Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

Open
3 tasks done
Tracked by #10499
LeDechaine opened this issue Sep 14, 2024 · 1 comment
Open
3 tasks done
Tracked by #10499
Labels
effort/hours Estimated to take one or several hours exp/intermediate Prior experience is likely helpful kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up

Comments

@LeDechaine
Copy link

Checklist

Installation method

ipfs-update or dist.ipfs.tech

Version

Kubo version: 0.29.0
Repo version: 15
System version: amd64/linux
Golang version: go1.22.4

Config

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic-v1/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": false
    }
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "Libp2pStreamMounting": false,
    "OptimisticProvide": true,
    "OptimisticProvideJobsPoolSize": 120,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "DeserializedResponses": null,
    "DisableHTMLErrors": null,
    "ExposeRoutingAPI": null,
    "HTTPHeaders": {},
    "NoDNSLink": true,
    "NoFetch": false,
    "PublicGateways": {
      "k51qzi5uqu5dj4zil10lqlbtckpmozoxghycqhtksngn215toulwb3n8k9sv2k": {
        "NoDNSLink": false,
        "Paths": []
      }
    },
    "RootRedirect": ""
  },
  "Identity": {
    "PeerID": "12D3KooWA6HLX9ebnT91TktUzRNx3WJta6Ks1FZrVetyU7AY9Rjf"
  },
  "Import": {
    "CidVersion": null,
    "HashFunction": null,
    "UnixFSChunker": null,
    "UnixFSRawLeaves": null
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128,
    "UsePubsub": true
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {},
  "Routing": {
    "Methods": null,
    "Routers": null
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

A more accurate title for this is possibly: Prevent multiple instances of "ipfs bitswap reprovide" running at the same time.

This is similar to this, but with "Reprovider.Strategy" not set.

"ipfs bitswap reprovide" apparently "triggers reprovider to announce our data to network" and is a recommended way to make IPFS work better on forums. Having only a 20mb website (about 100 files) on IPFS, I set a cron job on two different servers, to do "ipfs bitswap reprovide" every hour on two different VPS's. Doing the command manually appeared to just hang, no info whatsoever (which is probably a bug already), and I had to "ctrl+C" out of it, but I added it to crontab anyway. TL,DR: Don't.

Here's "journalctl -u ipfs" on server 1 (So yeah I found out my new VPS actually meets the minimum IPFS requirements, this is a quad-core with 8Gb of ram -- "ipfs config show" is from this one)

Sep 12 01:50:42 server systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL
Sep 12 01:50:42 server systemd[1]: ipfs.service: Failed with result 'signal'.
Sep 12 01:50:42 server systemd[1]: ipfs.service: Consumed 1h 9min 48.690s CPU time.
Sep 12 01:50:42 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 12.
(...)
Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.
Sep 12 11:18:06 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 16.
Sep 12 11:18:06 server systemd[1]: Stopped ipfs.service - IPFS daemon.
Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.

"journalctl -u ipfs" on server 2 (this is from a single-core with 512mb ram -- Hosting one website and running with "NoFetch" now)

Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 1.
Sep 09 00:09:18 server2 systemd[1]: Stopped IPFS daemon.
Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Consumed 9min 10.547s CPU time.
Sep 09 00:09:18 server2 systemd[1]: Started IPFS daemon.
Sep 09 02:45:53 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 3.
(...)
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Failed with result 'signal'.
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Consumed 16min 24.878s CPU time.
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 5.

But no restarts since 4 days ago on server2?

"ps aux | grep ipfs" on server2:

ledecha+ 16088 2.5 12.8 2294940 56084 ? Ssl Sep11 40:41 ipfs daemon --migrate=true --enable-gc --routing=dhtclient
ledecha+ 16194 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16195 0.0 1.6 1659356 7276 ? Sl Sep11 0:29 ipfs bitswap reprovide
ledecha+ 16479 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16480 0.0 0.0 1733088 0 ? Sl Sep11 0:31 ipfs bitswap reprovide
ledecha+ 16730 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16731 0.0 0.0 1659356 4 ? Sl Sep11 0:24 ipfs bitswap reprovide

...gave me 26 instances of "ipfs bitswap reprovide" running

Long story short: executing "ipfs bitswap reprovide", even for 20mb (about 200 files), is too much, and will systematically crash your ipfs daemon even with a quad core with 8Gb of ram. Big server or not, this is definitely not the intended result(s). IPFS worked fine, stable, no crashes, for multiple months, without "ipfs bitswap reprovide" as a cron job, even on the VPS with 1-core and 512mb ram.

Maybe I was an idiot for setting it to "ipfs bitswap reprovide" every hour, maybe that's why ipfs crashed. If that's the case, at minimum, I recommend preventing reproviding when another "reprovide" job is already ongoing.

@LeDechaine LeDechaine added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Sep 14, 2024
@lidel
Copy link
Member

lidel commented Sep 17, 2024

The default Reprovider.Interval is once every 22 hours. Modern Amino DHT servers remember records for 48h (libp2p/go-libp2p-kad-dht#793), old ones remembered for 24h. There should be no reason to provide more often than once a day.

Forcing reprovide every hour via cron is def. not doing you any good, especially if providing your CIDs takes longer than tha this shortened interval. You should just disable it and rely on Reprovider.Interval.

A more accurate title for this is possibly: Prevent multiple instances of "ipfs bitswap reprovide" running at the same time.

We don't have a global mutex on running ipfs bitswap reprovide (it is backed by Provider.Reprovide(req.Context) from boxo which is always forced).

This is a sensible bug to fix as part of the reprovider work we plan to do (cc @gammazero)

@lidel lidel changed the title "ipfs bitswap reprovide" command gives no answer, and just crashes the ipfs daemon when run in the background. Prevent multiple instances of "ipfs bitswap reprovide" running at the same time Sep 17, 2024
@lidel lidel added P2 Medium: Good to have, but can wait until someone steps up exp/intermediate Prior experience is likely helpful effort/hours Estimated to take one or several hours and removed need/triage Needs initial labeling and prioritization labels Sep 17, 2024
@lidel lidel mentioned this issue Sep 17, 2024
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours exp/intermediate Prior experience is likely helpful kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

2 participants