Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

LeDechaine · 2024-09-14T06:16:31Z

Checklist

This is a bug report, not a question. Ask questions on discuss.ipfs.tech.
I have searched on the issue tracker for my bug.
I am running the latest kubo version or have an issue updating.

Installation method

ipfs-update or dist.ipfs.tech

Version

Kubo version: 0.29.0
Repo version: 15
System version: amd64/linux
Golang version: go1.22.4

Config

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic-v1/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": false
    }
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "Libp2pStreamMounting": false,
    "OptimisticProvide": true,
    "OptimisticProvideJobsPoolSize": 120,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "DeserializedResponses": null,
    "DisableHTMLErrors": null,
    "ExposeRoutingAPI": null,
    "HTTPHeaders": {},
    "NoDNSLink": true,
    "NoFetch": false,
    "PublicGateways": {
      "k51qzi5uqu5dj4zil10lqlbtckpmozoxghycqhtksngn215toulwb3n8k9sv2k": {
        "NoDNSLink": false,
        "Paths": []
      }
    },
    "RootRedirect": ""
  },
  "Identity": {
    "PeerID": "12D3KooWA6HLX9ebnT91TktUzRNx3WJta6Ks1FZrVetyU7AY9Rjf"
  },
  "Import": {
    "CidVersion": null,
    "HashFunction": null,
    "UnixFSChunker": null,
    "UnixFSRawLeaves": null
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128,
    "UsePubsub": true
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {},
  "Routing": {
    "Methods": null,
    "Routers": null
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

A more accurate title for this is possibly: Prevent multiple instances of "ipfs bitswap reprovide" running at the same time.

This is similar to this, but with "Reprovider.Strategy" not set.

"ipfs bitswap reprovide" apparently "triggers reprovider to announce our data to network" and is a recommended way to make IPFS work better on forums. Having only a 20mb website (about 100 files) on IPFS, I set a cron job on two different servers, to do "ipfs bitswap reprovide" every hour on two different VPS's. Doing the command manually appeared to just hang, no info whatsoever (which is probably a bug already), and I had to "ctrl+C" out of it, but I added it to crontab anyway. TL,DR: Don't.

Here's "journalctl -u ipfs" on server 1 (So yeah I found out my new VPS actually meets the minimum IPFS requirements, this is a quad-core with 8Gb of ram -- "ipfs config show" is from this one)

Sep 12 01:50:42 server systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL
Sep 12 01:50:42 server systemd[1]: ipfs.service: Failed with result 'signal'.
Sep 12 01:50:42 server systemd[1]: ipfs.service: Consumed 1h 9min 48.690s CPU time.
Sep 12 01:50:42 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 12.
(...)
Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.
Sep 12 11:18:06 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 16.
Sep 12 11:18:06 server systemd[1]: Stopped ipfs.service - IPFS daemon.
Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.

"journalctl -u ipfs" on server 2 (this is from a single-core with 512mb ram -- Hosting one website and running with "NoFetch" now)

Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 1.
Sep 09 00:09:18 server2 systemd[1]: Stopped IPFS daemon.
Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Consumed 9min 10.547s CPU time.
Sep 09 00:09:18 server2 systemd[1]: Started IPFS daemon.
Sep 09 02:45:53 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 3.
(...)
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Failed with result 'signal'.
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Consumed 16min 24.878s CPU time.
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 5.

But no restarts since 4 days ago on server2?

"ps aux | grep ipfs" on server2:

ledecha+ 16088 2.5 12.8 2294940 56084 ? Ssl Sep11 40:41 ipfs daemon --migrate=true --enable-gc --routing=dhtclient
ledecha+ 16194 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16195 0.0 1.6 1659356 7276 ? Sl Sep11 0:29 ipfs bitswap reprovide
ledecha+ 16479 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16480 0.0 0.0 1733088 0 ? Sl Sep11 0:31 ipfs bitswap reprovide
ledecha+ 16730 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16731 0.0 0.0 1659356 4 ? Sl Sep11 0:24 ipfs bitswap reprovide

...gave me 26 instances of "ipfs bitswap reprovide" running

Long story short: executing "ipfs bitswap reprovide", even for 20mb (about 200 files), is too much, and will systematically crash your ipfs daemon even with a quad core with 8Gb of ram. Big server or not, this is definitely not the intended result(s). IPFS worked fine, stable, no crashes, for multiple months, without "ipfs bitswap reprovide" as a cron job, even on the VPS with 1-core and 512mb ram.

Maybe I was an idiot for setting it to "ipfs bitswap reprovide" every hour, maybe that's why ipfs crashed. If that's the case, at minimum, I recommend preventing reproviding when another "reprovide" job is already ongoing.

The text was updated successfully, but these errors were encountered:

lidel · 2024-09-17T23:02:05Z

The default Reprovider.Interval is once every 22 hours. Modern Amino DHT servers remember records for 48h (libp2p/go-libp2p-kad-dht#793), old ones remembered for 24h. There should be no reason to provide more often than once a day.

Forcing reprovide every hour via cron is def. not doing you any good, especially if providing your CIDs takes longer than tha this shortened interval. You should just disable it and rely on Reprovider.Interval.

A more accurate title for this is possibly: Prevent multiple instances of "ipfs bitswap reprovide" running at the same time.

We don't have a global mutex on running ipfs bitswap reprovide (it is backed by Provider.Reprovide(req.Context) from boxo which is always forced).

This is a sensible bug to fix as part of the reprovider work we plan to do (cc @gammazero)

LeDechaine added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Sep 14, 2024

lidel changed the title ~~"ipfs bitswap reprovide" command gives no answer, and just crashes the ipfs daemon when run in the background.~~ Prevent multiple instances of "ipfs bitswap reprovide" running at the same time Sep 17, 2024

lidel added P2 Medium: Good to have, but can wait until someone steps up exp/intermediate Prior experience is likely helpful effort/hours Estimated to take one or several hours and removed need/triage Needs initial labeling and prioritization labels Sep 17, 2024

lidel mentioned this issue Sep 17, 2024

Release 0.31 #10499

Open

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

LeDechaine commented Sep 14, 2024

lidel commented Sep 17, 2024

Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

Comments

LeDechaine commented Sep 14, 2024

Checklist

Installation method

Version

Config

Description

lidel commented Sep 17, 2024