storage2.piece-scan-on-startup: false we are seeing crazy massive reads on node startup. What is this?
More like a checking/cleaning trash, but you may use a debug endpoint to see what’s this.
rMB/s: 7.51, rrqm/s: 1724 and a utilization up to 100% for minutes on every startup.
If that’s required just for cleaning trash, that sounds insane.
Only trace from the debug endpoint can say, what is exactly it doing.
Is it this one?
[7530092614813486020,3333302239938971669] storj.io/storj/storagenode/pieces.(*TrashChore).Run() (elapsed: 6m54.901722685s, orphaned) [1942410985489664106,3333302239938971669] storj.io/storj/storagenode/pieces.(*Store).EmptyTrash() (elapsed: 41.157451163s) [5081579554688100975,3333302239938971669] storj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash() (elapsed: 41.157451433s) [8220748123886537845,3333302239938971669] storj.io/storj/storagenode/blobstore/filestore.(*Dir).EmptyTrash() (elapsed: 41.157452684s) [2136544656230198906,3333302239938971669] storj.io/storj/storagenode/blobstore/filestore.(*Dir).walkNamespaceInPath() (elapsed: 41.157453838s)
It looks like something about the trash.
I have just checked the size of the trash folder, it is 175GB on this node?
What exactly is this chore doing on every startup, no matter how recent the last start was?
I think so. It also doing that every hour:
docker exec -it storagenode2 ./storagenode setup --help | grep "collector.interval" --collector.interval duration how frequently expired pieces are collected (default 1h0m0s)
But what is this process doing? Is it going through every single file in the trash? And for what? And why on every startup even if the last start was just a couple of minutes ago? This does not make any sense.
I haven’t seen the same load like on startup every hour. So I am not sure if this is the same thing.
It checks last modified time to remove expired pieces and refreshes the cache (DB). I think it also checks all pieces in blobs, registered with expiration date, it’s stored inside pieces as a metadata.
Maybe this design needs a rework like the lazy-filewalker.
It does not sound to me like task that is required on every startup nor to run with high or even normal priority.
Also going through every file on startup again and again sounds like strange design. Modification times should not change I guess at least once a file has been moved to the trash. Therefore reading it from a file once should be sufficient.
And how it supposedly should figure out, which files are needed to remove from the trash without checking the modification time and comparing it with the current date?
If does not check dates, files could be here more than for 7 days, and we likely see complains about the next “bug” - why is it not removed asap when 7 days are passed.
Don’t we have an expiration database? I mean the moment we move a file into the trash we know that in +7 days it has to get removed. So this information could be stored in the database?
And then a low priority background task could query the database over the day whenever the node is not very busy and delete pieces.
But in any case, if the day is the relevant factor, checking the dates would be sufficient once a day.
I would love to be able to set a time as a config parameter at which these processes start every day. Right now it looks like it’s running on startup and then every day again around the container start-time.
That means that on my raspberry pi that is connected to multiple HDD‘s and has multiple nodes, the process usually runs for all storagenodes at the same time, because after a reboot all of them are started together.
Would be really nice to set different times for different nodes on the same machine to avoid overloading my little raspberry for one-two hours every day. Rest of the day he is more or less relaxed.
Fixed times are always bad as you really don’t know when the node gets hammered by up- or downloads or whatever is required.
In any case it would be required to define those tasks as low priority so that would not interfere. I think some resource optimization of the node software would be good.
This database is basically cache and it’s updated when scanned pieces.
You may post it as an idea in the Storage Node feature requests - voting - Storj Community Forum (official) category or as an issue on our GitHub.
you may start nodes with a delay, see
- Services top-level element | Docker Documentation
- Docker-Compose Startup Delay - #2 by zvin - Product support - balenaForums
or write a batch script:
#!/bin/bash docker compose up -d storagenode1 sleep 3600 docker compose up -d storagenode2 sleep 3600 docker compose up -d storagenode3
However, if your nodes on a separate disks, the parallel execution should not be a big problem.
And as these massive reads seem to be repeating I would like to know what is the current interval for the TrashChore as there seems to be no config setting to influence it. So what makes it start apart from node startup and at which times?
I think you are right with that observation. On one node it has just started again causing massive reads:
[8451136184016877735,3844120539144758204] storj.io/storj/storagenode/pieces.(*TrashChore).Run() (elapsed: 24h2m55.26998826s, orphaned) [6348737124026641307,3844120539144758204] storj.io/storj/storagenode/pieces.(*Store).EmptyTrash() (elapsed: 2m50.150998187s) [7309767258812830858,3844120539144758204] storj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash() (elapsed: 2m50.150997772s) [8270797393599020410,3844120539144758204] storj.io/storj/storagenode/blobstore/filestore.(*Dir).EmptyTrash() (elapsed: 2m50.150998758s) [4948374601870305559,3844120539144758204] storj.io/storj/storagenode/blobstore/filestore.(*Dir).deleteWithStorageFormatInPath() (elapsed: 76.419µs) [8455491530434153,3844120539144758204] storj.io/storj/storagenode/blobstore/filestore.(*Dir).walkNamespaceInPath() (elapsed: 2m50.151005695s)
And there is no config.yaml setting for this.
Something really needs to change with this process or my Storj days might be coming to an end very quickly… which I really don’t want to happen. My server keeps crashing now due to this. Server has been running fine all month but had to reboot for other reasons. Reboot and minutes later the whole system locks up over and over again. Filewalker is off on all of them, yet 17 drives all pegged at 100%. Databases all run on an SSD pool but that literally makes no differene anyway. If I don’t start the nodes it runs fine. With them running CPU is only ~10% and RAM is 33/128 so no issues there. But again, all drives used for Storj are pegged. I don’t have any other tasks accessing the drives, it’s all Storj. There’s no way this is a sustainable if this is how it confinues to work.