I don’t know because it’s a colocation setup, I just rent a 42U rack.
About one drive per 2 mo. Most of them died after ~2.5y of uptime.
I don’t know because it’s a colocation setup, I just rent a 42U rack.
About one drive per 2 mo. Most of them died after ~2.5y of uptime.
We can take advantage of the two-level directory structure here, and sort the two-letter directories on our own.
Remember the the last two-letter directory scanned. When running walkNamespaceInPath, first run all Readdirnames() to learn all two-letter directories, then sort the list alphabetically, only then run the inner loop, starting from the next two-letter directory. Update the database counter after every few two-letter directories, maybe not more often than every 5 minutes.
We could also try reducing the mismatch for uploads/deletes during the file walker, though the separate process for the file walker makes this cumbersome. It would go like: for each upload/delete, check whether its two-letter directory has been scanned or not. If it was, update the total. Setting up additional communication with the lazy file walker process might be quite cumbersome…
You don’t have a separate item for power on your colocation bill?
it will report much less used space, than it is now. It should not update databases with temporary results.
I agree with @Toyoo on taking advantage of the two-level directory structure. In fact, I’ve been thinking along these lines as well.
We can use the database as a means of communication since the lazyfilewalker subprocess also has access to the DB. Or we can pipe the data continuously to stdout just like we do for the logs until the parser detects the final data.
We can set it up to report it as incomplete data (but a cursor for the lazyfilewalker) and not be used till the lazyfilewalker is (actually) done with the remaining files.
Maybe the optimisation process under ultradefrag and ntfs can sort them on the disk in an order. I see such settings under optimisation tab in preferences
Not possible to depend on. The first file uploaded after the defrag is complete may already place a new directory entry out of order.
Thanks! That pushed me in the right direction, found multiple context canceled under used-space-filewalker.
So i just moved the Node to a more powerful host, container was peaking at 5GB RAM something during the used-space-filewalker process, and went down to 400MB once the process finished.
Other threads suggest that the reason for the context canceled is because the disk (Seagate IronWolf 6TB) cant keep up, but with no SSD cache there was nothing left than to just let it consume the RAM, just to complete the process.
2024-01-19T10:40:42Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2024-01-19T10:40:42Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2024-01-19T23:08:58Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode", "piecesTotal": 2583159110780, "piecesContentSize": 2573010075772}
2024-01-19T23:08:58Z INFO lazyfilewalker.used-space-filewalker subprocess finished successfully {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
Thank you elek!
was peaking at 5GB RAM something during the used-space-filewalker process
This is usually an indication of a too slow disk subsystem.
Is it a single disk? How is it connected?
This is usually an indication of a too slow disk subsystem.
The mantra is: “Use what you have” and “Don’t invest”.
That means the storagenode software must adapt to all kinds of different setups and function properly.
I’m agree, so we are trying to collect as much information as possible and we may help to solve the issue.
As a result it might improve the node or our documentation, like we did there:
Please check:
Yeah i assumed so as well, single disk, single node over USB 3.0, ext4.
I have multiple nodes with a similar setup and same type of HDD, and only during the used-space-filewalker process for this node was it consuming that much, so i really cant explain it more than just classifying it as anomaly, no other faults through SMART or such detected, and also defrag already run.
This is from another Node running the same type of HDD, over USB 2.0, ext4 on an Raspberry PI 3 B+, container is limited to 400MB RAM, the disk is already full so it does not really have to deal with the filewalker running and accepting loads of data at the same time.
2024-01-19T20:11:31Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-01-19T20:11:31Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-01-19T22:53:08Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "piecesTotal": 241228363008, "piecesContentSize": 241057498368}
so i really cant explain it
Because of USB connection. However, if you have this amount of RAM available - this should not be an issue.
The mantra is: “Use what you have” and “Don’t invest”.
That means the storagenode software must adapt to all kinds of different setups and function properly.
No, this means we should have better guidelines on what are the lowest reasonable specs that a node requires, so that prospective node operators with low-spec setups won’t have expectations the node will work.
So will there be a solution of the problem?
TLDR; with high number of segments on one node, the bloom filter is less effective as it should be. Fix is on the way.
You can track the github issue in above post.
I have a node with about 54M. Maybe you need data from this node?
I have a node with about 54M. Maybe you need data from this node?
No, but thanks the offer. We have enough data, the problem can be reproduced with the collected data. We are waiting for the next release deployment, after that we will increase the size of the bloom filter.
It can be be bumped up to 5Mb. Which will slowly (!) cleanup all the nodes. For faster cleanup, we need more code changes (5Mb is the limit of the single request/response DRPC message. But we can switch to a streaming based approach, just needs more code changes).
So all we need to do is to enable FileWalker and keep the node up to date?
As always, i guess
As always
Yes, exactly…