Disk usage discrepancy?

james58899 · December 29, 2023, 2:51am

I noticed that filewalker only uses a single thread to traverse the entire storage space. Since this is obviously an IO bottleneck, wouldn’t it be faster to use multiple threads?

This may cause more IO when filewalker is running. Although I think OS and FS can combine multiple IO operations, it is better to provide an option to adjust the number of threads.

Alexey · December 29, 2023, 3:17am

Invoking a traversal on a spinning disk with threads is bad idea, the latency will grow exponentially, unless you use cache (SSD or RAM). But you may disable a lazy filewalker to increase a priority to normal and speedup a process:

pieces.enable-lazy-filewalker: false

jammerdan · December 29, 2023, 5:18am

I see. I thought I might have missed some new feature setting.
There were ideas for such a process during startup: Large temporary folder - #7 by nerdatwork
It should not be too hard to implement.

If you search the forum for temp folder, there are quite a few including myself who have experienced or are still experiencing this issue.

james58899 · December 29, 2023, 5:47am

Have you actually tested it? Modern HDDs are not that bad, NCQ allows the controller to read in a more efficient way, but only if there is enough IO depth.

In any case, it would be better if there are options. For nodes on RAID or network storage, increasing IO depth can significantly improve efficiency, and nodes operator can also change the setting to 1 to avoid overloading the hard disk.

Alexey · December 29, 2023, 7:22am

It’s tested by operators, who tried to run several nodes on the same disk/pool (even if that against ToS), they got worst results - all their nodes were slow and affect each other, and they have a discrepancy because failed filewalker due to “context canceled” errors (produced by a large latency).

It’s reverse. RAID with parity works as slow, as a slowest disk in the pool, and they usually have much more worse results than nodes running one per disk without any RAID and using a native filesystem for their OS - ext4 for Linux, NTFS for Windows.

james58899 · December 29, 2023, 7:55am

This is not the same thing. The download and upload operations are already parallelized. Download and upload will generate enough IO depth, but filewalker does not.

In most cases, RAID will indeed be slower when IO depth=1, but when IO depth is high, RAID can exceed a single hard disk and provide higher reliability at the same time. Since node survival time affects revenue, higher reliability isn’t a bad thing.

Alexey · December 29, 2023, 8:10am

With parity - no, unfortunately. And it’s a most useful RAID.
You will have a much slower disk subsystem without a cache.

daki82 · December 29, 2023, 8:54am

Maybe i confuse something,( i remember setting it to 4Mib on my slow node,)BUT here is a screenshhot from my new second node, where i checked i did not change it in the yaml.
With the temp folder in action, together.

Luis · December 29, 2023, 10:44am

I have this problem in the log

2023-12-29T10:58:08Z	INFO	Telemetry enabled	{"instance ID": "1XtRu6nGS2HmWxpBV7KpreymYBPdXgQPVR7QPXfGxYhDZaHA9h"}
2023-12-29T10:58:08Z	INFO	Event collection enabled	{"instance ID": "1XtRu6nGS2HmWxpBV7KpreymYBPdXgQPVR7QPXfGxYhDZaHA9h"}
2023-12-29T10:58:08Z	INFO	db.migration	Database Version	{"version": 54}
2023-12-29T10:58:08Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.
2023-12-29T10:58:12Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.
2023-12-29T10:58:12Z	INFO	bandwidth	Performing bandwidth usage rollups
2023-12-29T10:58:12Z	INFO	Node 1XtRu6nGS2HmWxpBV7KpreymYBPdXgQPVR7QPXfGxYhDZaHA9h started
2023-12-29T10:58:12Z	INFO	Public server started on [::]:61607
2023-12-29T10:58:12Z	INFO	Private server started on 127.0.0.1:7778
2023-12-29T10:58:12Z	INFO	trust	Scheduling next refresh	{"after": "5h32m54.590921875s"}
2023-12-29T10:58:12Z	INFO	pieces:trash	emptying trash started	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2023-12-29T10:58:12Z	INFO	piecestore	download started	{"Piece ID": "P62BII54SQL4ULN3WUIY2GDV5YDLMUDIHRUI3J3WTDEB2N4DFJYA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 277504, "Remote Address": "79.127.201.209:41516"}
2023-12-29T10:58:12Z	INFO	lazyfilewalker.used-space-filewalker	starting subprocess	{"satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-12-29T10:58:12Z	WARN	piecestore:monitor	Disk space is less than requested. Allocated space is	{"bytes": 5629214683136}
2023-12-29T10:58:12Z	INFO	piecestore	download started	{"Piece ID": "GC4OEZJQZQVI6I55CADRSZWBSZJ3QVRXPQ5AUFWHXU3L3VKQWX6A", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "Offset": 76288, "Size": 4864, "Remote Address": "130.61.114.233:56736"}
2023-12-29T10:58:12Z	ERROR	piecestore	download failed	{"Piece ID": "P62BII54SQL4ULN3WUIY2GDV5YDLMUDIHRUI3J3WTDEB2N4DFJYA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Remote Address": "79.127.201.209:41516", "error": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:616\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2023-12-29T10:58:12Z	INFO	piecestore	download started	{"Piece ID": "IZW2PQRIEZN6ENWOYASUXACI6TVQ5EDZZTQH55MJ6IAOKLBR2P2A", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "Offset": 791552, "Size": 4864, "Remote Address": "130.61.78.120:44680"}
2023-12-29T10:58:12Z	INFO	piecestore	download started	{"Piece ID": "ZBNP2ZQLASSGUFXPTRZCHIZVIPRK266R66RORZXPLAGYY2AJ7CVQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "Offset": 1659648, "Size": 4864, "Remote Address": "152.53.13.86:49224"}
2023-12-29T10:58:12Z	INFO	lazyfilewalker.used-space-filewalker	subprocess started	{"satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2023-12-29T10:58:12Z	INFO	piecestore	download started	{"Piece ID": "XVJH3TPZG2JS5XUUHJGJ2BDJ3KDHEOS6ZDCQ7GB7MZDYR5HDFYXA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "Offset": 661248, "Size": 4864, "Remote Address": "152.53.13.54:53094"}
2023-12-29T10:58:12Z	INFO	piecestore	download started	{"Piece ID": "OB75BT3WFK3AIWD6J7L4LHXPQJIASWT7SUN45WXTEZLJLCAZ5QEA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "Offset": 1030656, "Size": 4608, "Remote Address": "130.61.78.120:44654"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "2MGJXDS5CZAUKPA4IKPZNY3VKTNZM5RNLI6JKOU7E2CVWKV6INIA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 5629214683136, "Remote Address": "79.127.203.194:44704"}
2023-12-29T10:58:12Z	INFO	lazyfilewalker.used-space-filewalker.subprocess	Database started	{"satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode"}
2023-12-29T10:58:12Z	INFO	lazyfilewalker.used-space-filewalker.subprocess	used-space-filewalker started	{"satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "ETRHSUEPX7GE4UA2YJNVWLCENEQ6RDJD7SJ4HFGDQKWPIMHYOVEA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629210488832, "Remote Address": "79.127.220.99:60318"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "WUQUXONFZSIEXP2EX2QUKFSB2WSFSPUXA3GPSDENPM2ICLURULOA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629210488832, "Remote Address": "79.127.219.33:33872"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "JK7BUNWOP7RTMRAKVSLAU5772ECMSFHM4P5PQL6UOQAGCKU433XQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629210488832, "Remote Address": "5.161.117.79:63856"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "HJJC7BSMKEI5FXQYPW4WSYLP3BUKSXDBJ4AOZ2UVCQRN2QHVIYMA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629210488832, "Remote Address": "79.127.205.226:42984"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "MJKVP7NRWVM6YNKTRMDK5CKXMKDCR3S4HQSX5UPH442MNZW4B77A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629210488832, "Remote Address": "79.127.203.194:44690"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "PUH5A7ZPHUSFJQ6JXB7QF4MANDSTXKPNW7XAS6ZOXGOX5YB43Z3Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629189517312, "Remote Address": "72.52.83.203:59066"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "TLMFKATTKCB3KFATEC3J3RAIYBZ3PLSSRZN45EGRVLBO7YYAMZJA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 5629189009408, "Remote Address": "79.127.203.193:58860"}
2023-12-29T10:58:12Z	INFO	piecestore	upload started	{"Piece ID": "HBTIHBF3XB4UFDYHRPI45OIZDM24YXBTPMFAMWYJINY4RWFHJKPA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 5629189009408, "Remote Address": "5.161.77.10:26548"}
2023-12-29T10:58:12Z	ERROR	blobscache	piecesTotal < 0	{"piecesTotal": -35840}
2023-12-29T10:58:12Z	ERROR	blobscache	piecesContentSize < 0	{"piecesContentSize": -35840}
2023-12-29T10:58:12Z	ERROR	blobscache	satPiecesTotal < 0	{"satPiecesTotal": -145920}
2023-12-29T10:58:12Z	ERROR	blobscache	satPiecesContentSize < 0	{"satPiecesContentSize": -145408}
2023-12-29T10:58:12Z	INFO	collector	deleted expired piece	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "6YCGJF24VUFIMNGYLGJWIFZANT2X4PBKW26QR5D5D6WCSBMUCN3A"}
2023-12-29T10:58:13Z	ERROR	blobscache	satPiecesTotal < 0	{"satPiecesTotal": -145920}
2023-12-29T10:58:13Z	ERROR	blobscache	satPiecesContentSize < 0	{"satPiecesContentSize": -145408}
2023-12-29T10:58:13Z	INFO	collector	deleted expired piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "DMRAWFLBRIK4OXQD56YMHC5TJ2Y572ENQCGQ7R6YQK6Z7SQSN5WA"}
2023-12-29T10:58:13Z	ERROR	blobscache	satPiecesTotal < 0	{"satPiecesTotal": -17920}
2023-12-29T10:58:13Z	ERROR	blobscache	satPiecesContentSize < 0	{"satPiecesContentSize": -17408}
2023-12-29T10:58:13Z	INFO	piecestore	download started	{"Piece ID": "3CK4WNN7HQ3W2EKAXO4ZLMYRPJCP7IECK3UUYNRWBNXD2WQVZW3A", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "Offset": 1038592, "Size": 4608, "Remote Address": "130.61.114.233:48172"}
2023-12-29T10:58:13Z	INFO	collector	deleted expired piece	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "JRP475XDCWIC6HQ5REINKMLJRWMXDBHQGKWPCCZV46KLC7OD2JVA"}
2023-12-29T10:58:13Z	INFO	collector	collect	{"count": 3}

snorkel · December 29, 2023, 11:57am

There are 2 params: one is 128Kib, the other 4 MiB. I think what you see is the result of the other. As I understand it, one sets the size in RAM, and the orher sets the size on HDD.

daki82 · December 29, 2023, 1:36pm

found it!
its : pieces.write-prealloc-size: 4.0 MiB

woudn’t the max. pice size being more logical?

in RAM stuff is propably fast enough to be insignificant…but what does the 128kib thing do?

daki82 · December 29, 2023, 1:41pm

somehow disk has not 5.6TB ?
provide system specs like Filesystem, clustersize, OS, disktype, non-storj-files on it? etc.

Alexey · December 30, 2023, 2:17am

./storagenode setup --help | grep write-

      --filestore.write-buffer-size memory.Size                  in-memory buffer for uploads (default 128.0 KiB)
      --pieces.write-prealloc-size memory.Size                   file preallocated for uploading (default 4.0 MiB)

for what? If the piece would be less than that, it will shrink, if it would be greater - it will have to grow. This param just trying to reduce a fragmentation. On some sophisticated filesystems it could improve performance.

it will use this size to allocate memory blocks and will be written to the disk when the 128kiB will be filled. Increasing this parameter would allow to collect piece to memory before flushing to the disk, but it also will increase a total memory usage, since it’s per transfer.

Alexey · December 30, 2023, 2:42am

this is mean problems with the network, something is blocking your outgoing connections or your DNS doesn’t work and you need to change it to 8.8.8.8

these errors are result of the not finished filewalker. You need to wait when it will finish the calculation and make sure that you do not have restarts during the process.

I would also suggests to disable a lazy filewalker to speedup a process:

pieces.enable-lazy-filewalker: false

save the config and restart the node.

snorkel · December 30, 2023, 6:33am

I get that it tries to reduce fragmentation by gattering more pieces in RAM before transfering them to disk, but:

enterprise drives have 256/512 MB cache, and
Linux ext4 has a buil-in mechanism to prevent fragmentation.
Can’t these 2 things take care of the pieces alone, by storing them in HDD cache and copying them in 4MB chunks, without occupying the RAM?

Alexey · December 30, 2023, 8:29am

Likely they can. So, you may change these values suitable to your setup (I do not think, that they actually help much though).

daki82 · December 30, 2023, 9:25am

I agree, my new node is cached in ram anyway, and the old one does fine and is nearly full.

hdd cache is under management of the drive and os. its purpose is to buffer data, not to provide ram. also NCQ and things are working here.

snorkel · December 30, 2023, 9:38am

Yes, I followed your advise since last week, and restarted all nodes without those 2 settings. I needed to do it anyway, because it’s time to add new drives in each Syno and I need to free up all the RAM I can. I’m really curious how my new 22TB Exos will work, along with the 16TB one, in my NASes. I hope 18GB RAM are enough.

daki82 · December 30, 2023, 1:45pm

you don’t use ntfs. i assume.

snorkel · December 30, 2023, 2:00pm

Synology, ext4, no raid, no ssd cache. Works like a charm.