Disk has now run out of space, even though allocated storage was set?

Hi, I have a Storj Node running on a Windows 10 OS.
I have a 3.6 TB disk dedicated for Storj data (D:\ drive).
I have Storj installed on C:\Program Files\Storj (NVMe drive)

I have set the allocated disk space to 3.10 TB, but my disk has now run out of space.
I have now tried to set the allocated disk space to 2 TB and restart the Storj service. It still has no affect, but its only been a couple of hours so there might be a delayed affect?
image

Does anyone have a clue to why my disk is filling up beyond the allocated disk space?

The log:
C:\Program Files\Storj\Storage Node\storagenode.log

Begins at date 2022-12-19 and up to todays date, the log is over 12GB in size, is this normal??
Anyway, someone mentioned a thing called “filewalker”, and I see it mentioned in the log several times with warnings and errors:

|2023-11-08T16:38:41+01:00|WARN|piecestore:monitor|Used more space than allocated. Allocated space is|{bytes: 2000000000000}|
|---|---|---|---|---|
|2023-11-08T16:39:12+01:00|WARN|reputation:service|node scores worsened|{Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Total Audits: 842857, Successful Audits: 836612, Audit Score: 1, Online Score: 0.9168789216122301, Suspension Score: 0.8958375645475731, Audit Score Delta: 0, Online Score Delta: -0.009090909090909038, Suspension Score Delta: 0.030452454116918548}|
|2023-11-08T16:39:37+01:00|ERROR|pieces|failed to lazywalk space used by satellite|{error: lazyfilewalker: exit status 1, errorVerbose: lazyfilewalker: exit status 1\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:83\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:717\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75, Satellite ID: 12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB}|
|2023-11-08T16:39:37+01:00|ERROR|lazyfilewalker.used-space-filewalker|failed to start subprocess|{satelliteID: 12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo, error: context canceled}|
|2023-11-08T16:39:37+01:00|ERROR|pieces|failed to lazywalk space used by satellite|{error: lazyfilewalker: context canceled, errorVerbose: lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:71\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:717\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75, Satellite ID: 12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo}|
|2023-11-08T16:39:37+01:00|ERROR|lazyfilewalker.used-space-filewalker|failed to start subprocess|{satelliteID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, error: context canceled}|
|2023-11-08T16:39:37+01:00|ERROR|pieces|failed to lazywalk space used by satellite|{error: lazyfilewalker: context canceled, errorVerbose: lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:71\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:717\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE}|
|2023-11-08T16:39:37+01:00|ERROR|lazyfilewalker.used-space-filewalker|failed to start subprocess|{satelliteID: 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6, error: context canceled}|
|2023-11-08T16:39:37+01:00|ERROR|pieces|failed to lazywalk space used by satellite|{error: lazyfilewalker: context canceled, errorVerbose: lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:71\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:717\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75, Satellite ID: 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6}|
|2023-11-08T16:39:37+01:00|ERROR|lazyfilewalker.used-space-filewalker|failed to start subprocess|{satelliteID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, error: context canceled}|
|2023-11-08T16:39:37+01:00|ERROR|pieces|failed to lazywalk space used by satellite|{error: lazyfilewalker: context canceled, errorVerbose: lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:71\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:717\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S}|
|2023-11-08T16:39:37+01:00|ERROR|lazyfilewalker.used-space-filewalker|failed to start subprocess|{satelliteID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, error: context canceled}|
|2023-11-08T16:39:37+01:00|ERROR|pieces|failed to lazywalk space used by satellite|{error: lazyfilewalker: context canceled, errorVerbose: lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:71\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:717\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs}|
|2023-11-08T16:39:37+01:00|ERROR|piecestore:cache|error getting current used space: |{error: filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled, errorVerbose: group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:726\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:726\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:726\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:726\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:726\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:69\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:74\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:726\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75}|
|2023-11-08T16:39:40+01:00|WARN|piecestore:monitor|Used more space than allocated. Allocated space is|{bytes: 2000000000000}|

I got exactly the same problem, especially on SMR drives. It seems to be because the real used storage never seems to be adequately updated. So, essentially the filewalker task never seems to finish its task before the node seems to be reset (and the filewalker starts over).

Problem even remains after disabling the lazy filewalker (faking back to the more IO-intensive filwalker) and resetting the piece_spaced_used.db.

So I’m actually very curious about the answer.

Temporary solution:

docker run -d --restart unless-stopped --stop-timeout 300 \
        --user $(id -u):$(id -g) \
        --mount type=bind,source="$sIDFolder",destination=/app/identity \
        --mount type=bind,source="$sNodeMnt",destination=/app/config \
        --log-opt max-size=20m --log-opt max-file=3 \
        --name "storagenode" storjlabs/storagenode:latest \
        --storage2.monitor.minimum-disk-space=100MB \
        --storage2.piece-scan-on-startup=true \
        --pieces.enable-lazy-filewalker=false

(Pointing at the last three lines)

What file system and what cluster size are you using on the D: drive?

Yes, the D:\ drive is a spinning disk drive, where Storj data is located.

Storj is installed on Windows 10 Pro (no docker)
Im unable to locate --pieces.enable-lazy-filewalker=false, in the config.yaml
But storage2.piece-scan-on-startup is now set to true, and I have restarted the Storj service.
Thanks :slight_smile:

Running NTFS with 4096 cluster size:
image

So indeed, assuming nothing else takes this space, it’ll indeed most likely be outdated bandwidth statistics.

Reduce your allocation to 500GB, the minimum allowed. If your node believes it has more than 500GB of data, this will stop uploads, allowing the used space file walker to finish faster. The file walker is the easiest way to update the statistics. After it finishes and you will see in the node’s user interface that the amount used matches what you observe on the drive, then you can restart the node again with the original allocation.

You could potentially also update the numbers directly in the database, the file is bandwidth.db.

Why your node does not have accurate data, you’ll have to investigate on your own though.

It’s normal if you log INFO-level messages. Lots of node operators either log only at WARN or ERROR level, or invent various ways to rotate or compress logs.

1 Like

Thanks, I have set the allocation to 500 GB now, and will monitor…

And the storage2.piece-scan-on-startup:true is the filewalker thing I assume ?

The log is set to warn it looks like?:
image

Yep!

Yep!؜؜؜؜؜؜؜؜؜؜؜؜

1 Like

Ummm.
its failing audits ?
Thats a big problem.
Check the disk for errors
S.m.a.r.t.
And propably chkdsk.

And loglevel fatal logs only when the node gets stopped.

But here you got to investigate. Propably something corrupted

1 Like

Explain please, how?
Because this definately is a filewalker problem. In my case at least, it ends with the same filwalker errors now and then as the TS.

File system probably hasn’t to do that much with it, because I use ext4. And no SMART-issues, although being curious if the same goes for the TS.

No audit fails at my side, but also good question for the TS to answer.

This is usually related to the slow disk subsystem, you may improve it if you would be able to add more RAM (the OS will use it for cache).

Regarding modifying a database - you may stop and remove the container, open this database with sqlite3 (either binary or the docker container with sqlite3 image - you may search for it) and see what is already there, then add a correct record about usage. But likely not a bandwidth.db, but piece_spaced_used.db and storage_usage.db.
I do not have exact insert command though, but you may try to figure out depending on output what’s already there.
If you are not familiar with SQL at all and SQLite in particular, I would suggest to fix the underlaying issue instead - fix the slow disk subsystem.
One of the workarounds is to reduce the allocation below the usage (or just 500GB), it will stop receive any ingress, and this may help it to finish a filewalker. Do not disable a filewalker on start. You may disable a lazy one to make it process a little bit faster.

1 Like

Right, as Alexey mentioned, it’s a different database. My mistake.

You can use any tool that can open an SQLite3 database. DBeaver is one such nice tool. But if you do not have technical expertise to know how to work with databases, you will likely make it worse.

I have run a chkdsk but there where no errors

When looking in the config.yaml it looks like the section has been commented out as default:

#if set to true, all pieces disk usage is recalculated on startup
#storage2.piece-scan-on-startup: true

Does this mean it will not run?
Anyways I have now un-commented the line:

storage2.piece-scan-on-startup: true

Stop the node. Delete/rename the logfile. Start the node.

2 Likes

Is there data from the decommisioned test satelites left?

It’s the default behaviour to start scan at startup.

But actually I think the filewalker is totally broken, since I’m getting these errors just recently. I’m using the configuration stated above. As far as I can see:

docker logs storagenode 2>&1 | grep -i walk

Doesn’t show any output.

So @Alexey, has the old filewalker been disabled at all? Or how can we check whether there’s even a filewalker-process running?

No RAM issues apparently on the nodes with these problems:

root@STORJ3:~# free -m
               total        used        free      shared  buff/cache   available
Mem:            1536         192        1136           0         207        1343
Swap:              0           0           0
root@STORJ3:~# df
Filesystem                          1K-blocks       Used Available Use% Mounted on
/dev/sda2                           131108864   75935260  51691796  60% /
none                                      492          4       488   1% /dev
tmpfs                                 8071764        300   8071464   1% /tmp
/dev/disk/by-partlabel/STORJ3-DATA 2883120408 2736088132    503096 100% /storj/disk
tmpfs                                 8071764          0   8071764   0% /dev/shm
tmpfs                                 3228708        216   3228492   1% /run
tmpfs                                    5120          0      5120   0% /run/lock
overlaid                              8071764        300   8071464   1% /MaintenanceScripts
overlaid                              8071764        300   8071464   1% /var/log
overlay                             131108864   75935260  51691796  60% /var/lib/docker/overlay2/3d81e627aca553035570b8bd2824ce139841d6fad90143f444bf604a21fa1c48/merged
root@STORJ7:~# free -m
               total        used        free      shared  buff/cache   available
Mem:            1536         198        1136           0         201        1337
Swap:              0           0           0
root@STORJ7:~# df
Filesystem                         1K-blocks      Used Available Use% Mounted on
/dev/sda2                          131108864  75934232  51692824  60% /
none                                     492         4       488   1% /dev
tmpfs                                8071764       276   8071488   1% /tmp
/dev/disk/by-partlabel/STORJ7-DATA 974648824 974132432    500008 100% /storj/disk
tmpfs                                8071764         0   8071764   0% /dev/shm
tmpfs                                3228708       216   3228492   1% /run
tmpfs                                   5120         0      5120   0% /run/lock
overlaid                             8071764       276   8071488   1% /MaintenanceScripts
overlaid                             8071764       276   8071488   1% /var/log
overlay                            131108864  75934232  51692824  60% /var/lib/docker/overlay2/e725d167d40466a120471cfe67f6d125cab48292625c3e218548a7b76683512e/merged

top is also showing <10% CPU use.
Strengthening the idea that there’s no file walker running…

Either you have a configured log rotation or you redirected logs to the file (in this case you need to search in the log file instead) or you have disabled a filewalker on start.

Unlikely.

and I can confirm that on my nodes.

$ grep walk /mnt/x/storagenode2/storagenode.log | tail
2023-11-09T12:00:09Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-11-09T12:00:09Z    INFO    lazyfilewalker.used-space-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2023-11-09T12:00:09Z    INFO    lazyfilewalker.used-space-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2023-11-09T12:00:09Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started   {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "process": "storagenode"}
2023-11-09T16:33:41Z    INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2023-11-09T16:33:41Z    INFO    lazyfilewalker.gc-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2023-11-09T16:33:42Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "process": "storagenode"}
2023-11-09T16:33:42Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "process": "storagenode", "createdBefore": "2023-09-19T17:00:07Z", "bloomFilterSize": 364472}
2023-11-09T16:42:38Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "process": "storagenode", "piecesCount": 610788, "piecesSkippedCount": 0}
2023-11-09T16:42:39Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}