Used space gets reset after restart

Alexey · November 15, 2024, 7:21am

Hello @acoustician,
Welcome to the forum!

By default the scan on startup is enabled, so if you did restart the node, the used-space-filewalker should fix the issue.
If you see that all used-space-filewalkers are completed the scan without issues, and after a hour the dashboard is still not updated to the actual usage, then it could be a bug.

The possible workaround is described above, but I didn’t get a confirmation that this actually helps.

Alexey · November 15, 2024, 8:51am

I tried to reproduce it with storj-up from the main branch and the used space is showing correctly after restart.

On 1.114.6 it’s not reproduced too.

mahir · November 16, 2024, 12:54pm

Even after couple of hours the diskspace did not get updated
My node for test had 200GB space used
And all 4 filewalkers are done in seconds after restart

But when i use the workaround it takes a couple of minutes for the 4 filewalkers to finish

Also on v1.116.7 i have this issue.

Mark · November 16, 2024, 9:53pm

I noticed the same or similar Issues on my Asia pacific satellite node running v1.115.5. The file walker seems to instantly pull severely outdated numbers from the used_space_per_prefix.db instead of doing an actual file walk. The file walk completes in as little as 8 milliseconds. Recreating the databases seems to have allowed the walker to actually run and the reported numbers match what is actually on my disk. I’m not sure if I’ll need to remove the database every time until the bug is fixed.

File walker logs with notes below:

<file walker reported the same numbers for space and pieces for about a week>
2024-11-10T02:07:21Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "4m41.326967307s"}
2024-11-11T01:21:11Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "36.225692ms"}
2024-11-11T11:42:37Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "97.192031ms"}
2024-11-14T18:43:16Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "18.409076ms"}
2024-11-16T17:12:44Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "18.613078ms"}
2024-11-16T17:23:51Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "8.49543ms"}
2024-11-16T17:53:54Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "78.153171ms"}
2024-11-16T19:21:33Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 96280136192, "Total Pieces Count": 317121, "Duration": "57.764342ms"}
<I removed the databases and restarted the node, I get new numbers this time. Disk meta data was already cached from using the du command on the blobs folders so the scan finished in about 2 seconds which is more believable than 8 milliseconds>
2024-11-16T19:24:20Z	INFO	pieces	used-space-filewalker completed	 ... "Total Pieces Content Size": 82564829952, "Total Pieces Count": 277344, "Duration": "1.702440175s"}

Hountez · November 16, 2024, 10:46pm

I do have the same Problem, filewalker completes in a couple of ms.
I run windows with StorJ v 1.115.5.

Alexey · November 17, 2024, 2:25am

Did it update the used space after you have used a workaround?

mahir · November 18, 2024, 12:01pm

yes it did work,
only now that the node is growing in storage the filewalker is also taking more time, so there will be some data that is not in sync while doing the workaround.

Alexey · November 19, 2024, 7:48am

Still cannot reproduce on storj-up environment
It’s built from the main branch though. Would try to build for v1.115.5

mahir · November 19, 2024, 8:05am

i can reproduce it with (consistent);
start a fresh node (i am on v1.115.5)
let it accumulate some data
restart the docker (everything should be fine)
let it run and get some more data
now when you restart the container the 4 filewalkers will be done in a flash and the used space will be wrong

Alexey · November 19, 2024, 8:12am

I did it multiple times. The current version of storj-up is 1.114.6, so I built everything from main, but still cannot reproduce it.
This is doesn’t mean that I do not believe you, this is mean, that I cannot reproduce it.

Vadim · November 19, 2024, 7:07pm

@Alexey remember i hade same problem, I was shoked today to see this stats - Node Operators / troubleshooting - Storj Community Forum (official) it is database corruption.
As i fixed it after DB replacement.
My filewalker also readed all in several ms.

Alexey · November 20, 2024, 6:38am

Maybe.
@mahir can you reproduce this bug on the node where you have applied the workaround?

Because I cannot reproduce it on a new node in storj-up yet (my nodes doesn’t have this issue too, 2 docker, 1 Windows Service).

mahir · November 20, 2024, 2:23pm

fixed forever? or you have to do it every restart?

you mean reproduce what Vadim said? by deleting the 2 db’s and moving the other db’s back and forth (like with the workaround, but this time with deleting both used_space_per_prefix and piece_spaced_used)?

if it is enough to delete used_space_per_prefix only, then yes i still have the bug, and made a script to automate the workaround

also tried (again) on a fresh node (v115), same situation. the second restart 4 filewalker finishes in ms

Vadim · November 20, 2024, 2:27pm

After i chenged Db,s I didnt have any problem with it any more.

Alexey · November 21, 2024, 4:23am

I mean to try to restart the node where you already fixed the wrong usage using a workaround:

But does the usage is correct after a hour? Because a fast used-space-filewalker is not a bug, it’s a feature. But if in result it doesn’t update the usage, then this is a bug.

mahir · November 22, 2024, 9:28am

well for testing this i was running multiple nodes
and tried multiple cases, none of them did show the correct disk-space after 1-2hours.
only the workaround does fix it, but while the node grows, the workaround can take a while, so some data is lost i guess while the workaround is running?
deleting both db’s (what Vadim did) does also reset the “bandwidth used this month”

it has to be a bug i think.
also the first time you restart a fresh node it works as expected, the problem is the second time you restart

if its only me with this issue, i am fine using the workaround everytime.

Alexey · November 22, 2024, 10:52am

Of course not. Unless it’s expired or deleted by the customers.
The databases and the dashboard in particular almost for an informational purpose only. Only the expiration database may affect your node regarding how long the deleted data would be still on your node before elimination.

I tried to restart multiple times. The first value shown may differ from the previous one if the latest update was not flushed to the databases before the restart, but this is usually fixed by used-space-filewalker.

I cannot reproduce it so far. So, seems I missing something.

Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID     12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 3h18m30s
                   Available         Used     Egress      Ingress
     Bandwidth           N/A     16.81 MB        0 B     16.81 MB (since Nov 1)
          Disk       0.97 GB     31.20 MB
Internal :30002
External storagenode1:30001

docker compose exec storagenode1 rm /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1

docker compose exec storagenode1 storagenode dashboard --address=:30002

Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID     12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 32s
                   Available         Used     Egress     Ingress
     Bandwidth           N/A          0 B        0 B         0 B (since Nov 1)
          Disk       0.97 GB     31.20 MB
Internal :30002
External storagenode1:30001

ermejoromano · November 22, 2024, 1:19pm

I have exactly the same problem with my node 1.116.7 (docker on raspberry pi).
Today I did the first reboot and the used data dropped from 400Gb to 35Gb.
As other pointed out, the disk actually has about 400Gb worth of data.

I tried restarting the node with log filtering and am seeing the used-space-filewalker running through the satellites. It is not very quick for the us1 satellite which makes sense as it’s over 350Gb.
For the moment no errors on filewalker.
I will wait 2 hours and report back whether the dashboard has updated.

Edit: Almost 2 hours have passed, but the used-space-filewalker for us1 hasn’t finished yet! I am on a raspberry pi 4 with 4Gb of RAM and the total used space for us1 is about 400Gb but this still looks very slow to me…
ap1 was around 19Gb and took 1m 7s, so in theory if this was linear the us1 one should take around 23min30s

I decided to restart the node again and after doing that I see that 3 out of 4 used-space-filewalker are done within under 1 sec. The only one taking longer is us1.
Will keep you posted.

ermejoromano · November 22, 2024, 3:01pm

After restarting the node, the us1 filewalker took 8min and after refreshing the dashbaord everything is back to normal and the used space now makes sense and shows over 400Gb.

This didn’t even need the 2 hours of wait before refreshing the dashboard.

Alexey · November 23, 2024, 6:57am

The 2h wait period is suggested since the usage cache is flushed to the persistent storage (DB) every hour by default.