API disk space used negative

felixbrucker · September 15, 2020, 7:24am

With one of the latest updates it seems like the space used calc used in the storagenode API has changed. Assume the following scenario: Your node has 5TB of data stored on disk. In your config you set the storage.allocated-disk-space to 1TB.

Previously it showed the used space of 5TB correctly in the API, now it shows a negative amount that im not sure how it is calculated.

Anybody else seen this?

Alexey · September 15, 2020, 8:17am

I have node with 7.0TB used.

((curl http://127.0.0.1:14002/api/sno).Content | ConvertFrom-Json).satellites.id | %{"$_"; ((curl http://127.0.0.1:14002/api/sno/satellite/$_).Content | ConvertFrom-Json) | %{$_.storageSummary,$_.bandwidthSummary}}

118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW
867684838982.8856
188792576
1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE
1220013470465702.2
534805791744
121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6
92304750146855.62
48094575616
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S
93660540289892.98
50200412416
12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs
227493662388082.22
80415960320
12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB
748961356923475.4
116561979392

Changed the 7.0TB allocation to 1.0 TB, restarted the service

((curl http://127.0.0.1:14002/api/sno).Content | ConvertFrom-Json).satellites.id | %{"$_"; ((curl http://127.0.0.1:14002/api/sno/satellite/$_).Content | ConvertFrom-Json) | %{$_.storageSummary,$_.bandwidthSummary}}

118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW
867684838982.8856
188792576
1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE
1220013470465702.2
534831304704
121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6
92304750146855.62
48099212032
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S
93660540289892.98
50201259776
12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs
227493662388082.22
80418309120
12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB
748961356923475.4
116564298752

The stat metric

7.0 TB

((curl http://127.0.0.1:14002/api/sno).Content | ConvertFrom-Json).diskSpace

         used     available     trash
         ----     ---------     -----
6998714426496 7000000000000 791169408

1.0 TB

((curl http://127.0.0.1:14002/api/sno).Content | ConvertFrom-Json).diskSpace

        used     available     trash
        ----     ---------     -----
270179595904 1000000000000 791169408

So, the .diskSpace.used is decreasing with decreasing of allocation.
And how it looks on the dashboard:

Before the change (allocation 7.0 TB)

After the change (7.0 TB → 1.0TB)

felixbrucker · September 15, 2020, 10:20am

ah, im not alone, looks like a bug then

i have created a bug issue here: https://github.com/storj/storj/issues/3942

Nikolai_Siedov · September 15, 2020, 12:46pm

Its overall bad approach, to calculate used space we calculate space used by pieces of each satellite, it can’t decrease to 1 TB if it was 7 TB even since u changed allocated space, that’s why you would receive negative space, but since we had negative space problems before we added check to code that calculates if allocated-used-trash < 0 we recalculate directory’s free space. To avoid this wrong calculations we are going to add partial graceful-exit for extra pieces after decreasing allocated space(it is in our roadmap). If you have any ideas or suggestions how to handle this right now we would kindly check it and try to implement.

baker · September 15, 2020, 12:58pm

On my node my free space was negative (by 30 GB) running on v1.11.1. After the update to v1.12.3, the node shows the correct used space, but now is showing free space equal to the amount of physical space available on the disk. This might just be a reporting problem, since the node is not starting any new uploads to fill this “free” space.

Any chance this recalculation could be the problem?

felixbrucker · September 15, 2020, 1:05pm

I’d expect the following to happen once i decrease my allocated space below used space:

My used space remains where it was, unchanged
My free space is now negative
NEVER ever should there be any graceful exit because of this change (!)
The node stops receiving new pieces and only uploads existing files (if requested), till it’s used space is below the allocated amount again

baker · September 15, 2020, 1:28pm

Actually, there is a discrepancy between the CLI and Web Dashboards:

Storage Node Dashboard ( Node Version: v1.12.3 )
======================
ID     
Status ONLINE
Uptime 1h18m37s

                   Available          Used        Egress      Ingress
     Bandwidth           N/A     600.66 GB     577.09 GB     23.56 GB (since Sep 1)
          Disk     339.41 GB       3.39 TB

user@rock64:~$ df -H | grep /mnt/storj1
/dev/sdc1       4.0T  3.4T  340G  91% /mnt/storj1

(sorry to jump in on this thread, but it seems related to me)

Nikolai_Siedov · September 15, 2020, 4:18pm

Weird, since calculations on cli and web dashboards are same, I’m going to check why it keeps happening.

Nikolai_Siedov · September 15, 2020, 4:20pm

Thanks, we will discuss it with team asap and decide what are the next steps to improve this part!

BrightSilence · September 15, 2020, 11:19pm

Partial graceful exit would be a great feature, but I would prefer it to not trigger automatically. Sometimes it can be useful to briefly reduce the load on your node by lowering the allocation to just not accept any new uploads. But that doesn’t mean you necessarily want to get rid of the data. So I suggest having a separate command to actually free up the space.

BrightSilence · September 17, 2020, 7:30pm

This is really confusing for the end user though, since depending on the situation, this number now means something completely different.

I recently had a fairly extreme version of this problem. I run one node on a Drobo device, which uses thin provisioning. It has a 16TB volume, but far less physical disk space. I had only 1.9TB assigned to Storj, which I now wanted to lower to 1.8TB. (Drobos get slow when they fill up beyond 75%, so I wanted to lower it slightly so it would eventually drop below that threshold. I’m not in a hurry for this though, so don’t want to trigger partial exit even if it were an option.)

This results in the CLI dashboard now showing the available space in the thin provisioned volume. Which is meaningless.

The web dashboard is even more surprising!

Apparently the node is currently using negative 13TB!

For what it’s worth, it does look like the node doesn’t actually tell the satellites that it has free space, since it’s not getting new data. But it’s really confusing to track how much data is actually available or how far above the assigned space the node is.

As an end user I would by far prefer that the node ALWAYS displays available space compared to the assigned space. Even if that leads to negative numbers, as that at least makes it clear that the current usage is above the assigned space.

baker · September 17, 2020, 7:42pm

I am seeing a less extreme version of the exact same problem as explained above. @Nikolai_Siedov did you manage to find time to look into this?

julez · October 1, 2020, 9:01am

Looks like this commit in 1.13.2 has removed the temporary workaround that caused this issue.

felixbrucker · October 1, 2020, 9:31pm

looks like that will be part of >= 1.14.0 tho, not included in 1.13.3, only 1.14.0-rc

Alexey · October 2, 2020, 8:26pm

2 posts were split to a new topic: There is a discrepancy between the CLI and Web Dashboards in used space