Bug: SN gross space limit overstep

I’ve been “struggling” half of june, 5TB allocated, but ended up at -150GB free around the 15th.
storj is on a 13TB drive, so nothing wrong is happening, but would’ve preferred it to work right. Trash and garbage-folders add up to about ~20GB.
Wondering about allocating a bit more space now…

1 Like

Finally someone admitting to have the same problem!

Hi @Storgeez.

Thank you for reporting this issue. This is a known one. We are currently working with fixing it, so hopefully fix will be merged in the nearest future.

Have a nice day.

3 Likes

Oh finally an official response, great! Thanks!

Any updates on this?

1 Like

How about now?

Still using about 170 GB more than advertised.

It will not be reduced until customers remove their data. The fix supposed to do not allow such issue anymore.
And since there is no cases for the last month, we can assume that this is fixed.

But your node will keep an overusage until customers would delete their data.

I don’t understand, I’m not asking for customers to remove their data, I’m asking for the SN to account for the used storage properly. I’m fine with the extra used space, I have enough for 200 GB extra, but I want the dashboard to reflect it properly and not hide it/say I’m using 200 GB less than I actually am.

What the filesystem on your drive?
What is reported by web dashboard now?
What is reported by CLI dashboard now?
How much allocated in the config (binary version) or STORAGE option (docker version)?
What the actual used in blobs? (in SI units)
What the actual used in trash? (in SI units)

du -hcd 1 --si /mnt/y/storagenode/storage

What the size of all databases? (in SI units)

du -hc --si /mnt/y/storagenode/storage/*.db
2 Likes

Filesystem is ZFS.
Web dashboard is reporting 7.15 TB total, 6.98TB used, 164.80 GB free, 1.35 GB trash.
CLI dashboard is reporting 6.98 TB used, 164.80 GB free.
Allocated space is 6.5 TiB.
“du -hcd 1 --si” for blobs folder returns 7.1T.
“du -hcd 1 --si” for trash folder returns 1.6G.
“du -hc --si” for *.db returns 964M.

7.1 TB + 1.6 GB + 964 MB = 7.102564 TB
7.15 TB - 7.102564 TB = 0.0474360000000003 TB or 47.436 GB is actually free in your allocation.
So, your local database is missed 117.364 GB used on your disk.

How much space used by Stefan satellite?

du -hs --si /mnt/y/storagenode/storage/blobs/abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa

Well, more than that due to low precision output from the command but, yeah, approximately.

94MB. Why didn’t that get deleted if the satellite is no more?

We can try to fix the issue with the local database, but I’m not sure will it help or not.
We currently do not have any auto fixing tools, since the database corruption percentage is low.

  1. Stop the storagenode
  2. Create a backup of piece_spaced_used.db database
  3. Remove the piece_spaced_used.db database
  4. Execute either with a local sqlite3 (make sure that version is not older than v3.25.2), or with a docker version (see https://support.storj.io/hc/en-us/articles/360029309111 for reference), specify correct path to piece_spaced_used.db:
sqlite3 piece_spaced_used.db
  1. When you see a sqlite> prompt execute this script:
CREATE TABLE versions (version int, commited_at text);
CREATE TABLE piece_space_used (
                                                total INTEGER NOT NULL DEFAULT 0,
                                                content_size INTEGER NOT NULL,
                                                satellite_id BLOB
                                        );
CREATE UNIQUE INDEX idx_piece_space_used_satellite_id ON piece_space_used(satellite_id);
insert into versions values(29, datetime('now', 'utc'));
insert into versions values(30, datetime('now', 'utc'));
insert into versions values(31, datetime('now', 'utc'));
.exit
  1. Start the storagenode
  2. Let it work with the disk (it could take a few hours for full tree traversal);
  3. Check usage on dashboards

I thought the shutdown of the satellite means all data from that satellite gets deleted on the nodes, not all the pieces individually.

But regarding the piece_spaced_used.db, this holds the sum of the space used by all the pieces from individual files on the drive instead of from the piece database?

The last corruption I had was a year or so ago and it was fixed and I havent had any ungraceful shutdowns since then. Unclear what happened to the database. I ran the command as you requested, rebuild in work now.
It seems excruciatingly slow at the moment, running at 1TiB/month, hopefully it speeds up, I cannot stop uploads the normal way, I don’t want to break something.

I will report back when done.

Thanks.

For the zfs it is normal. It do traversal through all disks in the RAID.
If you mean the load to the network, then there is no any predications from my side :slight_smile:

Yes I mean the database summing. This is faster than a single drive, or something’s not quite right, the disks aren’t even fully loaded, they’re just above 50%. ETA 7 months. This process resumes normally after restarting the node?

It is holding at around 50 IOps, which is what I normally see after restarting the SN. Perhaps some startup task is still running.

And why can’t it just pull the data out of the main database that holds metadata?

It should check the actual space usage. When it used the information from your local database with pieces info, you saw a discrepancy.
So, let it run to check all pieces. Please, do not restart the storagenode until it finishes.

Oh wow it finished already, okay. It might’ve been showing the space used by data I received during the check in the SNOBoard.

I think there’s still an issue, running du now to check how much actual used space changed. SNOBoard reports 7.01 TB used.

Okay, du finished. Total usage of storage folder is 7.1655 TB, while the SNOBoard says 7.01 TB.

In which folder the excess 0.15 TB?