I wrote this quick how-to to help debugging space discrepancies (When reported used space is different from the observer one). It might be too technical, and probably you don’t need it for normal Storagenode operations. But if you are interested what’s going on under the hood and keen to debug internals, here it is.
There are multiple sources of the used space:
- Operating system
- Storagenode
- Satellite
Operating system
Let’s start with the operating system based space usage. blobs
directory supposed to storage all the segment files, each has a small metadata header (512) and the raw data. Also, you may have overhead depending on the used file system.
In my case:
sudo du --max-depth=1
4 ./temp
26060 ./trash
8612044 ./blobs
4 ./garbage
8639576 .
sudo du --max-depth=1 --apparent-size
4 ./temp
26056 ./trash
8599010 ./blobs
4 ./garbage
8626659 .
As you see, OS thinks that I used 8.6 GB (8.2 GiB)
blobs
directory has a separated subdirectory for each Satellite. You can double check if unused Satellites has storage directories or not.
trash
directory is separated and will eventually be purged. Having too big trash can also be a signal of a problem.
Storagenode
Storagenode calculates the space with walker
, which checks all the blob files one by one. Due to the many small files it’s quite extensive operation, therefore it’s executed during the startup, and counters just updated in-memory during new reads/writes.
Executions are async, as the result of the previous execution is stored in a local cache database, it may not fully update…
Note: To avoid huge IO pressure for big nodes, there is a lazy walker implementation which executes the size calculation in a separated process with lower IO priority.
To check this value, you have 3 options:
A) check the current value (from the dashboard or from the JSON output)
This one is displayed on the Storagenode Dashboard, but you can also get it in the geeky way: curl -v 127.0.0.1:14001/api/sno/ | jq '.'
(or just 127.0.0.1:14001/api/sno
in your browser).
In my case:
{
"nodeID": "...",
"wallet": "0x0000000000000000000000000000000000000000",
"satellites": [
{
"id": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S",
"url": "us1.storj.io:7777",
"disqualified": null,
"suspended": null,
"currentStorageUsed": 8776057856
}
],
"diskSpace": {
"used": 8778217472,
"available": 1000000000000,
"trash": 0,
"overused": 0
},
As you see I connected only to one satellite (this is a special storagenode what I have access to).
Both satellite level and summarized numbers (diskSpace
) are based on the results of the file walker.
(B) cached values
These numbers are also stored in a local database to make the next startup fast (we don’t need to wait until everything is calculated).
/opt/storagenode1/config/storage# sqlite3 piece_spaced_used.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite>select * from piece_space_used;
8778217472|8776057856|
0|0|trashtotal
8778217472|8776057856|O
Ll+z&'RMn
Here you can see the data both with and without the metadata header (512 bytes per blob file).
In my case: (8778217472 - 8776057856) / 512 = 4218
, so I expect that number of blob files in my storage directory.
Which seems to be fine:
cd storage/blobs
find -type f | wc
4218 4218 476634
(I have also one satellite, so I didn’t enter inside subdirectories)
Satellite
Thanks to the walker separation, the value also can be checked with the walker itself (for each satellite).
echo '{"satelliteID":"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}' | ./storagenode used-space-filewalker --pieces ./config/storage --storage ./config/storage --info ./config/storage/pieceinfo.db --info2 ./config/storage/info.db
2024-01-08T13:29:28Z INFO Database started {"process": "storagenode"}
2024-01-08T13:29:28Z INFO used-space-filewalker started {"process": "storagenode"}
2024-01-08T13:29:28Z INFO used-space-filewalker completed {"process": "storagenode", "piecesTotal": 8778217472, "piecesContentSize": 8776057856}
{"piecesTotal":8778217472,"piecesContentSize":8776057856}
(3) Finaly the space which is calculated by the Satellite itself. This is the base of the payment (and supposed to be close to the value what storagenodes see).
This is calculated by Satellite 2-3 times per day. We call this component as ranged-loop as it iterates over the segment data in ranges, and does different calculation on the stored segment metadata. In our case it’s the sum of the sizes.
The values are retrieved by storagenode and stored in a local database:
cd /opt/storagenode1/config/storage
sqlite3 storage_usage.db "select timestamp,hex(satellite_id),at_rest_total,interval_end_time,julianday(interval_end_time) from storage_usage order by satellite_id,timestamp desc limit 6"
2024-01-08
My results:
00:00:00+00:00|A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000|37182796689.5248|2024-01-08 03:01:24.917097+00:00|2460317.62598284
2024-01-07 00:00:00+00:00|A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000|218215160132.345|2024-01-07 22:40:25.315725+00:00|2460317.44473745
2024-01-06 00:00:00+00:00|A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000|180876738958.695|2024-01-06 21:08:43.610494+00:00|2460316.3810603
2024-01-05 00:00:00+00:00|A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000|269904445870.378|2024-01-05 23:59:07.048166+00:00|2460315.49938713
2024-01-04 00:00:00+00:00|A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000|154081883010.896|2024-01-04 16:24:36.307736+00:00|2460314.18375356
2024-01-03 00:00:00+00:00|A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000|199409017263.589|2024-01-03 22:23:04.446944+00:00|2460313.43269036
(please note that you need to do this per satellite)
The trick here is that we don’t really have exactly daily values, because the space usage is update periodically. So what we need is to check the period between one line and the previous line. For example:
37182796689.5248 / (2460317.62598284 - 2460317.44473745) / 24
~8547986767.53580693371934407086
Here, the first value is the at_rest_total
in byte hours. 2460317.62598284 - 2460317.44473745
is the difference between two lines (julianday(interval_end_time)
) (in days).
As you see here, we have a few hundred megabytes differences between the space calculated by Satellite and space calculated by Storagenode. Usually it’s because there were deleted segments which are not yet deleted on the Storagenodes.
I wouldn’t like to go into the details here, but Satellite does sg. like this:
Example:
- last daily calculation finished at 22 pm at Monday
- 23 pm at Tuesday
- 21 pm at Wednesday
In this case the Tuesday report will report 25h (104%) of usage, Wednesday report will include 91% of the usage (22h). This doesn’t make any problem during billing, as usage is calculated based bytes-hour, but can be misleading (that’s why we calculated the used hours above, with using juliandate function)
Segments might be deleted in two different ways:
- By Storagende itself, based on expiration time (if segments are uploaded with expiration time)
- By Satellite, if related key/object is deleted.
The second is more complex, as this is an async process:
- Satellite should iterate over all the available segments and group them by Storagenodes
- Each group is compressed and archived to a Storj bucket (more technically: it’s not a compression, but a bloom-filter. It’s is way more smaller, but a few segments may be deleted only later time…)
- And finally Satellite delivers all the filters to the Storagenodes (
Retain
call) - Storagenode will check all the bloom filters one by one and Trash the pieces which are not required any more
- Trash content is checked every 24 hours and files which are deleted earlier than one week are permanently deleted
This is usually called once per week (it’s quite an expensive process as we need to check all the segments).
If you have Prometheus, you can check a few interesting metrics:
(TLDR; metrics with retain_
prefix are updated when Satellite sends the request to Storagenode, garbage_collection_
metrics are updated when jobs are done…):
retain_creation_date{field="count"}
→ This is incremented by one when the bloom filter execution is created. Usually during the weekends…garbage_collection_pieces_skipped{field="recent"}
→ Number of pieces which couldn’t be read during the last GC processgarbage_collection_pieces_count{field="recent"}
→ Number of all piece blobs.garbage_collection_pieces_to_delete_count{field="recent"}
→ pieces which supposed to be deletedgarbage_collection_pieces_deleted{field="recent"
→ pieces which actualy deleted (ideally, should be the same as the previous number)garbage_collection_loop_duration{field="recent"}
→ GC execution time (seconds)
If you don’t have prometheus, you can check the debug endpoint manually (but without the historical data):
For example:
curl 127.0.0.1:8001/metrics | grep garbage_collection | grep recent
Assuming you set the STORJ_DEBUG_ADDR=127.0.0.1:8001
environment variable for the storagenode process.
Check the log!
This is how it supposed to work, but any of these components can be failed (but not typical). So if you have any strange disk space usage discrepancies:
- check for any logs (especially eroor lines with reference to
walker
,gc
/garbage_colector
, bloom filter ortrash
) - check the size of the trash (only the pure blobs folder should be in sync with the usage calculate by Satellite)
- check if you had GC proesses recently…