EU1 sudden drop in storage usage reported by satellite (Not just the graph display issue this time, database shows unusual low data usage in the last report from the satellite)

BrightSilence · May 7, 2024, 10:20am

I was working on a fix for my earnings calculator that resulted in faulty low Disk Last Report usage from the satellite, due to 0 records being added and had just implemented a fix when I encountered something strange in the last report from EU1.

Screenshot of storage_usage.db

The second to last line was causing the initial issue, which I fixed by ignoring those lines. That works for past results just fine, but for some reason the next report suddenly shows significantly lower storage usage for EU1. Like down to a third of what it used to be. Even though is should report on about 26 hours of usage, just like the few lines before it.

Is this a clean up initiated by Storj? Did something go wrong in storage usage calculation? Or am I interpreting the data wrong?

Note: at_rest_total is the storage used in byte*hours between the interval_end_time of the previous none-0 record and the current record. At least that’s how I’ve always interpreted it and that seemed to have always been correct. Fluctuations in at_rest_total are usually caused by different time intervals being reported on, but with that last EU1 record, the time interval is similar to previous records, yet at_rest_total has dropped to a third, indicating removal of 2/3rds of the data.

Edit: For context, here is a screenshot of such a 0 record happening in the past, but here you can clearly see the following record accounts for all storage usage since the record prior to that 0 record.

Aitor · May 7, 2024, 11:32am

My stats are confusing again too. I don’t know if it’s the same error the last month or not but it makes me have doubts about the entire project and makes me want to turn off the node.

BrightSilence · May 7, 2024, 11:40am

I wouldn’t have posted a new topic if it was the same error. Previous times it was just a display glitch related to the 0 records or missing reports. This time there seems to be an actual usage report with too low usage.

There’s no reason to have doubts about the project in my opinion. I’ve been around a long time and serious issues have always been addressed. Display glitches may stick around for a while as they don’t have any material impact, but if this is more than that, I’m sure it will be addressed appropriately. Let’s just wait for Storj to respond. Its perfectly possible that I’m misinterpreting things, so I’m not ready to jump to conclusions just yet.

Knowledge · May 7, 2024, 11:53am

I have asked the team to have a look.

BrightSilence · May 7, 2024, 12:04pm

Thanks @Knowledge . Let me know if someone needs a node ID to look into it. I’ll help any way I can from my end.

@Aitor : See. Took them barely 2 hours to respond and take it seriously. Now let’s just give them a bit of time to investigate, as the cause of these things might not be immediately obvious and may need several people to be involved.
An attitude of “trust but verify” has always worked just fine for me here.

Ottetal · May 7, 2024, 12:34pm

Good work on documenting errors

michaln · May 7, 2024, 2:26pm

Having node ID which was affected by this issue will be definitely helpful.

BrightSilence · May 7, 2024, 2:30pm

I sent you a PM with the ID. Thanks for looking into it!

arrogantrabbit · May 7, 2024, 3:36pm

I see the same dip on all my nodes in NA. I don’t think it’s specific to the node. I can’t check which or how many satellites are reporting this (the drop down list hasn’t worked for ages) but here is the screenshot,

And here is node id:

12qzCGvytEbLeTNwVPrETPTfofECoWkeL3cJoHrrpdgy7Uv8QCs

wildwaffle · May 7, 2024, 3:59pm

Maybe the the problem is not specific to eu1.

77% of this node’s data belongs to us1, and only 21% to eu1.

And it doesn’t look like a 21% drop.

BrightSilence · May 7, 2024, 4:19pm

The issue in this topic has coincided with a day of skipped reporting. Unfortunately the screenshots of the dashboard aren’t useful as result of that, as they may just point to the display glitch we’ve had before.

If you want to check you’ll have to get into the storage_usage.db file. I recommend working on a copy so you’re not interrupting your storagenode. You can install sqllite and use CLI or use https://sqlitebrowser.org/ like I did.

Satellite ID’s are binary, so it’s easiest to run

select *, hex(satellite_id) from storage_usage

Then scroll all the way down for the latest storage reports. Also please read my top post for how to interpret the data.

At this point just posting dashboard screenshots is unfortunately not useful.

wildwaffle · May 7, 2024, 7:15pm

I did not make SQL queries, but made a couple of node API calls instead. This is the same data, I guess?

saltlake.tardigrade.io:7777

{“atRestTotal”:5385395790.287455,“atRestTotalBytes”:224391491.2619773,“intervalInHours”:24,“intervalStart”:“2024-05-01T00:00:00Z”}
{“atRestTotal”:5166453222.760814,“atRestTotalBytes”:224628400.9896006,“intervalInHours”:23,“intervalStart”:“2024-05-02T00:00:00Z”}
{“atRestTotal”:4886447299.462085,“atRestTotalBytes”:222111240.88464022,“intervalInHours”:22,“intervalStart”:“2024-05-03T00:00:00Z”}
{“atRestTotal”:5882125887.29211,“atRestTotalBytes”:226235611.04969656,“intervalInHours”:26,“intervalStart”:“2024-05-04T00:00:00Z”}
{“atRestTotal”:5926946714.875766,“atRestTotalBytes”:219516544.99539873,“intervalInHours”:27,“intervalStart”:“2024-05-05T00:00:00Z”}
{“atRestTotal”:5238547274.144752,“atRestTotalBytes”:227762924.96281528,“intervalInHours”:23,“intervalStart”:“2024-05-06T00:00:00Z”}
{“atRestTotal”:1258055567.606437,“atRestTotalBytes”:83870371.17376247,“intervalInHours”:15,“intervalStart”:“2024-05-07T00:00:00Z”}

ap1.storj.io:7777

{“atRestTotal”:2356294395.0678535,“atRestTotalBytes”:102447582.39425449,“intervalInHours”:23,“intervalStart”:“2024-05-01T00:00:00Z”}
{“atRestTotal”:4201167166.6145864,“atRestTotalBytes”:175048631.94227442,“intervalInHours”:24,“intervalStart”:“2024-05-02T00:00:00Z”}
{“atRestTotal”:7762149472.636321,“atRestTotalBytes”:337484759.67984,“intervalInHours”:23,“intervalStart”:“2024-05-03T00:00:00Z”}
{“atRestTotal”:12781757385.352081,“atRestTotalBytes”:532573224.3896701,“intervalInHours”:24,“intervalStart”:“2024-05-04T00:00:00Z”}
{“atRestTotal”:15876136087.538837,“atRestTotalBytes”:661505670.3141183,“intervalInHours”:24,“intervalStart”:“2024-05-05T00:00:00Z”}
{“atRestTotal”:17248856244.113106,“atRestTotalBytes”:749950271.4831785,“intervalInHours”:23,“intervalStart”:“2024-05-06T00:00:00Z”}
{“atRestTotal”:12259365325.518219,“atRestTotalBytes”:766210332.8448887,“intervalInHours”:16,“intervalStart”:“2024-05-07T00:00:00Z”}

us1.storj.io:7777

{“atRestTotal”:135491981858.88416,“atRestTotalBytes”:6158726448.131098,“intervalInHours”:22,“intervalStart”:“2024-05-01T00:00:00Z”}
{“atRestTotal”:242115828163.01443,“atRestTotalBytes”:11529325150.619736,“intervalInHours”:21,“intervalStart”:“2024-05-02T00:00:00Z”}
{“atRestTotal”:374562388716.4631,“atRestTotalBytes”:15606766196.519295,“intervalInHours”:24,“intervalStart”:“2024-05-03T00:00:00Z”}
{“atRestTotal”:410198847332.55286,“atRestTotalBytes”:19533278444.40728,“intervalInHours”:21,“intervalStart”:“2024-05-04T00:00:00Z”}
{“atRestTotal”:770972808294.7119,“atRestTotalBytes”:23362812372.567028,“intervalInHours”:33,“intervalStart”:“2024-05-05T00:00:00Z”}
{“atRestTotal”:0,“atRestTotalBytes”:0,“intervalInHours”:-17736262,“intervalStart”:“2024-05-06T00:00:00Z”}
{“atRestTotal”:0,“atRestTotalBytes”:0,“intervalInHours”:0,“intervalStart”:“2024-05-07T00:00:00Z”}

eu1.storj.io:7777

{“atRestTotal”:59342899576.29383,“atRestTotalBytes”:3708931223.5183644,“intervalInHours”:16,“intervalStart”:“2024-05-01T00:00:00Z”}
{“atRestTotal”:103817707497.24464,“atRestTotalBytes”:4325737812.385194,“intervalInHours”:24,“intervalStart”:“2024-05-02T00:00:00Z”}
{“atRestTotal”:138176462408.55994,“atRestTotalBytes”:5314479323.406152,“intervalInHours”:26,“intervalStart”:“2024-05-03T00:00:00Z”}
{“atRestTotal”:164786153942.29303,“atRestTotalBytes”:6591446157.691721,“intervalInHours”:25,“intervalStart”:“2024-05-04T00:00:00Z”}
{“atRestTotal”:243066076654.5307,“atRestTotalBytes”:9348695255.94349,“intervalInHours”:26,“intervalStart”:“2024-05-05T00:00:00Z”}
{“atRestTotal”:173607931962.00195,“atRestTotalBytes”:10850495747.625122,“intervalInHours”:16,“intervalStart”:“2024-05-06T00:00:00Z”}
{“atRestTotal”:191285819299.51373,“atRestTotalBytes”:11252107017.618454,“intervalInHours”:17,“intervalStart”:“2024-05-07T00:00:00Z”}

BrightSilence · May 7, 2024, 7:25pm

API data is derived from the databases, but there is calculation inbetween to summarize per day. The bold lines are a result of the 0 records not being handled well by that calculation. However, Saltlake seems to be showing a similar issue on your node as I saw on mine, but without the raw data in the database it’s hard to say if it’s actually a report that is too low or a miscalculation in the API.

I’m less concerned if the node does a miscalculation locally vs the satellite miscalculating. As a satellite miscalculation would impact payouts.

wildwaffle · May 7, 2024, 7:36pm

As the node is young, these are all the results from query.

Vicente · May 7, 2024, 7:54pm

Same problem on my nodes. A sudden drop in capacity in the last two days.

Knowledge · May 7, 2024, 8:03pm

Summarizing - It looks like there may have been an issue with tallies loading on the 6th for EU1. They recently (today) upgraded the EU1 cluster. It may take some time for the SNO graphs to catch up.

BrightSilence · May 7, 2024, 8:21pm

Thanks for the update!
Got quite a few questions, though I understand if you don’t have the answers just yet.

Is it possible Saltlake was effected as well, because @wildwaffle’s last screenshot seems to show a similar issue on Saltlake?
Since incomplete tallies were sent to nodes, will they be replaced in the node db? Because if not, the graph won’t fix itself when new tallies come in.
If tallies won’t be replaced, will the next tally compensate for missed Bh? Resulting in the graph showing a valley and a peek.
If neither 2 or 3 are answered with yes, how else will it be corrected?
Will any of this effect payouts?

Ps. Just to be clear this is very much not about a delay in storage usage reports or a graph issue. But about actually received reports that are too low.

Fenix78 · May 8, 2024, 7:14am

I’m having the same issue for both of my nodes. On the 6th of May the disk space used in the graph went to zero for the us1 and eu1 satellites. The saltlake and ap1 seemed to be unaffected.

But when I woke up this morning I was pleasantly surprised that for the eu1 the graph seems to be fixed (but not for the us1 yet).

I have to check whether the earnings report will be corrected. For the eu1 alone, I have $ 0.79 for 7 days of 2.5 Tb used. That seems to be correct now.

So the issue seems to be fixed on the eu1, but not the us1 yet.

nyancodex · May 8, 2024, 8:07am

I see a drop from many of my nodes on both EU1 and US1.

BrightSilence · May 8, 2024, 9:02am

In the interest of providing my own answers whenever possible. This seems to be the case. On this node my storage_usage.db now looks like this for EU1.

A missed tally on the 6th was added and the tally on the 7th now seems correct.

I’m seeing the issue on US1 now though.

Two missed days and then a record for the 8th that is way too low. Since I saw it resolve itself on EU1, I’m not too worried anymore about this.

But I would like to ask how we are supposed to interpret this data? I understand not having a new calculation because it took a little longer, but I’m really confused about receiving new records with numbers that are too low. I would like to get an accurate correct measure of the latest storage usage according to the satellite for my earnings calculator, but I have no clue how to derive that from this mess. Help would be appreciated.