Wow, that's weird [drop on "Disk Space Used This Month" graph]

well you can go check the numbers, i’m sure it’s okay… does look kinda off… tho if you squint and tilt your head you can see actually on the high end of the avg from the other side of the dip, could be that its actually right…

can be very difficult to tell from this graph… it’s shit, but there are real numbers behind it you can go access, and i do believe some people more or less do the accounting for this from time to time…

doubt they do it every month tho… and it does happy they find mistakes, but that gets rarer and rarer all the time ofc…

but yeah it’s a representation of numbers ofc which really only the avg can be used for anything…

facepalms i hate this graph

same issue here:
image
not corrected next day either

I see the same issue:

Only europe-north-1 is tried to recover, but another 3 just loose data.

Is it Storj-Labs have a problem with satellites databases?

So I guess the other issue is this seems to be affecting only non-asian SATs, based on your graphs.

Even with Euro-N trying to correct itself… there’s something afoot with those four.

I confirm that europe-north tried to recover, asia-east didn’t have this issue.

doesn’t loose data, the calculation is delayed and then it either catches up or just slowly slips further and further behind… and because it’s most likely running on some sort of vm then if one vm uses more cpu the rest of the vm’s have less to work with…

which might explain why it seems to shift be the different satellites… it’s only because there is so much accounting data that it’s difficult for them to process it… last time before they noticed it was running behind it was like more than a month behind if not 6 months… i forget the actual number

was only on one satellite tho, last time… to my knowledge, but i’m not really that well informed on this.
would make sense that they had problem with processing, moved it to some sort of shared cloud thing and now a few months later ran out of processing power again…

that would make sense if it’s being run from one cluster, ofc each continent would need its own, but in some regions multiple satellites might be easier to just keep on the same hardware… or it was part of the patch implemented last time.

maybe we should hear what @littleskunk has to say

but it’s been an ongoing issue, and has no detrimental impacts aside from at worst late payouts because the space accounting has not been processed or whatever…

Personally, I don’t get that much payout yet, so I don’t have any issue even if couple of cents get lost or are delayed.
My goal here is to only report and confirm that I also have this issue in order for Storj team to be aware of scale.

2020-11-23 20_43_36-Node Dashboard same

looks like this explains why it’s going all crazy right now…
they are migrating the satellites cluster and the accounting system is being replaced… so no wonder its acting up. you can read about it in the new change log

Yea, saw that. Not happy about being part of an unclean migration, but understand it can happen. Just not sure though why there wasn’t more testing to develop a fix to mitigate this- I mean I’d hope there’s a CI/CD setup specifically for making sure things aren’t breaking too bad.

1 Like

well it’s to fix the issue we pondering… or that’s how i understand it…

and it’s not critical infrastructure for the nodes, meaning it really can’t go bad, from our perspective…
and when implemented maybe this disk space used this month graph will finally stop being all weird because the new system will not require the same level of computation that the old system did…

ofc it’s critical for tardigrade that there is no double spend… but even if there was for a short time… it doesn’t matter to much… ofc i suppose in theory the worst case would be tardigrade denying uploads if the system truly failed…

and the reason it’s affecting the space used … graph might just be because the cluster / satellites are doing a lot of processing right now to convert into the new system or whatever…

i duno how it works… but it doesn’t worry me one bit… i cannot imagine this affecting SNO’s much

i’m more worried about when to update… i skipped the last update because there was a good deal of people having issues with the new orders.db or whatever… in 1.16.1
so hopefully i won’t get into trouble when going from 1.15.3 to 1.17.1
and hopefully more SNO’s won’t suffer when updating from 1.16.1 to 1.17.1

I must have missed the bullet on that one- 1.15.3 to 1.16.1 went fine and I don’t imagine major issues into 1.17.4 - that being said, stranger things have happened.

from my understanding it was only an issue that hit very old nodes maybe… something with some of their orders being from an old version… so in theory i should be okay to just update…
but saw more than a few affected on the forum… ofc thats how it goes when updating software infrastructure or whatever one calls it… core processes, functions, thingamajigs…

most likely not be a huge issue since we didn’t see a new patched release, something which i sort of expected… found that a bit surprising, but maybe there was no fix for it and if 98% of the nodes where updated… then the damage already had happened…

1.17.4 … i think i need to read up on version numbers again… why is it .4 and not .1
i mean we went from 1.15.1 to 1.15.3 (because of a mistake) then we went back to 1.16.1 and now we go to 1.17.4
either i can’t count or whoever is in control of these version numbers can’t…

I believe there was a 1.17.1 for a period- but I think there’s merit to 1.17.4:

1 Like

Here are the “missing” releases:


Not every version is released outside Storj’s own development.

Overall, version numbers are pretty arbitrary. Chrome is on version 87 for example. It’s whatever works for the devs.

1 Like

i just don’t really understand why public released versions wouldn’t be sequential, sure it’s more or less completely irrelevant, ofc there maybe some advantage to using the same versions for development as for public releases…

i was just wondering if there was something i missed about understanding how to read the version numbers, i can ignore the last number … :smiley: if it helps the devs
just have to remember to write 1.17.x instead
does that mean devs have x factor?

This about why we rollout to SN different patch versions on each minor version.

Let me give some context about SN rollout.

SN rollouts take time because we release them in ~24 hours steps that make the new version applicable to an increasing % of storage nodes. Currently, the steps are: 5%, 10%, 20%, 40%, 80%, 100%. Weekends and official company holidays are excluded from this cadence, which means that we don’t promise to follow the next % rollout during those days, although we sometimes do; when some rollout isn’t finished and runs on any of those days, the rollout continues the following business day.

The storage nodes which follow the cursor are the ones that use the storage node updater which, currently, are the ones installed through or Windows and Linux installer.

The ones using the Docker images and Watchtower are updated when we publish the new Docker images and Watchtower executes the check. The Docker images are just published right after we deploy the 100% step.

But, why do we only roll out some versions?

Because sometimes we need/want to release some Satellite updates and avoid selecting all the SN changes that we want to release for the next release and other times, there aren’t any new changes for the storage node so we wait until we have them to create a new release.

Considering that some Satellite updates are released in a pretty short period of time, just hours if we start a new SN release roll out on each version, we would end up that SN release may never catch up with the last version due to our ~24 hours percentage release commented above.

I hope that this comment clarifies, why SN releases aren’t consecutive.

5 Likes

My stats this month show this curious drop in storage usage.

image

How can it happen?
I haven’t had downtime. And if it were due to somebody deleting and reuploading, I would have seen the effect on bandwidth (and bandwidth usage haven’t skyrocketed).

That’s what happened to everyone

image

it doesn’t graph disk space used, it’s graphing satellite storage calculation speeds…
you can basically super impose rmon’s graph on top of mine and see the exact same thing.
nearly… the only thing you can take away from this graph is the average it gives, it could be represented in a single line of the avg over the days and it would be more useful and accurate… lol
makes me sad and angry every time i look at it… it’s just mass broadcast confusion.

just look at total disk space used space and then generally thats about what you’ve had for a month… and what you will get paid for…
ofc the less data stored the less accurate it will be because it can change rather quickly, like in dada181’s case… because it’s a new node.

1 Like