Announcement: major storage node release (potential config changes needed!)

Can you confirm that this change does not have impact on this issue?

Sometimes we observe that nodes do not get any ingress for no apparent reason. A restart solves it and uploads re-appear in the logs.
It sounds like the change would not have impact on this as if that issue happens, the logs do not show any uploads just downloads. But as we don’t know the cause of the issue I am not sure.
If due to the change the node would display the intended uploads though, then it would be harder to spot when the issue happends.

I think real ingress is very interesting to know and see if you node is working and how much it receives.

1 Like

But its not real ingress anymore then.

Yes, real ingress as it was before is better and shouldn’t be changed.
And I agree that the new value has no real meaning.
But at least it would show that your node is healthy enough to accept ingress and accepting uploads. If you completely abandon that, then you have no clue if uploads are working.

5 Likes

I like that it will be less taxing on the bandwdith.db and will improve performance. There are 3rd party softwares available to monitor total ingress and egress for your router/computer/node

I can confirm that writing the bandwidth db less frequent doesn’t make it reject/not respond to any uploads.

1 Like

Is this change only to save writes to SQLite databases? Some people are running these on SSDs, where frequent writes to the database isn’t a problem at all.
Would it be possible to make this configurable? Or maybe enable the previous behaviour when storage2.database-dir is specified manually?

2 Likes

…And also deleted 170GB
Yes, the disk usage wouldn’t match ingress almost never.

Yes, please revert this change.
It does not make any sense for SNOs.
You can show an additional percentage value to show SNOs the difference between real traffic and “could have been traffic if your node would have been fast enough or you would have been closer to the data origin”.

We can’t revert it. We need the performance improvement.

3 Likes

Im just wondering, with the latest release it was implemented that:

  1. Bandwidth is saved to cache
  2. Only order data is stored, not actual usage

What I do not fully understand, is why it was required to change the bandwidth monitoring to only use order information. I know that before when the bandwidth updated the disk db, it was a very expensive operation, but now that it is written to cache, wouldnt the impact of these operations be reduced significantly?

I would think that the chore in charge of writting the cache to the db periodically could be adjusted to do it less frequently, reducing IOPS to a minimum, and I do not fully understand why there would be that big of a difference between storing actual data usage in cache, vs storing order data in cache.

I say this from an understanding that the most limitting factor in nodes are IOPS, therefore the extra CPU cycles from storing the actual bandwidth usage should not have that big of an impact.

I know that currently the focus is node performance, which is objectively more important for everyone: the company, the nodes, and the network.
But, as a node operator, I also strongly believe that accurate information about what is going on with our nodes is also important, and should be available when possible (I know SNOs are divided in this topic, some dont care about stats much, some do, I happen to fall in the latter. The more information available for everyone, the better).

Im not saying that the change should be reverted, Im trying to properly understand the reasoning for this change (order data vs actual usage).

Thank you and have a good weekend!

5 Likes

I am fine with the caching of the information an writing it only every 30 minutes or so.
But you should be able to cache both and write them to the according DBs every 30 minutes like pasatmalo asked.
Or at least give us an optional parameter to do so. As at least on my setup i am not worried about some more IOPs.

2 Likes

This is a new log entrie from 104 ver, logged hourly. What does it mean?

2024-05-16T12:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T13:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T14:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T15:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T16:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T17:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T18:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T19:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T20:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T21:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T22:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-16T23:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
2024-05-17T00:58:37Z    INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}

I think the change should be reverted:

  • Make the databases completely optional unless those which are required. There are node operators who do not care about the statistics and histroy. So everyone who does not care could opt out and provide better performance by doing so.
  • I have suggested it many times, still I don’t know if it is fully possible: Move databases to RAM only and make backup to disk in configurable intervals.
  • Revert the change, make it configurable if only ordered data is stored because many SNOs have the databases already on SSDs so there is no performance gain from that change.
  • Saving bandwidth to a cache (I hope it is RAM cache?) seems fine. If you need more performance increase interval when to flush it to disk. Once ever 10 minutes or even 15 minutes should be sufficient.

And yes, as node operator I want accurate information what the node is doing.

4 Likes

Without db-es and dashboard, many bugs would passed unnoticed. They are a pain if you don’t use a SSD or a premium flash drive, for drives going 10+TB, but they are necessary.
I believe the big majority of nodes are run without graphana or other fency data displays, and the most use analizer is the official dashboard/multinode dashboard wich relies on the db-es.

I don’t understand that answer. Maybe I was not clear enough.
My fear was/is that the issue I have mentioned could become less detectable.
The issue is, that sometimes nodes stop getting ingress. So ingress is shown as 0. That was the ingress that the node really has received.
Now with that change could it be that a node does not receive ingress, but also does not show 0 because it does no longer display the ingress it really has received, but the ingress it should have received.
As the issue causing the ingress to stop is not yet resolved or found this will make it harder to find the nodes that don’t get ingress and need a restart.

Today I think my fears came true. I see several nodes which are below the ne 5GB limit of remaining space. So they naturally should not receive ingress. But I see the have received 70b, some 60b. These cases are all below 5Gb so the cause for non ingress is clear.
But it seems that from now on when ingress stops for other reasons the nodes will not display 0 any longer but whatever arbitrary value. Mayb only bytes, then spotting them could still be doable. Hopefully not other values then it will be impossible.

Another question:

Can the changed way of ingress value have impact on the estimated payout that gets displayed?
I don’t know how estimated payout gets calculated.

no. Because nobody contact your node. If it’s contacted, then the allocation would be placed on the graph…

I think - yes. The settled value is used for payouts, not allocated.

How to run --filestore.force-sync=false option from docker run command?

Just like:

docker run -d --restart unless-stopped --stop-timeout 300 \
                -p 28967:28967/tcp \
                -p 28967:28967/udp \
                -p 14002:14002 \
                -e WALLET="$sWallet" \
                -e EMAIL="$sEmail" \
                -e ADDRESS="$sAddress" \
                -e STORAGE="1000GB" \
                --user $(id -u):$(id -g) \
                --mount type=bind,source="$sIDFolder",destination=/app/identity \
                --mount type=bind,source="$sNodeMnt",destination=/app/config \
                --mount type=bind,source="$sDBFolder",destination=/app/dbs \
                --log-opt max-size=20m --log-opt max-file=3 \
                --name "storagenode" storjlabs/storagenode:latest \
                --storage2.monitor.minimum-disk-space="1MiB" \
                --filestore.write-buffer-size="2MiB" \
                --storage2.min-upload-speed=16kB \
                --storage2.min-upload-speed-grace-duration=10s \
                --storage2.database-dir="dbs" \
                --filestore.force-sync=false
2 Likes

This is the db migration:

INFO    bandwidth       Performing bandwidth usage rollups      {"Process": "storagenode"}
INFO    bandwidth       Performing bandwidth usage rollups      {"Process": "storagenode"}
INFO    db.migration.57 Create new bandwidth_usage table, backfilling data from bandwidth_usage_rollups and bandwidth_usage tables, and dropping the old tables.        {"Process": "storagenode"}
INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}