I’m confused by the results of the calculator and the dashboard

I’m confused by the results of the calculator and the dashboard.


Dasboards are not working very well, I also do not match the data, both on the single and on the multi

After payout info has been reported back by satellites, the web dashboard switches from displaying the nodes internal bookkeeping to displaying what the satellites reported back. While the calculator always displays the nodes own bookkeeping in the top overview you posted.

However, in the satellite breakdown below you can see both and compare. Could you restart the node (this will complete the satellite reports just in case something is still missing on your node) and post the entire output of the calculator and compare it to what the dashboard shows. We may get a better hint at where it goes wrong.

Ps. You can mask the transaction links if you want to keep your wallet address private.

Thanks for your reply, after restarting the node the results of the dashboard are the same as before, I think the results of the script are correct because the results reported by the local switch logs are not much different.

full output here:

Well, this is an unfortunate case that is exactly what this tool was built for. As you can see at the bottom you’re being paid $1.19 less than you should be. Minor differences are expected, but not this. Something is clearly wrong. Seeing that the difference seems to be in download payout, the most likely scenario is that your node is having trouble sending bandwidth orders to satellites.

Lets do some diagnosis. Open your data location and look at the orders/unsent folder. How many files do you see there? There should be up to 3 files per satellite usually. If you have many more, your orders aren’t being sent and you’re not being paid for egress bandwidth.

If this is the case, you can remove all files older than 48 hours as they won’t be accepted by the satellite anymore anyway. Hopefully this will resolve the issue and fix order sending from this point on. Unfortunately there is no way to recover older missed payouts, but we can at least try to fix it for the future.

Yes, there are 1445 entries in orders/unset. Then I found " failed to settle orders for satellite" error in storj-node logs. I couldn’t get the right answer from DNS with “storj.io”, DNS pollution is the culprit.

1 Like

Ahh glad you found the cause. Good thing you checked the calculator. These things can go unnoticed for a long time if you have nothing to compare against. At least now you can start getting properly paid.

This to me definitely looks like another thing nodes should detect automatically, really… :confused:
And fix automatically if possible.

Hmm. I don’t have that folder (v1.69.2) and I do remember seeing it before, with orders, just as you describe. There is a .db file with that name though, but not the folder. Maybe this has changed recently?

It’s not inside the storage folder, but one folder up. At least in docker setups. On windows it might be in the install location. I’m not too familiar with windows setups.
The db file was the old way orders worked, it’s not used for order sending anymore.

It was a DNS issue in this case. Not much the node can do to fix that, but it should indeed display a message on the dashboard if orders weren’t sent for more than 24 hours or something.
I vaguely remember after this was discussed a while ago that they did make some changes to how order sending dealt with corrupt order files to prevent that from causing issues sending the rest of the orders. So some of it may have been fixed already. But fixing DNS is outside of what the node can do itself.

Yes, the orders folder on Windows is placed in the installation location by default (option storage2.orders.path in the config.yaml file or parameter --storage2.orders.path).

1 Like

TLDR:

I don’t use windows, but I found the folder—it’s located by default under the config folder, wherever that folder may be. Perhaps the docker container puts it to the share along with the data folder (I don’t use docker either, as you might have guessed. :slight_smile:

Details:
I’ve setup that node like so (those are the only flags I provided to setup command):

  storagenode setup \
    --storage.path "${STORAGE_PATH}" \
    --config-dir "${CONFIG_DIR}" \
    --identity-dir "${IDENTITY_DIR}" \
    --operator.email "${OPERATOR_EMAIL}" \
    --console.address "${CONSOLE_ADDRESS}" \
    --operator.wallet "${OPERATOR_WALLET}" \
    --operator.wallet-features "${OPERATOR_WALLET_FEATURES}" \
    --contact.external-address "${CONTACT_EXTERNAL_ADDRESS}" \
    --storage.allocated-disk-space "${STORAGE_ALLOCATED_DISK_SPACE}"

Looking at the generated configuration file in the CONFIG_DIR, I see this: (note, it’s commented out, I never edited the config file manually)

# path to store order limit files in
# storage2.orders.path: /mnt/storj-two/config/orders

The value is defined but commented out, apparently to indicate the default. So, while it can be overridden, by default orders folder is placed under the config folder. On my other node, I did not put it under the data folder, and that’s why I was unable to find it. For my second node, I put the config folder along with identity folder into the data folder, and it’s of course there.

Thank you all for clarification.

The db file on my node has a modification time of Dec 27, last year, so it was being used at least up until two weeks ago:

 % ls -alt /mnt/storj-two/orders.db
-rw-r--r--  1 storj  storj  32768 Dec 27 18:13 /mnt/storj-two/orders.db

So was it two weeks ago when the .db stopped being used, and storage node now relies purely on the file system to track orders?

perhaps the “fix” should be another sheduled job to do something like find /.../orders -mtime +4 --delete to nuke stale old files?

You are correct, the orders folder is created in the location for config (this is why it’s in the installation dir on Windows and on data location when you use docker).
The database orders.db is not used anymore, orders are stored on filesystem for a long time already. But this database is still exist in the migrations, even if not used.

Usually it’s not needed, they are removed automatically. But if something goes wrong you could never notice the problem with such a hack.

1 Like

The hack can help “unclog” the node in cases when and if (if I understand the problem in this thread correctly) the accumulated stale files prevent the node from functioning properly, regardless of reason why the files got stale (which can be intermittent DNS issue like discussed above: dns may be already working now but the node stayed crippled). Ultimately, node shall handle it itself, by deleting or ignoring old order files — but it does not seem to happen today, hence why we have this thread)

The evidence of the original failure shall still persist in the logs, and those should be monitored regardless (ideally, by automation, with notifications when something is wrong).

I believe the problem in some corrupted file, not in amount of files, so it likely a bigger problem here - files were corrupted, likely data too and perhaps databases.

I have to agree

I believe we don’t delete such files automatically because it looks a lot like “deleting evidence” of things going wrong.

If, as you suggest, the number of files was a cause for further failures, leading to a feedback loop, we need to fix that. But I’m not convinced that is what’s happening. An extra couple thousand files in the directory shouldn’t break anything. Maybe the problem is that we don’t retry submitting the orders often enough before they are expired?

(Can we split out the thread about unsent orders? This thread is so huge)

1 Like

I don’t think I’ve seen the number of files cause an issue before. That may have been a misinterpretation of what I meant to say. In the past we’ve seen corrupt files getting the process stuck entirely. Not only unable to process the corrupt files, but also not picking up any subsequent files. I believe there was talk that this would be fixed though. And it might already have been.
I have noticed this process being sensitive to slow down. Due to an unfortunate SSD failure I’m running 4 nodes on the same array without SSD cache ATM, while that array also serves other purposes. And order sending fails frequently. Though it succeeds often enough to not have the orders be stale, I believe.

I second the suggestion to split the topic btw.