Why does my Node Dashboard show no traffic, but logs show recently completed down/uploads?

roryboreyalice · September 1, 2020, 2:31am

I have 5 nodes. Nodes 1,3,4,5 are all running without issue, reporting plenty of activity on their respective dashboards. Node 2 (second oldest) has been “silent” for about 3 months now, reporting zero traffic. If I look at the docker logs I see, “Download Started, Downloaded, Upload Started, Uploaded, Deleted” across a ton of files, all of which are recently accessed (as in, within the hour). However, the dashboard is dead, graphs are empty. It reports the node as being online, 300+ hours of uptime, recently contacted 0 minutes ago, running v1.10.1.

What’s going on here?

baker · September 1, 2020, 2:55am

Are you running Windows GUI, or docker? I would suggest stopping the node and checking the databases for errors as per the following article:

roryboreyalice · September 1, 2020, 3:48pm

baker,

I attempted Step 4 using cp *.db* ~/storjbackup and got the following error:

cp: error reading 'bandwidth.db': Input/output error

Thoughts?

baker · September 1, 2020, 3:56pm

Sounds like it could be a problem with the bandwidth.db file, which would make sense since your node is reporting 0 bandwidth in the dashboard. With the node stopped you can safely run the integrity check on the file in place without a backup. I’m guessing the bandwidth.db file is corrupted and will likely return “File is not a database”. Try the check on the database(s) and see what the result is.

Alexey · September 1, 2020, 6:48pm

Please, check your disk first before the checking of databases. Seems filesystem is corrupted.

roryboreyalice · September 1, 2020, 7:15pm

Alexey,

I didn’t think to run a filesystem check. Instead I ran a smart test which came back good. The only db that reported an error was bandwidth.db, which is currently being rebuilt now. Based on the dump_all_notrans.sql file size it appears to be half-way done. If/when it completes, should I run the disk check or attempt restarting the node?

Alexey · September 1, 2020, 8:29pm

I would like to suggest you to check it anyway after the build.

roryboreyalice · September 1, 2020, 8:58pm

Alexey,

We have an issue here. So, rebuild was a success. I then tried fsck -y device and it tells me there is a “Bad magic number in super-block”, alternative superblocks do not work, so it appears to be corrupt.

I tried starting the node, fired up and now it reports traffic for the month.

What should I do? I have two new nodes that are on 4TB disks, this corrupted one is a 3TB. Can I kill one of the 4TB nodes (hasn’t ran for a month yet,), and migrate the 3TB node to the 4TB disk, expanding it in the process while keeping the existing 3TB node age?

Alexey · September 2, 2020, 6:51am

I would like to recommend to fix your disk first.
Also, if you plan to have two nodes and want to replace the disk to the healthy one in the near future, you can reduce the allocation for the second node to 0 and put your older node there alongside with the second (use a different storage location of course) and run two nodes. Since the new one will not have a free space, it will not affect the older one.
When you replace the disk, you can migrate the new node to there and leave the old one on the healthy disk (to reduce the friction with migration, since the new node has a lot less used space and will take a lot less time to migrate).