Node Dashboard not loading

kupan787 · September 7, 2020, 12:49am

I’m running on Linux. Running the latest node, 1.11.1

My node is up and running fine, and if I tail the log, I see plenty of download/upload commands:

|2020-09-07T00:40:48.358Z|INFO|piecestore|download started|{"Piece ID": "SUYKO3WEYDUQYUL4LKXFPZDG2HUFGWJTZGT7RC2WTJKQY4O5QH6Q", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_REPAIR"}|
|---|---|---|---|---|
|2020-09-07T00:40:50.725Z|INFO|piecestore|upload started|{"Piece ID": "OMXQYOO45MIRGYLXSW4XCTNNZCLPUVOE35GB2XOX2AWP2WJPQDKA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT", "Available Space": 90941778976}|
|2020-09-07T00:40:50.988Z|INFO|piecestore|upload started|{"Piece ID": "3NPB4AOGFLW7QUECATCZYQXR2WJNNXZZQM4NAVBBZXEJE3STENKQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 90941777696}|
|2020-09-07T00:40:52.449Z|INFO|piecestore|download started|{"Piece ID": "OMXQYOO45MIRGYLXSW4XCTNNZCLPUVOE35GB2XOX2AWP2WJPQDKA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "GET"}|
|2020-09-07T00:40:53.066Z|INFO|piecestore|upload started|{"Piece ID": "DWIMGXVIHCVUVMEMGCUQRDGWGQGRUTKEJRKSTYJ3TFENUL536Y6A", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT_REPAIR", "Available Space": 90941770784}|
|2020-09-07T00:40:53.487Z|INFO|piecestore|download started|{"Piece ID": "OMXQYOO45MIRGYLXSW4XCTNNZCLPUVOE35GB2XOX2AWP2WJPQDKA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "GET"}|

If I try and access the web dashboard, the page loads, but I just get a spinning circle. I’ve let it sit for 5 minutes, and it just keeps spinning.

Likewise, if I try the CLI dashboard, it starts to load, but again just sits without displaying the details:

storj@storj:~$ docker exec -it storagenode /app/dashboard.sh
2020-09-07T00:42:39.569Z INFO Configuration loaded {"Location": "/app/config/config.yaml"}
2020-09-07T00:42:39.579Z INFO Identity loaded. {"Node ID": "12ZSweTMauixx1kBErvAQ2pNU8GADHo5E6vnqCHpnXdJeuTGk56"}

It just sits showing those lines, and never loads. I let it sit for 5 minutes, but it didn’t load.

I know in previous versions the dashboard would load just fine. I’ve tried removing the docker image and re-pulling it. Is there any way to troubleshoot, or see what it is doing/why it is stuck loading the dashboard?

nerdatwork · September 7, 2020, 1:51am

Do you see any completed uploads/downloads ? Your log just shows them as started.

kupan787 · September 7, 2020, 2:33am

My 6 TB drive is about 95% full, so pretty sure they are completing. I’ve been running a node for over a year, and the dashboard has worked just fine before. It seems to have stopped working in the latest update (1.11.1) for me.

I see lines in my log like:

2020-09-07T02:24:31.105Z	INFO	piecestore	uploaded	{"Piece ID": "B6XS6S3TJLVFVSAP2KEZU63SQKT5KKPMBSZLM4MDZYRM4OSPBQWA", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action": "PUT"}

...

2020-09-07T02:28:36.465Z	INFO	piecestore	downloaded	{"Piece ID": "CWN5JBEGOOTL4DBQLA2JQ3GAVKD3IRIR3RVMKOCGMYOHSVJTO3TQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET"}

So I assume these are completed upload or completed download.

But scanning my log, I do see some lines like:

2020-09-07T02:27:06.554Z	ERROR	piecestore	could not get hash and order limit	{"error": "v0pieceinfodb error: sql: no rows in result set", "errorVerbose": "v0pieceinfodb error: sql: no rows in result set\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).Get:131\n\tstorj.io/storj/storagenode/pieces.(*Store).GetV0PieceInfo:680\n\tstorj.io/storj/storagenode/pieces.(*Store).GetHashAndLimit:460\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:523\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:1004\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-09-07T02:27:06.554Z	ERROR	piecestore	download failed	{"Piece ID": "CNCXRVQF25TKI7IQKRANSXXKIXEFCBFQNRQJM3M2HOXEK3V7RLQA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_REPAIR", "error": "v0pieceinfodb error: sql: no rows in result set", "errorVerbose": "v0pieceinfodb error: sql: no rows in result set\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).Get:131\n\tstorj.io/storj/storagenode/pieces.(*Store).GetV0PieceInfo:680\n\tstorj.io/storj/storagenode/pieces.(*Store).GetHashAndLimit:460\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:523\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:1004\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

Not dashboard related, but not sure if these are just normal part of the process.

Is there a log for the dashboard anyplace? Or any way to see what it is doing when it is just paused at the startup?

nerdatwork · September 7, 2020, 3:28am

I would recommend checking your database files.

There is a single log for the node and you are already referring to it.

kupan787 · September 7, 2020, 5:29am

My db files all look ok:

storj@storj:/mnt/Storj/config/storage$ docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -iname "*.db" -maxdepth 1 -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'
./bandwidth.dbok
./info.dbok
./notifications.dbok
./orders.dbok
./piece_spaced_used.dbok
./reputation.dbok
./satellites.dbok
./storage_usage.dbok
./used_serial.dbok
./heldamount.dbok
./pricing.dbok
./pieceinfo.dbok
./piece_expiration.dbok

Hmm, that’s unfortunate there is no additional logging for the dashboard. When I scroll through the logs, I don’t see anything written out about it. So hard to know what is going on, and why it seems to pause at startup.

Is there any other way to troubleshoot the dashboard?

nerdatwork · September 7, 2020, 5:38am

You can change log.level to debug from info in your config.yaml file. You have to restart your node after config file is updated.

swissstore · September 8, 2020, 9:33am

Experiencing exactly the same behaviour here

I’m running the current v1.11.1 version Node (Docker on Ubuntu 20.04.1 LTS) and am experiencing an unresponsive Dashboard.

Basically the first few minutes til hours after the Node starts all is fine but later the dasboard refuses to load. The logfiles is already at debug level and shows regular activity but no errors.

Anybody experiencing the same? Or do you have an idea what might be going on

Host Details
4 CPU cores
6 GB RAM
5 TB dedicated Storage for Storagenode

Alexey · September 8, 2020, 9:27pm

Hello @swissstore,
Welcome to the forum!

I would like to recommend you to check your databases too.
Also, please check for database is locked errors in your logs.

Alexey · September 8, 2020, 9:28pm

How is your HDD connected to the PC with storagenode?
What is filesystem?

kupan787 · September 9, 2020, 1:53am

Right now, my setup is a VM running storagenode connecting to my NAS via NFS mount. This seems to work fine for the actual workload, just not the dashboard.

I’m thinking about dumping the VM and just running storagenode directly on my NAS. I had done this originally months ago, but I had transitioned to the VM in the last few months.

kupan787 · September 9, 2020, 5:40am

So I just dumped the VM, and moved everything to the NAS. So now storagenode runs directly on the same hardware as the disks. Doing this, the CLI dashboard and web dashboard now load just fine.

It’s hard to diagnose what the issue was, as there is no entries in the log to identify what the dashboard is trying to do during startup. It seems odd to me that the node transactions (uploads, downloads, etc) were processing fine, but just displaying the dashboard was not working.

Alexey · September 9, 2020, 6:49am

NFS/SMB are not supported by storagenode, they could work, but usually have a lot of problems: https://forum.storj.io/tag/nfs
https://forum.storj.io/tag/smb

The only compatible network protocol is iSCSI, however the latency of network storage will increase cancel rate significantly, there are plenty nodes with local connected drives and they are win the race more often than nodes with network drives.
https://forum.storj.io/tag/iscsi

Please, use only local connected drives instead.

swissstore · September 9, 2020, 11:40am

@Alexey, Thanks for your reply i followed your suggestions and answerd your questions.

The Logfiles contain not a single “Database Locked” error or any Database error actually.
Storage is a Freenas NFS share mounted in the Docker Host. All connected over 10G Ethernet.
Other Containers on the same Host are using different shares on the same Freenas system, none of wich are experiencing any problems.
I just ran iperf to see some Perfomance stat no issues found there.
I checked every single db with the following command and everyone results in “ok”
/mnt/storj/storage$ sqlite3 /mnt/storj/storage/bandwidth.db “PRAGMA integrity_check;”

I’dont understand your motivation as to why NFS “usually have a lot of problems”. What are those problems supposed to be?
The Data is coming from Freenas Cache mostly anyway “mean ARC hit ratio 98.38” as currently the Storagenode only has 96GB used on Disk and Freenas has 128GB RAM.

I will start another Storagenode with an iscsi disk on the same Hardware to check if it behaves differently.

(Edit 9.9.2020:15:32, typo)

Alexey · September 9, 2020, 8:16pm

You can observe all topics with this tag: Topics tagged nfs

For short: NFS is not compatible with SQLite databases and not all implementation of NFS are compatible with DB, which is used for blobs. As a summary - the NFS is not supported and not recommended.
As you can see on your own example, it has issues.

You should measure the mounted disk subsystem, not the network throughput. You will notice a high latency when you have a deal with small chunks of data

kupan787 · September 9, 2020, 10:31pm

I think this is a pretty out of date assessment.

According to this, as long as you are using NFSv3 or NFSv4, locking works just fine, which was the complaint on the sqlite site for why to not use NFS. I personally use NFS backing a number of other sqlite databases, and have never had any issues. I also use it as a datastore for VMWare, and don’t have any issues there either. In fact, I don’t seem to have any issues with the normal processing of storagenode operations. It is just the viewing of the dashboard that is an issue.

If you guys don’t want to support it, that is fine. I get that you are primarily a “Windows First” organization, so supporting every possible use-case and setup is out of the question.

That said, I wish there was a way to provide more information or help troubleshoot this more. But there doesn’t seem to be a way to get any logging out of the dashboard in it’s current iteration. If the team ever chooses to add some more logging, i’d be very willing to test out and see what is causing the hold up.

Alexey · September 9, 2020, 11:23pm

Your own example disprove it. With NFS it’s not fully work. And lock doesn’t work as well.

We are the Linux first organization in your classification, just sqlite and boltdb do not like the network-connected storage and especially NFS.
If it could help - with SMB on Linux it doesn’t fully work too.

You can read here: How To Corrupt An SQLite Database File and here: Storage Node - Storj Docs

Alexey · October 26, 2020, 6:49am

A post was split to a new topic: How about GlusterFS?

Alexey · October 26, 2020, 7:27am

A post was merged into an existing topic: How about GlusterFS?