Unrecoverable error {"error": "Error starting master database on storagenode: database: file is not a database\n\tstorj.io/storj/storagenode/storagenodedb

Your screenshot shows the same node ID as previous, so you didn’t start a fresh node. Since you had to recreate some of the databases, your previous usage info has been lost. But as Brightsilence said, your node will assess the disk usage and re-create this figure. This is the “filewalker” process. Probably after a couple of hours it will be complete and you will see the usage again.

Do you see anything else in the logs now? Or just errors?

Are you sure about that? I stopped the node because I am worried that this way it might get a lot of bad audits and get it disqualified. But if you are sure, that nothing bad shouldn’t happen, I have no problem to leave ti for a day or 2 to finish whatever it is doing.

I will be waiting for an reply from any of you.

This is what I would do in your situation. This is how it is supposed to work, as technically you could start the node with all of the databases gone and it will still run without problems, just lost stats.

You can monitor the node logs for a short while. If you see download requests failing with “piece does not exist”, then there is a problem with the node accessing the node’s files. If you don’t see any errors related to uploads/downloads/audits, you have nothing to worry about. Ideally you would also see some successful download requests.

Start the node, the run docker logs --tail 30 -f storagenode assuming your logs are not redirected to a file and the docker instance is called storagenode

OK I started it and I see only these warninfs for now:

2021-08-04T13:37:41.608Z INFO orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 finished
2021-08-04T13:37:43.279Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}
2021-08-04T13:37:43.310Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}
2021-08-04T13:37:43.605Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}
2021-08-04T13:37:44.230Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}
2021-08-04T13:37:44.498Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}
2021-08-04T13:37:46.547Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}
2021-08-04T13:37:54.181Z INFO piecestore upload started {“Piece ID”: "S3M2I3IXZFKNZG2ZJDENHPD5JI

Are they worrying?

Nothing to worry about. These just indicate that UDP hasn’t been forwarded to the node, which is an optional step for now. This became an option months ago, so you may have missed it. Something you might want to do in the future, but isn’t required at the moment.

will it improve things?

In theory it could lead to more ingress for your node. The QUIC protocol over UDP is supposed to give a slight boost to transfer speeds. It is currently being tested by Storj. If you wanted to add this, you would need to forward the same UDP port number as your TCP port, being sure to add -p 38967:38967/udp (in this case) to your docker run command.

The size of your db files suggests you have a fresh new set of databases for some reason… did you move them away at some point and have the node recreate them and skip the step to overwrite the new ones with backups for all non-corrupt db’s or something?

Edit: I’m referring to forgetting step 6 on this page How to fix database: file is not a database error – Storj

@baker is right that all you lost is stats. Keep an eye in the logs for audit failures or repair failures though. Those would be a bad sign. The used space should update eventually, but most other historic stats are lost for good. This shouldn’t impact node function and payout though.

and if I am using the Windows GUI?

Few minutes ago I copied back the DBs from the backup I did, when Alexey suggested this article. But nothing changed in the GUI. Still only few MB used space is shown. I stopped and started the storage node of course.

Anyway, thank you for helping me. I will update you tomorrow.

I don’t think you need to change anything, but if you did it would be in the config.yaml file. I don’t run a Win GUI node, but I’m pretty sure it will work if everything is forwarded correctly.

You can test if QUIC is working by pinging your node with this tool: Pingdom

Here is what I got:

  • started
  • TCP: dialed node in 100ms
  • QUIC: couldn’t connect to node: rpc: quic: context deadline exceeded

on the router I have forwarded ALL packets/protocols, maybe I should do something on the linux VM or in the docker itself?

Here is the command I use:

docker run -d --restart unless-stopped --stop-timeout 300
-p 38967:28967
-p 14004:14002
-e WALLET=“xxxxxx”
-e EMAIL=“xxxxx”
-e ADDRESS=“188.254.208.174:38967”
-e STORAGE=“0.55TB”
–mount type=bind,source="/root/.local/share/storj/identity/storagenode1",destination=/app/identity
–mount type=bind,source="/mnt/storenode1",destination=/app/config
–name storagenode1 storjlabs/storagenode:latest

What should I modify, so both TCP and UDP packets are used?

-p 38967:28967/tcp
-p 38967:28967/udp

You don’t need to forward all ports. Just TCP/UDP port number the same as the TCP port number you have been forwarding. If you do all the same steps you performed forward just the TCP port, it should work for UDP as well.

You need to change:
-p 38967:28967
to
-p 38967:28967/tcp -p 38967:28967/udp

This will require you to stop, remove, and re-create the container with the new parameters. You should wait for your node to finish the filewalker and re-build your used space before you do this since it will have to start the process again if you stop the container. Best to wait for your repaired node to stabilize before making more changes.

(as a side note, feel free to edit your posts to add more info instead of making multiple posts. Helps with the readability for future users)

OK, I will test that.

Thank you for your suggestion. I appologize.

1 Like

Hello,

It seems my node is normallizing:


But I noticed these notifications:

The only errors in the log are:

2021-08-05T00:58:06.684Z ERROR piecestore download failed {“Piece ID”: “PXJ2I2QIU5HTZA23QETDEZXXNULCTXWFTDEEW5GTL73G62COYLUA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “error”: “write tcp 172.17.0.2:28967->95.216.20.146:49464: use of closed network connection”, “errorVerbose”: “write tcp 172.17.0.2:28967->95.216.20.146:49464: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:211\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).rawWriteLocked:261\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:315\n\tstorj.io/common/pb.(*drpcPiecestore_DownloadStream).Send:302\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:608\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}

2021-08-04T16:53:36.740Z ERROR blobscache trashTotal < 0 {“trashTotal”: -882972160}

2021-08-04T16:04:00.046Z ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:711\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func6:650\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:674\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:217\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

Also few warnings:

2021-08-04T15:53:06.888Z WARN contact:service Your node is still considered to be online but encountered an error. {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Error”: “contact: failed to dial storage node (ID: 1fDbYw3z9DxdBGiASZqzukh6v9uYXqUVuKEz25MpPsW6ak8TVK) at address 188.254.208.174:38967 using QUIC: rpc: quic: timeout: no recent network activity”}

What is your opinion?

Looks like your stats are rebuilt, so that’s good news. Your node would have been suspended for being offline (scores below 60%). If you keep your node online, it should recover the online scores and come out of suspension. The online scores will take at least 30 days to get back to 100%, so expect those to rise slowly. I think your node will come out of suspension sooner than that though. While in suspension, your node will not receive any ingress data, only egress (download) requests.

This is a normal error that you will see from time to time. Nothing to worry about

This error is probably related to stats re-building. If it continues to occur, let us know. Shouldn’t be a problem.

This error usually occurs when the node is under heavy load, especially during the initial startup/filewalker process. If it shows up occasionally, it is nothing to worry about.

This is the message related to UDP/QUIC not being set-up. No action required except for setting up UDP/QUIC as described above.

You will also notice a lot of data in the trash for the near future, as many of your nodes pieces will have been repaired to other nodes while yours was offline.

Happy to see you got the node back online! :+1:

2 Likes

Thank you very much for the assistance!!! :slight_smile:

2 Likes