Ubuntu Node offline but QUIC is OK

mdmeyerpfa · September 21, 2023, 5:54pm

Tried node restart, PC reboot, removed and reinstalled docker container but still offline. Uptime is 456 hours, last contact is 73 hours 6 minutes ago and climbing. Port check says port is open. QUIC says ok but status is offline. v1.86.1

What to do?

sudo docker logs --tail 10 storagenodec1
2023-09-21T17:56:32Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “22Q7JWABOMTCBJ5E5VYNUICH7NPUG2LSV2S7ZX4ARVGPTNJYRCYQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 1449216, “Size”: 290304, “Remote Address”: “5.161.104.6:49980”}
2023-09-21T17:56:32Z INFO piecestore uploaded {“process”: “storagenode”, “Piece ID”: “HDUFUDVBRWNBD3DOZWIRJU4UFII6TW5SBFX2MH6BTBCA7VIGTCVA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 145920, “Remote Address”: “5.161.220.231:29578”}
2023-09-21T17:56:32Z INFO piecestore download started {“process”: “storagenode”, “Piece ID”: “JB52KMBTPLB77OD7T5EIWIR2VLIMKQUCNUVKV7LX3VLBF5HPX3PA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 9728, “Remote Address”: “72.52.83.202:56650”}
2023-09-21T17:56:33Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “JB52KMBTPLB77OD7T5EIWIR2VLIMKQUCNUVKV7LX3VLBF5HPX3PA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 9728, “Remote Address”: “72.52.83.202:56650”}
2023-09-21T17:56:33Z INFO piecestore download started {“process”: “storagenode”, “Piece ID”: “22Q7JWABOMTCBJ5E5VYNUICH7NPUG2LSV2S7ZX4ARVGPTNJYRCYQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 2029056, “Size”: 290048, “Remote Address”: “5.161.76.253:51794”}
2023-09-21T17:56:33Z INFO piecestore download started {“process”: “storagenode”, “Piece ID”: “H4NKLXRATNXVEOMVWIIE3363ETX4KQZ7LMXJZDYWTGDUAJS2BVSQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 9984, “Remote Address”: “72.52.83.202:47048”}
2023-09-21T17:56:33Z INFO piecestore download started {“process”: “storagenode”, “Piece ID”: “354IEAWCCC6W5IH56I34UNOBZXLJHN52BBDKGPI76OWG7JZPXMVQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 14080, “Remote Address”: “72.52.83.202:47054”}
2023-09-21T17:56:34Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “H4NKLXRATNXVEOMVWIIE3363ETX4KQZ7LMXJZDYWTGDUAJS2BVSQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 9984, “Remote Address”: “72.52.83.202:47048”}
2023-09-21T17:56:34Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “22Q7JWABOMTCBJ5E5VYNUICH7NPUG2LSV2S7ZX4ARVGPTNJYRCYQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 2029056, “Size”: 290048, “Remote Address”: “5.161.76.253:51794”}
2023-09-21T17:56:34Z INFO piecestore downloaded {“process”: “storagenode”, “Piece ID”: “354IEAWCCC6W5IH56I34UNOBZXLJHN52BBDKGPI76OWG7JZPXMVQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 14080, “Remote Address”: “72.52.83.202:47054”}

ptC7H12 · September 21, 2023, 6:28pm

Portcheck my lead into a false positive fealing as most Portchecker only check TCP

What does it say, when you look at: http://[IP-OF-NODE]:28967

It should show you this:

{
  "Statuses": null,
  "Help": "To access Storagenode services, please use DRPC protocol!",
  "AllHealthy": true
}

This means all is okay! If you don’t see this, means that our Node is not reachable from outside and you should check your Router settings/portforwading.

And look into this helpfull post:

mdmeyerpfa · September 21, 2023, 6:45pm

Portchecker say the port is open. My other docker node is working fine. The offline node has been working fine for months until now.

The log says downloads have started and completed. So doesn’t that mean the node is really onllne?

When I stop and restart the node why doesn’t uptime reset to 0?

ptC7H12 · September 21, 2023, 8:12pm

What is the response of this? Using the public IP

mdmeyerpfa · September 21, 2023, 8:28pm

bash: http://149.75.178.72:28967: No such file or directory

ptC7H12 · September 21, 2023, 9:13pm

Okay, this seems to look good and bad. There are errors in your logs somewhere: it says allhealthy: false

Have you tried to find the errors with grep ERROR

Are the DBs on SSD?
I had the similiar error where from one day to another I had some issues wirh my node. I moved the DBs to SSD and the issues got fixed

mdmeyerpfa · September 21, 2023, 10:52pm

Not sure where the DB reside using ubuntu. Ubuntu is on an SSD. The Storj data is on a HDD. The other strange thing is that all the payout info in the dashboard says $0.00 even though the drive has 7.15TB stored and has been operational since Sept 2020.

ptC7H12 · September 21, 2023, 11:38pm

When you didn’t change anything, then the DBs are on the HDD.

It sounds like you have an issue with your DBs.

You should try to find the ERROR and then maybe fix the DBs and move them to SSD.
Thats what worked for me.

Read this Thread, there you will find many Informations:
After reboot: Failed to add bandwidth usage - Node Operators / troubleshooting - Storj Community Forum (official)

mdmeyerpfa · September 22, 2023, 12:19am

Looks like I am in over my head regarding Ubuntu. I found the folder with with 25+ DB files. Can’t seem to find the log file and then there is the issue of how to search it. I read a bunch of notes from the forum about how to fix malformed DBs but they were too complicated for me.

This is too bad as this node is the largest of my 6 nodes. Hate to lose it.
Right now I would just like to convert it to Windows gui (if that is even possible) and be done with Ubuntu.

Thanks for your efforts to help me.

Alexey · September 22, 2023, 3:16am

If you use docker and did not configure it to redirect logs to the file, then you can check logs with

docker logs --tail 20 storagenode

Databases are in the storage location inside the storage directory.
Regarding check and fix databases you may use this guide:

this should help at least find which databases are corrupted. If you so reluctant to fix them, you may re-create them, losing the historic data and the current Stat (doesn’t affect reputation or payout):

If you want to migrate to Windows, it likely won’t be possible to do so in-place unless you have the same (or more) amount of free space that is currently occupied. You may use this guide:

mdmeyerpfa · September 22, 2023, 4:16pm

Here is the latest 5 log entries:
They look ok and show the node is working. How can that be if the node is offline?

2023-09-22T16:12:48Z	INFO	piecestore	download started	{process: storagenode, Piece ID: E3RYRNCCIUO2IBWEQVXHHROQB5TTOYX4QXEYGTNCXGI6AEBSFE7A, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET, Offset: 0, Size: 73472, Remote Address: 72.52.83.202:40944}
2023-09-22T16:12:49Z	INFO	piecestore	downloaded	{process: storagenode, Piece ID: UFACJDNEZYT42NMXDXTJZ5FVLH5LFJENBD4G2J6N4USW75IPWN6Q, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET, Offset: 0, Size: 290048, Remote Address: 5.161.61.225:42686}
2023-09-22T16:12:50Z	INFO	piecestore	upload started	{process: storagenode, Piece ID: JRB5FU4J45GHW6BDQSTDCI7B3FCV2BZFEGWVFT5IKI2JEZCAEUCQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Available Space: 2067489930624, Remote Address: 5.161.192.57:20212}
2023-09-22T16:12:50Z	INFO	piecestore	uploaded	{process: storagenode, Piece ID: JRB5FU4J45GHW6BDQSTDCI7B3FCV2BZFEGWVFT5IKI2JEZCAEUCQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Size: 5120, Remote Address: 5.161.192.57:20212}
2023-09-22T16:12:51Z	INFO	piecestore	download started	{process: storagenode, Piece ID: ERS2G4KBSHVR5QMRWV6A7PLCCDN4BDKELRLEIRXN53POLDPCLCWQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET, Offset: 0, Size: 1928960, Remote Address: 23.237.191.146:32944}

ptC7H12 · September 22, 2023, 10:38pm

Try this:

docker logs storagenode 2>&1 | grep ERROR

And then this:

docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'

Here replace ${PWD} with the path were you found the DBs

Then post both outputs here

mdmeyerpfa · September 22, 2023, 10:45pm

Much to my surprise, the node came back online with 4 hours uptime. I did not make any changes or fix anything. Very strange but glad to have it back.

ptC7H12 · September 22, 2023, 10:51pm

Sounds good, but try to get the errors nevertheless, I had the same, 4 days later issues came again.
Which HDD do you have? Is it SMR or CMR

My recommendation:
Try to find the errors with the two commands above.
Move DBs to SSD, that’s what fixed the problem permanently for me.

mdmeyerpfa · September 22, 2023, 11:02pm

The HDD is a Seagate Exos CMR 10TB drive.

I ran the docker log command but there was no output. Do I need to change the 2>1 values? Don’t want to run the rm command for fear I will break the node and not be able to get it going again. Just going to leave it alone for now. Thanks for your help.

P.S. I had a several hour internet outage early this morning and had to reboot my router at 8am. Not sure if that has any relation to the node working at 1pm (per the 4 hours uptime on the dashboard). But who knows.

mdmeyerpfa · September 22, 2023, 11:32pm

I was finally able to get the log output

$ sudo docker logs storagenodec1 2>&1 | grep ERROR
2023-09-22T18:26:55Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “EOIRZF76XNTAZNGORLZAUACI23K7MLL24Q56IF56I6TZ7WW3WCSA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Remote Address”: “23.237.191.146:40662”, “error”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:615\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-22T18:26:55Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “UHLWQYAOK4CVHBLFSP6XD374TSIQUJO5MCB3AHW72EZQDP7M2GYA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Remote Address”: “23.237.191.146:40664”, “error”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:615\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-22T18:26:55Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “ZE6XDUAQ2243QCU5WF47ABIQHDNJVTWEZ2V52B2QUJTXK3Z2TRUA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Remote Address”: “184.104.224.98:57266”, “error”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled”, “errorVerbose”: “untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:615\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-09-22T18:26:57Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “QW4MPGIQDCVEOG2DO5GQ4GNUIJWHUCKHUDQ6BPDS73UH5PTMS3RA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “Remote Address”: “184.104.224.98:45950”, “error”: “trust: rpc: tcp connector failed: rpc: dial tcp 104.199.30.73:7777: operation was canceled”, “errorVerbose”: “trust: rpc: tcp connector failed: rpc: dial tcp 104.199.30.73:7777: operation was canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2023-09-22T19:04:11Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “HSXY67XXY5VEY6WGXMXGC6THPNFNZPSK3HDPTQ4VCX4PB54FWNMA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 98304, “Size”: 491520, “Remote Address”: “72.52.83.202:41620”, “error”: “write tcp 172.17.0.3:28967->72.52.83.202:41620: write: connection reset by peer”, “errorVerbose”: “write tcp 172.17.0.3:28967->72.52.83.202:41620: write: connection reset by peer\n\tstorj.io/drpc/drpcstream.(*Stream).rawFlushLocked:401\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:462\n\tstorj.io/common/pb.(*drpcPiecestore_DownloadStream).Send:349\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).sendData.func1:830\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}
2023-09-22T19:45:11Z ERROR piecestore upload failed {“process”: “storagenode”, “Piece ID”: “YBYAHC74X5LMHR562RTJGGTGXXAUH2ZEUAY6JZC556SOJ5UQCMXA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “context deadline exceeded”, “errorVerbose”: “context deadline exceeded\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”, “Size”: 131072, “Remote Address”: “5.161.220.231:11182”}
2023-09-22T23:04:23Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “ERS2G4KBSHVR5QMRWV6A7PLCCDN4BDKELRLEIRXN53POLDPCLCWQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 1218560, “Remote Address”: “72.52.83.202:57484”, “error”: “manager closed: read tcp 172.17.0.3:28967->72.52.83.202:57484: read: connection timed out”, “errorVerbose”: “manager closed: read tcp 172.17.0.3:28967->72.52.83.202:57484: read: connection timed out\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:211\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:171\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:226”}
2023-09-22T23:14:04Z ERROR piecestore download failed {“process”: “storagenode”, “Piece ID”: “ERS2G4KBSHVR5QMRWV6A7PLCCDN4BDKELRLEIRXN53POLDPCLCWQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 1339392, “Size”: 229376, “Remote Address”: “72.52.83.202:25728”, “error”: “write tcp 172.17.0.3:28967->72.52.83.202:25728: use of closed network connection”, “errorVerbose”: “write tcp 172.17.0.3:28967->72.52.83.202:25728: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).rawFlushLocked:401\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:462\n\tstorj.io/common/pb.(*drpcPiecestore_DownloadStream).Send:349\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).sendData.func1:830\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22”}

Alexey · September 23, 2023, 3:09am

Likely because of corrupted databases. Please, check them:

ptC7H12:

docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'

Here replace ${PWD} with the path were you found the DBs

You may also change the current directory to the directory with databases, i.e. (replace /mnt/storj/storagenode/storage with your actual path):

cd /mnt/storj/storagenode/storage

then run the command to check databases

docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'

(${PWD} will be replaced to the current directory automatically)

iPrinz96 · July 31, 2024, 10:31pm

I have the same problem and have tried everything that is explained here. However, the last contract is always exactly 9 hours ago. No matter when I check it. This has been going on for a few days, but traffic is still coming in. Is this perhaps a bug? The whole thing runs in an LXC container with Debian 12, current updates & Docker.

http://[IP-OF-NODE]:28967 show this:

{
  "Statuses": null,
  "Help": "To access Storagenode services, please use DRPC protocol!",
  "AllHealthy": true
}

Alexey · August 1, 2024, 8:50am

Hello @iPrinz96,
Welcome to the forum!
Is [IP-OF-NODE] external?

If so, and considering

I may consider, that your node is online.

However, you may also check your databases:

And please consider to disable any “smart” protections like DDoS blocking on your router.