Noob Node Runner need some help pls

mining2023 · August 17, 2023, 7:30pm

Hello everyone!
I’m new to Storj, I’ve been running node since Apr 2023 and have some small issues with node and hope to resolve with your help.
Issue 1. as you can see on attached image satellite US2 and europe-north-1 stuck at 43% and39% online while rest
have recouped to 98% online since last issue with node that resulted it to be offline over 24hrs or so. issue was caused by node not being auto started after reboot even though it was configured that it supposed to auto start as as service, which later caused for temp suspension, anyway that was fixed and since that rest of the satellites have recovered, but not those 2, any ideas why? or any suggestions? it’s been maybe week since they are stuck at those low %.
issue 2. saltlake.tardigrade.io:7777 suddenly went to suspension 95% for a day and currently came back to 98%, I’ve looked into giant Log file and couldn’t figure out what to look to troubleshoot the suspension for this satellite. any tips for that?
thanks in advance for your time and help.

JWvdV · August 17, 2023, 7:45pm

Don’t worry, since both sattelites were test satellites.
Emphasis on were, because they have been taken down and therefore your online score won’t ever increase on us2 and europe-north-1.

See also: Announcement: Storj to shut down europe-north-1 and us2 - Announcements - Storj Community Forum (official)

mining2023 · August 17, 2023, 7:53pm

Oh I see so that’s why. thank you.

daki82 · August 18, 2023, 6:21am

Wellcome to the forum!

take a look at this post esp. uptimerobot.

(—)

Alexey · August 18, 2023, 6:39am

You need to check, why the suspension score is dropped on Saltlake:

daki82 · August 18, 2023, 6:59am

its clear, imho, good morning @Alexey

Alexey · August 18, 2023, 7:04am

Yes, but I wanted to know what was the error.

daki82 · August 18, 2023, 7:04am

search your log for FATAL errors.
(to reduce log, use logrotate or simply set the loglevel to error in config.yaml)

suspension comes with router reboot or internet reconnects as well.
maybe you have the 1min timeout error, can you tell us about the hardware of the node?

daki82 · August 18, 2023, 7:13am

since he calls himself a noob, i doubt hes firm with powershell

Alexey · August 18, 2023, 7:30am

To copy and paste command from the linked KB article?

I guess they are able and capable, because they managed to at least generate an identity and sign it, which is CLI-only.

daki82 · August 18, 2023, 7:39am

Fair point, lets wait the answer(s). im courious too.

i forgot that entirely, and i started a node 3 weeks ago

mining2023 · August 18, 2023, 5:01pm

Hello @Alexey
here is the powershell output as followed instructions from the page that you provided: PS C:\Users\NED> sls “GET_AUDIT|GET_REPAIR” “C:\Program Files\Storj\Storage Node\storagenode.log” | sls failed

C:\Program Files\Storj\Storage Node\storagenode.log:17669917:2023-06-15T17:34:48.515-0400 ERROR piecestore download
failed {“Piece ID”: “6Y5CZIHQ7RTOV2KN42UPTSZ5M6JGET57CY3XBIADHJDTVKPG6KTQ”, “Satellite ID”:
“12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET_REPAIR”, “Offset”: 0, “Size”: 344064, “Remote
Address”: “128.140.12.124:56382”, “error”: “write tcp 10.0.1.33:28967->128.140.12.124:56382: use of closed network
connection”, “errorVerbose”: “write tcp 10.0.1.33:28967->128.140.12.124:56382: use of closed network connection\n\tstor
j.io/drpc/drpcstream.(*Stream).rawFlushLocked:401\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:462\n\tstorj.io/common/
pb.(*drpcPiecestore_DownloadStream).Send:349\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).sendData.func1:807\n
tstorj.io/common/rpc/rpctimeout.Run.func1:22”}
I hope this makes any sense for you.

Just to make it clear @daki82 I’m noob to Storj node I know my way around computers, as for searching for FATAL error didn’t give me any results when searched LOG file, but I do have lot of upload or download errors don’t know if that matters. I started a node as an experiment with mini PC with Celeron 1017U dual core CPU, 6GB ram, win 10 running on SSD and using network attached iscsi 10TB from synology server as STORJ drive, 2GB LAN and 500mb/s internet connection, (this is an experiment to see if it’s wort for me to run another spare server that is hanging around offline for now with 250TB storage and 2 Xeon CPUs). both mini PC and Synology server are powered 24/7 and figured that I can use extra 10TB for Storj and see what comes out of it. both mini pc and synology have min downtime and internet is solid, they do get some system updates that cause some down time but not for too long.

And for auto load on startup I did solved while ago by following guide to your forum posts.
thank you all for replies and trying to get this figured out.

Alexey · August 19, 2023, 1:16am

And this is the reason that node did not start after reboot, and why you do have suspension score dropped:

The network attached storage is not so reliable as local.
You need to configure your storagenode service to depend on the network and have a delay to allow the OS to fully propagate the network before the node start:

By the way, if your model of Synology does support docker, I would recommend to run storagenode directly on Synology instead. You need only activate docker and ssh and follow the guide for CLI: CLI Install - Storj Docs
You should skip the section how to install docker, since it performed differently.
You may also migrate the current node: Migrating from Windows GUI installation to Docker CLI - Storj Docs
When you would start an existing node, you must skip the setup step too, because it should be performed only once for the entire node’s life. And you already did it when installed the Windows GUI node.

mining2023 · August 21, 2023, 4:10pm

Hello Alexey
I have my storj service startup in delayed mode.
as mentioned this was only a test node to figure things out and looking forward to convert it to docker version running from my synology server.
So this morning I found my node offline again ( why the hell is this happening only on weekends?) PC was online and seemed that haven’t rebooted over the weekend on its own and no updates were done, same with synology server was online and no errors in log.
so I looked up for clues: 1. I checked windows event viewer to see any errors related to storj and found one: “The Storj V3 Storage Node service terminated unexpectedly. It has done this 1 time(s).”
2. I looked in the Storj log file and I saw bunch of upload and download error messages: 2023-08-18T18:37:18-04:00 ERROR piecestore upload failed {“Piece ID”: “PJLBWZN6MPTUVU53DHIAMBDECVBKD7KZWD5SHU3JNDVT5JJFLXTA”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “context canceled”, “errorVerbose”: “context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:500\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:506\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”, “Size”: 36864, “Remote Address”: “5.161.149.40:8678”}
2023-08-18T18:37:39-04:00 ERROR piecestore download failed {“Piece ID”: “IFG4MVIOVTDUF4OYCK3FDVQRXA5ODCGBHSCYIGKQCZOFDZI7TYWQ”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “Offset”: 0, “Size”: 311296, “Remote Address”: “51.77.227.245:50426”, “error”: “manager closed: read tcp 10.0.1.33:28967->51.77.227.245:50426: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.”, “errorVerbose”: “manager closed: read tcp 10.0.1.33:28967->51.77.227.245:50426: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:231”}

Does any of those give you any clues what caused to stop storj node? should I look for any other clues elsewhere?
so windows error came up this morning, but storj started to throw those upload and download errors late evening 08.18.23. till this morning before storj service crushed.

mining2023 · August 21, 2023, 4:13pm

forgot to post this fatal error as well that came up in storj log file this morning before node service crush: 2023-08-21T00:52:50-04:00 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:169\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:161\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

daki82 · August 21, 2023, 7:53pm

This means your drive has a bottleneck.
Read here.

daki82 · August 21, 2023, 9:47pm

Loading the newest network card drivers from the manufacturer homepage is always recommended.

Alexey · August 22, 2023, 4:01am

Because of:

As I said - the disk subsystem is too slow. It kind of expected since you use a network attached storage. In this case you would need to increase this writeability timeout:

I would expect that it can have a readability timeouts too, so you would increase two parameters for that case:

if any of timeouts would reach 5 minutes after tuning, I will recommend to reconsider your setup.

mining2023 · August 22, 2023, 4:52pm

Hi
so I made this 4 changes as I understood:

# how frequently to verify the location and readability of the storage directory
 storage2.monitor.verify-dir-readable-interval: 1m30s

# how long to wait for a storage directory readability verification to complete
storage2.monitor.verify-dir-readable-timeout: 1m30s

# how frequently to verify writability of storage directory
 storage2.monitor.verify-dir-writable-interval: 1m30s

# how long to wait for a storage directory writability verification to complete
storage2.monitor.verify-dir-writable-timeout: 1m30s

Correct?
I’ll play with those settings and keep it running somehow before I switch to synology docker.

mining2023 · August 22, 2023, 4:53pm

Hi
yes I got all latest drivers from manufacturers website not generic ones.