Online time dropping

Jov · September 14, 2021, 11:22am

Hi together since a couple weeks i’m loosing constantly uptime and couldn’t solve it till now. any ideas? Thanks a lot for your help. Will i get suspended at some point when i lose uptime or is the Audit and Suspension the only thing that matters?

2021-09-06T17:21:50.721Z	INFO	piecestore	uploaded	{"Piece ID": "DMN3GFTHMHEKU3RIUG3BWIHXPUISP26SLFBTLQPPUUHWWAR2NXEQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "Size": 47872}
2021-09-06T17:21:51.257Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "attempts": 1, "error": "ping satellite: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967: rpc: dial tcp 158.181.127.35:28967: i/o timeout", "errorVerbose": "ping satellite: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967: rpc: dial tcp 158.181.127.35:28967: i/o timeout\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:141\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-09-06T17:21:51.507Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "attempts": 1, "error": "ping satellite: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967: rpc: dial tcp 158.181.127.35:28967: i/o timeout", "errorVerbose": "ping satellite: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967: rpc: dial tcp 158.181.127.35:28967: i/o timeout\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:141\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-09-06T17:21:51.594Z	WARN	console:service	unable to get Satellite URL	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "console: trust: satellite \"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW\" is untrusted", "errorVerbose": "console: trust: satellite \"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW\" is untrusted\n\tstorj.io/storj/storagenode/trust.(*Pool).getInfo:238\n\tstorj.io/storj/storagenode/trust.(*Pool).GetNodeURL:177\n\tstorj.io/storj/storagenode/console.(*Service).GetDashboardData:174\n\tstorj.io/storj/storagenode/console/consoleapi.(*StorageNode).StorageNode:45\n\tnet/http.HandlerFunc.ServeHTTP:2042\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:210\n\tnet/http.serverHandler.ServeHTTP:2843\n\tnet/http.(*conn).serve:1925"}
2021-09-06T17:21:51.855Z	WARN	contact:service	Your node is still considered to be online but encountered an error.	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Error": "contact: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967 using QUIC: rpc: quic: timeout: no recent network activity"}
2021-09-06T17:21:51.940Z	INFO	piecestore	download started	{"Piece ID": "QVDTNH6XE5GANBZQ7WLTY32UXGA3UPKXOT2MGAZX32IA5F5S4PCQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_REPAIR"}
2021-09-06T17:21:52.070Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 1, "error": "ping satellite: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967: rpc: dial tcp 158.181.127.35:28967: i/o timeout", "errorVerbose": "ping satellite: failed to dial storage node (ID: 12RLSXrobgFaMCrnZdQ34v2WbJgu1CBpYHQh8vTf25mo6KJXscF) at address strojstoragej.ddns.net:28967: rpc: dial tcp 158.181.127.35:28967: i/o timeout\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:141\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-09-06T17:21:52.124Z	INFO	piecestore	downloaded	{"Piece ID": "QVDTNH6XE5GANBZQ7WLTY32UXGA3UPKXOT2MGAZX32IA5F5S4PCQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_REPAIR"}
2021-09-06T17:21:52.815Z	INFO	piecestore	upload started	{"Piece ID": "CYICGV4URRDAJ4RNP6HFZ6NP4PAUNYUN235PHCIA3RNHHZ6MOXXA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 25271717949592}

SGC · September 14, 2021, 2:23pm

audits are the most critical and cannot be recovered from if the node gets DQ, in almost all cases.

suspension results in no ingress, the barrier is 60% if memory serves

Online score is the avg online time over the last 30 days.
a 40% drop is equal to about 12 days of downtime, so with an 8% drop you should notice the connection is offline for extended periods.

if that doesn’t happen and your connection is stable, it could be if you are using a ddns or maybe a firewall blocking data, seems rather consistent and thus the blocking would have to be pretty random else it would be more satellite dependent…

so my money would be on some sort of ddns or router issue, our you simply had an period of extended downtime for whatever reason and because it takes 30 days for online score to reset, it seems like its getting worse while its infact stable.

there are services like uptime robot that will monitor your “node” for free.

Jov · September 15, 2021, 7:06pm

Thanks a lot for your answer
As you might saw in the attached log it’s kind of a weird behaviour that in the same second i get a error “ping satellite failed” there is also a download finished and a upload starting. So in general my node wasn’t really offline but the ping failed and thats why i get downtime apparently. When i went through my logs i found a lot of cases like that. so i’m pretty sure it wasn’t just a extended downtime.

SGC · September 16, 2021, 10:56am

ah yeah you are right, the drop in online score is across all of them tho… so maybe some sort of internet / isp security against ddos attacks or something.

or your / the node connection is unstable, but it does seem like it is uploading and downloading while getting the ping error, which seems to indicate that its something that only affects some traffic and only at some points in time.

seems a bit weird.

@Alexey @BrightSilence any of you got a good idea about what this can be… i’m a bit puzzled by it.

BrightSilence · September 16, 2021, 1:00pm

The issue sounds similar to what @Jov posted a while back here: Node downtime after adding second node

I am still suspecting it’s an indication that the router can’t keep up. Try restarting it first, see if that helps. If not try resetting the router settings. I had a similar issue in the beginning where connections were dropped intermittently. Resetting my router fixed it. I believe based on the other topic you have an Asus router as well, so it might be a similar issue. While you’re at it, make sure it has the latest updates. Also check the router logs for errors.

Jov · September 20, 2021, 8:24pm

i might was able to fix the issue with some configuration changes on the router. i will see if the online score goes up again in the next couple days. thanks everyone for your help.