Linux node seems to stop showing data randomly?

darrenjlobb · November 29, 2019, 6:25pm

New to running my own node, am running on Linux (Ubuntu 18.04 Server)…

Successfully setup the storj service with no problems, and obviously forwarded the ports etc, everything working great etc, however I am having a problem in that it appears to be “stopping” randomly, id say around once every 24 hours…

When I say stopping.I think its stopping, but its still reporting as online… Basically I have the web interface open on my computer most of the time in a tab somewhere, and hit refresh from time to time to look at the statistics, but around once a day, it will all zero out (all the bandwidth / storage info on the web gui)…no matter how many times you refresh it all stays at zero…however still reports the node as being “Online”… Its running on a VM, so i just power cycle the VM, and within 5-10 seconds it all comes back up / refresh the page again and all is well…

Any idea on where to start troubleshooting, is there an error log somewhere I can view to try and see what is happening when it stops?

Thanks!

Darren

Alexey · November 29, 2019, 8:09pm

Hello @darrenjlobb,
Welcome to the forum!

Please, check the free RAM when it happened, also, some logs could help too

docker logs --tail 20 storagenode

Vadim · November 29, 2019, 8:17pm

what time zone do you have?

darrenjlobb · November 29, 2019, 8:19pm

UK…GMT

Can this effect things?

Vadim · November 29, 2019, 8:20pm

do you have picture how it all stoping?

darrenjlobb · November 29, 2019, 8:21pm

No, but I will screenshot the next time it happens…

Vadim · November 29, 2019, 8:22pm

did you made restart after stop and after all working, or it just start to work itself?

darrenjlobb · November 29, 2019, 8:23pm

Thanks, thats the command I was trying to come up with to narrow things down.

Have just ran it, and there are info logs all day, and then this error log which I think is when it stopped… Looks to be a disk I/O issue? Wondering why as it should all be up etc…

"2019-11-29T20:18:09.987Z ERROR piecestore:cacheUpdate error persisting cache totals to the database: {“error”: “piece space used error: disk I/O error”, “errorVerbose”: “piece space used error: disk I/O error\n\tstorj. io/storj/storagenode/ storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:115\n\t storj.io/ storj/ storagenode/ pieces.(*CacheService).PersistCacheTotals:82\n\ tstorj. io / storj/ storagenode/ pieces.(*CacheService).Run.func1:68\n\tstorj. io/storj/private/sync2.(*Cycle).Run:147\n\tstorj. io/storj/storagenode/pieces.(*CacheService).Run:63\n\tstorj. io/storj/storagenode.(*Peer).Run.func6:435\n\tgolang. org/x/sync/errgroup.(*Group).Go.func1:57”} "

darrenjlobb · November 29, 2019, 8:24pm

I did a full restart when it stopped, and everything works fine on restart, until it stops again…

Alexey · November 29, 2019, 8:25pm

How is your storage connected to there?

darrenjlobb · November 29, 2019, 8:27pm

An NFS shared mounted vs fstab. The share is shared from the same host machine that the VM is running from… (It’s an UnRaid box…). is there a better way to do it?

Vadim · November 29, 2019, 8:28pm

Today i had similar problem on my nod with1 hdd, when i cheked it was disapeard from list of hdd.
Dashboard was showing same thing, just not data and no bandwidth. After restart started to work, but i copied all data, to other hdd and replaysed hdd, i was lucky because it was first day of node and data was some 5-6 GB only. So i thing you have isue with HDD if it old hdd it start to die.

Alexey · November 29, 2019, 8:29pm

The NFS and SMB doesn’t supported. Exactly because the sqlite is unable to work on network connected drives.
Please, use a local connected drive or at least iSCSI

darrenjlobb · November 29, 2019, 9:52pm

Ok…I did wonder…

So have now changed the setup, removed the NFS mount, added a directly attached disk, formatted it, mounted it to the same mount point, copied the 2 files and 2 folders from the old disk, and rebooted it, but not it dosnt appear to be running at all / not web interface anyway… Can i copy it like I did, or do i need to do something different?

Thanks

darrenjlobb · November 29, 2019, 9:54pm

Sorry ignore the above, I didnt start the docker container! All back up and running now with new disk. Will monitor over next few days and see how it goes. Thanks for the speedy help!

Alexey · November 29, 2019, 10:09pm

I hope, that you haven’t removed the data

darrenjlobb · November 29, 2019, 10:21pm

No, left original disk / nfs share intact, just unmounted it… Copied to new disk / mounted it in same place and everything seems to be great, fingers crossed!