Linux node seems to stop showing data randomly?

New to running my own node, am running on Linux (Ubuntu 18.04 Server)…

Successfully setup the storj service with no problems, and obviously forwarded the ports etc, everything working great etc, however I am having a problem in that it appears to be “stopping” randomly, id say around once every 24 hours…

When I say stopping.I think its stopping, but its still reporting as online… Basically I have the web interface open on my computer most of the time in a tab somewhere, and hit refresh from time to time to look at the statistics, but around once a day, it will all zero out (all the bandwidth / storage info on the web gui)…no matter how many times you refresh it all stays at zero…however still reports the node as being “Online”… Its running on a VM, so i just power cycle the VM, and within 5-10 seconds it all comes back up / refresh the page again and all is well…

Any idea on where to start troubleshooting, is there an error log somewhere I can view to try and see what is happening when it stops?

Thanks!

Darren

Hello @darrenjlobb,
Welcome to the forum!

Please, check the free RAM when it happened, also, some logs could help too

docker logs --tail 20 storagenode

what time zone do you have?

UK…GMT

Can this effect things?

do you have picture how it all stoping?

No, but I will screenshot the next time it happens…

did you made restart after stop and after all working, or it just start to work itself?

Thanks, thats the command I was trying to come up with to narrow things down.

Have just ran it, and there are info logs all day, and then this error log which I think is when it stopped… Looks to be a disk I/O issue? Wondering why as it should all be up etc…

"2019-11-29T20:18:09.987Z ERROR piecestore:cacheUpdate error persisting cache totals to the database: {“error”: “piece space used error: disk I/O error”, “errorVerbose”: “piece space used error: disk I/O error\n\tstorj. io/storj/storagenode/ storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:115\n\t storj.io/ storj/ storagenode/ pieces.(*CacheService).PersistCacheTotals:82\n\ tstorj. io / storj/ storagenode/ pieces.(*CacheService).Run.func1:68\n\tstorj. io/storj/private/sync2.(*Cycle).Run:147\n\tstorj. io/storj/storagenode/pieces.(*CacheService).Run:63\n\tstorj. io/storj/storagenode.(*Peer).Run.func6:435\n\tgolang. org/x/sync/errgroup.(*Group).Go.func1:57”} "

I did a full restart when it stopped, and everything works fine on restart, until it stops again…

How is your storage connected to there?

An NFS shared mounted vs fstab. The share is shared from the same host machine that the VM is running from… (It’s an UnRaid box…). is there a better way to do it?

Today i had similar problem on my nod with1 hdd, when i cheked it was disapeard from list of hdd.
Dashboard was showing same thing, just not data and no bandwidth. After restart started to work, but i copied all data, to other hdd and replaysed hdd, i was lucky because it was first day of node and data was some 5-6 GB only. So i thing you have isue with HDD if it old hdd it start to die.

The NFS and SMB doesn’t supported. Exactly because the sqlite is unable to work on network connected drives.
Please, use a local connected drive or at least iSCSI

Ok…I did wonder…

So have now changed the setup, removed the NFS mount, added a directly attached disk, formatted it, mounted it to the same mount point, copied the 2 files and 2 folders from the old disk, and rebooted it, but not it dosnt appear to be running at all / not web interface anyway… Can i copy it like I did, or do i need to do something different?

Thanks

Sorry ignore the above, I didnt start the docker container! All back up and running now with new disk. Will monitor over next few days and see how it goes. Thanks for the speedy help!

I hope, that you haven’t removed the data

No, left original disk / nfs share intact, just unmounted it… Copied to new disk / mounted it in same place and everything seems to be great, fingers crossed!

1 Like