Storj service keeps stopping and no clear reason why

7tigers · October 20, 2020, 1:58pm

Hi Storj,

I’m running A Windows node (not docker) connected to iSCSI storage the Storj service will randomly stop after working fine for X number of days.

I need to be able to rely on the service to be stable and not shut down or need daily babysitting.

If there’s a problem with storage/database/network, alerting needs to come to email, IM, or popup a notification on the server with the relevant bits from the log as to why Storj had to stop…

Current audit/suspension score is thankfully still 100% across all satellites but at risk of dropping if the service keeps shutting down without any warning

Needed Improvements:
Stabilized service
Service notifications/status/alerts via Slack/Telegram/Desktop/etc.
The logfile needs to automatically archive every month to keep the bloat down

SGC · October 20, 2020, 7:18pm

you are most likely loosing the connection to the iSCSI for a brief moment and that’s enough for the node to shutdown, you may be able to simply set the windows storagenode service to automatically restart.

in theory that should keep the storagenode running at all times and if it sees it’s off it will relaunch it.
but i haven’t used it, because my storagenode is on a linux system… so cannot say that it works 100%, tho pretty sure it should.

doesn’t really solve the problem in a great way, but should work…
the storagenode software has recently gotten additional features to help protect against DQ when drives where disconnected… so it’s most likely not a bug… it’s a feature…

generally most network storage solutions doesn’t work or barely works with a storagenode… something with the database not working correctly over NFS and SAMBA
iSCSI can work, but like you noted… doesn’t really run flawlessly… it is possible that your network or iSCSI configuration / setup isn’t optimal and thus might cause slight connection issues that you wouldn’t normally notice… but the storagenode will… even USB will sometimes not work correctly for various reasons, and thus the recommended method of attaching drives are directly into a sata and/or sas controller directly connected to the system.

also iSCSI is to my understanding difficult to setup correctly, but haven’t really used it, so can’t really say…
but thats what i heard from somebody that actually uses it…

7tigers · October 21, 2020, 1:11pm

I poked through the Log for Node2 and it’s complaining about disk space requirements not met…

It’s a 650GB slice, the minimum is 500GB and I’ve raised the capacity once before because of this same error last month

How does one repair the Storj file when it’s in the state?

Also, the iSCSI setup I have is very simple and straightforward on a very robust network, the Node has been fine for the past 8 months, only recently it started dropping much more often…

SGC · October 21, 2020, 3:40pm

well it’s only recent that the node shutdown on storage disconnects was added.
not sure if one can turn that off…

i doubt it’s capacity related, you can adjust capacity up or down like you want and it will just stop ingress and then wait until the deleted data requests makes it below the max and then it will start taking in data again…

the storage does do some sort of checks on storage capacity to ensure it doesn’t fill a drive beyond max, it might be that which is causing you grief… but i really have no clue how that works

but again it might be related to iSCSI but duno…

@Alexey or @littleskunk might be able to help