Your node has been suspended

pietro · May 4, 2020, 3:27pm

I ordered exactly the same drive, it should arrive tomorrow. In another thread I’ve been suggested to cancel the order, which is what I did, but unfortunately Amazon rejected the cancellation and they sent me the drive anyway. If it’s not up to the job I’ll give it back or recycle it on my NAS for backups and will buy another model for Storj.

But, as I said in the dedicated thread, the problem is knowing which models are SMR and which are PMR before buying the drive, but that’s very hard because vendors are started to removing this information from their datasheets.

Lukasz · May 4, 2020, 3:59pm

@nerdatwork, @pietro
I checked the other drives I have. They are not in the storj.
-2x ST8000DM004
-1x ST8000AS0002
Basycli according to the Seagate and WD. HDD without the SMR are the more expensive drives:

Seagate IronWolf
WD Red Pro and WD Gold
Do we have any solution to my situation or I need to shutdown all operation if I have those drives?
I’m with the storj from 2017 without any break starting from the story v2

p.s. @nerdatwork again thx for the “tag” tip

BrightSilence · May 4, 2020, 6:31pm

Both WD and Toshiba have published lists of their models currently on sale that use SMR. Seagate has yet to do this. So… don’t buy Seagate unless you buy Ironwolf (Pro) or Exos X drives.

As for the drive you ordered. If you ordered that for Storj, I wouldn’t even bother trying it. Just send it back in its original sealed packaging, you get less trouble from the store that way. There have been enough reports of nodes running into trouble that it’s really not worth taking that risk as it’s almost certain to cause you problems, but they may not appear until the node is vetted and getting full traffic.

4ich · May 4, 2020, 7:00pm

has the storj network any issues ? iam feeling a bit to low traffic since 2 hours

my Pi node seems to not have smr drive since i cant find any informations about it … but i cant exclude it

bdurrer · May 4, 2020, 7:38pm

Is it only me that finds it disturbing how these problems are handled? Instead of properly handling sqlite locking there are articles on how to fix the database?

pietro · May 4, 2020, 8:12pm

Thanks. I think I will not accept the package from the delivery man so it will return back to Amazon without even bothering to start the return procedure, print the labels, package the item, etc. and this will avoid discussion with them regarding packaging opening and use.

And, yes, I intended to use it for Storj because I ordered it before knowing about SRM. My node is already vetted, I’m in the network since september, my plan was to move the data to the new disk using rsync (procedure already succesfully performed when I moved from my NAS to a dedicated solution).

anon27637763 · May 4, 2020, 8:25pm

If you look in the storage directory, you’ll note there are WAL files…

These files are there because sqlite is running in WAL mode…

Here is WAL mode

There are advantages and disadvantages to using WAL instead of a rollback journal. Advantages include:

WAL is significantly faster in most scenarios.

WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.

Disk I/O operations tends to be more sequential using WAL.

WAL uses many fewer fsync() operations and is thus less vulnerable to problems on systems where the fsync() system call is broken.

So… Storj has configured the database properly for the application.

My general feeling is that the choice of sqlite is mildly problematic… but that’s a controversial opinion.

bdurrer · May 4, 2020, 9:11pm

You can break an sqlite db with WAL mode too. Believe me, our team has been there.

My point is that it breaks and we, the operators end up with the fallout. The nodes should be made in a way that they can go back into operation with only part of the data and, if the db is not failsafe, it should be restoreable from the satellite.

Alexey · May 4, 2020, 9:15pm

…and will be disqualified if they doing so.

Better to do not start at all.

However, regarding databases I have to agree.

bdurrer · May 4, 2020, 9:48pm

You misunderstood my statement. I am saying that when a node fails audits and is suspended, storj code should handle bringing it back. I know you do not want to have it download missing data, that’s fine. But storj could allow the node to rejoin the network with only that data that is healthy, instead of simply throwing it away. I see no reason why a single broken sektor of a disk should make a node leave the network for good. Storj would profit when someone is able to rescue 80% data.

littleskunk · May 4, 2020, 9:56pm

Beside that you get DQed for losing 20% of the data this is already the case. You can reset all the SQLite DBs as long as you are still holding the data. The SQLite DBs are not required for that. They need to be in a healty state.

I don’t agree that storj has to provide that. We are a community. If you need a cleanup for your SQLite DB I am happy to help you but it currenlty don’t has the priority that would allow me to work on it myself.

Alexey · May 4, 2020, 9:57pm

It is already in place. However, if audit score will fail below 0.6, your node will be disqualified.
Why the customers should trust the failed node? How you can be sure, that other data is not corrupted or missed?
I would suggest you to read this blog:

bdurrer · May 4, 2020, 10:12pm

ah nice, didn’t know that. For “why keep trusting the failed node”?
Well, why not? you trusted it before, for all you know he might switched from a single drive to raid 5. And you already have a lot of mechanism to make it build up trust again, e.g. the same as with new nodes but accelerated. But then, you are the experts here. To me as lowly node operator, it feels like you are abandoned when something happens, even when it could just as well be storj code that caused it.

Lukasz · May 5, 2020, 10:31am

Hi @Alexey ,
Sorry to bother.
Unfortunately, my HDD has SMR technology. From Sunday I was suspended on my dashboard with 4 satellites.
I was assuming, that we can connect any unused HDD to Storj network and run the node. Apparently not “any” HDD.
Yesterday, I received 4 e-mails confirming my node is suspended. I checked the logs. Till yesterday have a lot of the “database lock” errors.
But from around 10:00 PM last night that sims to stop. Currently, I see "upload failed"error and I do not see suspended node referring to any of the satellite.
Also what I notice is that I’m losing the data.
Should I invest in the much price HDD without that SMR technology if I still want to have my node?
Do I lose all of the data from my current HDD?
Thank you.

jammerdan · May 5, 2020, 10:50am

I think you are making an important point here. Even more, Storj has repeatedly stated, not to buy hardware for node operation. Now if it turns out that major hard drives are not suitable for running a note that lead those claims and statements to ad absurdum.

andrew2.hart · May 5, 2020, 11:09am

I use (the very naughty):
storage2.max-concurrent-requests: 1
on my SMR and don’t get any database locked errors.

I don’t suggest anyone else should use it -

deathlessdd · May 5, 2020, 2:09pm

I also run SMR hard drives but I have yet to get suspended so that isn’t the leading cause of suspended nodes.

cdhowie · May 5, 2020, 2:50pm

I don’t know if you can jump from a sample size of one to “isn’t the leading cause” but I will add that I am also running several nodes on SMR disks and haven’t had any suspension issues with them.

deathlessdd · May 5, 2020, 2:54pm

Thats true but I can also make a point that the specs for my node are Dirt from a 11 year old dell optiplex dual core intel, Has not gotten suspended yet running a Full SMR hard drive 2 4TB has not gotten suspended yet.

cdhowie · May 5, 2020, 2:56pm

Yes, I mostly agree with you. It seems like SMR is being used as the scapegoat for a different underlying issue.