Your node has been suspended

But then again I don’t know if having an even bigger SMR hard drive would make much difference though, do you?

It’s a common issue:

And related to

This is because your node finally passed audits. I don’t think that SMR is directly related to the “database is locked” issue. My nodes have a similar error on PMR disks, but for orders and bandwidth databases.

5 posts were split to a new topic: Suspension emails feature

Hi,
I got an email this morning that my node has been suspended on one satellite, however:

  • there is no evidence of database lock in the log file (I searched over a 20 days old log)
  • there is no evidence on the dashboard of any suspension
  • I checked the console based report (curl -s http://127.0.0.1:14002/api/sno/ | jq .) and there is also in this case no evidence of suspension

can be a case of something wrong on storj backend side?

Check your log for download failed and GET_AUDIT. If you can’t find it then that email is probably sent out by error. Also your web dashboard shows if your node is DQed or suspended.

From the wording it seems you are using multiple drives. So either multiple nodes or RAID, which means you’re at least cutting the load per drive in half. It’s not a scape goat. SMR drives are causing issues for people. Running multiple nodes might fix that though.

Yeah I am running 1 drive per node once it fills. No raids here. Could that be the long term way to do it with SMR drives? I don’t know if the issue itself it with raids and or bigger SMR drives.

FWIW, I’m running at least one node on a single SMR HDD and I’m not having any of these kinds of issues… and I know for a fact that the drive is pretty slow, even for reads. It’s a miracle that it works at all, but I haven’t had any failed audits.

1 Like

“At least one” still suggests you are running multiple nodes, which means this SMR HDD is not seeing the full write load it would see if it was the only node running on that /24 subnet.

I don’t see why bigger drives would be a bigger issue. And since RAID also spreads the load during daily operation, it’s actually a solution.
Though SMR drives can become a problem during RAID rebuilds. I would recommend lowering the speed of RAID rebuilds if you have SMR drives in the array to give the drive some space to write data from the CMR cache to SMR storage.

I started the node when the others were all full and before saltlake started deleting stuff, so it got pretty much full ingress traffic until it filled up. Then the deletes started.

Ahh, I see. Fair enough, that should get most of the load on one HDD after all. I guess not all SMR HDD’s are alike.

Help me please

My Node was suspended

Node ID: 12vsSXDUMxMGdKcE8cYvYmanpn2UHodzBpmHUmtoeQfPw653niJ

What can I do?

Hey @dedede,
Welcome to the forums.

Suspension is caused by errors during audit checks, please check your logs for lines with “error” and “GET_AUDIT” in them.

There is currently a known issue being worked on with databases being locked. If you see this error for now the best you can do is vacuum the mentioned db’s and perhaps defrag them.

While the solution is being worked on, nodes won’t get disqualified as a result of this suspension.

Yeah, I’ve just got the " Your node has been suspended on 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB ." note.

I’ve just upgraded from an 8TB Seagate IronWolf Pro to the 16TB so to have just 1 drive for Storj as I’m just on Windows 10 home and using the simple “storagenode program for Storj” program. Trying to keep this simple and hassle free. But now every morning I’m waking to my node being offline for no reason. My PC is left locked with no options for sleeping for the operating system or the harddrives. I have to restart the machine and then everything works flawlessly till the next night when it does the same thing. This has never been an issue before.

Anything I can check or do to sort this? Or is this also part of the known issue mentioned above? How can I “Vacuum” the database? Windows does seem to think that drive needs Defragging.

Please, search for GET_AUDIT and failed in your logs:

Also, search for any errors contains “I/O” or some “FATAL” errors.

I have external disk plug into USB2.0 and HDD via eSATA on LVM linear volume(Debian10). And periodically(when have big traffic on download) was have “usedserialsdb error: database is locked”, and was suspended my node at almost all satellites. I was did defragmentation and vacuumate, but still errors. Some days ago i was disabled write cache on both disks and errors was gone.

Welcome to the forum @SeWIR

This is not advisable. It could lead to corruption of data and. I sure hope you hope you have a very stable power grid, because there is no way the databases can protect themselves against corruption on power failure or unclean shut down now.

Write cache is disabled and data writing directly on disk without RAM. Or i something don’t understand?
I used this: hdparm -W0 -K1 /dev/sdx

that would also be my understanding and even advised on windows (for nodes).

I think @BrightSilence meant it the other way around. Enabling the write cache could lead to data corruption. So you’re fine.

2 Likes