Storagenode.exe process locks up from time to time

BigDaddio · June 15, 2022, 4:00am

storagenode.exe process locks up from time to time

This is the second time it has done this,
version number is v1.56.4 .
9GB node, 8GB used…

Single node running on windows 10, i5-4690, 16gb ram. couple of other VBox VMs running on the same system

When it locks up…

Stopping the service to restart it, results in a perpetual ‘stopping’ status that cannot be restarted.
storagenode.exe cannot be killed using task manager, attempting this results in an error (from memory) “operation cannot be completed” or similar…
The web gui is dead with a page cannot be displayed or similar error
restarting the computer results in a perpetual “Restarting” window, I presume because windows is trying to kill it but can’t… Waited 20 minutes but nogo. Had to shutdown the PC the not very nice way…
Inevitably happens overnight so the node ends up offline for several hours and resulting in yellow warnings for online time

Not sure what to do about it… Last time it happened was a month or so ago but I didn’t record any details of it at the time. This post is mainly to get something on the record…

I can’t see anything in the log about it but it is a mess and I really don’t understand all of it, it’s about 400Mb …

I have a bunch of (100’s) errors
unable to update piece info {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs” …

the odd
unable to update piece info {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs” etc

and a few
console:service unable to get Satellite URL {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”

littleskunk · June 15, 2022, 8:23pm

Maybe same problem as in this topic? One Node is using more RAM

BigDaddio · June 15, 2022, 9:13pm

Sorry I wasn’t clear about that, the 8/9GB is the storage capacity - RAM usage of the node is around 500MB.

I also don’t appear to have that parameter mentioned in the link for the config file
i.e

in-memory buffer for uploads

filestore.write-buffer-size: 4.0 MiB

Alexey · June 22, 2022, 7:18am

I would like to recommend to stop the storagenode and check your disk for errors, maybe it’s dying.
How this disk is connected to the system?
Do you really mean 9 Gigabyte node?

BigDaddio · June 22, 2022, 8:37am

Doh, no that is 8TB used of a 9TB allocation.

It’s a ST10000VN0004-1ZD101 drive connected to the sata port on Mainboard.

Smart does show Raw Read Error Rate values in the millions but what I can gather this is normal for seagate drives

I also have an LSI SAS 2008 Card in the machine - would it help to switch it over to that?

Alexey · June 22, 2022, 8:50am

Such lockups usually mean some hardware issue, maybe just cable have a bad contact or not fully connected, maybe port on mainboard is malfunctioning, etc.
I do not know is this card could help. There could be some incompatibilities between SAS and SATA as well.
Could you please copy one of the errors “unable to update piece info” between two new lines with three backticks:

```
here is the error from the log
```

BigDaddio · June 22, 2022, 9:32am

Like such

2022-06-22T20:59:09.850+1200	ERROR	collector	unable to update piece info	{"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "4FPQL62CI47OHGTLWKPPEIJF4T4FBPZRXSQJONE5WC2XPGMSMLYA", "error": "pieceexpirationdb: database disk image is malformed", "errorVerbose": "pieceexpirationdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:547\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:103\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

They each appear to be followed by a unable to delete

2022-06-22T20:59:09.850+1200	ERROR	collector	unable to delete piece	{"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "4FPQL62CI47OHGTLWKPPEIJF4T4FBPZRXSQJONE5WC2XPGMSMLYA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Alexey · June 22, 2022, 7:15pm

You need to fix the database piece_expiration.db file: https://support.storj.io/hc/en-us/articles/360029309111-How-to-fix-a-database-disk-image-is-malformed-

This error should gone after 7 days, if not you can apply this workaround: ERROR collector unable to delete piece - #11 by BrightSilence