Latest Update (v.108.3) killed node

Mark · July 27, 2024, 9:16pm

I doubt that will change anything. Settings passed in the docker start command are higher priority so the 2tb line in the config file gets ignored.

mtone · July 28, 2024, 4:00am

Forget who mentioned this trick, but to use config.yaml for storage size you can pass empty string to docker: -e STORAGE="" \ .
If omitted it defaults to 2TB regardless of config.yaml.

nerdatwork · July 28, 2024, 4:14am

It was @snorkel . He/she experimented with config.yaml.

snorkel · July 28, 2024, 5:51am

“He”. I’m proud about my gendre.

Alexey · July 28, 2024, 5:56am

You may update the pronounces in your profile

Nuke1999 · July 28, 2024, 4:34pm

You were right; changing the storage size on the config.yaml did nothing. However, setting storage2.piece-scan-on-startup: false on config.yaml AND lowering the storage size on the docker start to 500GB did start up the node successfully, and had it up for > 12 hours. I’ve also had success with docker start of 2TB. I am now going to try 10TB, which would allow uploads as that is slightly above what is currently used by Storj.

Nuke1999 · July 28, 2024, 10:39pm

Ok so it looks like it runs fine as long as the docker run storage is low enough that it doesn’t try adding new files. These are the last 50ish logs when I set the docker run command back to 10TB so that it could try to upload, before losing access to the dashboard; once again took ~ 1 hr.

2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "W654UZDSELW7BYJ4OAI4OKCYTVH2BQ655OR4L5EERPD2EUT3OC4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 2304, "Remote Address": "79.127.219.36:59248"}
2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "ORSNKXRARNRO6UORNRNVNCXZHSW5TAR5FRJ6GFO44N7SKPZ3QFTQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 68608, "Remote Address": "79.127.205.229:33082"}
2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "W654UZDSELW7BYJ4OAI4OKCYTVH2BQ655OR4L5EERPD2EUT3OC4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 2304, "Remote Address": "79.127.226.99:52310"}
2024-07-28T22:36:29Z    INFO    piecestore      upload canceled {"Process": "storagenode", "Piece ID": "2OOGSGIM6VUMT7SSFCOARXANB32M3GYWGVXEXQLA4C3YFPNM2CHQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT_REPAIR", "Remote Address": "199.102.71.53:45856", "Size": 0}
2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "W654UZDSELW7BYJ4OAI4OKCYTVH2BQ655OR4L5EERPD2EUT3OC4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 2304, "Remote Address": "109.61.92.75:41984"}
2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "W654UZDSELW7BYJ4OAI4OKCYTVH2BQ655OR4L5EERPD2EUT3OC4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 2304, "Remote Address": "79.127.219.42:40854"}
2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "W654UZDSELW7BYJ4OAI4OKCYTVH2BQ655OR4L5EERPD2EUT3OC4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 2304, "Remote Address": "121.127.47.27:56152"}
2024-07-28T22:36:29Z    INFO    piecestore      download canceled       {"Process": "storagenode", "Piece ID": "W654UZDSELW7BYJ4OAI4OKCYTVH2BQ655OR4L5EERPD2EUT3OC4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 2304, "Remote Address": "79.127.205.241:58280"}
2024-07-28T22:36:29Z    INFO    piecestore      upload canceled (race lost or node shutdown)    {"Process": "storagenode", "Piece ID": "6QOJRSIP4NTODDOEKI6T2BLOOTLCHN63AJQ6KW72VAROPUDNATWQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.34:42008"}
2024-07-28T22:36:29Z    DEBUG   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "6QOJRSIP4NTODDOEKI6T2BLOOTLCHN63AJQ6KW72VAROPUDNATWQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.34:42008", "Size": 3840, "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:526\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:535\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-28T22:36:29Z    INFO    piecestore      upload canceled (race lost or node shutdown)    {"Process": "storagenode", "Piece ID": "DJNIB4MAAQ42HCJEE7M3W2WI5QR2BTDTOHMT6ATS2VLOFFXKM4XA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.36:34720"}
2024-07-28T22:36:29Z    DEBUG   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "DJNIB4MAAQ42HCJEE7M3W2WI5QR2BTDTOHMT6ATS2VLOFFXKM4XA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.36:34720", "Size": 3840, "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:526\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:535\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-28T22:36:29Z    INFO    piecestore      upload canceled (race lost or node shutdown)    {"Process": "storagenode", "Piece ID": "5TSOTQJMUMVAUZEWQFTFMGX7ZVVTWK5MOS5AZKMOZ24DQV4JECWA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.33:35192"}
2024-07-28T22:36:29Z    DEBUG   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "5TSOTQJMUMVAUZEWQFTFMGX7ZVVTWK5MOS5AZKMOZ24DQV4JECWA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.33:35192", "Size": 17920, "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:526\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:535\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-28T22:36:29Z    INFO    piecestore      upload canceled (race lost or node shutdown)    {"Process": "storagenode", "Piece ID": "ZITOAR5LEIRTZ5VKCB7IYKTYOGDKUKR6SLGFQVR3FV2ZRPIX4GRA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.33:60452"}
2024-07-28T22:36:29Z    DEBUG   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "ZITOAR5LEIRTZ5VKCB7IYKTYOGDKUKR6SLGFQVR3FV2ZRPIX4GRA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.33:60452", "Size": 4352, "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:526\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:535\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-28T22:36:29Z    INFO    piecestore      upload canceled (race lost or node shutdown)    {"Process": "storagenode", "Piece ID": "ZWF4EGAKVLL4INRQJ2EKO7WNCJ6XXVKQNWFW4SENP6KLS6JQOBEQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.226.102:55756"}
2024-07-28T22:36:29Z    DEBUG   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "ZWF4EGAKVLL4INRQJ2EKO7WNCJ6XXVKQNWFW4SENP6KLS6JQOBEQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.226.102:55756", "Size": 2048, "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:526\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:535\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-28T22:39:12Z    ERROR   failure during run      {"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-28T22:39:12Z    DEBUG   Unrecoverable error     {"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
2024-07-28 22:39:12,878 INFO exited: storagenode (exit status 1; not expected)
2024-07-28 22:39:13,889 INFO spawned: 'storagenode' with pid 485
2024-07-28 22:39:13,892 WARN received SIGQUIT indicating exit request
2024-07-28 22:39:13,895 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-07-28T22:39:13Z    INFO    Got a signal from the OS: "terminated"  {"Process": "storagenode-updater"}
2024-07-28 22:39:13,911 INFO stopped: storagenode-updater (exit status 0)
2024-07-28 22:39:13,917 INFO stopped: storagenode (terminated by SIGTERM)
2024-07-28 22:39:13,928 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
2024-07-28 22:39:19,283 INFO RPC interface 'supervisor' initialized
2024-07-28 22:39:19,284 INFO supervisord started with pid 1
2024-07-28 22:39:20,289 INFO spawned: 'processes-exit-eventlistener' with pid 13
2024-07-28 22:39:20,294 INFO spawned: 'storagenode' with pid 14
2024-07-28 22:39:20,299 INFO spawned: 'storagenode-updater' with pid 15
2024-07-28T22:39:20Z    INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}
2024-07-28T22:39:20Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}
2024-07-28T22:39:20Z    DEBUG   tracing collector       started {"Process": "storagenode"}
2024-07-28T22:39:20Z    DEBUG   debug server listening on 127.0.0.1:41137       {"Process": "storagenode"}
2024-07-28T22:39:20Z    INFO    Operator email  {"Process": "storagenode", "Address": "myEmailAddr"}
2024-07-28T22:39:20Z    INFO    Operator wallet {"Process": "storagenode", "Address": "myWallete"}
2024-07-28 22:39:21,387 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-28 22:39:21,387 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-28 22:39:21,387 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Mark · July 28, 2024, 11:00pm

What type of system are you running the Storagenode on? computer? Raspberry Pi? other?
Edit: Never mind, I just looked back at your first post and it looks like a Raspberry pi. I’m on a pi also. Before I did graceful exit on the salt lake satellite I remember my pi was having some issues with all the test data for some reason. So I put this in my config to keep it from getting overloaded:
storage2.max-concurrent-requests: 30

You might try that and see if it makes any difference.

Edit2: Also I see in the logs that it said : piecestore monitor: timed out after 1m0s while verifying writability of storage directory

You might want to take a look at your disk activity and see if your storage node drive is getting overloaded. I like a program called iostat.

should be able to install it with:
sudo apt install sysstat

and run it with:
sudo iostat -xm 2

In this example the bottom two rows are two drives: sda and sdb. Those are my usb hard drives that I use for some storage nodes. The utilization circled in green is well below 100% so the drives are not overloaded at the moment.

Nuke1999 · July 29, 2024, 2:36am

I’d like to thank everyone that tried to help me find the root cause; I’ve finally found it. My 6 month old 8TB SSD’s running in jbod setup is experiencing data integrity issues, as was determined by fsck; this seemingly lined up perfectly with the storj update to 108.3. I apologize for pointing fingers in the original post when it was my own hardware that was failing. Thanks again, everyone.

Inode 22454611 (...) has invalid mode (00).
Clear? yes

Inode 22454613 (...) has invalid mode (00).
Clear? yes

Inode 22454616 (...) has invalid mode (00).
Clear? yes

Inode 22454620 (...) has invalid mode (00).
Clear? yes

Error reading block 179309110 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454631 (...) has invalid mode (00).
Clear? yes

Error reading block 179309111 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454654 (...) has invalid mode (00).
Clear? yes

Error reading block 179309112 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454663 (...) has invalid mode (00).
Clear? yes

Error reading block 179309113 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454687 (...) has invalid mode (00).
Clear? yes

Error reading block 179309114 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454693 (...) has invalid mode (00).
Clear? yes

Error reading block 179309116 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454732 (...) has invalid mode (00).
Clear? yes

Error reading block 179309117 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22454742 (...) has invalid mode (00).
Clear? yes

Inode 22454744 (...) has invalid mode (00).
Clear? yes

Error reading block 179309151 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455287 (...) has invalid mode (00).
Clear? yes

Error reading block 179309152 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455300 (...) has invalid mode (00).
Clear? yes

Error reading block 179309154 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455344 (...) has invalid mode (00).
Clear? yes

Error reading block 179309155 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455355 (...) has invalid mode (00).
Clear? yes

Inode 22455356 (...) has invalid mode (00).
Clear? yes

Error reading block 179309163 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455476 (...) has invalid mode (00).
Clear? yes

Inode 22455478 (...) has invalid mode (00).
Clear? yes

Inode 22455483 (...) has invalid mode (00).
Clear? yes

Error reading block 179309164 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455494 (...) has invalid mode (00).
Clear? yes

Error reading block 179309166 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455536 (...) has invalid mode (00).
Clear? yes

Error reading block 179309167 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455543 (...) has invalid mode (00).
Clear? yes

Inode 22455549 (...) has invalid mode (00).
Clear? yes

Inode 22455551 (...) has invalid mode (00).
Clear? yes

Error reading block 179309168 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455568 (...) has invalid mode (00).
Clear? yes

Error reading block 179309169 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455569 (...) has invalid mode (00).
Clear? yes

Error reading block 179309179 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455738 (...) has invalid mode (00).
Clear? yes

Error reading block 179309180 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455747 (...) has invalid mode (00).
Clear? yes

Error reading block 179309182 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455785 (...) has invalid mode (00).
Clear? yes

Inode 22455787 (...) has invalid mode (00).
Clear? yes

Inode 22455788 (...) has invalid mode (00).
Clear? yes

Error reading block 179309183 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22455800 (...) has invalid mode (00).
Clear? yes

Inode 22455804 (...) has invalid mode (00).
Clear? yes

Error reading block 179309196 (Input/output error).  Ignore error? yes

Force rewrite? yes

Inode 22456007 (...) has invalid mode (00).
Clear? yes

Inode 22456016 (...) has invalid mode (00).
Clear? yes

Error reading block 179309201 (Input/output error).  Ignore error? yes

Force rewrite? yes

andrew2.hart · July 29, 2024, 7:26am

Are you supplying power to the ssd properly? An ssd can sometimes draw a lot of power while writing and a pi4 can’t supply it for long intervals.

CutieePie · July 29, 2024, 7:57am

Hi Storj,

Nodes running on Rpi4 - 8GB
Database and LVMcache on SSD
Disk iscsi backed
Node size 10TB
Node is full = No Ingress for last few weeks

Node upgraded to 1.108.3 24 hours ago

Since then, memory usage has gone from 512mb for storagenode, up to 8gb and killed by OOM reaper - this is new in v1.108.3 !

I have piece-scan-on-startup set to true, and lazy file walker enabled.

looking in logs,

collector unable to delete piece

I have around 2,6million files in scope of collector - it is running constantly every hour, and it is repeatedly trying to delete the same files which don’t exist ! wasn’t this fixed where if a file didn’t exist, collector would not try again ?

I’m going to delete the database, but probably other people will be seeing this issue…

#edit:oh…that’s new as well…it doesn’t recreate the database if they are missing in this version…sqlite it is then

FATAL   Unrecoverable error     {"Process": "storagenode", "error": "Error migrating tables for database on storagenode: migrate: v26: no such table: piece_expirations\n\tstorj.io/storj/private/migrate.

CP

Alexey · July 29, 2024, 8:12am

My nodes are full too, but the memory usage is small

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
3d20fef76e67   storagenode5   0.03%     136.2MiB / 24.81GiB   0.54%     26.3GB / 5.33GB   0B / 0B     83
9fb28e5cf48a   storagenode2   19.64%    302.6MiB / 24.81GiB   1.19%     167MB / 7.22GB    0B / 0B     106

storagenode2 is currently working on gc, used-space-filewalker and collector (which is full of the same WARN messages, however it didn’t try to delete the same pieces, all unique).

more details here:

Looks like the database got corrupted?
Please check all databases and fix them:

It cannot recreate just one missing database and has never been able to do so. You need to use this guide:

CutieePie · July 29, 2024, 8:19am

Thanks Alexey

Yep, I read that - clearly a bug if every time collector runs, it try’s to delete exactly the same piece, that the previous run identified doesn’t exist - you able to raise that ?

to reproduce…

1- make sure node is running v1.108.3
2- stop node
3- delete piece_expiration.db
4- start node
5 - node crashes on startup

I’ll fix my databases by dropping all the tables on piece_expiration - looks like we have some more bloom filters being sent out, which will catch it

ah ok, i thought it was able to recreate just one missing db - my mistake then, it’s fine i’ve dropped all the rows now

ty anyway.

Alexey · July 29, 2024, 8:31am

Seems you missed the important part of the information. There is no bug, if the customer deleted an object with a TTL before the expiration, it will be moved by the garbage collector to the trash. And the TTL collector would complain that the expired object is missing. These are two different collectors and they may work in parallel.
There was an issue, that the collector might not update the databases, because it had update them only when the full loop is finished (and if you restarted the node during the process - the databases would not be updated), now it will do it in batches and update the databases more often (every 1000 pieces by default), and it’s released in the current version.

why did you delete it? Now this data would be collected by the garbage collector (2 more weeks, not when expired)…