Node/docker goes offline every few days

After an OK launch, after while issues started popping up. I have to revive this node every few days now. Thinking I made some setup errors - or though there where IP issues - I’ve restarted the node few times. I’m stuck in my diagnosing options, hope I can get some support here. I must’ve done something wrong.

  • I have a raspberry PI setup with USB connected disk (the same setup I had working elsewhere without issues)
  • I’ve removed the config.yaml file and reactivated the node. (so stopped the node, removed the container and repeated the launch codes)
  • restarting temporarily fixes the issue
  • last restart was yesterday
  • see below for logs

Thanks in advance for the help!

2022-08-12T01:33:33.575Z        INFO    piecestore      upload started  {"Process": "storagenode", "Piece ID": "SLBOBD3C2CPRGCNZJGKHRVVKLSRTJGOKVWON5D2GHYJXO3NAYJCA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 4253158278656}
2022-08-12T01:33:33.590Z        ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "JU6SYPII3LCXYPJ66M5S6GCV2MADVGCXWW47BJF4O6AVMCRA44SA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "pieces error: filestore error: open config/storage/temp/blob-419695670.partial: input/output error", "errorVerbose": "pieces error: filestore error: open config/storage/temp/blob-419695670.partial: input/output error\n\tstorj.io/storj/storage/filestore.(*blobStore).Create:170\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:213\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:302\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52", "Size": 0}
2022-08-12T01:33:33.610Z        ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "SLBOBD3C2CPRGCNZJGKHRVVKLSRTJGOKVWON5D2GHYJXO3NAYJCA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "pieces error: filestore error: open config/storage/temp/blob-2830581282.partial: input/output error", "errorVerbose": "pieces error: filestore error: open config/storage/temp/blob-2830581282.partial: input/output error\n\tstorj.io/storj/storage/filestore.(*blobStore).Create:170\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:213\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:302\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52", "Size": 0}
2022-08-12T01:33:33.839Z        INFO    piecestore      upload started  {"Process": "storagenode", "Piece ID": "SJ7RNC6BCOAZARRXXEDI7IDHV6R6N2RKCN7QDFKXP6YQ55EYHKLA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 4253158278656}
2022-08-12T01:33:33.840Z        ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "SJ7RNC6BCOAZARRXXEDI7IDHV6R6N2RKCN7QDFKXP6YQ55EYHKLA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "pieces error: filestore error: open config/storage/temp/blob-3710462421.partial: input/output error", "errorVerbose": "pieces error: filestore error: open config/storage/temp/blob-3710462421.partial: input/output error\n\tstorj.io/storj/storage/filestore.(*blobStore).Create:170\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:213\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:302\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52", "Size": 0}
2022-08-12T01:33:33.851Z        INFO    piecestore      upload started  {"Process": "storagenode", "Piece ID": "ERCRIIEOPX4FKIB2SFF3ACNKUL7WNQVJZKW4PJWADVJSMEW2ADLA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 4253158278656}
2022-08-12T01:33:33.851Z        ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "ERCRIIEOPX4FKIB2SFF3ACNKUL7WNQVJZKW4PJWADVJSMEW2ADLA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "pieces error: filestore error: open config/storage/temp/blob-1199158257.partial: input/output error", "errorVerbose": "pieces error: filestore error: open config/storage/temp/blob-1199158257.partial: input/output error\n\tstorj.io/storj/storage/filestore.(*blobStore).Create:170\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:213\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:302\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52", "Size": 0}
2022-08-12T01:33:34.386Z        INFO    piecestore      upload started  {"Process": "storagenode", "Piece ID": "IG3ZI5FJZPAZLMMZOKBIBMWJXG64FGBEJFBHNPZDD5L5H3RLZQXQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 4253158278656}
2022-08-12T01:33:34.387Z        ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "IG3ZI5FJZPAZLMMZOKBIBMWJXG64FGBEJFBHNPZDD5L5H3RLZQXQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "pieces error: filestore error: open config/storage/temp/blob-4233988681.partial: input/output error", "errorVerbose": "pieces error: filestore error: open config/storage/temp/blob-4233988681.partial: input/output error\n\tstorj.io/storj/storage/filestore.(*blobStore).Create:170\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:213\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:302\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52", "Size": 0}
2022-08-12T01:33:35.443Z        ERROR   services        unexpected shutdown of a runner {"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error", "errorVerbose": "piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:133\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:130\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Error: piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error
2022-08-12 01:33:35,510 INFO exited: storagenode (exit status 1; not expected)
2022-08-12 01:33:36,532 INFO spawned: 'storagenode' with pid 931
2022-08-12 01:33:36,543 WARN received SIGQUIT indicating exit request
2022-08-12 01:33:36,545 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2022-08-12T01:33:36.545Z        INFO    Got a signal from the OS: "terminated"  {"Process": "storagenode-updater"}
2022-08-12 01:33:36,551 INFO stopped: storagenode-updater (exit status 0)
2022/08/12 01:33:36 failed to check for file existence: stat config/config.yaml: input/output error
2022-08-12 01:33:36,647 INFO stopped: storagenode (exit status 1)
2022-08-12 01:33:36,653 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Then you need to perform a setup step again, otherwise the config.yaml file would not be re-created.

Did you run your node with --user $(id -u):$(id -g) since beginning or added this option only recently?
If the former - please check permissions for the storage location. If the latter - then you need to replace permissions to your user and group. And add your user to the docker group.

sudo usermod $(id -un) -aG docker
sudo chown $(id -un):$(id -gn) -R /mnt/storj/storagenode
sudo chmod ug+rw -R /mnt/storj/storagenode

After that you need to logout and login back, then run the storagenode as usual.

Yes did this, so the initial setup again, and the node-run code.

I’ve never done a run-code with the user-id yet. I noticed that change after making the template for this node, as I manage multiple.

I’m also onto something around the sata-usb connector that might be failing. Will replace it and see if that changes that issue. Will check back in a few days.

Do you have have the —restart=always set in your docker run command?

2 Likes

This one is concerns me. storagenode cannot read the protection file, so either permissions are wrong or disk is failing. Since it’s an external disk, you need to provide an external power supply. Maybe this supply is not enough.

2 Likes

Yes. So I discovered that the disk isn’t mounted - or gets dismounted in the process. I’ve switched usb/sata cable as that was the only difference between a working version of this setup.

My newer setups have indeed external power. Maybe I’ll upgrade this one earlier than planned. Let’s see how the other cable performs.

1 Like

Yes absolutely. So a restart of the PI will always start the node again. However I have some experience with usb>sata cables that will disconnect out of nowhere. USB connection often don’t restore without manual intervention (or a restart)

So to wrap this up: the node seems stable for the last few days. So it’s fair to say the usb>sata cable was the issue. It wasn’t my first suspect as I bought a fairly expensive one, compared to the one that’s installed now. Also tie-wrapped it.

Feel like a relatively stupid issue. But sometimes that happens.