Unexpected shutdown of a runner

Hello. So i have strange problem. Node is offline. Checked with e2fsck, no problems found. Checked all DB’s. They are all OK.

Problem:

2024-05-02T11:42:06Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “4OLYXETP4X7Q4BAAIOMI3POKPKZTPLYAVJPKL5P7RTLVOFD47Y6Q”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET”, “Offset”: 833280, “Size”: 36608, “Remote Address”: “79.127.226.98:52404”}
2024-05-02T11:42:06Z INFO piecestore upload started {“Process”: “storagenode”, “Piece ID”: “4JG4RNELKYXDZX5Q5FQHY3STIRSLYY5PTB5Q67SUPQNUM56MPGPQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “PUT”, “Remote Address”: “79.127.213.34:39870”, “Available Space”: 180200741708}
2024-05-02T11:42:06Z ERROR piecestore upload failed {“Process”: “storagenode”, “Piece ID”: “4JG4RNELKYXDZX5Q5FQHY3STIRSLYY5PTB5Q67SUPQNUM56MPGPQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “PUT”, “Remote Address”: “79.127.213.34:39870”, “Size”: 0, “error”: “pieces error: filestore error: open config/storage/temp/blob-723450662.partial: input/output error”, “errorVerbose”: “pieces error: filestore error: open config/storage/temp/blob-723450662.partial: input/output error\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Create:198\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:240\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:394\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2024-05-02T11:42:06Z INFO piecestore upload started {“Process”: “storagenode”, “Piece ID”: “PTJEAZGTDXB3AB7GEZQG2ZKCMIEGZVE2RFIIV5QD43G3BNPBN6CQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Remote Address”: “x.x.x.x:xxxx”, “Available Space”: 180200741708}
2024-05-02T11:42:06Z ERROR piecestore upload failed {“Process”: “storagenode”, “Piece ID”: “PTJEAZGTDXB3AB7GEZQG2ZKCMIEGZVE2RFIIV5QD43G3BNPBN6CQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Remote Address”: “x.x.x.x:xxxx”, “Size”: 0, “error”: “pieces error: filestore error: open config/storage/temp/blob-1248418200.partial: input/output error”, “errorVerbose”: “pieces error: filestore error: open config/storage/temp/blob-1248418200.partial: input/output error\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Create:198\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:240\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:394\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2024-05-02T11:42:07Z ERROR services unexpected shutdown of a runner {“Process”: “storagenode”, “name”: “piecestore:monitor”, “error”: “piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error”, “errorVerbose”: “piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:160\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}
2024-05-02T11:42:07Z ERROR failure during run {“Process”: “storagenode”, “error”: “piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error”, “errorVerbose”: “piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:160\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}
Error: piecestore monitor: error verifying location and/or readability of storage directory: open config/storage/storage-dir-verification: input/output error
2024-05-02 11:42:07,190 INFO exited: storagenode (exit status 1; not expected)
2024-05-02 11:42:08,215 INFO spawned: ‘storagenode’ with pid 1394
2024-05-02 11:42:08,217 WARN received SIGQUIT indicating exit request
2024-05-02 11:42:08,219 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-05-02T11:42:08Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode-updater”}
2024-05-02 11:42:08,230 INFO stopped: storagenode-updater (exit status 0)
2024-05-02 11:42:08,231 INFO stopped: storagenode (terminated by SIGTERM)
2024-05-02 11:42:08,233 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

How is your HDD connected ?

external enclosure. But i see the drive, check is passed, looks like all OK.

No changes to mount points or anything like that?

Try to check the drive for errors. fsck for linux and chkdsk for Windows.

Happy Cake day @ACarneiro :birthday:

2 Likes

Hahahaha! Thank you! I had to go check what that was about, it got me confused :wink:

3 Likes

No changes to mount point.
Disk is already checked with e2fsck many times, no problems found.
Also checked DB’s. 100% db’s are OK.

2024-05-17 12:13:26,106 INFO stopped: storagenode-updater (exit status 0)
2024-05-17 12:13:26,108 INFO stopped: storagenode (terminated by SIGTERM)
2024-05-17 12:13:26,110 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
2024-05-17 12:13:27,893 INFO RPC interface ‘supervisor’ initialized
2024-05-17 12:13:27,893 INFO supervisord started with pid 1
2024-05-17 12:13:28,899 INFO spawned: ‘processes-exit-eventlistener’ with pid 12
2024-05-17 12:13:28,905 INFO spawned: ‘storagenode’ with pid 13
2024-05-17 12:13:28,910 INFO spawned: ‘storagenode-updater’ with pid 14
2024/05/17 12:13:28 failed to check for file existence: stat config/config.yaml: input/output error
2024/05/17 12:13:28 failed to check for file existence: stat config/config.yaml: input/output error
2024-05-17 12:13:28,947 INFO exited: storagenode-updater (exit status 1; not expected)
2024-05-17 12:13:28,948 INFO exited: storagenode (exit status 1; not expected)
2024-05-17 12:13:29,950 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-17 12:13:29,955 INFO spawned: ‘storagenode’ with pid 25
2024-05-17 12:13:29,962 INFO spawned: ‘storagenode-updater’ with pid 26
2024/05/17 12:13:29 failed to check for file existence: stat config/config.yaml: input/output error
2024/05/17 12:13:29 failed to check for file existence: stat config/config.yaml: input/output error
2024-05-17 12:13:29,993 INFO exited: storagenode-updater (exit status 1; not expected)
2024-05-17 12:13:29,997 INFO exited: storagenode (exit status 1; not expected)
2024-05-17 12:13:32,007 INFO spawned: ‘storagenode’ with pid 37
2024-05-17 12:13:32,013 INFO spawned: ‘storagenode-updater’ with pid 38
2024/05/17 12:13:32 failed to check for file existence: stat config/config.yaml: input/output error
2024-05-17 12:13:32,045 INFO exited: storagenode-updater (exit status 1; not expected)
2024/05/17 12:13:32 failed to check for file existence: stat config/config.yaml: input/output error
2024-05-17 12:13:32,048 INFO exited: storagenode (exit status 1; not expected)
2024-05-17 12:13:35,059 INFO spawned: ‘storagenode’ with pid 49
2024-05-17 12:13:35,065 INFO spawned: ‘storagenode-updater’ with pid 50
2024/05/17 12:13:35 failed to check for file existence: stat config/config.yaml: input/output error
2024-05-17 12:13:35,093 INFO exited: storagenode-updater (exit status 1; not expected)
2024/05/17 12:13:35 failed to check for file existence: stat config/config.yaml: input/output error
2024-05-17 12:13:35,098 INFO gave up: storagenode-updater entered FATAL state, too many start retries too quickly
2024-05-17 12:13:35,104 INFO exited: storagenode (exit status 1; not expected)
2024-05-17 12:13:35,105 INFO gave up: storagenode entered FATAL state, too many start retries too quickly
2024-05-17 12:13:35,105 WARN received SIGQUIT indicating exit request
2024-05-17 12:13:35,105 INFO waiting for processes-exit-eventlistener to die
2024-05-17 12:13:37,111 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Just changed enclosure, did not helped.

2 posts were merged into an existing topic: Fatal Error on my Node

No problems:

sudo dd if=/dev/zero of=samplefile bs=1M count=100

[sudo] password for user:
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.201332 s, 521 MB/s

sudo dd if=/dev/zero of=samplefile bs=1M count=500

500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 14.4728 s, 36.2 MB/s

Another strange thing is that sometimes when i umount drive and try to run e2fsck. it says:

udo e2fsck -f /dev/sdb1
e2fsck 1.46.5 (30-Dec-2021)
/dev/sdb1 is in use.
e2fsck: Cannot continue, aborting.

How the drive can be in use, if it’s not mounted at all?

Did the umount command fail? It should work without any extra flags (like ‘-l’ for lazy–unmount). Fails can easily happen if something else is touching the disk: like NFS is sharing it or something.

umount command went with no problems.
drive is unmounted, but e2fsck, says “/dev/sdb1 is in use.” But there is no sdb1 at that time. only sdb.

‘’’
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 5.5T 0 disk
└─sda1 8:1 0 5.5T 0 part /home/user/storj/1
sdb 8:16 0 2.7T 0 disk
└─sdb1 8:17 0 2.7T 0 part
sdc 8:32 0 14.6T 0 disk
└─sdc1 8:33 0 14.6T 0 part /home/user/storj/3
mtdblock0 31:0 0 16M 0 disk
zram0 254:0 0 7.7G 0 disk [SWAP]
zram1 254:1 0 200M 0 disk /var/log
nvme0n1 259:0 0 476.9G 0 disk
├─nvme0n1p1 259:1 0 1G 0 part /boot
└─nvme0n1p2 259:2 0 475.9G 0 part /var/log.hdd
/
user@user:~ $ sudo e2fsck -f /dev/sdb1
e2fsck 1.46.5 (30-Dec-2021)
/dev/sdb1 is in use.
e2fsck: Cannot continue, aborting.
‘’’

It’s exist

You need to unmount /dev/sdb1 and run fsck again.
Check mounts:

df --si

Also make sure that your user for docker has permissions to the data location.
If you use --user $(id -u):$(id -g) in your docker run command, then you need to make sure that this user is the owner for the data location.

What’s filesystem by the way?

This is ext4.
But the problem is that this node was running for years. Could the permissions change them selves?

By saying sdb1 not exist, i mean, that it is not mounted. So how can it be busy?

Unlikely. However, you could change the docker run command to have a --user $(id -u):$(id -g) option, where you did not use it before.
If so, you need to update the owner to the user $(id -u):$(id -g) for all data.

Either some hardware/kernel issue on your OS or it’s mounted.
Please show the result of the command

sudo df --si -T

Here are test results:

sudo e2fsck -f /dev/sdb1
[sudo] password for user:
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb1: 9811707/183148544 files (2.2% non-contiguous), 563077932/732566129 blocks

Here is your command:

sudo df --si -T
Filesystem Type Size Used Avail Use% Mounted on
tmpfs tmpfs 1.7G 161M 1.5G 10% /run
/dev/nvme0n1p2 ext4 503G 9.0G 489G 2% /
tmpfs tmpfs 8.3G 82k 8.3G 1% /dev/shm
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 8.3G 8.2k 8.3G 1% /tmp
/dev/nvme0n1p1 vfat 1.1G 122M 951M 12% /boot
/dev/zram1 ext4 197M 87M 96M 48% /var/log
/dev/sdc1 ext4 16T 12T 3.5T 78% /home/user/storj/3
/dev/sda1 ext4 6.0T 4.9T 852G 85% /home/user/storj/8
tmpfs tmpfs 1.7G 0 1.7G 0% /run/user/1001
overlay overlay 503G 9.0G 489G 2% /var/lib/docker/overlay2/0032c0db0bc37ef9829b57388a66dfb3eb71ab797ae734e8c16b2a4a7ef7bc84/merged
overlay overlay 503G 9.0G 489G 2% /var/lib/docker/overlay2/94d1a000a10490b3c2d33cb99638e675d8d692560ae96d3537b1616689f233d0/merged
overlay overlay 503G 9.0G 489G 2% /var/lib/docker/overlay2/8e1427e2316a16765b698ad3e8a1198bfb66eb728e8facd19375ab2de6ae4bae/merged
overlay overlay 503G 9.0G 489G 2% /var/lib/docker/overlay2/f7328fb207fa6425b88af749f3f83dc3f066f4842aa1c1840ab369b40aa8a43a/merged
overlay overlay 503G 9.0G 489G 2% /var/lib/docker/overlay2/fea52654354b6aa8b2d891a79a6cb107e434aa8cdb69eefb1ab60e0d9ed1f145/merged
/dev/sdb1 ext4 3.0T 2.3T 545G 81% /home/user/storj/j5

ta-dam. It’s mounted.

because i just mounted few sec ago, after e2fsck. It was me, manually.