My node is restarting again and again

Hi there,
I operate a node from 1 month without any problem but my node gone offline since yesterday for an unknown reason.

the command to run my docker is

docker run -d --restart unless-stopped --stop-timeout 300     -p 28967:28967/tcp     -p 28967:28967/udp     -p 14002:14002     -e WALLET="0x..."     -e EMAIL="xxx"     -e ADDRESS="82.xxx.yyy.zzz:28967"     -e STORAGE="3TB"     --user $(id -u):$(id -g)     --mount type=bind,source="/root/.local/share/storj/identity/storagenode",destination=/app/identity     --mount type=bind,source="/data/storj-storage-dir",destination=/app/config     --name storagenode storjlabs/storagenode:latest

logs said this :slight_smile:

/usr/bin/python2: error while loading shared libraries: /lib/x86_64-linux-gnu/libutil.so.1: cannot read file data: Input/output error
/usr/bin/python2: error while loading shared libraries: /lib/x86_64-linux-gnu/libutil.so.1: cannot read file data: Input/output error
/usr/bin/python2: error while loading shared libraries: /lib/x86_64-linux-gnu/libutil.so.1: cannot read file data: Input/output error
/usr/bin/python2: error while loading shared libraries: /lib/x86_64-linux-gnu/libutil.so.1: cannot read file data: Input/output error
/usr/bin/python2: error while loading shared libraries: /lib/x86_64-linux-gnu/libutil.so.1: cannot read file data: Input/output error

I have tried to fix malformed database with no success
Is there anyone who had encountered the same behaviour ?

Thanks in advance for your help and your time

How is your hard drive connected (SATA, USB) ?

Hello @3918f510e3ee52bba3ea,
Welcome to the forum!

In which logs? If journalctl, then you need to fix your OS - firstly check and fix errors on the system drive, then reinstall everything corrupted or reinstall/reflash the OS. Make sure that you copied/moved identity to the disk with data out of the system drive.

What do you see in the node’s logs?

docker logs --tail 20 storagenode

Hi,

My drives are SATA ones

Hi Alexey,

Thank you.

These error are displayed when I check logs in the docker storagenode with this same command line

Already restarted the whole system? Looks like the drive failed or your lost connection. Although even the connection might already been restored, if the whole docker process hasn’t been restarted the problem can still persist. Easiest possibly solution is just restarting the whole system, preferably cold (so just poweroff, wait 10secs, disconnect/reconnect USB drives if you work with those, and then restart the system).

If that doesn’t work, check file system for errors. Mount it elsewhere and check whether you can access the files by hand. If so, than check for database errors: https://support.storj.io/hc/en-us/articles/360029309111-How-to-fix-a-database-disk-image-is-malformed-

But essentially it most looks like a file system error.

Hi and Thank you for your help

The whole system has already restarted several times, and I already followed instructions about fixing malformed database. I have deleted the container with docker rm and recreated by running again the command updates docker run -d… without more better result

Are all nodes running on the same computer / operating system?

If Linux (debian-based), any errors in dmesg?

If not, you can stop all docker storagenodes and rm them all docker stop (...) && docker rm (...). Then docker rmi storjlabs/storagenode (if that doesn’t work, then use image ID. See also: docker rmi | Docker Docs) and then restart, in order to force docker to redownload the image.

But essentially I quite doubt it’s docker- or storj-related. I/O-errors are usually file system problems (in this case, apparently the system file system of the docker image pr even your host file system).
You could even try to get into the docker container, and check for yourself if the files stated above really exist: docker exec -it storagenode /bin/bash (might be time sensitive, due to restarts).

This is very weird. Because we do not use python in storagenode. Try to stop and remove the container, delete an image and pull it again:

docker stop -t 300 storagenode
docker rm storagenode
docker rmi storjlabs/storagenode:latest
docker pull storjlabs/storagenode:latest

Then try to run it back using the full docker run command with all your parameters.
By the way, do you use a docker compose?

hi,
thx for your interest. My Host is a centos
dmesg said

[168150.023244] docker0: port 2(veth12cd752) entered blocking state
[168150.023259] docker0: port 2(veth12cd752) entered disabled state
[168150.023356] device veth12cd752 entered promiscuous mode
[168150.023511] IPv6: ADDRCONF(NETDEV_UP): veth12cd752: link is not ready
[168150.023605] docker0: port 2(veth12cd752) entered blocking state
[168150.023613] docker0: port 2(veth12cd752) entered forwarding state
[168150.031721] docker0: port 2(veth12cd752) entered disabled state
[168151.638414] eth0: renamed from vethbc45d89
[168151.645018] IPv6: ADDRCONF(NETDEV_CHANGE): veth12cd752: link becomes ready
[168151.645105] docker0: port 2(veth12cd752) entered blocking state
[168151.645113] docker0: port 2(veth12cd752) entered forwarding state
[168156.677003] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168156.677017] ata6.00: irq_stat 0x40000001
[168156.677024] ata6.00: failed command: READ DMA
[168156.677035] ata6.00: cmd c8/00:08:88:2c:00/00:00:00:00:00/e7 tag 10 dma 4096 in
res 51/40:08:88:2c:00/00:00:07:00:00/e7 Emask 0x9 (media error)
[168156.677048] ata6.00: status: { DRDY ERR }
[168156.677053] ata6.00: error: { UNC }
[168156.680204] ata6.00: configured for UDMA/100
[168156.680240] sd 5:0:0:0: [sde] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=4s
[168156.680251] sd 5:0:0:0: [sde] tag#10 Sense Key : Medium Error [current]
[168156.680259] sd 5:0:0:0: [sde] tag#10 Add. Sense: Unrecovered read error - auto reallocate failed
[168156.680269] sd 5:0:0:0: [sde] tag#10 CDB: Read(10) 28 00 07 00 2c 88 00 00 08 00
[168156.680279] blk_update_request: I/O error, dev sde, sector 117451912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[168156.680337] ata6: EH complete
[168157.377210] vethbc45d89: renamed from eth0
[168157.396754] docker0: port 2(veth12cd752) entered disabled state
[168157.476782] docker0: port 2(veth12cd752) entered disabled state
[168157.480223] device veth12cd752 left promiscuous mode
[168157.480248] docker0: port 2(veth12cd752) entered disabled state

A disk is dying (the system one), and a second one too but it is in raid0.

no. Should I do?

you have saved my node but data seems to be always here.

Does it work now?
If your system disk is dying you likely need to reinstall every single packet for the OS to allow the relocator to place binaries and configs to the healthy sectors. But it’s a cat and mouse game, you never know, when your system will stop to boot.

Meanwhile I insist to copy/move identity from the system drive to the disk with data and use this path in your docker run command.

Do you really meant RAID0? With one disk failure the whole volume will be lost.

1 Like

Hi there,
my node doesn’t run so well and I’m working on identifying packets to reinstall. I will switch from centos to debian. I will do a better install of storj by the way.
Apologize me for the fat finger, i meant RAID1. New disks are just arrived at home and I will soonly reinstall my NAS.

RAID1 without autocorrection coming with ZFS or BTRFS doesn’t help much - mdraid do not know which data is corrupted, and which - not, meaning that it will mirror it as is and you will have a corrupted data on both drives.