Hello!
I have a node on an RPI 3B+ with a 2TB USB HDD attached. It was running for about 8 month without any problem.
A few days ago I started to notice that uptime is just a few hours. I looked into the log file and found this:
Following this, the node restarted and since that working normally.
I checked with df -h command if there is enough free space and I have 1.1TB free out of 1.7TB.
Can you please help with any suggestion?
Thank you!
Balázs
There are two errors listed which are related. The first shows the piecestore:cache was unable to find the current amount of used space and then the same piecestore:cache process stopped.
The errors point to the node (CPU or RAM) being overwhelmed or the storage being too slow to respond. It’s hard to know exactly without more errors but it would be worth keeping a closer eye on the node.
Do you have a separate power supply for the HDD? Or is it relying on the RPi USB supplied power rails?
If the drive is running on the RPi power rails, it’s possible that the drive suddenly pulled too much power resulting in a voltage drop on the power supply and thus a reboot of the RPi.
I have a genuine RPi power supply for the RPi and a dedicated psu for the HDD. It is a 3,5" HDD.
On the other hand, I found an other error message in the log, which might be connected to this issue:
I would say this is hardware issues its always bad to see this error in any aspects of the logs. First I would try a different usb port, and if you can change the usb wire. If that doesnt help then you may have a failing drive.
The first step is to stop the storagenode and check this disk for errors with the sudo fsck -f /dev/sda1 (you need to unmount it first with sudo umount /mnt/storj or what is your system mount point).
If it’s a NTFS filesystem, then you need to check it with chkdsk /f on Windows rather on pi and think about migration from NTFS to ext4.
Thank you Alexey, it turned out that I missed the ext4 partition creation at the very beginning, so it is on NTFS.
I will check the drive first with chkdsk on a Windows machine.
Is there any easy way to convert NTFS to ext4? Or just backup-format-restore?
If there is an easy way, I might give it a try if chkdsk does not found any bad sector.
If there are any bad sectors, I will try to do a migration/copy to an other disk to run the node further…
I found a 4TB WD HDD, put it into an external case (has external psu) and attached to the RPi’s second USB port.
Now both HDDs are attached to the RPi.
Partitioned and formatted the 4TB drive, it has only one ext4 partition.
Old drive /dev/sda1 → /mnt/storage
New drive /dev/sdb1 → /mnt/storage2
The old drive has only Storj data, so can I simply copy everything from the old drive to the new one? For example with rsync command? I tried to understand rysnc parameters, but I’m not sure about the exact command to include everything…
Can you please help me with the exact command?
Run the node back with all your parameters, include changed ones (path to data and maybe to the identity folder, if it’s on the disk with data (it’s recommended to move it to there from your SD card)).
Poor old HDD has some really hard time now becouse due to the restarts the filewalker is running almost all the time, plus normal node operation, plus now I started rsync… I hope it will survive the next few days…
Rsync is runnning for 2 days already (first time)… 615GB out of 831GB copied over… I reduced the allocated space to 700GB, after 6 hours, so no ingress at the moment. Yes, it is bit quicker without ingress traffic, but still…it takes a while to properly copy everything…
Ok, so the situation is the following:
Rsync is done both with running and offline node.
I run the db integrity checks and I had to fix the bandwidth.db file. I did it, the check says it is OK.
On the other hand, I have database disk image malformed error for the piece_expiration.db.
I tried to fix it the same way as the bandwidth.db, but failed:
If you still has an original database - you can just write it over, the outdated database is better than empty.
Perhaps during the last rsync you forgot to add --delete option and your destination have had the journal files like *.db-wal and/or *.db-shm, they are exist if database was not closed properly or if you copied them on the fly (your case) and they were wrongly imported when you started a node on a new place.
Yes, you are right, I forgot the last run with the --delete option.
Now the node is running again from the old HDD, so I’m back (almost) at the starting line.
I noticed that after I reduced the node size, the node was still restarting a few times. It seems that the restart happened when it wanted to empty the trash. I also noticed that the trash is keep staying over 106.1 GB for weeks now. Sometimes it goes up to 108-109GB, then down to 106.1GB.
Is this second database error related to this?
Thank you once again for your help! I managed to fix the node, however it was quite a rollercoaster ride…
On wednesday, the node started to crash more and more often, so I decided to stop it and do some more try with rsync. Unfortunately, I guess due to the old HDD’s errors even rsync (with delete) was not able to finish properly. I decided to take the old HDD and connect it to my windows computer to run chkdsk. Even chkdsk /f was not able to run completly. It simply froze while it tried to fix various .sj1 files.
By midnight, I simply gave up, becouse when I plugged back the old drive to the RPi, the system was not even starting. I seemed I lost my node…
For two days I wasn’t at home, but on saturday evening, I decided to do one more try. I took an other sd card, installed Ubuntu and connected the new HDD. I had to fix the mounting point as there wasn’t any on the new drive, but by the end of the day: the node was running!
The trash cleared properly, I have ingress and egress and no any warnings in the log.
Yes, it could be. You need to try over an over again, until it will fix all errors.
Unfortunately Linux is unable to fix NTFS errors, so this node can stop working in any time.