I’m using an ext4 file system. I checked my filesystem with efsck and got some pieces deleted from their original paths.
In lost+found directory many files were written. They have inode number as file name.
Those are the recovered pieces. I can’t restore them because I don’t know what was their original name. Is there any way to recover the piece name? From the piece data for example?
Because of the deleted pieces some downloads are failing.
Hello @Arrow ,
Welcome to the forum!
Likely no, but I will ask the team. The problem is not only their names, but also for which satellite they belongs. There is a folders tree in
blobs. So you need to know the original location and name of these pieces. It also means that these recovered pieces could be either with a garbage or corrupted.
The satellite ID is stored as part of the OrderLimit message inside the piece header, stored in the first 512 bytes of each file. It also stores the piece ID. So technically it should be possible to write some custom code to put pieces back into directories.
Wrote some Python code here to extract the path. I tested it on some of my piece files and couldn’t find one for which the computed file name was wrong, but obviously no warranties and your node might as well disqualify the moment you run this code
Thank you a lot @Alexey and @Toyoo. The script worked like a charm. I tested it with existing pieces and gave correct path and file name.
After running the script over the files in “lost+found” directory, I could identify the files of the pieces that caused the download errors in the log file .
Thank you guys!!!
I just run into same problem.Node restarted every 10 minutes.Then i run fsck on it.I see like 550 files in lost+found.Doing the described thing manually will last forever to finish.
My question is, what will happen, if i just turn the node on again?Probably get disqualified?
I’ve got 11261 pieces in “lost+found”. I started the node after running efsck and the Audit score was oscillating between 90% and 100%. Data on the disk occupy 1.3Terabyte.
If the Audit score drop too much (I don’t know how much) your node could be disqualified.
You can restore your files faster if you generate a shell script that will copy or move the files from the “last+found” directory to their original pathes.
You can try to modify line 77 of the python script so that the output for each file will be a copy or a move command. For example, instead of having for each file the output:
“File lost+found/#92559603 should probably go do ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/y5/a7ok7l6xcujgiudtnpo6a2bnu5lptbe35je2n5khp5isi43tqq.sj1”
you will get:
“cp your_PATH_lost+found/#100010469 PATH_TO_BLOBS/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/7w/f2xfjrqkmlooj5sbp6hougk7jjpssdl36rxfvitanfc7xsoyaq.sj1”
After modifying the script, assign the lost+found path to the variable files_to_scan (line 8 of the script)
files_to_scan = ‘your_PATH_lost+found/*’
Run the script and redirect the output to a file:
python3 script > file.sh
Open the file.sh. Its content will be for example:
“cp your_PATH_lost+found/#124399501 PATH_TO_BLOBS/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/f7/7elfswbyhvzmtfdiqpmf6nkeiejcrnpqsah2gweltaf3oywd5a.sj1”
“cp your_PATH_lost+found/#96482641 PATH_TO_BLOBS/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/f7/rqangqph5c7sijetp4pnirwsb675cnl4ma2k2ute2j4zen3lsa.sj1”
“cp your_PATH_lost+found/#96618903 PATH_TO_BLOBS/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/f7/vu4orqxzfmw7wqkw4y2yi27cl7qkik3p244oacgu75bjenw2ma.sj1”
“cp your_PATH_lost+found/#113003688 PATH_TO_BLOBS/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/gl/efl254eueb5qi42m5d4jkiuhjtnqcmougly7ybqr47ieejlwsa.sj1”
“cp your_PATH_lost+found/#96397789 PATH_TO_BLOBS/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/gl/gqqcogdpzwdzeyqaorysbnucgzlu3pmbkzixfs43rlv3c73f2q.sj1”
Insert at top the line:
Close the file and change the access permission:
chmod u+x file.sh
Run the file:
After running, all the files in “lost+found” directory are copied or moved (depending on the command chosen) to their pathes.
@Arrow I think it’s important to monitor your disk closely as it may be about to fail.
@remo Seems like you’re in a similar case and IMHO your disk should be monitored too. Turning your node ON won’t necessarily disqualify it as 500-ish files is probably not much compared to the total number of files your node must be holding (I guess - depends how old your node is). If it keeps losing pieces though, that would be concerning.
Besides, files from the lost+found directory may be incomplete or corrupted as pointed out by @Alexey so I’m not sure how relevant it is to restore them?
I guess it would at least kind of “repair” the ones that turn out to be still valids… Dunno if there is a way to check if a Storj file is valid.
There was a recent decision to aim to allow nodes that lost 2% of data to operate. However as far as I understand, it has not yet been implemented, and the current way of computing the audit score depends on luck a lot.
From my experience, though I admit I didn’t have many opportunities to pick files out of lost+found, the chance for these files being complete is fairly high. Unless the drive is almost completely dead, metadata and data failures don’t seem to be very correlated, and files end up in lost+found if their metadata is corrupted.
On my side everything seems to be ok for now.I extracted the filenames with cp command and their place in blobs directory as Arrow said, and copied them.I closely monitor that hdd for future problems, and try to move that node to another hdd, avoiding problems in the future.
No signs of malfunction for now
Thanks Toyoo for your work around this problem.This came in right on time, when problem occured at my side