You need to search for “unable to delete piece”. This error is definitely exist, since you have it on your Grafana dashboard. As far as I know the Prometheus node exporter scans logs for errors to give them to Prometheus.
ah ok this is to high linux for me
I am seeing lots and lots of errors like these and they are all for satellite US1
2022-07-17T20:11:01.069Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “Y4K5SAIOMZPB4MNY4BGBEQA65EXUS6FSQXV5YRS3BQXYYQSL6ABQ”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2022-07-17T20:11:01.473Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “KRTK36VBCUG643L6KDJB22WH2SEUOJNGK4ZJ7GFKA5DHRLZKXPJA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2022-07-17T20:11:01.819Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “THFDCY5R5VPST6OPUKHG25HOG6ZKUYSZ5ALYLIQ3SMEHLDK6HODQ”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2022-07-17T20:11:02.148Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “G5PVCGOEH3BGDNALHYSDADRVUKKY654YC3FP7MO4ETKM6G2RMANQ”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2022-07-17T20:11:02.382Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “WOSA3WOFTX2BBJTAL6G3PEOAIFRHTRUD6DFXNEZRRIOSQMGNDSZA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2022-07-17T20:11:02.691Z ERROR collector unable to delete piece {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “RWDHN2COE6Z2G264CTW7UDXPH36JIY6BM6J5MX2GSDYNX6ZSX54A”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
Can someone help please
so basically the solution is to create empty files so they can be deleted. I have lots and lots of files which are not found. what could be the root cause of this?
The solution actually wait a week before trying a workaround. These pieces should not be mentioned in your logs anymore. If the same pieces still appear in the log after a week - then you can apply a workaround.
The reason is currently unknown. My guess it’s related to expired pieces - when the piece is expired, it got deleted by storagenode automatically, but the satellite sends a Bloom filter to remove these pieces to the trash, but since they are not exist anymore - they throws an error.
The problem that I have witnessed is that when there are way more files that throw this error. In my latest case there were around 25ish files. So every hour my log would show 25 error lines. 25x24x7=4200 lines of unnecessary log lines and related space. It will be more for multinodes.
I, personally, opt to create the empty files than wait for a week.
Fixed. created emtpy files for all the error. Now I have another problem of “bandwidthdb: database is locked”
I’m also getting those errors, it seems that it’s a recurrent problem because every once in a while they appear in groups of several files in the logs and if you dont take care they can realy spam your logs (every hour for each problematic file, ad infinitum)
The way I deal with them is creating empy files, but it should be nice if they fix the root cause.
Here is an example.
2022-03-05T01:27:42.788Z INFO piecestore upload started {"Piece ID": "C4N4EK2FR4JJRAJYXVGFK3J3P74LEIPDPANC5FWFMM5EU2OMAPHQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "Available Space": 31148600576}
2022-03-05T01:27:43.012Z INFO piecestore uploaded {"Piece ID": "C4N4EK2FR4JJRAJYXVGFK3J3P74LEIPDPANC5FWFMM5EU2OMAPHQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "Size": 145408}
2022-03-05T01:27:47.007Z INFO piecedeleter delete piece sent to trash {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "C4N4EK2FR4JJRAJYXVGFK3J3P74LEIPDPANC5FWFMM5EU2OMAPHQ"}
2022-08-26T20:38:30.722Z ERROR collector unable to delete piece {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "C4N4EK2FR4JJRAJYXVGFK3J3P74LEIPDPANC5FWFMM5EU2OMAPHQ", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-08-26T21:38:30.757Z ERROR collector unable to delete piece {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "C4N4EK2FR4JJRAJYXVGFK3J3P74LEIPDPANC5FWFMM5EU2OMAPHQ", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
In this case the collector is trying to delete a file that was sent to the trash 5 months before
hello, does anyone have a script that automates the creation of empty file of those missing files? could you please share it. i just noticed this “delete error” on my log, i see 124 unique pieces that are missing.
This is my script in PowerShell that works in Windows. You can run PowerShell on Linux & Mac too.
Edit: Make sure the log file only contains those error lines and nothing else. You can copy the error lines in another file and give the path of that file to the script.
Do you convert the filenames to lower case? I’m not fluent in Powershell, so I can’t tell. The case in filenames matters in *nix.
Here’s my bash script
#!/bin/bash
if [ "$1" == "" ];then
echo "usage $0 <storagenode_base_dir> [test]"
exit 1
fi
tempfile="/tmp/$$.log"
grep $(date "+%Y-%m-%dT%H" -d '2 hour ago') $1/node.log|grep -E "ERROR\s+collector\s+unable to delete piece"|cut -d'"' -f8,12 > $tempfile
if [ $(cat $tempfile|wc -l) -gt 0 ];then
while IFS= read -r line;do
satellite=$(echo $line|cut -d '"' -f1)
#converting piece_id from log to lower case and inserts a "/" after the 2nd character
piece_id=$(echo $line|cut -d '"' -f2|tr '[:upper:]' '[:lower:]' | sed 's:^\(.\{2\}\):\1/:')
case $satellite in
121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6)
blob=qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
;;
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S)
blob=ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
;;
12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs)
blob=v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa
;;
1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE)
blob=pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
;;
12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB)
blob=6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa
;;
12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo)
blob=arej6usf33ki2kukzd5v6xgry2tdr56g45pp3aao6llsaaaaaaaa
;;
*)
echo "Satellite $2 doesn't exist"
break 1
;;
esac
if [ -d $1/storage/blobs ];then
if [ "$2" == "test" ];then
echo "Satellite $satellite piece ID $piece_id"
else
touch $1/storage/blobs/${blob}/${piece_id}.sj1
ls -la $1/storage/blobs/${blob}/${piece_id}.sj1
fi
fi
done < $tempfile
fi
rm $tempfile
Use at your own risk. The script assumes the log is in the root folder of the node and is called "node.log. Adjust accordingly. Also adjust “2 hours ago” in the line
grep $(date "+%Y-%m-%dT%H" -d '2 hour ago') $1/node.log|grep -E "ERROR\s+collector\s+unable to delete piece"|cut -d'"' -f8,12 > $tempfile
to your offset between your local time and the time inside the container. My offset is 1h, so I use “2 hours ago” to grep the previous hour of the log.
If you use “test” as second parameter the script just prints out the pieceid’s together with the corresponding satellite.
I do not. I will make the necessary changes. Thank you for your suggestion.
Edit: Updated script to convert piece id to lower case.
what this row is realy mean?
what is *BlobsUsageCache?
2022-09-01T20:10:00.430+0300 ERROR collector unable to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “IF4PEXQLSW4PQZMQT2JTIUQYDLK6IZ6C4JAMKNDCIUIKEGRRF2JA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:226\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
it mostly acure when restarted the node
I can confirm this and maybe it helps to pinpoint the underlying issue. I see this error a lot after I have moved a node. Which is weird, because you do a full rsync which means you have a full clone of a node. But after starting the cloned node, you see numerous collector unable to delete piece
right at the beginning of the log with the same error message like:
I’m seeing them a lot as well recently. I think the timing with the move may be a coincidence. I didn’t change anything in my setup and never really saw these before.
Maybe. It is just that I remember that this was the first time I saw these errors after a recent update and just had a node moved. I got scared really scared first because I thought the move had gone wrong. But I see this every time now so I stopped to worry about it.
Yeah, file does not exist errors scare me as well. I digged deeper when I first saw them and noticed they only really happen with garbage collection (collector). I’m guessing there is simply a mismatch between creation of the bloom filter and execution of the removals. My nodes have had perfect scores during this entire time and no similar errors on other transactions. So I guess it’s nothing to worry about. It just looks scary.
Hey just tried your script and it saved me a whole bunch of time, only issue is - I had to make some tweaks to it to make it work, you’ll notice the array indexes were different for me - not sure if it’s like that for anyone else, here is the version of your script I used:
<#
.DESCRIPTION
Script to fix "unable to delete piece" error message in the log
#>
Write-Host "`t###############################################################" -ForegroundColor Cyan
Write-Host "`t Script to stop `"unable to delete piece`" error message "
Write-Host "`t###############################################################" -ForegroundColor Cyan
$satIDHT = @{
"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S" = "ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa"
"12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs" = "v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa"
"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6" = "qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa"
"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE" = "pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa"
"12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB" = "6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa"
"12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo" = "arej6usf33ki2kukzd5v6xgry2tdr56g45pp3aao6llsaaaaaaaa"
"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW" = "abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa"
}
# Format of command: <scriptname> <path to storj> <path to log file>
if ($args.Count -eq 2){
$storjPath = $args[0]
$logfile = $args[1]
}
else{
Write-Host "Enter Storj path containing `"Storage`" folder : " -ForegroundColor Magenta -NoNewline
$storjPath = Read-Host
Write-Host "Enter path containing `"log`" file : " -ForegroundColor Magenta -NoNewline
$logfile = Read-Host
}
foreach ($logline in Get-Content $logfile ){
$logarr = $logline.Split('"')
$satID = $logarr[3]
$pieceID = $logarr[7].ToLower()
$satFolder = $satIDHT[$satID]
$pieceFolder = $pieceID.Substring(0, 2)
$fileName = $pieceID.Substring(2, $pieceID.Length - 2) + ".sj1"
Set-Location -Path ($storjPath + "\blobs\" + $satFolder + "\" + $pieceFolder)
Write-Host "Enter Storj path containing `"Storage`" folder : "
Write-Host ($storjPath + "\blobs\" + $satFolder + "\" + $pieceFolder)
if (New-Item $fileName){
Write-Host "`n`t" $pieceID -ForegroundColor Yellow -NoNewline
Write-Host " was successfully FIXED!" -ForegroundColor Green
}
}
Pause
Putting it here in case anyone else tries to use it and runs into the problem, might save them some time!
Maybe it wouldn’t look that scary if that was recognized as a normal behavior and if the log line wasn’t tagged as an ‘ERROR’ but as ‘INFO’ maybe…
I feel like there are still a few events like this one in the logs that are scary ERRORS although they seem to be kinda normal and frequent. It got better, but there is still room for improvement.