Could you please give logs for the period before and after?
Funny you should mention that.
Yesterday one of my nodes (which was previous full and with a lot of trash) purged its trash data, and so had free space again again (about 2TB).
It wouldnât start accepting ingress data until I restarted it (I waited about 30 minutes to see if it had to âtellâ the satellites something).
Other nodes didnât exhibit this behaviour after trash deletion.
No idea what happened.
Out of interest, was your offending node in a similar circumstance as this one?
EDIT: Crap, I just noticed how old this thread was. Sorry!
The default check-in interval is 1hâŚ
However, the restart should work too. The node will try to check-in immediately.
Iâm not sure if I should continue this thread or I should start a new topic, but my issue seems rather similar. Ingress just stopped, although Iâm constantly online, there is plenty of space left in the disk and the internet speed is excellent. In fact not only ingress is too low, but disk usage keeps falling. To make things worse, that didnât even happen in just one node, but in several nodes hosted in the same location.
I think there is no point analyzing all of them, as it seems to be similar for each one, as the problems started all together. Worse online time I see 99.67%, so that canât be the problem. As I remember it used to be using around 5TB, increasing about 20GB/hour. Then increase stopped and started reducing. Now itâs using 4,3TB. For these first 13 hours of September, I only have 8.43GB ingress. Success rates look good in my eyesâŚ
========== AUDIT =============
Critically failed: 0
Critical Fail Rate: 0,00%
Recoverable failed: 0
Recoverable Fail Rate: 0,00%
Successful: 506
Success Rate: 100,00%
========== DOWNLOAD ==========
Failed: 20
Fail Rate: 0,14%
Canceled: 239
Cancel Rate: 1,72%
Successful: 13649
Success Rate: 98,14%
========== UPLOAD ============
Rejected: 0
Acceptance Rate: 100,00%
---------- accepted ----------
Failed: 141
Fail Rate: 0,70%
Canceled: 14
Cancel Rate: 0,07%
Successful: 20118
Success Rate: 99,24%
========== REPAIR DOWNLOAD ===
Failed: 0
Fail Rate: 0,00%
Canceled: 0
Cancel Rate: 0,00%
Successful: 26
Success Rate: 100,00%
========== REPAIR UPLOAD =====
Failed: 0
Fail Rate: 0,00%
Canceled: 0
Cancel Rate: 0,00%
Successful: 590
Success Rate: 100,00%
Same yesterday on one of my nodes. Free space was around 1 TB.
Several restarts did not help. Only help was to increase assigned space, right after that it started to accept uploads again.
Can you see how fast it receives uploads? Because it seems to receive some uploads, but the download rate seems to be rather higher.
I donât know if itâs somehow connected, but in the logs I have plenty of
WARN collector unable to delete piece âerrorâ: âpieces error: filestore error: file does not existâ
How many per seconds or minute, you mean?
My logs for this node arenât good at the moment as I am rejecting everyting but 1.
But from the rate of rejections I can see like 1-5 rejected uploads per second second.
The test data was completed and the ingress fell 98% for me. Today on September 1st, after 14 hours I have about 32 Gb ingress.
I mean whatâs an approximate ingress rate for your node. For example, for the current month, mine shows 10GB ingress. Considering itâs the 1st of September and the time approximately 15:30, turns out to be 0.65GB/hour. Thatâs very low for no apparent reason, but also used space is actually reducing.
Just to make things clear, vetting completed a month ago.
This node has only around 8GB ingress today so far. But as said I have limited the number of uploads so it is expected to be low.
This is exactly the real ingress before test. Maybe little more than past average. So⌠be happy!
This is normal. We are not running any testing on Salt Lake servers right now. It is also a holiday weekend in the US so customers are likely not in the office doing work either. Possibly data will pick up early next week.
Yup: this is 100% authentic home-grown grass-fed antibiotic-free free-range ingress⌠no artificial test data. Data thatâs much more likely to stick around more than 30 days. The good stuff!
It just feels bad to watch that slow consistent climb in used-space get totally overwhelmed by expired TTL for more than a week now. But weâre getting quality over quantity!
This is need to be checked. The node reporting the calculated allocated and used space to the satellites, and if it believes that itâs full, it would report so to the satellites and there would be no ingress.
So, please make sure that your node has a free space accordingly your dashboard and that it matches the usage reported by the OS. If not, you need to enable the scan on startup if you disabled it (itâs enabled by default) and restart the node. When the used-space-filewalker would finish scan for all trusted satellites it should update the usage in the databases.
You may also search for "less than allocated"
message in your logs.
Well, the node is using the space of a 9TB vhdx. I have set the node to use 6TB for now.
Yesterday, it was using 3.75TB, 2.07 free, 173GB trash.
Today, itâs using 3.17TB, 2.66TB free, 173GB trash.
Windows VM reports 5.08TiB out of 8.18TiB free, 3.38TB used (which approximately matches what the node reports, 3.19TB+0.17TB=3.36TB).
I guess that rules out any available space problem.
Scan on startup is default, never touched it.
I searched for the world âallocatedâ in the logs and it found nothing.
However, what makes me suspicious is that I see a lot of lines like this:
collector unable to delete piece {âSatellite IDâ: â1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGEâ, âPiece IDâ: âHDMVDGZGL7YM6COOIDW2RVDIAFGKFZ5NV7S7U7XEHLB6OI6FNRPQâ, âerrorâ: âpieces error: filestore error: file does not existâ, âerrorVerboseâ: âpieces error: filestore error: file does not exist\n[tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124](http://tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124)\n[tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:340](http://tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:340)\n[tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).DeleteWithStorageFormat:320](http://tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).DeleteWithStorageFormat:320)\n[tstorj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0:359](http://tstorj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0:359)\n[tstorj.io/storj/storagenode/collector.(*Service).Collect:112](http://tstorj.io/storj/storagenode/collector.(*Service).Collect:112)\n[tstorj.io/storj/storagenode/collector.(*Service).Run.func1:68](http://tstorj.io/storj/storagenode/collector.(*Service).Run.func1:68)\n[tstorj.io/common/sync2.(*Cycle).Run:99](http://tstorj.io/common/sync2.(*Cycle).Run:99)\n[tstorj.io/storj/storagenode/collector.(*Service).Run:64](http://tstorj.io/storj/storagenode/collector.(*Service).Run:64)\n[tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87](http://tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87)\n\truntime/pprof.Do:44\n[tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86](http://tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86)\n[tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)â}
Iâm thinking if there is a chance that these files were deleted in the past, but they were not reported that were actually deleted? Because, as I see, it never tries to delete the same file again. Although, that doesnât explain the non-existent ingress, as there is plenty of space.
By the way, they are all saltlake.tardigrade.io:7777.
Update:
I also found a lot of these, which are all us1.storj.io:7777:
2024-09-03T12:45:18+03:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-09-03T12:45:18+03:00 WARN retain failed to trash piece {âcachePathâ: âC:\Program Files\Storj\Storage Node/retainâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âPiece IDâ: âDYJPKYB7UVBPZIV6E6YEDHPM534YFZ325H76LH6GIA6CV5OREZIAâ, âerrorâ: âpieces error: filestore error: file does not existâ, âerrorVerboseâ: âpieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:340\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).TrashWithStorageFormat:406\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:422\n\tstorj.io/storj/storagenode/retain.(*Service).trash:428\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces.func1:387\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).processTrashPiece:112\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).writeLine:99\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).Write:78\n\tio.copyBuffer:429\n\tio.Copy:386\n\tos/exec.(*Cmd).writerDescriptor.func1:560\n\tos/exec.(*Cmd).Start.func2:717â}
And then I saw stuff like these:
INFO collector expired pieces collection completed {âcountâ: 6576}
Average count approximately 7000 pieces. So, as I understand every hour collector finds approximately 7000 expired pieces. Could that be normal?
Hi Christos, I donât think these log lines are related to the issue youâre facing. They typically appear when a piece has expired but was garbage collected earlier and therefore does not exist anymore. This is a common occurrence and shouldnât be a cause for concern.
If the WARN comes from the TTL collector or from the retain process, then the piece likely was either moved by a retain to the trash, or the collector was able to delete the expired piece before the retain was able to move it to the trash (the case depends on from where you took this WARN).
And it would be a completely different story, if you got it from GET_AUDIT or GET_REPAIR, in that case it will affect the audit score.
Seems the second case, the TTL collector was faster. In that case it can be ignored.
So, @edo is correct, itâs not related to the issue.
Do you have a suspension score and online score higher than 60%?
Looks like a TTL collector removes the expired pieces.
Well, yes, servers report online 99,89, 99,56 99,89 and 100,00.
So, are you actually saying that itâs normal that I had 5TB used and now just 2TB? I mean, it was filling up like crazy and a fine day it was like, âok, you know what? I was wrong, remove that data from thereâ. Whatâs more, I see itâs not only just one node, but at least 3 of my oldest ones were affected. Small nodes, so, not so painful, but still. Good example that, 2 full 1TB disks, the first one all of a sudden have 220GB usage and the second one 210 free, 630 trash (!).
The only thing that I thought about is that maybe by enabling my white, static IP and turning off the multiple IPs, Storj system decided to relocated old data. But still, this doesnât explain why my big disk was emptied, as it started working only after I got the new IP.