Free space, but no uploads anymore

I am running v1.102.3 and have an issue with ingress; I suddenly have hardly any uploads while having lots of free space.

image

Log summary:

[storj]cut -d $'\t' -f 2-4 //marstorage//Storj/storage/node.log | sort | uniq -c
     20 ERROR   piecestore      download failed
     14 INFO    bandwidth       Performing bandwidth usage rollups
     14 INFO    collector       collect
   2398 INFO    collector       deleted expired piece
      1 INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully
      1 INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed
      4 INFO    lazyfilewalker.trash-cleanup-filewalker starting subprocess
      4 INFO    lazyfilewalker.trash-cleanup-filewalker subprocess finished successfully
      4 INFO    lazyfilewalker.trash-cleanup-filewalker subprocess started
      4 INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      Database started
      4 INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker completed
      4 INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started
     14 INFO    orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      finished
     14 INFO    orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      sending
     14 INFO    orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      finished
     14 INFO    orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      sending
     14 INFO    orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      finished
     14 INFO    orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      sending
      3 INFO    orders.1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE       finished
      3 INFO    orders.1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE       sending
      4 INFO    pieces:trash    emptying trash finished
      4 INFO    pieces:trash    emptying trash started
    825 INFO    piecestore      download canceled
  11041 INFO    piecestore      download started
  10201 INFO    piecestore      downloaded
     17 INFO    piecestore      upload started
     17 INFO    piecestore      uploaded
     16 INFO    reputation:service      node scores updated
      1 INFO    retain  Moved pieces to trash during retain
      2 INFO    trust   Scheduling next refresh

As you can see, 17 uploads.

The stats look ok:

My guess would be database discpenacy, but I have not seen any database error in the logs.

Does anybody have a clue what may be causing this?

Brgds, Marc

I see you have 3 GB free. The satellite now has a free space buffer increased to 5GB.
So maybe that is your issue. You could test if you increase your space so that you have more than 5 GB as free.

the minimum is 5 GiB,
see

Thanks! Obviously I did not read that. Funny though that 17 uploads slipped through. These must have been handled with different criteria.

Marc

Not necessarily, perhaps your node have had enough free space to accept them.

It is a free space buffer. The moment your node has less than 5 GB free space it will send a message to the satellite but otherwise keep accepting any incoming upload. The satellite will continue selecting the storage node for up to 5 additional minutes because of the way the satellites caches the nodes table. Now in these 5 minutes we don’t want to return error messages to the customer. So ideally the free space buffer is big enough to write pieces on disk for 5 more minutes.

3 Likes

This should be 10GB or even 50GB. SNO dosen’t get to much payout of them anyway. Why risk it to crash the node with overused?
This is of course for SNOs that live on the edge.
I deallocated 1TB on all drives. Better play safe.

I didn’t want to increase it that fast. You are correct we might have to increase it depending on how much load we will get.

After my post I received another 12 uploads, the last one many hours later. Free space was still well under 5GB all this time. Weird.

BTW, node just updated to v1.104.5 and it works great with BTRFS, going from a depressing 38.705% upload success rate to 98.666% with this new version. Also, the IOPS disk utilization went down drastically during ingress. Great job by the development team.

Brgds, Marc

1 Like