Negative available space when upgrading to v1.18.1

Many posts are about this subject already, but they’re pretty old so I though the issue had been resolved in the past.

My nodes updated to v1.18.1 15h ago, and some of them (not all of them) started receiving more data even though they were full already (with ~480MB free, so they had stopped receiving data many days ago).

They stopped receiving data again eventually, but now they have 241MB, 50MB and -194MB of free space:
image

The only thing common to these 3 nodes is that they are 3 small nodes (500GB) on the same 2.5" SMR disk which took something like more than 10 hours (!) to browse all files after the update.

So I was wondering if nodes were starting receiving data again after an update, until they “realize” the disk is full after the filewalker has finished browsing all files?

The only things that come up from the past 24h of logs when running grep error are these:

2020-12-09T14:44:27.986Z        ERROR   piecestore      download failed {"Piece ID": "RO36YWHA7X25G64SR7DJBCYNXQVNSBEKATDY5EB6OQDVR5HKZV3A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "write tcp 172.17.0.3:28967->176.9.121.114:51308: use of closed network connection", "errorVerbose": "write tcp 172.17.0.3:28967->176.9.121.114:51308: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}
2020-12-09T18:04:51.471Z        ERROR   servers unexpected shutdown of a runner {"name": "debug", "error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-12-09T18:04:55.624Z        FATAL   Unrecoverable error     {"error": "debug: http: Server closed", "errorVerbose": "debug: http: Server closed\n\tstorj.io/private/debug.(*Server).Run.func2:108\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-12-09T20:38:05.492Z        ERROR   piecestore      download failed {"Piece ID": "SDYFTABGT4VKUNOPYPCXO2IDI6XFSGOK27N6EJ5B25SLMIGXM2GA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "error": "write tcp 172.17.0.8:28967->46.4.33.240:45008: use of closed network connection", "errorVerbose": "write tcp 172.17.0.8:28967->46.4.33.240:45008: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}
2020-12-09T23:55:58.137Z        ERROR   piecestore      download failed {"Piece ID": "LOH25XEPCPC6PSZYWNPUECDBGLTEEED665L432WJUTDC635CDW5A", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET", "error": "write tcp 172.17.0.8:28967->176.9.121.114:52082: use of closed network connection", "errorVerbose": "write tcp 172.17.0.8:28967->176.9.121.114:52082: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}
2020-12-10T05:38:56.633Z        ERROR   piecestore      download failed {"Piece ID": "B7K5HQP45EYNASYPNPCKZOPYV22YFQXMJTYDWBBFPXCEO2GVI4HA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "error": "write tcp 172.17.0.8:28967->46.4.33.240:48330: use of closed network connection", "errorVerbose": "write tcp 172.17.0.8:28967->46.4.33.240:48330: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

I’m wondering if there is something that should be fixed in the Storj node software… ?


Anyway, running multiple nodes (even small ones) on a weak SMR drive is obviously a very bad idea, so I’ll be migrating some nodes to elsewhere, to ease the disk a bit ^^’

1 Like

Just realized the ingress received yesterday can’t explain the extra 700MB of data that would have been supposedly received after the update:

So my initial assumption may be wrong. I’m not sure what went on. Maybe the filewalker refreshed the amount of data actually stored, which was off to begin with, before the update? :thinking:

1 Like

how interesting, my 3 full nodes all have 70mb ingress right on the upgrade

We are already working on a fix for this issue. Thank you for reporting this!

4 Likes