Graceful Exit (log inspection)

Let’s analyze log file and see what’s going on during GE process.

cat output.log |grep -c gracefulexit:chore
70956

70956 pieces are transferred/try to transfer during GE process.

cat output.log |grep gracefulexit:chore| grep -c failed
5963

5963 is failed during GE process

Why did it fail? I asked himself, my storage node is not loaded (controlling process with netdata), the network (at that moment) was silent. And I walk to the deep details of log… ( because the devil is always in the details :smiling_imp: )

Cut log file to GE errors only and go through the all log:
cat output.log |grep gracefulexit:chore| grep failed > errors.log

2020-05-28T05:48:17.769637997Z 2020-05-28T05:48:17.769Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “KQU5WWKHYMS6GGEGJEJSC4WIKLLONDC3C5TZQD
PQSGTHUCYQ3LTQ”, “error”: “protocol: pieces error: filestore error: truncate F:\temp\blob-655807176.partial: Espacio en disco insuficiente.", “errorVerbose”: "protocol: pieces error: filestore error: truncate F:\temp\blob-65580717
6.partial: Espacio en disco insuficiente.\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:182\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io
/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/private/piecestore.(*Client).UploadReader:76\n\tstorj.io/uplink/private/ecclient.(*ecClient).
PutPiece:216\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

It errors on the remote side and I can do nothing with it, just calculate how many…

cat errors.log | grep -c "filestore error: truncate"
154

2020-05-28T05:49:28.188261774Z 2020-05-28T05:49:28.187Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “H4FMR73ZJZOIXENHWGNYP7YLJAZFA5AY4E3EFT
ADPGLT2HM2OPOA”, “error”: “piecestore: rpccompat: dial tcp [2a0b:f4c0:58:1:ac25:857:a749:6715]:28967: connect: cannot assign requested address", “errorVerbose”: "piecestore: rpccompat: dial tcp [2a0b:f4c0:58:1:ac25:857:a749:6715]:2896
> 7: connect: cannot assign requested address\n\tstorj.io/common/rpc.Dialer.dialTransport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/up
link/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worke
r).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

IPv6 node, why satellite is differing it fo me… but let’s calculate how many.

cat errors.log | grep -c "cannot assign requested address"
480

2020-05-28T05:49:37.015766784Z 2020-05-28T05:49:37.015Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “SVWBAKXQP4PESCHATYCL43JPATSMXM2FAOJTJJ
S6TC7VKV4HHUAQ”, “error”: "protocol: usedserialsdb error: disk I/O error", “errorVerbose”: “protocol: usedserialsdb error: disk I/O error\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:182\n\tbufio.(*Writer).Flush:593\n\tbufio.
(*Writer).Write:629\n\tstorj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/upli
nk/private/piecestore.(*Client).UploadReader:76\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:216\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).R
un.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Disk I/O errors on the remote side, can’t do anything, just calculating.

cat errors.log | grep -c "disk I/O error"
104

2020-05-28T05:50:05.168852828Z 2020-05-28T05:50:05.168Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “2XPAGAXNXDWZ2ZZJCZKIRMUAHSQSA6T7YKAAITPBMUCYZQO3F4OQ”, “error”: “piecestore: rpccompat: context deadline exceeded”, “errorVerbose”: "piecestore: rpccompat: context deadline exceeded\n\tstorj.io/common/rpc.Dialer.dialTransport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43"}

Next error on the remote side, can’t do anything, calculating…

 cat errors.log | grep -c "deadline exceeded"
1912

2020-05-28T05:52:13.484607212Z 2020-05-28T05:52:13.484Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “IGFTUZXZZMFKJZQNM43AJUJYUTSIV2GJIAPC3UHGVI75DQ44PMOA”, “error”: “protocol: storage node overloaded, request limit: 2”, “errorVerbose”: “protocol: storage node overloaded, request limit: 2\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:182\n\tbufio.(*Writer).Flush:5
93\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\ts
torj.io/uplink/private/piecestore.(*Client).UploadReader:76\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:216\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit
.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Overloaded storage node on the remote side (looks like have connection limit:2), calculating…

cat errors.log | grep -c "storage node overloaded"
184

2020-05-28T05:53:12.517507175Z 2020-05-28T05:53:12.517Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “EER6GZ7D6P5Y3JB6ZSYZYYUAJ32KZRLBB3UMIG
ZQDM6LRKUQP4EQ”, “error”: “protocol: pieces error: write config/storage/temp/blob-192182437.partial: no space left on device”, “errorVerbose”: “protocol: pieces error: write config/storage/temp/blob-192182437.partial: no space left on
> device\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:182\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*Locki
ngUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/private/piecestore.(*Client).UploadReader:76\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:216\n\tstorj.io/storj/sto
ragenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

No space on the remote side, but why satellite is offering storage node with no space for me? calculating…

cat errors.log | grep -c "no space left on device"
436

2020-05-28T05:57:35.202904511Z 2020-05-28T05:57:35.202Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “EWBDOJLYYNMKTBQXTUWG5Y567DLCLKSUY5D67E
5HRULXG22SUAOA”, “error”: “piecestore: rpccompat: EOF”, “errorVerbose”: "piecestore: rpccompat: EOF\n\tstorj.io/common/rpc.Dialer.dialTransport:284\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\ts
torj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiec
e:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43"}

I did not completely understand this error, so just calculating…

cat errors.log | grep -c "rpccompat: EOF"
120

2020-05-28T05:57:49.315963180Z 2020-05-28T05:57:49.315Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “D2LSIGSO4TVUUIL3FSIIYRUPBBMXO776RSRDGM
ZYZGJLLDDNQQBA”, “error”: “protocol: pieces error: filestore error: open G:\Strong V3 interfase for WIN10\temp\blob-924928862.partial: There is not enough space on the disk.”, “errorVerbose”: "protocol: pieces error: filestore erro
> r: open G:\Strong V3 interfase for WIN10\temp\blob-924928862.partial: There is not enough space on the disk.\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:182\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tsto
rj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/private/piecestore.(*Cl
ient).UploadReader:76\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:216\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/c
ommon/sync2.(*Limiter).Go.func1:43"}

Windows storagenode that not have space, the question is the same, why satellite is offering this storagenode :slight_smile:

cat errors.log | grep -c "filestore error: open"
366

2020-05-28T06:02:47.243508871Z 2020-05-28T06:02:47.243Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “QKOEUBB6FJPOZMQX7SI5M7R7TOIHZVXC23GZG3
BIUWFKIE4RZUGA”, “error”: “protocol: expected piece hash; usedserialsdb error: database is locked”, “errorVerbose”: “protocol: expected piece hash; usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/gracefulexit.(*W
orker).transferPiece:224\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Our lovely error from our favorite topic on the remote side :slight_smile:

cat errors.log | grep -c "database is locked"
150

2020-05-28T06:16:48.336886120Z 2020-05-28T06:16:48.336Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “A3622ZWQ5C2JYUTLRI7FH7NFYTJHYBPN4HLPDK
UDDNUWALJOQEFA”, “error”: “piecestore: rpccompat: dial tcp 72.80.88.83:28967: connect: no route to host”, “errorVerbose”: “piecestore: rpccompat: dial tcp 72.80.88.83:28967: connect: no route to host\n\tstorj.io/common/rpc.Dialer.dial
Transport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/private
/ecclient.(*ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Communication issues to the Verizon side (wireless client!?)

cat errors.log | grep -c "no route to host"
162

2020-05-28T06:50:48.911313385Z 2020-05-28T06:50:48.911Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “WZSNWSRV3GE56H57223NKQ2ETU4XCJGMUV3ULF
4TS5NXA3RPGVBA”, “error”: “piecestore: rpccompat: dial tcp: lookup spinme.asuscomm.com on 192.168.83.254:53: server misbehaving”, “errorVerbose”: “piecestore: rpccompat: dial tcp: lookup spinme.asuscomm.com on 192.168.83.254:53: serve
r misbehaving\n\tstorj.io/common/rpc.Dialer.dialTransport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClie
nt).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/c
ommon/sync2.(*Limiter).Go.func1:43”}

DDNS resolving issue, but why satellite is offering name instead of IP? I remember it was reworked a long time ago.

cat errors.log | grep -c "dial tcp: lookup"
94

2020-05-28T07:00:02.723604423Z 2020-05-28T07:00:02.723Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “DED75GOCGNGYJXFMTRJTVNPR3ENA2N535NFVXJ
RZZC7BJS52BIRA”, “error”: “protocol: expected piece hash; context canceled”, “errorVerbose”: “protocol: expected piece hash; context canceled\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:224\n\tstorj.io/storj/sto
ragenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Remote side canceled request…

cat errors.log | grep -c "context canceled"
46

2020-05-28T07:10:14.789079205Z 2020-05-28T07:10:14.788Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “WB6OXPN2TOCG6APIGEOPVEYUSU5VFZA425ILYB
3EXOACL4MCZVWQ”, “error”: “piecestore: rpccompat: dial tcp 75.69.200.27:28967: connect: connection refused”, “errorVerbose”: “piecestore: rpccompat: dial tcp 75.69.200.27:28967: connect: connection refused\n\tstorj.io/common/rpc.Diale
r.dialTransport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/p
rivate/ecclient.(*ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Remote side refused connections…

 cat errors.log | grep -c "connection refused"
280

2020-05-28T08:47:48.213756026Z 2020-05-28T08:47:48.213Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “GRQLBIYDPFBUOY6TA5Y2EAU2MHQSYDNVNA3GAI
44USIFQSZK7XIA”, “error”: "protocol: expected piece hash; pieces error: filestore error: sync S:\Storjplot_S1\temp\blob-134854034.partial: The request could not be performed because of an I/O device error.", “errorVerbose”: “protoc
ol: expected piece hash; pieces error: filestore error: sync S:\Storjplot_S1\temp\blob-134854034.partial: The request could not be performed because of an I/O device error.\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).trans
ferPiece:224\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Disk trouble on the remote side…

cat errors.log | grep -c "filestore error: sync"
2

2020-05-28T08:54:21.563290721Z 2020-05-28T08:54:21.562Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “5BU2IBIYCGMBQVFNBZY7SGO4YDM4HAG6HMHYTS
5TZF7FKRNLGEZA”, “error”: “protocol: out of space”, “errorVerbose”: “protocol: out of space\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:182\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/private
/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/private/piecestore.(*Client).UploadReader:7
6\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:216\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limite
r).Go.func1:43”}

Out of space on the remote side.

 cat errors.log | grep -c "out of space"
10

2020-05-28T10:32:04.022911065Z 2020-05-28T10:32:04.022Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “VKYODNEW2BB777EQ7VHW7NMXNLLZWOQRU7CCB3
ZO5L5B6PPIMWAA”, **“error”: “piecestore: rpccompat: dial tcp *****:: i/o timeout”, “errorVerbose”: “piecestore: rpccompat: dial tcp ******:: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:264\n\tstorj.i
o/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(*ecClient).Pu
tPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

I shocked… it was my node (secured with *****:) on another location that did GE a few weeks ago on Stefan’s satellite… It absolutely epic fail on the satellite side!
It definitely should not be offered to me during GE

cat errors.log | grep -c "i/o timeout"
44

2020-05-28T11:06:27.099539573Z 2020-05-28T11:06:27.099Z ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “write tcp 192.168.83.39:8376->78.
94.240.189:7777: write: broken pipe", “errorVerbose”: "write tcp 192.168.83.39:8376->78.94.240.189:7777: write: broken pipe\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:321\n\tstorj.
io/common/pb.(*drpcSatelliteGracefulExitProcessClient).Send:1345\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:274\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limi
ter).Go.func1:43”}

Communication issues…

cat errors.log | grep -c "write: broken pipe"
1001

2020-05-28T11:38:41.658935955Z 2020-05-28T11:38:41.658Z ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: "write tcp 192.168.83.39:14448->78
.94.240.189:7777: write: connection reset by peer", “errorVerbose”: “write tcp 192.168.83.39:14448->78.94.240.189:7777: write: connection reset by peer\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*S
tream).MsgSend:321\n\tstorj.io/common/pb.(*drpcSatelliteGracefulExitProcessClient).Send:1345\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:274\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\t
storj.io/common/sync2.(*Limiter).Go.func1:43”}

The remote peer is resetting connections…

cat errors.log | grep -c "connection reset by peer"
418

2020-05-28T22:22:10.727629759Z 2020-05-28T22:22:10.727Z ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: "write tcp 192.168.83.39:50046->78
.94.240.189:7777: write: connection timed out", “errorVerbose”: “write tcp 192.168.83.39:50046->78.94.240.189:7777: write: connection timed out\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*Stream).M
sgSend:321\n\tstorj.io/common/pb.(*drpcSatelliteGracefulExitProcessClient).Send:1345\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:274\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io
/common/sync2.(*Limiter).Go.func1:43”}

Time out connections…

cat errors.log | grep -c "connection timed out"
149

So, let’s summarize this very long post. Most errors on the remote side and we can’t do anything with it. The next piece of error is a wrong offering from the satellite side and we also can’t do anything with it. Another big piece of error is routing/communication issues.

I prepare a simple table with graph:

Error type Count of errors
filestore error: truncate 154
cannot assign requested address 480
disk I/O error 104
deadline exceeded 1912
storage node overloaded 184
no space left on device 436
rpccompat: EOF 120
filestore error: open 366
database is locked 150
no route to host 162
dial tcp: lookup 94
context canceled 46
connection refused 280
filestore error: sync 2
out of space 10
i/o timeout 44
write: broken pipe 1001
connection reset by peer 418
connection timed out 149

14 Likes