"GET_REPAIR”, “error”: “used serial already exists in store”

I have 9 nodes on different networks, 5 nodes with this problem.

All in the last 24 hours

2022-04-30T14:40:30.726+0100 ERROR piecestore download failed {“Piece ID”: “VM5JOJIINPNWTURZYTDG5WTEO63S4BCCNVTQAKD5N5VBMDHSSX2Q”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET_REPAIR”, “error”: “used serial already exists in store”, “errorVerbose”: “used serial already exists in store\n[tstorj.io/storj/storagenode/piecestore/usedserials.insertSerial:263\n\tstorj.io/storj/storagenode/piecestore/usedserials.(*Table](http://tstorj.io/storj/storagenode/piecestore/usedserials.insertSerial:263\n\tstorj.io/storj/storagenode/piecestore/usedserials.(*Table)).Add:117\n[tstorj.io/storj/storagenode/piecestore.(*Endpoint](http://tstorj.io/storj/storagenode/piecestore.(*Endpoint)).verifyOrderLimit:76\n[tstorj.io/storj/storagenode/piecestore.(*Endpoint](http://tstorj.io/storj/storagenode/piecestore.(*Endpoint)).Download:498\n[tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux](http://tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux)).HandleRPC:33\n[tstorj.io/common/rpc/rpctracing.(*Handler](http://tstorj.io/common/rpc/rpctracing.(*Handler)).HandleRPC:58\n[tstorj.io/drpc/drpcserver.(*Server](http://tstorj.io/drpc/drpcserver.(*Server)).handleRPC:122\n[tstorj.io/drpc/drpcserver.(*Server](http://tstorj.io/drpc/drpcserver.(*Server)).ServeOne:66\n[tstorj.io/drpc/drpcserver.(*Server](http://tstorj.io/drpc/drpcserver.(*Server)).Serve.func2:112\n[tstorj.io/drpc/drpcctx.(*Tracker](http://tstorj.io/drpc/drpcctx.(*Tracker)).track:52”}

​ ​ ​ ​

1 Like

An earlier thread pointed to this one as where to report if we were seeing these “used serial already exists” error. I’ve had these show up on my 3 nodes, it has already suspended me on europe-north for my first node. This happened within the last 8 hours. Please advise what logs would help, currently I’m spooling them off for all nodes with the following command (adjusted for node #):

docker logs storagenode1 2>&1 | grep -E "GET_AUDIT|GET_REPAIR" | grep failed | grep 2022-05-26 > /tmp/logfailed$(date +%F).txt

Quick summary of log entries for today (EDT) on my first node:

pi@storjpi:~ $ grep "closed network connection" /tmp/logfailedn1-2022-05-26.txt | wc -l
2
pi@storjpi:~ $ grep "broken pipe" /tmp/logfailedn1-2022-05-26.txt | wc -l
3
pi@storjpi:~ $ grep "timed out" /tmp/logfailedn1-2022-05-26.txt | wc -l
78
pi@storjpi:~ $ grep "serial already exists" /tmp/logfailedn1-2022-05-26.txt | wc -l
103
pi@storjpi:~ $ cat /tmp/logfailedn1-2022-05-26.txt | wc -l
186

What is node’s version?

Same here, still occuring @thepaul :

docker logs sn1 2>&1 | grep 'LMZ25OLOVGVBOM4VKQMUPJ4H64QMNWF6RVEKI37BI62OMJZ5F3IQ'
2022-05-28T10:04:50.282Z	INFO	piecestore	download started	{"Process": "storagenode", "Piece ID": "LMZ25OLOVGVBOM4VKQMUPJ4H64QMNWF6RVEKI37BI62OMJZ5F3IQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_REPAIR"}
2022-05-28T10:05:54.047Z	INFO	piecestore	downloaded	{"Process": "storagenode", "Piece ID": "LMZ25OLOVGVBOM4VKQMUPJ4H64QMNWF6RVEKI37BI62OMJZ5F3IQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_REPAIR"}
2022-05-28T10:09:50.309Z	INFO	piecestore	download started	{"Process": "storagenode", "Piece ID": "LMZ25OLOVGVBOM4VKQMUPJ4H64QMNWF6RVEKI37BI62OMJZ5F3IQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_REPAIR"}
2022-05-28T10:09:50.310Z	ERROR	piecestore	download failed	{"Process": "storagenode", "Piece ID": "LMZ25OLOVGVBOM4VKQMUPJ4H64QMNWF6RVEKI37BI62OMJZ5F3IQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_REPAIR", "error": "used serial already exists in store", "errorVerbose": "used serial already exists in store\n\tstorj.io/storj/storagenode/piecestore/usedserials.insertSerial:263\n\tstorj.io/storj/storagenode/piecestore/usedserials.(*Table).Add:117\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:498\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}

My nodes are all on 1.55.1. For me the errors only seem to be happening on europe-north (worst offender) and a bit on ap1 and eu1. us1/us2/saltlake seem unphased as so far as I recall have been a steady 100%. I am located in the USA though, coincidence?

I’ve been getting this error too.

I’m finally back to investigating this.

Can anyone else (thank you, @Craig) provide me with rates of how often this is happening for you? The query command might look like:

docker logs storagenode 2>&1 | \
awk '$0~/used serial already exists in store/ {sub(/T.*/,"");a[$0]++} END {for(d in a){print d,a[d]}}' | sort

That will provide a per-day count. If Craig’s results are typical, this is happening a lot more than I thought.

Secondly, the latest logs from @baker and @Bivvo are very interesting, because they show the first cases I’ve noticed where the download succeeds the first time. That throws a whole different spin on things and leaves me particularly confused. I hope to get to the bottom of this soon.

Ran it on my log file which started april 15th.

2022-05-05 18

That’s it. Seems like one specific day only.

1 Like

But wow, 18 times on that one day. How strange. Did you get more traffic in general that day?

For the month of June:

Node 1 (3.55 TB, full)

2022-06-12 1
2022-06-17 1
2022-06-24 1

Node 2 (1.74 TB, full)

2022-06-21 1

I could go back further, although I am missing some logs from May so it would be an incomplete record.

1 Like

I ran this on my log files starting 2022-04-07 (I have older log files, if you need it I can run in on those as well)

2022-04-08 10
2022-04-09 19
2022-04-10 24
2022-04-11 14
2022-06-14 1

Any suggestion on how to quickly find out? I guess I could query the db’s, but I don’t want to lock the databases while the node is running.

I’m trying to think back though. I did some large data migrations around that time. It’s possible I created a fairly big IO bottleneck which may have impacted things. I feel like this odd outlier is more likely to be because of a special scenario on my end than high traffic alone.

You could tweak my command above so that it only looks for ERROR instead of used serial already exists in store. That would give a count of errors per day, which would probably go up if you had higher traffic or an unrelated local problem.

That’s running now, though that might take a while since it’s a large log. But I would also see more errors if there was indeed an IO bottleneck. Simply because of errors caused by timeouts. So it won’t really be conclusive.

Done!

2022-04-24 32
2022-04-25 29
2022-04-26 33
2022-04-27 114
2022-04-28 43
2022-04-29 37
2022-04-30 19
2022-05-01 185
2022-05-02 209
2022-05-03 102
2022-05-04 103
2022-05-05 121
2022-05-06 26
2022-05-07 11
2022-05-08 20
2022-05-09 36
2022-05-10 49
2022-05-11 140
2022-05-12 31
2022-05-13 21
2022-05-14 17
2022-05-15 32
2022-05-16 21

This does seem to be during the time I was doing data migrations. I snipped the relevant rows. The rest looks fairly normal around 20-50 a day.
Though also a recent peak.

2022-06-22 51
2022-06-23 34
2022-06-24 63
2022-06-25 210
2022-06-26 89
2022-06-27 87
2022-06-28 45

Don’t think I was doing anything special that day…

Now that I think back about it, I’m almost certain the 5th was during the migrations I was doing. I think I even saw the error and figured it’s not too big a deal since others were seeing it too and it’s probably temporary due to high load anyway. I was kind of busy at the time so didn’t really give it another thought.

I can provide the full log if that’s useful.

2022-06-19 2
2022-06-20 1
2022-06-21 4

Unfortunately logging has so much increased last weeks, that my 100m x3 only lasts for 43h … older logs are gone and I need to rethink how to store the logs in an efficient way elsewhere.

Try running that same command with the capital T changed to a colon :. That should give a count per-hour, which is still useful.

On my two bigger nodes, for june:

2.3TB node running on a weak 2.5" SMR drive:

2022-05-31 3
2022-06-01 12
2022-06-02 8
2022-06-03 8
2022-06-04 8
2022-06-05 13
2022-06-06 11
2022-06-07 8
2022-06-08 10
2022-06-09 6
2022-06-10 3
2022-06-11 2
2022-06-12 3
2022-06-13 3
2022-06-14 3
2022-06-15 6
2022-06-16 7
2022-06-17 2
2022-06-18 8
2022-06-19 7
2022-06-20 2
2022-06-21 2
2022-06-22 3
2022-06-23 3
2022-06-24 3
2022-06-25 4
2022-06-26 2
2022-06-27 3
2022-06-28 1

2.5TB node running on a kinda more performant 3.5" drive, but still SMR:

2022-06-01 1
2022-06-03 1
2022-06-07 1
2022-06-15 1
2022-06-22 1
2022-06-27 1
2022-04-09 6
2022-04-10 7
2022-04-11 7
2022-04-21 2
2022-04-22 1
2022-05-21 1
2022-06-11 1
2022-06-21 2

Grepped a year worth of logs of nodes storing ~13TB. Mostly older data though, I was out of free space and had barely any ingress for many months. Non-SMR 3.5" drives. It’s a bit curious how these events clump around 10th and 21th of these months… I do not recall anything specific that could have impacted nodes on these days.

From my 102 nodes combined :

root@server030:/disk102/storj/logs/storagenode# cat server*.log ./old/server*.log | awk '$0~/used serial already exists in store/ {sub(/T.*/,"");a[$0]++} END {for(d in a){print d,a[d]}}' | sort
2022-03-28 21
2022-03-29 2
2022-04-05 1
2022-04-06 2
2022-04-07 2
2022-04-08 2
2022-04-09 4
2022-04-10 11
2022-04-11 11
2022-04-12 7
2022-04-15 2
2022-04-20 53
2022-04-23 1
2022-04-24 5
2022-04-27 1
2022-04-28 4
2022-04-29 6
2022-04-30 2
2022-05-01 2
2022-05-02 1
2022-05-06 1
2022-05-10 9
2022-05-11 4
2022-05-12 1
2022-05-13 3
2022-05-14 5
2022-05-16 1
2022-05-19 1
2022-05-23 6
2022-05-27 5
2022-05-31 2
2022-06-02 2
2022-06-03 3
2022-06-05 1
2022-06-06 2
2022-06-10 16
2022-06-14 2
2022-06-15 3
2022-06-17 1
2022-06-19 1
2022-06-20 8
2022-06-21 8
2022-06-25 2
2022-06-27 4
2022-06-28 1

th3van.dk

1 Like