GE вопросы, скорость выхода

Alexey · January 18, 2020, 12:19pm

Нода A запросила GE.
Она отправляет запрос на сателлит дать заказы на копирование данных. Получает список и выполняет его - подключается к ноде из заказа и передаёт кусочек. Если успешно, получает подпись от ноды назначения и передаёт сателлиту в подтверждение передачи. Сателлит меняет указатель в базе данных на новое место кусочка.
В случае если заказ выполнить не удалось - нода назначения выдала ошибку, нода A запросит новый заказ у сателлита и будет так делать до тех пор, пока не передаст этот кусочек.
То есть в процессе GE нода A выступает в роли uplink

anon68609175 · January 18, 2020, 1:29pm

Получается lock на базе это нормальное явление в GE и DQ не придет?

Alexey · January 18, 2020, 1:44pm

А причём тут lock на базе? Эта ошибка, которую вернула нода, куда пытались загрузить кусочек.

anon68609175 · January 18, 2020, 2:43pm

@Alexey, т.е. вот такие вот ошибки это проблема не моей базы, а принимающей ноды?

tiuUimbWgfATz21tuvgk3vzoA6", "error": "protocol: expected piece hash; serial number is already used: usedserialsdb error: disk I/O error\n\tstorj.io/storj/
storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/pi
ecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Metho
d.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).S
erveOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51", "errorVerbose": "protocol: expected piece 
hash; serial number is already used: usedserialsdb error: disk I/O error\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/st
orj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/
piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\
n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:14
7\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:115\n\tstorj.io/common/sync2.(*Limiter).Go.fu
nc1:41"}```

Alexey · January 18, 2020, 2:59pm

Обычно - да. Но я бы сказал точнее, если вы скопируете пару строк выше и ниже и эту ошибку полностью

anon68609175 · January 18, 2020, 3:07pm

@Alexey Вот так вот вроде полностью показывает на форуме:

2020-01-18T15:38:06.375+0100    ESC[34mINFOESC[0m       gracefulexit:chore      piece transferred to new storagenode    {"Storagenode ID": "12FADiHgiPU1c6L
oNGBF34fEQR9vuJFBXXiB5xBPyCdfgZrAvCi", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "OKLI52SHJFYB3BVXLI3NJVH3MUJJEXSI
NJKLY56FEZGFZMDXLUVQ"}
2020-01-18T15:38:06.402+0100    ESC[31mERRORESC[0m      gracefulexit:chore      failed to put piece.    {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbW
gfATz21tuvgk3vzoA6", "Piece ID": "DAQBZQ2N7C2EL4GIRULJZYQHPZ675JAWXJQLLBU2INRJXEGYQDUQ", "error": "protocol: expected piece hash; serial number is already 
used: usedserialsdb error: disk I/O error\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*En
dpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Uplo
ad:268\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*S
erver).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tr
acker).track:51", "errorVerbose": "protocol: expected piece hash; serial number is already used: usedserialsdb error: disk I/O error\n\tstorj.io/storj/stor
agenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/pieces
tore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.fu
nc1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).Serve
One:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51\n\tstorj.io/storj/storagenode/gracefulexit.(*Wor
ker).transferPiece:225\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:41"}
2020-01-18T15:38:06.402+0100    ESC[31mERRORESC[0m      gracefulexit:chore      failed to transfer piece.       {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1n
tiuUimbWgfATz21tuvgk3vzoA6", "error": "protocol: expected piece hash; serial number is already used: usedserialsdb error: disk I/O error\n\tstorj.io/storj/
storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/pi
ecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Metho
d.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).S
erveOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51", "errorVerbose": "protocol: expected piece 
hash; serial number is already used: usedserialsdb error: disk I/O error\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/st
orj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:77\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:319\n\tstorj.io/storj/storagenode/
piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\
n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:14
7\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:115\n\tstorj.io/common/sync2.(*Limiter).Go.fu
nc1:41"}
2020-01-18T15:38:06.503+0100    ESC[34mINFOESC[0m       gracefulexit:chore      piece transferred to new storagenode    {"Storagenode ID": "1hWYMPgmss1fonV
1D7PHSdaPgtbdFd6JMbnmQYQ2exa2F39CZ9", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "EZFHXM6UK4H6PLP3K7YFOEUQCTJGGKLKY
7UX2BL6L3QLPLOWD4XA"}

И новинка:

2020-01-18T15:55:55.524+0100 ESC[34mINFOESC[0m gracefulexit:chore piece transferred to new storagenode {“Storagenode ID”: “12DDEFReF9vuS1U
AYMZyGpLiXQvYj17oNurqPAEFevT7U2qucVi”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Piece ID”: “7TP6WZ3367SKNJBJOKERGUTN637ICK3O
3M5CAZVB5ECNFB7M46CQ”}
2020-01-18T15:55:55.546+0100 ESC[31mERRORESC[0m gracefulexit:chore failed to put piece. {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbW
gfATz21tuvgk3vzoA6”, “Piece ID”: “PRT5U22BMTZIFJNUJSDRTM6FWBJGTHBIJL7UTNMJQOH3IIAK56SA”, “error”: “protocol: storage node overloaded”, “errorVerbose”: “pro
tocol: storage node overloaded\n\tstorj.io/storj/uplink/piecestore.(*Upload).Write:160\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.i
o/storj/uplink/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/storj/uplink/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\ts
torj.io/common/sync2.Copy:22\n\tstorj.io/storj/uplink/ecclient.(*ecClient).PutPiece:240\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:
213\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:41”}
2020-01-18T15:55:55.546+0100 ESC[31mERRORESC[0m gracefulexit:chore failed to transfer piece. {“Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1n
tiuUimbWgfATz21tuvgk3vzoA6”, “error”: “protocol: storage node overloaded”, “errorVerbose”: “protocol: storage node overloaded\n\tstorj.io/storj/uplink/piec
estore.(*Upload).Write:160\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/storj/uplink/piecestore.(*BufferedUpload).Write:32\n\tstor
j.io/storj/uplink/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/storj/uplink/ecclie
nt.(*ecClient).PutPiece:240\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:213\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run
.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:41”}
2020-01-18T15:55:55.591+0100 ESC[34mINFOESC[0m gracefulexit:chore piece transferred to new storagenode {“Storagenode ID”: “12dEG5kbELAHnRn
uCNGrXr7LcexG5eip18b48aqZW7ahfodtRJE”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Piece ID”: “U6V7KI5LUVDZ37BCMFHXA3JSEOG6IU32
C5KMV4C745TLZB23TZ3A”}

Alexey · January 18, 2020, 3:22pm

Как видите, все ошибки относятся к получателям.

anon68609175 · January 18, 2020, 3:50pm

Хорошо, я просто увидел лок базы и понизил количество воркеров и потоков. Пойду возвращать 500 воркеров в 10 потоков

Alexey · January 18, 2020, 3:51pm

У вас со скоростью как? И с использованием полосы?

anon68609175 · January 18, 2020, 4:03pm

Гигабит, но сторж максимум 60 мбит выдает в пиках. Выход сателита на 63гб за сутки всего 10%! Это смешно как-то, я думал сторж сможет выжать 500-800Мбит/с

Alexey · January 18, 2020, 4:49pm

Он может и больше. А вот GE видимо медленная штука.
Я напишу разработчикам.

Может попробуем увеличить количество параллельных потоков? Раз канал всё равно пустой

anon68609175 · January 18, 2020, 5:34pm

Думаю такое количество должно забивать канал, но увы.

# number of concurrent transfers per graceful exit worker
graceful-exit.num-concurrent-transfers: 200

# number of workers to handle satellite exits
graceful-exit.num-workers: 200

И какой трафик выходит (стефан в обычном режиме) +/- 8 гб в за сутки вышло.

Процессор ~15-20% от одного ядра E3-1275v6, диск

DSK | sda | busy 1% | read 7 | write 222 | KiB/r 316 | KiB/w 6 | MBr/s 0.2 | MBw/s 0.2 | avq 4.15 | avio 0.58 ms |

Мой Node ID: 12kSz4gY5YDXAS8ZzPmZpczzV5C2kus7E8nuKiwm5eDb8W9CjKo

anon68609175 · January 20, 2020, 9:10pm

@Alexey and @littleskunk. The satellite seems to be blocking the exit. My node reaches a download speed of 400 Mbit + for 3-4 seconds, and then, for no apparent reason, the speed drops. No, this is not a disk problem. For the test, I put all the files in RAM and the base too;)
Tested workers count 1-1000 and parallel uploads 1-200 per woker.

anon68609175 · January 21, 2020, 6:05pm

Done! Only 4 days to upload 63 gib via 1gbit connection. super fast v3 on par with AWS

littleskunk · January 21, 2020, 8:47pm

We are deploying v0.30.5 on Stefans satellite in a minute. By default the satellite will send 100 graceful exit orders per request. On Stefans satellite increased that to be 1000. Please keep an eye on your storage nodes.

Please be careful with increase the batch size on the storage node side. 100 parallel transfers should still work. I would recommend a lower value. Basically take the lowest values that will max out your connection. If you target for too many parallel transfers some of them might fail. Remember that we do have a penalty in place for that. Be careful playing around with the options.

anon68609175 · January 21, 2020, 9:13pm

After the first attempt, I saw that none of the combinations allowed me to achieve a normal load on the channel. Even 100 parallel threads given that all the pieces and bases are in RAM. So 100 orders is critical low value. I would like to see a queue of up to 10,000 pieces and automatically add to the list when 4000-6000 pieces are passed out of 10,000

littleskunk · January 21, 2020, 9:19pm

That is too high for now. We don’t want to disqualify slow storage nodes just because they can’t transfer 10.000 pieces before the graceful exit order is getting too old.

Yes please. I can point you to the place where this feature would have to be added. At the moment I don’t think I can convice the developer team implementing it because of priorities. If someone from the community wants to take it I am happy to help.

makarich · January 21, 2020, 9:38pm

После обновления сателлита, много ошибок

beli · January 22, 2020, 11:38am

to this ip i got many failures too

Vadim · January 22, 2020, 2:23pm

Doest sattelite give node ip to what transfer piece, or it give node list for each piece where can node tranfer?