Set up a new node, seems to have stopped getting traffic and most storage deleted after 72 hours

orangepeelbeef · January 23, 2021, 1:01am

I have a node that has been up 72 hours, it seemed to be going well all the data I was holding has left. Satellites are all showing 100% uptime. Maybe I am just being impatient =)

Looking through my logs i only saw 11 failed events

3 upload failed events. 2 say ‘context deadline exceeded’ 1 says ‘unexpected EOF’
the 8 download failed events say ‘write tcp use of closed network connection’

I have tried to execute the script to check my audit, but the commands don’t seem to be working.
there is no api/dashboard, and api/sno response cannot iterate over null when attempting the wget w/ jq at least when using 1.19.6 via the docker container.

Pac · January 23, 2021, 8:35am

Hello there and welcome

What makes you believe data disappeared? I assume you checked that directly with df or similar if your dashboard is down.
Data isn’t supposed to get deleted so fast.
At worse, deletions would have been put in the trash folder as they get kept for 7 days before definitive deletion, so no data should have left your disk at all after 72 hours.

Could you show us exactly what command you’re running and what’s the resulting output?

Could you copy/paste suspicious log lines here?
(Surround them by 3 backticks ` before and after to format them correctly)

orangepeelbeef · January 23, 2021, 8:43am

2021-01-20T04:32:17.929Z ERROR piecestore download failed {"Piece ID": "TKEDEHN47YYXBOVCP4AIFQO5IMLALBR6RNSSOAW4JMROJX4WGKGQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:51364: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:51364: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-20T06:55:16.140Z ERROR piecestore download failed {"Piece ID": "53PPIIKIFWSYAGUSDXFM6IXBVNVZDWYXJOA5FIQENAJVCSG4Y66Q", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:47104: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:47104: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-20T09:11:42.397Z ERROR piecestore download failed {"Piece ID": "BVYP2R6XSGXRTKUCYPFINHRCS5AIFSUA3GUFST5BT2FTYRKR6Y2Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:38406: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:38406: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-21T01:17:54.880Z ERROR piecestore upload failed {"Piece ID": "ROW3M7UJJDPW35AMRMPMWBDLIVPNKNV6GO72K4DWM6YDW2H7Z3NQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "context deadline exceeded", "errorVerbose": "context deadline exceeded\n\tstorj.io/common/rpc/rpcstatus.Wrap:74\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:327\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:29\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

2021-01-21T02:57:03.251Z ERROR piecestore upload failed {"Piece ID": "V2U5L5NEMCFSZX5HRBFBIKSYQMY3IWUQ2ZPMX3EDEOTSEGDPVGQQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "unexpected EOF", "errorVerbose": "unexpected EOF\n\tstorj.io/common/rpc/rpcstatus.Error:82\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:325\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:29\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

2021-01-21T10:18:40.283Z ERROR piecestore download failed {"Piece ID": "CRCJQNAERCZ4DPGM3HRR6XQ3FOE5FWVYPPT2BLQMKMU5YA4M4HDQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:53834: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:53834: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-21T13:11:25.944Z ERROR piecestore download failed {"Piece ID": "DRJXHFXTYLE7YOITK4INIJO2C5BLJ7XF4K36O5YUGYCVHVFTHLAQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:44148: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:44148: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-21T18:02:54.893Z ERROR piecestore download failed {"Piece ID": "O7YAPDR5BYGGHBOKMWP5JV2OQQBKVJYJY23NB3LVN4Q7GRU6OGTQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:38182: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:38182: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-22T09:50:47.790Z ERROR piecestore download failed {"Piece ID": "ASSTCUPL6MND6ZUCSVISK246UXENOQR4S2FRVXCISNP6UQFA7MVQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET", "error": "write tcp 172.17.0.2:28967->88.198.107.240:33768: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->88.198.107.240:33768: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-22T13:45:45.400Z ERROR piecestore download failed {"Piece ID": "ANJTJLKYMG5QSV6VZUHYHTF2TMWFJNXKTZLVIZ67FTZRPNSIEK4Q", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET", "error": "write tcp 172.17.0.2:28967->144.76.136.153:58290: use of closed network connection", "errorVerbose": "write tcp 172.17.0.2:28967->144.76.136.153:58290: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:228\n\tstorj.io/drpc/drpcwire.SplitN:29\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:276\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:322\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1089\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func5.1:580\n\tstorj.io/common/rpc/rpctimeout.Run.func1:22"}

2021-01-22T17:02:57.842Z ERROR piecestore upload failed {"Piece ID": "XWV2MPOJALMONKX7WTCJ64MC7DOWCNNYSW77GLPGNWZUQAZATCJA", "Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "Action": "PUT", "error": "context deadline exceeded", "errorVerbose": "context deadline exceeded\n\tstorj.io/common/rpc/rpcstatus.Wrap:74\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:327\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:29\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

orangepeelbeef · January 23, 2021, 8:57am

I had 130GB of data for 2 days and then and it dropped to 7GB. i am trying to query the sno api but right now i’m having trouble getting a for loop and a pipe to work.

for sat in wget -qO - localhost:14002/api/sno | jq .satellites[].id -r; do wget -qO - localhost:14002/api/sno/satellite/$sat; done
-bash: syntax error near unexpected token `|'
The other code I was looking at, I think the api changed to the sno one.

Mams · January 23, 2021, 9:09am

Hi,
You forgot ’ before wget and after -r

Try for sat in wget -qO - localhost:14002/api/sno | jq .satellites[].id -r; do wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit; done

orangepeelbeef · January 23, 2021, 9:13am

I think the forum is eating some of the characters Yours looks exactly like mine

Mams · January 23, 2021, 9:17am

You are right

orangepeelbeef · January 23, 2021, 9:18am

base64 encode it

Pac · January 23, 2021, 10:05am

@orangepeelbeef & @Mams: One way to escape such commands (or display logs) on this forum is to use the triple backtick syntax.
For instance this:

would give this:

for sat in `docker exec -i storagenode wget -qO - localhost:14002/api/sno | jq .satellites[].id -r`; do
  docker exec -i storagenode wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit
done

Instead of running this, you could also check the following script out to get more info (and audit scores) in a clearer way:

Do you have successful uploads/downloads in your log or is everything failing?
Might want to check you have nothing blocking your ports, and that they are correctly redirected on your router.
These tools can be helpful for confirming your node’s port is correctly reachable from outside (but I guess it is otherwise you wouldn’t receive download requests):

Any firewall that could block outgoing connections?

orangepeelbeef · January 23, 2021, 10:11am

Pac:

for sat in `docker exec -i storagenode wget -qO - localhost:14002/api/sno | jq .satellites[].id -r`; do
  docker exec -i storagenode wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit
done

I do have successfull logs, maybe the traffic just isn’t coming. I got about 1.5G of repair yesterday and 430 M of usage.

That command produces the output

"12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"
{
  "totalCount": 72,
  "successCount": 72,
  "alpha": 19.527008,
  "beta": 0,
  "unknownAlpha": 19.527008,
  "unknownBeta": 0,
  "score": 1,
  "unknownScore": 1
}
"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"
{
  "totalCount": 0,
  "successCount": 0,
  "alpha": 1,
  "beta": 0,
  "unknownAlpha": 1,
  "unknownBeta": 0,
  "score": 1,
  "unknownScore": 1
}
"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"
{
  "totalCount": 0,
  "successCount": 0,
  "alpha": 1,
  "beta": 0,
  "unknownAlpha": 1,
  "unknownBeta": 0,
  "score": 1,
  "unknownScore": 1
}
"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"
{
  "totalCount": 0,
  "successCount": 0,
  "alpha": 1,
  "beta": 0,
  "unknownAlpha": 1,
  "unknownBeta": 0,
  "score": 1,
  "unknownScore": 1
}
"12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"
{
  "totalCount": 0,
  "successCount": 0,
  "alpha": 1,
  "beta": 0,
  "unknownAlpha": 1,
  "unknownBeta": 0,
  "score": 1,
  "unknownScore": 1
}
"12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"
{
  "totalCount": 1,
  "successCount": 1,
  "alpha": 1.95,
  "beta": 0,
  "unknownAlpha": 1.95,
  "unknownBeta": 0,
  "score": 1,
  "unknownScore": 1
}

Pac · January 23, 2021, 10:19am

For such a young node it’s not surprising to have very little activity. It takes around one month for the node to get vetted and receive full traffic.

This said, I’m a bit surprised to see you’re about to get vetted on the us2 test satellite already (12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo). But that’s probably simply because this sat’ does not manage a lot of data, vetting must be way faster on this sat’. It’s good news anyway.

So… you’re web dashboard does work?
If you can see activity (both ingress and egress) on all satellites, and if your dashboard shows “Online”, then I think you’re fine. Just be patient, when launching a new node, first weeks are very quiet. That’s expected.

orangepeelbeef · January 23, 2021, 10:21am

Ok, I have a fios connection so it’s very stable and I had a bunch of traffic and storage on the second day and then it all must have migrated elsewhere. Not really sure what are the warning signs :D. I will be patient and see how it pans out. I allocated 5TB, so hopefully it gets there.

Pac · January 23, 2021, 10:27am

@BrightSilence made a great earning estimator that you could use to have a rough idea on what to expect with a 5TB storage. Just copy their Google sheet to your account, and then fill in your numbers to see how long it could take to fill up 5TB, what is the possible ROI and so on…

Just keep in mind it’s an estimator, no one can really predict how the Tardigrade network is going to evolve

But yeah, as long as you see activity, that your node displays “Online” and that your scores displayed on the dashboard do not drop beyond 95%, you should be okay.

In doubts, don’t hesitate to browse this forum, it has many many known issues already addressed and many cool tools for us SNOs

orangepeelbeef · January 23, 2021, 10:58am

@Pac thanks, good information. Is it fairly easy to expand my storage if I have more? I can expand my LVM’s I just wasn’t sure how this was going to work out or how much to dedicate. The calculator is helpful and I realize just an estimate.

Pac · January 23, 2021, 11:20am

LVM? So one virtual disk for one node I guess?
Having one node per physical disk mitigates risks of losing everything should one disk fail.
Whereas raid solutions (or similar) coud make months of taking care of a large node out the window should the storage medium fail.
On the other hand, if the used raid solution is resilient to data loss, it means it uses more disk space than the actual amount of stored data for redundancy: that lost space could be allocated to Storj for more revenue instead. This has been debated many times on this forum and opinions differ, so… I’ll let you the judge of what suits you best

I’ll just give you my personal opinion then:

I think it would better to have the node running on a physical disk.
Then the best way to expand storage would be to start a new node on a new HDD.
Generally though I would suggest to let time for one first node to get descent amount of data before starting any other large node, and wait for it to be nearly full which is probably going to take close to 8 to 10 months anyway with your 5TB.
Then, when it is nearly full (say 90% full), you may start a new one on another disk for expanding storage dedicated to Tardigrade.

And so on.

orangepeelbeef · January 24, 2021, 12:35am

it’s LVM on top of a raid6 in a 22 bay NAS. I just use LVM so i can slice up the underlying array and not dedicate the entirety to one project. When one need goes away I can resize the partitions.

Alexey · January 24, 2021, 4:28am

Not the best solution for storagenode. It will have a concurrent requests for storage with other stuff.
I expect that your node would have a lot of canceled requests because it will loose the race for pieces to other nodes with more fast storage.
You can use this script to have an idea:

Also, keep in mind - the NFS and SMB are not supported, your node will have problems with the storage sooner or later. The smallest from the problems is broken or unavailable databases (empty dashboard as result).
See Topics tagged nfs and Topics tagged smb

orangepeelbeef · January 24, 2021, 5:01am

It’s not over NFS, though i do provide NFS shares for other partitions of data. It’s a linux host with an areca ARC-1284Ml-24MM full of 4TB disks. I have storj node running in a docker container on the host with a bind to the local drive mount.