Upload "Remote Address" use docker Client public IP and not "ADDRESS=" endpoint

victorelec14 · April 22, 2023, 6:09pm

Hello, I wanted to ask if this is ok.

When uploading a file “upload file” it uses a uPnP port and uses the client endpoint with the current ip.

In my case I have it like this:

sub1.dom1.net:28281 (Public IP1) — “Redirect traffic” —> xxx.xxx.xxx.xxxx:28281 (Public IP2) → Docker StorJ node

But the endpoint does not use the external IP that was configured in Docker but directly the “Public IP2” with a random port, and that give me a lot of fails in the logs.

Thanks

Docker:

environment:
      - ADDRESS=sub1.dom1.net:28281

"Remote Address": "xxx.xxx.xxx.xx:38344"
"Remote Address": "xxx.xxx.xxx.xx:38342"
"Remote Address": "xxx.xxx.xxx.xx:38286"
"Remote Address": "xxx.xxx.xxx.xx:38348"

node3132            | 2023-04-22T17:58:03.275Z  ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "2SG7Q22HVZIO4G5VBURKXVJ56C45LQ7NKZOC6NO3CRHRDRGRUDMA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 7936, "Remote Address": "xxx.xxx.xxx.xx:38344"}
node3132            | 2023-04-22T17:58:03.292Z  ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "CKVBOWAMXZWRKP5ASVE2UZ2QEI3VCOZ4HHBRN2TRE7EGJJUIEM5Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 7936, "Remote Address": "xxx.xxx.xxx.xx:38342"}
node3132            | 2023-04-22T17:58:03.315Z  INFO    piecestore      uploaded        {"Process": "storagenode", "Piece ID": "TPFWCRN4KSZVOCUVQSJLVNJ33W7WDMAQY3CPE5NLMOLVR5YDE5CA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 36864, "Remote Address": "xxx.xxx.xxx.xx:38286"}
node3132            | 2023-04-22T17:58:03.321Z  ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "ZUXDRGSV7UXHA3LABVD5GYGZHP447EZXVYVPUMMZWFS4WQSQTH2A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 7680, "Remote Address": "xxx.xxx.xxx.xx:38348"}

func getRemoteAddr(

github.com

storj/storj/blob/1bc26e7a14ef25e9b58600240f40eb3f7302f85c/storagenode/piecestore/endpoint.go#L979


      
          	bytesPerSec := float64(transferred) / estimate.uncongestedTime.Seconds()
          
          	if bytesPerSec < float64(estimate.limit) {
          		return errs.New("speed too low, current:%v < limit:%v", bytesPerSec, estimate.limit)
          	}
          
          	return nil
          }
          
          // getRemoteAddr returns the remote address from the request context.
          func getRemoteAddr(ctx context.Context) string {
          	if transport, ok := drpcctx.Transport(ctx); ok {
          		if conn, ok := transport.(net.Conn); ok {
          			return conn.RemoteAddr().String()
          		}
          	}
          	return ""
          }

Alexey · April 23, 2023, 8:26am

If your node is online - it’s OK. It cannot win all races, some nodes are faster.
Remote addresses here are customers addresses, which connecting to your node, not yours addresses.

jammerdan · April 24, 2023, 3:20am

But I think we still have this issue:

I just have checked it by myself on one node. A piece for that a upload failed context canceled error has been thrown is sitting happily in its correct destination folder.
If the file has been successfully uploaded which seems to be the case (I mean maybe still it is not successful and it gets later moved to trash or something) then this error is rubbish and completely misleading. In that case it should not be thrown but show a successful upload. Because it is definitely not a failed upload.

Alexey · April 24, 2023, 3:45am

These are different reasons, please do not mix everything together.
In the current thread the author has a different stack trace, pointing out, that this transfer was canceled by the client, not because of the mentioned error in your excerpt.

jammerdan · April 24, 2023, 3:49am

But this is what I am trying to tell you. I had the same error like OP but the piece has been uploaded to my node. So as a result it seems that what @kaloyan has stated that the logs get polluted with wrong error messages is still present.

Alexey · April 24, 2023, 3:51am

This is interesting, do you have logs that the piece from the canceled transfer is actually uploaded? Like later it was successfully audited or downloaded?
Because here the stack-trace is different (may be due to a different versions as I started to think).

jammerdan · April 24, 2023, 3:54am

No it is a very fresh piece. In fact it is just 1 hour old or something.
But I picked it at complete random just because of the error message context canceled.

I have the upload started message, followed by upload failed with the same error stated like OP. And when I check for the piece, I can see it in the destination folder.

Maybe OP should check as well if the pieces in questions are there or not (if not deleted).

Alexey · April 24, 2023, 3:59am

By the way, the mentioned PR is merged, so this change in the current version already.

jammerdan · April 24, 2023, 4:00am

That’s why I am wondering why this behavior is still (or again) present.

I don’t know if work on this issue report will fix it as it has a download failed message as example reference.

github.com/storj/storj

remove from "error" log events with: context canceled

opened 10:25AM - 19 Apr 23 UTC

Quake4

Bug

``` 2023-04-19T13:23:50.710+0300 ERROR piecestore download failed {"Process": "…storagenode", "Piece ID": "AOGRCKZUORSF7CZZELOJ6NQME4GQ7FPJUGX6DIETVCEEOTNPBFEQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Remote Address": "5.161.203.237:51448", "error": "trust: rpc: tcp connector failed: rpc: context canceled", "errorVerbose": "trust: rpc: tcp connector failed: rpc: context canceled\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190"} ```

Basically what would be needed is to show a successful upload as successful in the logs.

victorelec14 · April 24, 2023, 9:42am

@Alexey is normal this amount of Failed upload ? or i should ignore them ? thanks

---------- accepted ----------- 
-e Failed:                63798 
-e Fail Rate:             0.000%
-e Canceled:              622 
-e Cancel Rate:           0.000%
-e Successful:            75069 
-e Success Rate:          0.000%

Full report:

sh ./successrate.sh nodex
-e ========== AUDIT ============== 
-e Critically failed:     0 
-e Critical Fail Rate:    -nan%
-e Recoverable failed:    0 
-e Recoverable Fail Rate: 0.000%
-e Successful:            7421 
-e Success Rate:          0.000%
-e ========== DOWNLOAD =========== 
-e Failed:                127 
-e Fail Rate:             0.000%
-e Canceled:              16134 
-e Cancel Rate:           0.000%
-e Successful:            18466 
-e Success Rate:          0.000%
-e ========== UPLOAD ============= 
-e Rejected:              0 
-e Acceptance Rate:       100.000%
-e ---------- accepted ----------- 
-e Failed:                63798 
-e Fail Rate:             0.000%
-e Canceled:              622 
-e Cancel Rate:           0.000%
-e Successful:            75069 
-e Success Rate:          0.000%
-e ========== REPAIR DOWNLOAD ==== 
-e Failed:                0 
-e Fail Rate:             0.000%
-e Canceled:              0 
-e Cancel Rate:           0.000%
-e Successful:            10351 
-e Success Rate:          0.000%
-e ========== REPAIR UPLOAD ====== 
-e Failed:                5341 
-e Fail Rate:             0.000%
-e Canceled:              6 
-e Cancel Rate:           0.000%
-e Successful:            20704 
-e Success Rate:          0.000%
-e ========== DELETE ============= 
-e Failed:                0 
-e Fail Rate:             -nan%
-e Successful:            15530 
-e Success Rate:          0.000%

Alexey · April 25, 2023, 3:39am

If your node doesn’t have failed audits and the storagenode reputation is OK, then node functioning normally.
Regarding failed transfers - well, customers cancel excess connections on upload or download. How is it normal, you need to check in

jammerdan · May 18, 2023, 1:33am

But it’s not normal to have a failed upload in the log but the piece has been successfully uploaded.

Alexey · May 18, 2023, 3:00am

Timings. The client can cancel exactly when the piece is uploaded, but before the confirmation has been received.
So, it will be cared by the Garbage Collector.

jammerdan · May 18, 2023, 3:10am

This happens too often to be such a coincidence.

Just in this second I have checked the logs of a randomly selected node. I checked the last log entry with upload context canceled just a couple of minutes ago.
Result: The log shows upload failed
But the piece is there in the blob folder.

So how can this be a failed upload?

To me this looks more like the code has been “optimized” for faster closing of connections, resulting in incorrect log entries.

Alexey · May 18, 2023, 3:18am

The client does not notify storagenodes when it cancel request for speed up purposes (it just forcibly closes connections instead). I know, this is inconvenient for SNO, because success rate script become almost useless, it doesn’t show the true picture, how many transfers are explicitly canceled, and how much requests were failed for other reasons.

jammerdan · May 21, 2023, 4:39am

Why aren’t such pieces not deleted right away if the upload is considered failed? The way you describe it, they are only wasting space.

Lizzy · May 22, 2023, 5:03pm

We don’t necessarily know that the upload failed, one example the computer running the uplink might have died, so it can’t kick off deletes. We could have the storage node not save data that it never sends back a signature for, but we think that there are a lot storage nodes that do successfully send back signatures even when they aren’t selected by the uplink. This is something we could improve for some edge cases, but this is a hard problem to solve in general.

jammerdan · May 22, 2023, 5:29pm

Yet it gets displayed as failed in the logs.
For me as SNO I’d consider an upload successful when the piece has been dropped in the blobs folder. If this still does not count as successfull, the piece should get deleted right away.
If the piece has not been dropped into the blobs folders, then its obvious that the upload did not succeed.
I have no other options to judge.

BrightSilence · May 22, 2023, 6:21pm

I think it’s related to what I posted here.

jammerdan · May 23, 2023, 6:42am

Yes, it seems that this is exactly the same issue.