"drpc: remote closed the stream", "errorVerbose": "drpc

Hi,
I keep getting this error message in most of my nodes. Can anyone please help me with that? Node are online but log keep showing this error message. I think this might be the reason for not getting much data.

2022-03-10T17:10:04.588-0600 ERROR piecestore download failed {“Piece ID”: “4EBPYGYSZTXHPJTS2KWRX5CT3YLPONAY4SBMGHJFQA4NW4GHZGDA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “error”: “drpc: remote closed the stream”, “errorVerbose”: “drpc: remote closed the stream\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:202\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:211”}

Hi @chain7
These messages could point to a slow node, where the response to a download request isn’t completed within the time limit. Please check you hard drive is running ok.

You can also check the response time by looking for the corresponding ‘download started’ message with the same ‘Piece ID’.

Usually you can ignore piecestore as well as collector errors. :v:t2: Same for piecedeleter and “context canceled”.

Can you share more details like age, size, usage, if multiple nodes : within the same network (?) etc.

I do see download started. But where can I find the response time or how to calculate response time?

From the log file, the timestamp is at the beginning of each line. So you should see something like:

2022-03-11T17:16:10.165Z|INFO|piecestore|download started|{Piece ID: NPXX3HGNHBC27JHR6SXGDAHBMRKA2P6MXHE5DFBEB5GBCZLX4MJQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET}

2022-03-11T17:16:10.554Z|INFO|piecestore|downloaded|{Piece ID: NPXX3HGNHBC27JHR6SXGDAHBMRKA2P6MXHE5DFBEB5GBCZLX4MJQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET}

So for that download request it took my node 389ms to send the data.

This is what I see:

2022-03-10T17:10:01.085-0600 INFO piecestore download started {Piece ID: 4EBPYGYSZTXHPJTS2KWRX5CT3YLPONAY4SBMGHJFQA4NW4GHZGDA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: GET}
2022-03-10T17:10:04.588-0600 ERROR piecestore download failed {Piece ID: 4EBPYGYSZTXHPJTS2KWRX5CT3YLPONAY4SBMGHJFQA4NW4GHZGDA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: GET, error: drpc: remote closed the stream, errorVerbose: drpc: remote closed the stream\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:202\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:211}

I running the nodes since last 3 months and it has been online 99%. I have about 4 gb data in total and have capacity of 10 TB. I have been having similar issue with other nodes as well and all in same network.

Most probably the reason for that is:

Your nodes are sharing the traffic within the same network.

3 seconds, right? Can you run the “successrates” script and share the result please?

Here is response after running that successrate script

[D] Do not run [R] Run once [S] Suspend [?] Help (default is “D”): R
========== AUDIT =============
Critically failed: 0
Critical Fail Rate: 0.00%
Recoverable failed: 0
Recoverable Fail Rate: 0.00%
Successful: 68
Success Rate: 100.00%
========== DOWNLOAD ==========
Failed: 2
Fail Rate: 0.07%
Canceled: 23
Cancel Rate: 0.79%
Successful: 2885
Success Rate: 99.14%
========== UPLOAD ============
Rejected: 0
Acceptance Rate: 100.00%
---------- accepted ----------
Failed: 13
Fail Rate: 0.15%
Canceled: 100
Cancel Rate: 1.15%
Successful: 8579
Success Rate: 98.70%
========== REPAIR DOWNLOAD ===
Failed: 0
Fail Rate: 0.00%
Canceled: 0
Cancel Rate: 0.00%
Successful: 0
Success Rate: 0.00%
========== REPAIR UPLOAD =====
Failed: 0
Fail Rate: 0.00%
Canceled: 1
Cancel Rate: 0.14%
Successful: 732
Success Rate: 99.86%

“upload cancel rate” is an indication, that network performance might not be best - but almost 99% is pretty well. You should have an eye on that, as you do not have much traffic yet.

In general: the statistics look pretty fine! :white_check_mark:

I strongly advice to run an alert script, especially when you WANT to be informed about any major severe issue and therefore be able to react quickly. I’ve created a script for that - feel free to check it out and run it on your node(s).

Anyway, you should also rethink about your multiple nodes setup. It will take years to fill your 10 TB, when you run multiple nodes in the same network. You should run this one and extend storage capacity, once the 10 TB are almost full. :v:t2:

1 Like

Thank you for your feedback. Do you mean, it is not a good idea to run multiple nodes in same location with same internet provider even if I have high internet speed over 900 mbps.

Yes, indeed, you should have just one node in one network. You can have a look into this link to read further details on it.

Anyway: setup a monitoring which alerts you on issues and low thresholds. This will keep you informed about missed updates, any errors or fatal errors, inavailabilities etc.

Thank you. Are you referring to storage node health check up monitor mentioned above?

1 Like

Yes, this one

Do you have this script for windows as well? I am running my node on windows server.

Unfortunately no. Sorry. Not sure if it works, when you run it within a Linux VM and

  1. share your dashboard to your local network

  2. export your logs to a file, where you have access from the VM

These two things should be the only critical settings to be made to let it work.