"drpc: remote closed the stream", "errorVerbose": "drpc

chain7 · March 11, 2022, 2:16pm

Hi,
I keep getting this error message in most of my nodes. Can anyone please help me with that? Node are online but log keep showing this error message. I think this might be the reason for not getting much data.

2022-03-10T17:10:04.588-0600 ERROR piecestore download failed {“Piece ID”: “4EBPYGYSZTXHPJTS2KWRX5CT3YLPONAY4SBMGHJFQA4NW4GHZGDA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “error”: “drpc: remote closed the stream”, “errorVerbose”: “drpc: remote closed the stream\n\tstorj.io/drpc/drpcstream.(*Stream).HandlePacket:202\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:211”}

Stob · March 11, 2022, 4:20pm

Hi @chain7
These messages could point to a slow node, where the response to a download request isn’t completed within the time limit. Please check you hard drive is running ok.

You can also check the response time by looking for the corresponding ‘download started’ message with the same ‘Piece ID’.

Bivvo · March 11, 2022, 4:36pm

Usually you can ignore piecestore as well as collector errors. Same for piecedeleter and “context canceled”.

Can you share more details like age, size, usage, if multiple nodes : within the same network (?) etc.

chain7 · March 11, 2022, 4:58pm

I do see download started. But where can I find the response time or how to calculate response time?

Stob · March 11, 2022, 5:18pm

From the log file, the timestamp is at the beginning of each line. So you should see something like:

2022-03-11T17:16:10.165Z|INFO|piecestore|download started|{Piece ID: NPXX3HGNHBC27JHR6SXGDAHBMRKA2P6MXHE5DFBEB5GBCZLX4MJQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET}

2022-03-11T17:16:10.554Z|INFO|piecestore|downloaded|{Piece ID: NPXX3HGNHBC27JHR6SXGDAHBMRKA2P6MXHE5DFBEB5GBCZLX4MJQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET}

So for that download request it took my node 389ms to send the data.

chain7 · March 11, 2022, 6:01pm

This is what I see:

2022-03-10T17:10:01.085-0600	INFO	piecestore	download started	{Piece ID: 4EBPYGYSZTXHPJTS2KWRX5CT3YLPONAY4SBMGHJFQA4NW4GHZGDA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: GET}
2022-03-10T17:10:04.588-0600	ERROR	piecestore	download failed	{Piece ID: 4EBPYGYSZTXHPJTS2KWRX5CT3YLPONAY4SBMGHJFQA4NW4GHZGDA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: GET, error: drpc: remote closed the stream, errorVerbose: drpc: remote closed the stream\n\tstorj.io/drpc/drpcstream.(Stream).HandlePacket:202\n\tstorj.io/drpc/drpcmanager.(Manager).manageReader:211}

chain7 · March 11, 2022, 6:09pm

I running the nodes since last 3 months and it has been online 99%. I have about 4 gb data in total and have capacity of 10 TB. I have been having similar issue with other nodes as well and all in same network.

Bivvo · March 11, 2022, 6:27pm

Most probably the reason for that is:

Your nodes are sharing the traffic within the same network.

3 seconds, right? Can you run the “successrates” script and share the result please?

chain7 · March 11, 2022, 7:46pm

Here is response after running that successrate script

[D] Do not run [R] Run once [S] Suspend [?] Help (default is “D”): R
========== AUDIT =============
Critically failed: 0
Critical Fail Rate: 0.00%
Recoverable failed: 0
Recoverable Fail Rate: 0.00%
Successful: 68
Success Rate: 100.00%
========== DOWNLOAD ==========
Failed: 2
Fail Rate: 0.07%
Canceled: 23
Cancel Rate: 0.79%
Successful: 2885
Success Rate: 99.14%
========== UPLOAD ============
Rejected: 0
Acceptance Rate: 100.00%
---------- accepted ----------
Failed: 13
Fail Rate: 0.15%
Canceled: 100
Cancel Rate: 1.15%
Successful: 8579
Success Rate: 98.70%
========== REPAIR DOWNLOAD ===
Failed: 0
Fail Rate: 0.00%
Canceled: 0
Cancel Rate: 0.00%
Successful: 0
Success Rate: 0.00%
========== REPAIR UPLOAD =====
Failed: 0
Fail Rate: 0.00%
Canceled: 1
Cancel Rate: 0.14%
Successful: 732
Success Rate: 99.86%

Bivvo · March 11, 2022, 8:02pm

“upload cancel rate” is an indication, that network performance might not be best - but almost 99% is pretty well. You should have an eye on that, as you do not have much traffic yet.

In general: the statistics look pretty fine!

I strongly advice to run an alert script, especially when you WANT to be informed about any major severe issue and therefore be able to react quickly. I’ve created a script for that - feel free to check it out and run it on your node(s).

Anyway, you should also rethink about your multiple nodes setup. It will take years to fill your 10 TB, when you run multiple nodes in the same network. You should run this one and extend storage capacity, once the 10 TB are almost full.

chain7 · March 11, 2022, 8:19pm

Thank you for your feedback. Do you mean, it is not a good idea to run multiple nodes in same location with same internet provider even if I have high internet speed over 900 mbps.

Bivvo · March 11, 2022, 8:39pm

Yes, indeed, you should have just one node in one network. You can have a look into this link to read further details on it.

Anyway: setup a monitoring which alerts you on issues and low thresholds. This will keep you informed about missed updates, any errors or fatal errors, inavailabilities etc.

chain7 · March 11, 2022, 9:13pm

Thank you. Are you referring to storage node health check up monitor mentioned above?

Bivvo · March 11, 2022, 9:25pm

Yes, this one

chain7 · March 11, 2022, 11:47pm

Do you have this script for windows as well? I am running my node on windows server.

Bivvo · March 12, 2022, 6:17am

Unfortunately no. Sorry. Not sure if it works, when you run it within a Linux VM and

share your dashboard to your local network
export your logs to a file, where you have access from the VM

These two things should be the only critical settings to be made to let it work.