Question about Success Rates

tin.can86 · August 16, 2023, 9:22pm

Hey guys, my 2nd go for this , using a SSD instead of a HDD. anyway, my current stats are
========== AUDIT =============
Critically failed: 0
Critical Fail Rate: 0.00%
Recoverable failed: 0
Recoverable Fail Rate: 0.00%
Successful: 0
Success Rate: 0.00%
========== DOWNLOAD ==========
Failed: 1
Fail Rate: 0.26%
Canceled: 5
Cancel Rate: 1.32%
Successful: 373
Success Rate: 98.42%
========== UPLOAD ============
Rejected: 0
Acceptance Rate: 100.00%
---------- accepted ----------
Failed: 114
Fail Rate: 2.00%
Canceled: 112
Cancel Rate: 1.97%
Successful: 5465
Success Rate: 96.03%
========== REPAIR DOWNLOAD ===
Failed: 0
Fail Rate: 0.00%
Canceled: 0
Cancel Rate: 0.00%
Successful: 0
Success Rate: 0.00%
========== REPAIR UPLOAD =====
Failed: 1
Fail Rate: 0.54%
Canceled: 1
Cancel Rate: 0.54%
Successful: 184
Success Rate: 98.92%

Note Failed: 114 - Any specific reason i can find out why they failed ? and so many so soon ?

Thanks in advance .

snorkel · August 16, 2023, 9:30pm

You can’t win all races even if you have M.2 NVME pcie 5.0, epyc cpu and 1TB of RAM. You can’t be closest to every client requesting pieces that you store. The success rate will tend to 100%, but never be 100%. Instead, it will decrease with the increase in data stored. Loosing races is normal.

tin.can86 · August 16, 2023, 9:34pm

Thanks ! Ive read that on here , i just wanted to double make sure as the post i was reading up on was pretty old . makes sense !

I had to switch from my HDD as it kept getting a Misconfig on the dashboard , seems my HDD has SMR and doesnt work well from what i’ve read up on .

Alexey · August 17, 2023, 8:02am

Hello @tin.can86,
Welcome to the forum!

If you mean the QUIC misconfigured, then it’s unrelated to used storage, it’s a network issue, when the satellite cannot reach your node through UDP port of your node.

snorkel · August 17, 2023, 11:00am

True, but yout quest is not over. The SSD should be a temporary solution. Buy an enterprise grade HDD CMR, and move the node there.

tin.can86 · August 17, 2023, 2:10pm

it worked fine for months, all of a sudden i got a quic error daily , and nothing changed . Will also add that the logs kept saying upload failed every 2 secs. rebooted. Worked fine for hours then check again - same issue quic issue and log says same thing. So i just figured my HDD was having issues and couldnt keep up sinse it was 100% usuage and also starting kicking in errors to repair drive. so i just used my SSD for now .

Alexey · August 18, 2023, 2:32am

I see. Seems your HDD started to struggle with the load and thus your node cannot respond on requests in time. Interesting.
What are errors related to QUIC in your logs on that time?

tin.can86 · August 18, 2023, 2:57am

I didnt look to deep into the log , wish i would’ve . Maybe ill merge back on to the HDD and see what happens. Not sure yet. Is there a command to be more specific in the log? or like a debug command?

tin.can86 · August 18, 2023, 2:59am

I did notice there was 32Gb of trash . was thinking it was alot at the time. And my HDD never settled. 100% for like 6 months st8. Seems to be working fine on my SSD with no quic errors like when i used my HDD , literally nothing changed on mynetwork . Keep an active log reader to try and catch anything spiked up .

Alexey · August 18, 2023, 3:13am

You may search for quic and/or udp:

cat /mnt/storj/storagenode/storagenode.log | grep -iE "quic|udp" | tail

replace /mnt/storj/storagenode/storagenode.log to your actual path, if you redirected logs to the file, if you did not - the past logs are gone with the container.

tin.can86 · August 18, 2023, 4:10am

not using docker
windows gui

Alexey · August 18, 2023, 4:19am

Then in the PowerShell:

cat "$env:ProgramFiles\Storj\Storage Node\storagenode.log" | sls "quic|udp" | select -last 20

tin.can86 · August 18, 2023, 2:29pm

my hdd is SMR , just looked it up, Would it be possible to Put the DB on my ssd nvme and will that help the drive not get errors and be bottled ? its only 1 TB . Noticed ovr a few threds SMR drives fail eventually anyway. Only asking cause i have no use to really store anything on the HDD as its too slow for most of my needs.

Alexey · August 19, 2023, 1:29am

It could work, however if it produces errors and/or corrupts files, then it’s likely a bad one and should be returned to the manufacturer if possible.

tin.can86 · August 19, 2023, 1:32am

Its pretty old , not sure how old, Its a st1000lm049 though, I’ll look into getting a enterprize grade eventually.

Alexey · August 19, 2023, 1:35am

The only known method to put it to work is to run the second node with own disk and identity in the same /24 subnet of public IPs (they will split the ingress and reduce a load on SMR node) or reduce the number of concurrent requests (storage2.max-concurrent-requests:) to low value (this will significantly slow down a usage of this node).

tin.can86 · August 19, 2023, 1:40am

I’ll look into that as i do have another ISP/IP besides this one im currently using for storji.

Alexey · August 19, 2023, 2:39am

No, this will not work. You need to use your current ISP and different external port, see: How to add an additional drive? - Storj Node Operator Docs

tin.can86 · August 19, 2023, 2:43am

ahhh okay , why would it split the ingress? curious as it is a seperate node

Alexey · August 19, 2023, 3:03am

We want to be decentralized as much as possible, so we select only one node from /24 subnet of public IPs for each piece of each segment, the unvetted node has a 5% chance to be selected for uploads.

Having an issue with port forwarding (YAML related?)

each new node must be vetted. The unvetted node can receive only 5% of the customers uploads until got vetted. To be vetted on one satellite, it should pass 100 audits from it. For the one node in the same /24 subnet of public IPs it should take at least a month (or longer).
We filter out nodes by /24 subnet of public IPs, all nodes behind /24 subnet of public IPs are treated as a one big node for uploads, and as a separate ones for downloads, repair egress traffic, audits and online checks: we want to be decentralized as much as possible.
For multiple vetting nodes in the same /24 subnet of public IPs the vetting could take in the same amount of times longer as an amount of such nodes.

So it’s better to start the next node only when a previous one almost full or at least vetted.