And which drives exactly are in the sample size of which we are speaking?
Make and model and size?
Let’s go to the actual raw data. Backblaze helpfully posts the raw data. This is a very good thing, and I am quite impressed that they provide it.
But let’s take a peek at a single day in Nov 2019:
grep -nc "" 2019-11-19.csv
121295
Sample size = 121295-1 = 121294
drives
12 TB Seagate Drives
grep -ncE "ST12000" 2019-11-19.csv
40710
8 TB Seagate drives:
grep -ncE "ST800" 2019-11-19.csv
24223
4 TB Hitachi Drives
grep -ncE "HGST\ HMS5C404" 2019-11-19.csv
15528
Total drives of these 2 brands and 3 types:
24223 + 40710 + 15528 = 80461
Percent of total:
80461 / 121294 = 66.335 %
So, if a Storj node operator is running a 12 or 8 TB Seagate drive or a 4TB Hitachi drive, the BackBlaze data may be at least a little useful.
How many Seagate and Hitachi drives are there in the list?
grep -ncE "ST[0-9]{3}" 2019-11-19.csv
87123
grep -ncE "HGST" 2019-11-19.csv
29009
87123 + 29009 = 116132
Percent of total drives that are either Hitachi or Seagate:
116132 / 121294 = 95.744 %
Therefore, if a Storage Node operator is running a node using HD-Red drives, the BackBlaze statistics provide zero insight into possible drive realiability.