Why should we be concerned? First of all similar customers have already started using the public network. Just on a smaller scale. We expect these customers to scale up over time. If storj select nodes can handle the load (thanks to hashstore) the public network should also be able to handle it. I don’t see the problem. Sure there will be some node operators complaining about hashstore. We don’t force them to migrate. Everyone can use what ever they think works best.
I could explain once more time. The hashstore backend has been designed for the Storj Select use cases, not the Storj Global (where the patterns are not predictable), also it significantly speedups simple setups without any tunings, so just Ansible/TF and you are done, they works perfect…
It can be useful for the Storj Global as well, especially for a limited not tuned systems, and may work well even on a hardware, which is not usable for anything else…
If they’re not the same scale, they aren’t a “similar customer” - as the problem exhibits at scale. And are you asking me… to remind you… of how you just reminded us… of the reason for the concern? OK, I’ll scroll up.
You described it. Then I said I “understand Storj is concerned a similar customer may start using the public nodes and hitting the same bottlenecks.” And you came back and asked why I thought there was a concern?
I think we’re talking about the same thing. Hashstore hopes to be a solution for performance issues that currently are a problem on Select, that could affect Global. But Global SNOs don’t see it because nobody is paying for the same churn today.
No, I’m talking about deferred deletions that now have to be handled by HDDs, will will be slower, outside of that niche usecase where all TTL data is grouped togethe.
So it would work fast only in that niche usecase, where entire log can be deleted at once. For all other scenarios it would be slower. And yet, that niche usecase is also handled sufficiently fast by the filesystem. So, I still don’t see why did storj spent engineering resources developing the hash store.
No, ssd has already been part of the array. What array does not have an SSD today to accelerate random IO? Come on.
Who uses arrays for storj? The recommended setup is one disk per node.
I think Storj is trying to simplify running nodes insofar as possible in order to make the entry barrier for SNOs lower.
People like me can just about manage to run a node. People like you can seriously optimise it by using your filesystem and O/S “tricks” but I dare say you’re quite a few knowledge notches above the average.
I suspect that’s why you may be frustrated that Storj seems to be aimed at “potato nodes”.
It’s interesting that hashstore is being mentioned as a fix for Select issues. Those are the facilities you’d expect to have the expertise to tune filesystems to workloads. Instead… they apparently meet the audit requirements for SOC 2… but run potato environments? And it’s the public/global SNOs that are tuning for speed (like with SSD caching layers)?
In the end I can’t complain: just do the best I can with the hardware I already own. Apparently more appropriate for Storj than the commercial providers…
From a commercial provider you get what you paid for. Easy like that…
Apperently still the wrong conclusion. Can we get back to hashstore please. Your assumptions about storj select are wrong and off topic now…
I would say it is the other way around. Piecestore works fine in some usecases and the public network might be matching that for the time beeing. The moment the customer behavior changes hashstore will show its advantages. It is not just TTL data. There are a few scenarios in which hashstore will outperform piecestore. See it more as a trade. With piecestore all maintenance jobs will get slower the more data the node is holding. There is no limit on that. If a customer goes nuts and uploads billions of small files that will kill piecestore nodes. Hashstore can handle this with a known pricetag. And I bet that pricetag will be even cheaper than adding SSDs to prevent piecestore from failling.
I wish there were select operators sharing their experiences, it would be interesting to see what kinds of setups and challenges they have.
Isn’t select just providing storage while nodes are operated by storj?
I thought Select was all vanilla vendor-run Storj nodes… just that the Satellites knew their Identities were Select… so they got their own traffic and the /24 rule was ignored. But I can’t point to a link that says that.
Unrelated to the current discussion at hand, but I just finished converting my 6TB node to hashstore. It completed a few days ago, but it left 52 piece files in the blobs directory. Any idea why it left these 52? I tried restarting the server and I do have the file walker enabled (which completes very fast now). They are all dated 9/18/2024 and are 0 bytes in size. I assume it’s safe to delete these? Perhaps orphaned pieces from an unsafe shutdown or something?
[root@storj blobs]# pwd
/mnt/storagenode/storage/blobs
[root@storj blobs]# find . -type f | wc -l
52
[root@storj blobs]# find . -type f | xargs ls -al
-rw-r--r--. 1 storj storj 0 Sep 18 2024 ./ukf<snip>/2m/venh<snip>.sj1
-rw-r--r--. 1 storj storj 0 Sep 18 2024 ./ukf<snip>/3o/f5b7<snip>.sj1
-rw-r--r--. 1 storj storj 0 Sep 18 2024 ./ukf<snip>/5q/62ag<snip>.sj1
-rw-r--r--. 1 storj storj 0 Sep 18 2024 ./ukf<snip>/5u/eixk<snip>.sj1
If they are zero size, they are definitely corrupted and cannot be migrated, you may delete them.
What are the chances of corrupting a whole 1GB Hashstore file after an unsafe shutdown?
Generally, unsafe shutdowns (powerloss/crash/etc) are prone to cause corruptions or at least in most cases metadata issues.
This is especially true for filesystems like EXT*, NTFS, ExFAT… This is in contrast to filesystems like ZFS, ReFS.. which does CoW (copy on write) which in most cases protects from these issues.
If something unfortunate happens, we lose 1GB file data coz of corruption? Then instant disqualify?
You must loose at least 4% of data to be DQed.
We already rised this problem, though. The recommendations are: use UPS.
The hashstore is too riscky to be run without one.
2 posts were split to a new topic: Failed to create storage node peer: hashstore: invalid header: