While i read the wiki myself, the difference between caching, tiered cache, and tiering (HSM)
is now clear, StoreMI is an (propably read only) caching solution, not HSM/tiering.
As explained here.
Because it’s not a tiered storage solution. It’s a caching solution. Now you know.
AMD StoreMI technology has been rebuilt from the ground up with a new algorithm that makes it safe and simple to use. Now, a StoreMI configuration simply mirrors your most-used files to an SSD of your choosing, leaving the original copy intact
holy shit! So they bought a perfectly working product and killed it, replaced with a shitty cache, but kept the name. what the duck, AMD?!
Anyway, FuzeDrive and AMD StoreMI for years after acquisition was tiered storage, and I stand by what I said. Use the older version then, before this idiotic switcheroo.
Well then create a new product. Not just swap the core functionality and keep the name.
SSDs fails much more seldom than HDD. I don’t buy that excuse. Apple used this approach for years extremely successfully, as a reference point for you.
there are gazillions of users. Some users do stupid shit. Some users post about it. You will always find vocal minority complaining about issues with any solution. Google searches prove nothing but that the platform is popular enough.
The same way if you read this forum it looks like everyone is hosting node on windows, has dns issues, and has file-walker running for weeks. When in reality it’s just the same three people all the time. Everyone else who have no issues, don’t post.
Anecdotally, I’ve used Fusion Drive for over a decade across many systems and there is really no magic assembling and disassembling the volume. Two my Macs had seagate drive replaced. Creating the Fusion Drive is literally two well documented subcommands of diskutil. Most of the time OS installer does it for you automatically even. But this is beside the point.
I don’t know about famous, but Apple choosing this approach for their customer data is strong voice in support of tiered storage in general.
Cache is worse in every respect (with very few exceptions that don’t apply to vast number of scenarios)
Storj already has a lever to control the incentive to nudge customers to store data in bigger chunks: segment fee. It definitely drove my behavior to adjust the chunk size in my backup program, for what it is worth.
It’s basically the same as jumping to a specific position on a video. The player translates your jump to 1m42s to x amount of bytes in the video and asks storj for a byte range starting from x + whatever to cache some amount. (Also preloading works in the same way).
In the case of compressed blobs you would ask the sattelite for a file and the sattelite knows that file is in blob x from byte y to byte z and these are in chunks u to w.
I planed to to some tests with containers similar to the compressed blob idea for some time, but didn’t find the time yet. Might be useful for hosting websites, because you pack the whole site into a blob to get closer (or over, depends on your site) to the 60MB recommended size.
Those are what we call assumptions. But fair enough, you added a question mark. I have nodes in several locations, but yes, also some subnets in the same location. I’m small time though, there are node operators with hundreds of subnets in the same location, hence why an inflation factor of 1.2 is clearly impossible.
I would be okay with you trying to convince me OR other people either way. I’d just like to know what exactly I’m responding to and analogies I can’t follow don’t help with that. As you have probably noticed, I enjoy the debate.
Planes and cars don’t share the majority of the underlying tech though. So your analogy really doesn’t work. It’s more like selling computers and selling computers with a monitor.
Do you have data on this that I don’t? I wish we had this info from Storj, but as far as I’m aware, they have never published their edge service bandwidth costs.
Yeah, I read his posts and I appreciate that he probably told us everything he could say about this, but it isn’t helpful as it still doesn’t really specify proportions, leaving us guessing.
Thanks, that gave me a good chuckle.
If you’d do it without compressing and just by appending, they could store file offsets in metadata and it could be done performantly. But with quite a bit of coding overhead and additional metadata costs.
I don’t think file system level efficiency is the benchmark anyway when you already have to deal with a remote object storage.
With the downside that you increase the failure surface of the storage and lose data when either drive fails. I’ve tried the mentioned solutions and wasn’t really happy with the stability and performance of either PrimoCache or StoreMI. I’ve moved to SSD only storage for those systems now. (They aren’t used for Storj)
Kind of the result of “use what you have”. Though in general I would say windows systems tend to not be the types of systems best fit for Storj anyway. They tend to be more power hungry hardware. But if that’s what you already have online 24/7, it doesn’t really matter as the power is already being used.
They may not be the most profitable customers. But they help a lot in mind share. I don’t think you want to ditch a large part of the customer base.
I meant to respond to this earlier, but Storj doesn’t have VC money. Just token reserves. As far as I’m aware.
Those two statements are in contradiction. Frequently used blocks start out as infrequently used blocks two. Until they are used frequently enough, they are on HDD and would be slow with tiering as well. I generally don’t find the tradeoff between having just a little more space, but now needing both drives to survive to keep your data safe, worth it. But I guess it depends on your use case. This is kind of nice for game storage, where you can just download the games again anyway. But these days you really want your games on all SSD storage anyway.
I think the usefulness of customer tiering solutions has kind of disappeared with todays SSD prices.
Pretty sure they stopped doing this as well though. But I didn’t check.
Problem is, they may take all their business to another store and badmouth your establishment to their friends and family. Loss leaders are a thing, they can be useful. (Also, I don’t think we have determined edge services are in fact a loss leader.)
Token Balance and Flow Reports are NOT Company Balance sheets, they do not show all expenses incurred by Storj Labs, only what is related to STORJ token movements. Please note that Storj Labs has voluntarily shared more details than should be expected from a private company.
We have already explained repeatedly what the ¨other¨ category on our Token Balance and Flow reports entails. Once again, we exhort you to abstain from making remarks or accusations implying that somehow Storj is involved in illegal activities. If you continue to do so, you will find yourself suspended from this forum. Please take this warning seriously. If you have any actual evidence to prove any of your accusations, please post it to support your claims, otherwise abstain from further such statements.
This is just delegating part of local node’s filesystem job to satellite, resting unnecessary dependencies, and obscuring data access making it harder for local filesystem to optimize usage.
And on the node you are having potentially an extra metadata lookup and seek to the right place in the file, so you are saving nothing.
In the end you have increased load on a sattelite, increased complexity by entangling unrelated data together and in the best case maybe slightly reduced already minimal load on storagenode filesystem.
If customer needs to fetch specific data, the local solution would be the fastest -I.e. let filesystem do its job, and not disrupt caching and other optimizations by hiding complexity inside opaque blobs.
Video content provider in your example are in fact splitting video files into small chunks to improve performance, the opposite of what you are suggesting.
Filesystem still stores data in sectors, so by increasing blob size you are likely adding another metadata lookup and another seek, completely defeating the point: Instead of seek to the file → read data you now have seek the file, open fie, seek to the middle, read data. And in the process obscuring the access, so filesystem can’t optimize anything – maybe there was a faster way to reach those bytes if they were in the separate file (just one example – very small files can be living right next to metadata on an SSD), but now it can’t employ it.
And also yes, delegating job of the local filesystem to the cloud is indeed dubious proposition.
It does not matter. SSDs almost never fail, compared to HDD. And if they were – the probability is still too small. And then 2x larger HDD has 2x more sectors to fail anyway.
They are not. We are talking about caching metadata. All metadata is more or less localized. When few files metadata is read, that block is moved to SSD, thus accelerating access to metadata for a bunch more files. Very shortly all metadata ends up on SSD and for vast majority of files the first access is fast access, because metadata block got cached thanks for the other file fetches. The slightly smarter solution can get hints from the filesystem and proactively place metadata block on the ssd in the first place. ZFS is an extreme version of this – you literally say: here is an SSD, put all metadata right here form the get go.
This is exemplified by the plots I posted in the other thread of IOPS load on disks when running file walker on 10TB node: there is very sharp (4kIOPS) spike in the beginning for a few seconds, and then the load rapidly reduces for the subsequent 10 minutes of file walker run. That’s metadata block caching in action.
Nobody ships Macs with HDD anymore :), this is no longer a problem that needing solving.
Since we are talking about Macs – have you noticed that low end low storage Macs have about half of storage throughput and IOPS compared to the next larger storage ones? That’s because in the rest of Macs, except the very cheap ones, storage is configured in an equivalent of RAID0 – to double performance. This is on topic of worrying about effect of adding another SSD on the reliability of the storage.
In my mind, if apple is doing something for important data for millions of customers, I can afford to do that for a storagenode
The delegation happens in any way, since the node doesn’t get a full file. So the work is already done on the satellite. Also storj already allows HTTP Range requests, so I can already retrieve part of a file if I want to.
The idea would be to get closer to the recommended segment size of 60MB. This would mostly benefit the customer as he would not pay for so many segments. If we are taking a website as an example you’ll have lots of really small files 10kb or smaller, some images which are just around 300kb. The small files may even stay on a satellite in meta data already. Packing it all together into one file you would maybe have 20 MB, this can be properly distributed to the nodes.
As for video providers, the splitting has indirectly to do with performance. It is also done to optimize the bitrate to the connection speed you have. In case your connection suddenly gets slow the player can decide to load some parts with a lower bitrate such that your playback doesn’t get disrupted.
The blob would not contradict with this. One thing is how the data is stores, another how the data is retrieved. I can still retrieve chunks of video, but the data can still be stored as a blob. As a user you won’t notice.
the only thing I can think of that this would help with is moving the node to another drive - rsync would be faster.
I know of two IPTV systems, one is home-built by a STB manufacturer and the other is flussonic. Both systems serve HLS, so the STB gets the video as a series of ~6 second segments. The home-built system stores the 6 second segments as separate files (easier to code) and flussonic stores hour long segments with metadata files that are a few bytes.
The longer segments are much, much better. Let’s say the server is running out of disk space and I want to delete the recordings of one channel to free some space. Doing that on the server with lots of tiny files makes the server slow down to a crawl and takes hours to delete, say, 200GB - better done at night, while stopping the access to the server to reduce the load on the drives and running multiple deletes in parallel. Deleting a channel on the flussonic server is very fast and does not slow the server down.
You can use Duplicati, Duplicacy, restic, just rclone or uplink to upload a compressed zfs snapshots for example (TrueNAS).
It’s not difficult to use existing bindings, but requires either use a different tool, which already have a native integration (like mentioned above), or implement your own.
In case of S3 you likely do not need to code anything, just change a few config values.
My nodes are inconsequential in the large scheme of things. But we have seen people posting about having hundreds of nodes on different subnets running on a handful of servers, likely in the same location. We’ve also seen drops of 300 nodes going down for maintenance simultaneously. Not a problem with the current RS settings, but any new settings need to also be resilient to that.
I don’t think we’re splitting hairs about the analogy, but I’m glad we’re back on discussing Storj directly. It’s quite difficult to be honest. Full native requires you to integrate the Storj library into your implementation. That library is basically only available in Golang. There are libraries for other languages, but they tend to just be a wrapper around the Golang. Alternatively, you could host your own local S3 gateway ST. This is much simpler, since it doesn’t require you to change existing code, but you do have an additional component to manage. I personally think this is likely a solution that many people would go for if differentiated pricing is introduced. But it depends on your architecture whether this is easy to implement or not. Local S3 gateways could also resolve the issue of native implementations not being browser compatible. I could technically try setting that up again for my NAS, but why would I bother currently when there is a hosted gateway available for the same price? Also, I remember running into some issues with it when I tried a while back, so… I gave up on it.
Sure, but you have to search for the ones that are native here: Guides to Using Third-Party Tools - Storj Docs
The doc unfortunately doesn’t differentiate in the list. I think less than half of what’s listed is native. The rest uses S3. Which makes the list weirdly incomplete, because lots of software supports custom S3 end points. Many more than what’s listed there at least. But without price differentiation, customers don’t really care about what they use.
That said, software that supports custom S3 end points should just as easily be able to use local gateways. Storj could publish ready made docker images and even installers for this gateway to make that setup a lot easier as well. But without differentiated prices… why bother?
I’d like to point you to:
And point out that you admitted yourself that you don’t have the information to substantiate that claim. When we’re in the dark, lets just leave it at that and admit we don’t know. I hope you’re wrong, but I can’t know either. If anyone from Storj wants to comment on whether edge services users are profitable, I’d love to hear that. Until then, I’m not making any assumptions.
Fair enough. Just wanted to mention it in case someone else read it and drew the wrong conclusions. The rest of your point there was valid.
Well, those are the right questions. (Though I have to point out, in the context of Storj, that is still an analogy. )
It’s just that we don’t have the answers.
Is Storj running at a loss for unit economics when customers use edge services?
Does Storj have enough native customers to make up for that?
Does offering edge service help bring a healthy mix of edge services and native customers on board to grow profitable unit economics overall?
I simply don’t know. I hope so.
Even with the token flow reports that’s all we can do.
Ehh, I don’t know. There are plenty much less transparent companies with a healthy community. Those communities just don’t involve themselves into trying to figure out the companies financials. In a way, we can be pretty invasive and demanding around those topics. I welcome any transparency already offered happily. Doesn’t mean I will stop asking for more. It’s just from a position of appreciation of what’s already offered and a wish for more.
I think your use of the word ‘shady’ may have triggered that response. Though that is quite a negative interpretation of what you said. And I sure didn’t interpret your words that way. I also don’t really think it’s shady to keep financials of a private company private. It’s just unclear to us what it means.
There are trade offs for sure. But it might be worth the tradeoff, since it will increase the speed of initial writes and all filewalker instances including garbage collection. Which needs to run past all files, even if the majority of files is infrequently accessed by customers. Still think it’s worth looking into whether that tradeoff helps or hurts.
That’s simply not true. SSD annual failure rates are just below 1%. HDD just over 1%. Yes, they fail less, but not significantly so to not consider this.
ZFS is far from the only solution that allows you to pin all metadata to cache. Most services I used for SSD caching do this by default even.
That’s one way to look at it. The other is, it isn’t worth the tradeoff anymore. There are 24TB HDD’s right now. If SSD caching or tiering was really such a good solution, it would make sense to still offer larger FusionDrive’s. But the performance just isn’t up to snuff for todays use cases anymore.
Also, Apple isn’t the be all end all of good decisions. They’ve made mistakes plenty of times. Just because they do something doesn’t mean it’s the best choice.
Isn’t that always the way? People care about security until it becomes inconvenient to do so. But I mean, what better way to promote it than by saying, the best way to use Storj happens to also be the cheapest!