Updates on Test Data

Alexey · September 21, 2024, 5:05am

The customer didn’t know, that they would require the SOC2. It was from their security department, which popped up in the last second after all papework was done. And of course we did not suggest Storj Select before that and everyone were sure that Storj Public is a best case. But I hope that the lesson is learned.
However, I think it’s a tricky part. If you would ask, they may change their mind and take the Storj Select instead of Storj Public from the beginning even if they actually do not require a SOC2. And I think it would be costly to switch back later (in any direction actually).

jammerdan · September 21, 2024, 5:35am

Storj is offering 2 different types of networks for customers with different requirements. Part of the sales process of a customer with such huge storage requirement should be to clarify which solution is the best suited. Not just performance wise but also from the legal perspective.

I think Storj needs to be transparent about the advantages and disadvantages of the different networks. And this is not just about performance. It is about compliance. And if the Public network is not compliant, I don’t see an advantage in the long term to hide this fact from a customer.
And why should the chose the Select network if they don’t need its certification.
I am quoting you:

If the Select network is more expensive and they don’t really need the compliance, then they would normally not sign up for it.
But this is also why I am saying it should be easy to move the date around. So even if the would sign for the Select network first, when they realize they don’t need compliant storage for their data or parts of it, they could simply move it to the Public network and save money.

Alexey · September 21, 2024, 5:59am

The customer was aware about existence of Storj Select. I didn’t participate in the negotiations, however, I believe that everything has been discussed. This is a last minute change, so I do believe that our Sales and Engineering Team did a good job on onboarding them.

We are. All options are publicly available. And I believe all advantages and disadvantages were discussed, especially when the SOC2 requirement was added on the plate.
However, I do not think that starting with Storj Select as an option 1 is a good move for everyone deal, but it’s my opinion.

There are difficulties, the Storj Select is not connected as a Public network. It’s not easy for legal reasons and technical limitations.
So someone will pay for that transfer - either Storj (using special repairers, which could connect to both networks) or the customer by redownloading and reuploading their data or likely using our partners, who can do data migrations for them.

stuberman · September 21, 2024, 1:20pm

This is fairly normal for customers, even large enterprise customers. Customers typically do not have a list of security requirements (some do especially government related entities).
There are usually two types of enterprise customers:

A business unit such as a marketing department is trying to find a solution that the enterprise IT department isn’t able to offer (or is not a good fit for some reason).
The IT department of an enterprise looking to offer new capabilities or reduce costs.

In the first case, the risk officer (or IT security group) is not in the loop until the contract is ready to be signed, I went through this so many times. As internal risk management we try to train other departments to involve us very early.

In the second case, it is far more likely that the security or risk team is involved early and know to be part of the early conversations and options.

In either case, there is rarely a hard ‘requirement’ to have a vendor provide a SOC 2. The discussion goes like this internally between security and the department wanting a new vendor or service:

What kind of data is involved? (we then figure out the risk level)
Is there system integration involved with existing IT systems? (more risk assessment)
Depending upon the level of risk we ask how secure is the vendor’s system?
Next how can they prove it? (If they have SOC 2, ISO 27001, NIST this makes it much easier)
If the project needs a secure system and the vendor doesn’t have good proof, we can still consider the proposal but the amount of effort is very, very high and the vendor service had better offer us extremely high value.

I have also been in situations where some department simply used a credit card to on board a service without security approval but needed privileged access. Often times we simply could not grant them access due to the risk. (Grammarly.com was an example where internal corporate information was collected at their sites and they could not provide reasonable security assurances.)

The bottom line is that is is not reasonable to expect a customer to fully understand their own security requirements in advance. Security is not binary. Security is expensive and is always a balance between risk and value. A competent risk or security team will have a good understanding of corporate risk appetite and business needs. New and unique systems, including Storj, are considered high risk for sensitive data due to a lack of industry wide adoption. In some cases a company’s cyber insurance policy may not cover or allow such services.

jensamberg · September 22, 2024, 6:49pm

Can somebody ping point me what is the Storj select network? Is it secret ?

Did I miss an important input?

agente · September 22, 2024, 6:51pm

alpharabbit · September 22, 2024, 7:00pm

I wonder if something like this page https://stats.storjshare.io exists for the select network? Or any other information about its current size?

jensamberg · September 22, 2024, 7:07pm

Yes I am wondering if there any statistics for Storj Select

Knowledge · September 22, 2024, 8:51pm

Select doesn’t operate like the public side. Each customer that uses it has specific needs and it is scoped around those needs only. Where as the public side may have nodes all over the world with common redundancy, if a customer doesn’t need that level of redundancy or distribution, it won’t be necessarily built out for them that way in Select. So, even if you were able to compare the two, it would be Apples to Oranges.

peter_linder · September 22, 2024, 10:52pm

For me it is good that this customer chose different parameters. It is better to keep this kind of low TTL traffic from the public network because it is simply not profitable given the current rules. Some of my nodes are still, after more than 30 days, busy deleting data matched by BFs. I realise this is because of past bugs in in testing, but an IOP is and IOP after all.

arrogantrabbit · September 22, 2024, 10:59pm

Deletions (file unlink, specially) are not IOps bound.

Roxor · September 22, 2024, 11:40pm

Deletions are updates, and updates are writes. Is any write not limited by iops?

arrogantrabbit · September 22, 2024, 11:45pm

As it turns out, not: Unlink performance on FreeBSD

Edit: to be more precise: yes, writes are limited by iops, but deletions are not bottlenecked by write performance.

Roxor · September 23, 2024, 12:01am

That doesn’t show how iops affect unlink performance: as (it doesn’t appear) you
varied your hardware iops between test runs? You simply looked at the 20/sec that were hitting your disk and guessed faster-wouldn’t-be-better. Do a bulk delete from a SSD, then delete the same set of files from a HDD: and if you get broadly similar performance you can say iops don’t affect unlink.

However…

When I was testing ZFS metadata-device sizing: I unpacked the same tar’d node many times and deleted it. The SSD setups smoked the HDD on delete speeds. So, same set of millions of files… only the hardware capabilities changed. Both were SATA devices on the same controller. I didn’t test like you did: but my feel is that iops very much matter for any update to any filesystem on any OS.

Edit: I think I was typing while you edited, so now I’ll edit - that yeah for a node it isn’t deleting enough every day for delete speeds to be an issue: totally agree!

arrogantrabbit · September 23, 2024, 12:34am

No. I looked at the time spent in the call and concluded that if iops were the bottleneck it would increase the rate of the deletes until the bottleneck is reached.

It was bulk delete (by a storagenode) from an SSD — the special device. All delete traffic went to SSD, which was not even close to half loaded. I only mentioned 20 iops on HDD to diffuse any objection that it’s iops limited on HDD. All metadata is on Intel nvme 2TB SSD.

Right. Of course slow iops will affect deletes once drive capabilities are reached. But in this case, deletes are not going fast enough for iops to be a problem. Hence, there is something else that is done synchronously or sequentially that wastes time. Depending on which one it is — deleting in parallel may or may not help.

But either way — 200 files per second is way too slow.

Side note — it’s not CPU bound either — I’ve doubled the CPU clock at it made no difference.

It could have been memory/cache flush/whatever limited (this is dual xeon system), but I did not have time to investigate further yet.

alpharabbit · September 23, 2024, 10:49am

The none TTL data isn’t much better. Have a look in the trash folders, most files there are not even 30 days old…

Roxor · September 23, 2024, 11:01am

SNOs should have no say over when customers choose to delete their files. If they’re paying the bills… they can delete them… by TTL or by hand… whenever they want.

alpharabbit · September 23, 2024, 11:18am

SNOs HAVE no say. We can take it as it is or leave.

jammerdan · September 23, 2024, 2:02pm

What a coincidence, 10 days ago:

snorkel · September 23, 2024, 2:42pm

So now we are part of Skynet?
Hmm… if I decide to don’t GE nicely, will an AI drone nuke my house?
The half cloud logo should have a roboteye in it…