Ok so, sorry about the tone, but I am really angry about it. All I can say is that I find the current situation really not satisfying. Honestly speaking quite a disaster unfolded and management is silent. I get this uneasy sense of there is a detachment from them from the reality of the nodes.
Yes, technical team members are on the threads actively trying to straighten things out but they also make clear they do not speak for the company but for themselves. And they don’t speak for the management persons. Where is the “head of something” to make a statement about the various issues we are plagued with? Or that the progress on fixing things is closely monitored on top level and steps are introduced to prevent such a thing from happening again? That would be management leadership.
I have been with Storj for some time now and I cannot remember a situation like this before where there are so many issues that you ask yourself if Storj is up to par or even capable to handle its own code.
There’s that issue with a fundamental core functionality of the network where it failed to delete hundreds of terabytes of customer data from the nodes. This could potentially impact customers who rest assured their data got erased, could pose even legal issues. But of course also node earnings are impacted when nodes are full but not getting paid for the space used and additionally cannot receive ingress because they are considered full. At the end even the network cannot ingest data as expected when the nodes reporting as full while in reality there is plenty of space because the nodes are filled with garbage.
This was created not by one single but by multiple different implementations that proved as inadequate which make it appear as if they were led by a completely different impression how the network behaves than it does in reality. It appears to be as there was not enough understanding, anticipating, testing and monitoring.
Now when observing how things are getting fixed I am seeing new issues surfacing left and right. Nodes were unintentionally downgraded which has created new risks as downgrades usually are not tested. Thankfully the next disaster with a tsunami of broken nodes because auf that did not happen but it seems we were just lucky.
And with the fixes thrown at us, new issues arrive some of them even introducing the same kind of problems again, like
Filewalkers not updating databases, so ingress stops again for some nodes
Total mess with trash date folders leading to data not getting deleted from the nodes again and no ingress for nodes again
Bloomfilters getting deleted again instead of getting stored and resumed, preventing deletions and no ingress again
Resuming used-space filewalker not implemented yet, which means for some nodes the used-data garbage gets deleted but used space does not get corrected so still no ingress.
And SNOs must find workarounds for these new issues left and right.
Of course management can choose what to comment on and pretend that’s how it should be but that is just sad. This is not the quality that I have felt Storj was delivering in the past and if I was a customer it is certainly not the quality that I would entrust to store my encryption passphrases for me as Storj plans to do for their customers. So for me I can say this situation does not feel normal and costs reputation. And I ask myself
Is the management aware of all of this and of the core functionalities of the network failing on multiple levels. Do they even consider this as normal?
How was it possible that the code was not prepared for growing and larger nodes, why was this not anticipated and constantly monitored and tested with the nodes growing?
Is there generally enough testing? Does Storj run enough nodes themselves with different sizes, ages, speeds and hard- and software to test properly?
Is monitoring sufficient? Why is it always the node operators who detect the new issues?
Are coding processes up to par or need refinement?
Are developer resources sufficiently allocated?
Is there a roadmap for node software with clear operational goals and plans what performance should be achieved and when?
Well and if they would agree that the current situation is not normal and not what it should be and that it needs improvement then I ask what are their plans to prevent such a case and regain reputation?
External audits for code and processes?
Hackathons?
Add developer resources?
Provide better tools for devs and SNOs so issues can be detected, investigated and reported better and quicker? (First reports that something must be wrong with the used space were posted 1 year ago in the forum)
But it is not in my responsibility nor do I have of course the insights into the current mess to make the adequate suggestions and introduce required changes. That’s the duty of the management. All I can say is, it does not feel the way it should be.
And before anybody can get me wrong: I am seeing the work of the devs and their attempts to fix things and at least in theory some things that were pushed out look promising. But to put it with the words of another SNO: