Fine, total offline more than 12h for 2020.
Current uptime for 30 days less than 99.95%. How can we trust tardigrade?
Those uptime numbers cited usually exclude scheduled maintenance like today.
Though the scheduled maintenance is supposedly done according to the status pageâŚ
But satellites still seem to be offline.
Yes, but total maintenance already 12h. Iâgive links to 4h+4h+4h windows.
I started monitoring them on January 22nd. Iâve not seen significant down time since production launch.
europe-west-1 has the âworstâ track record.
The earlier downtime of more than an hour was prior to production launch. And this one was scheduled, so probably isnât going to count against the 99.95% number.
I donât know whether the January down time was scheduled as well.
Other customer facing satellites have had no down time between January 22nd and now.
The strangest thing is that what idiot thought that absolutely complete unavailability of all services is normal?
Clients do not care about is it planned or unplanned downtime, they just lose service for any downtime.
So, unfortunately, SLA should include any downtime that impact availability for clients.
Unfortunately, this monitoring is not correct, respond to âpingâ and âport is openâ doesnât mean that service is working
I wasnât making a judgement, just telling how it probably is. In my experience scheduled down time is excluded from these uptime numbers.
Agreed, best I have for now, but Iâve not seen any complaints about the service not working either.
It looks like they started a new maintenance window. The old one probably expired automatically at 10:00UTC
I also agree with you, any situation is possible, anyone does not saw SLA agreement from Storj LabsâŚ
(we are remembering the devil is in the details! )
In my experience, if the agreement includes scheduled downtime and this time is excluding from uptime availability, it also should be limited and should have planned âmaintenance windowsâ.
9. Service Level Agreement(âSLAâ)
a. Company will use commercially reasonable efforts to meet the following service level commitment: except for scheduled maintenance, the Storage Services will be available 99.95% of the time. We calculate availability based upon the service records we maintain. We will use reasonable efforts to notify you in advance of any scheduled maintenance.
i. Our SLA obligations do not extend to any unavailability of the Storage Services that is caused by: (i) any hardware or software that you use in connection with the Storage Services; (ii) misuse of our Storage Services, including use in breach of the Agreement or use other than in accordance with any content or Documentation or other instructions provided by Company; (iii) circumstances or events beyond the reasonable control of Company; (iv) maintenance or scheduled downtime; or (iv) our suspension or termination of your access to the Storage Services pursuant to the rights we have reserved under the the Storj agreements.
- Scheduled Downtime. Scheduled Downtime will generally occur during the Maintenance Windows. Company will endeavor to provide notice at least eight hours in advance of any scheduled downtime occurring outside of the Maintenance Windows.
- Maintenance Windows. Company has an optional weekly maintenance window on Sundays from 2:00 a.m. EST/EDT to 6:00 a.m. EST/EDT during which scheduled maintenance, upgrades and repairs can occur.
- Usage of the maintenance window is scheduled according to the Company release calendar.
- Company may also perform emergency maintenance in a non-standard maintenance window.
- Company will use commercially reasonable efforts to perform emergency maintenance at the time of lowest use levels, as determined by web use logs from the previous month.
2. Emergency maintenance windows will last no longer than four (4) hours.
3. Company reserves the right to use two (2) emergency (non-scheduled) maintenance windows per year. Emergency maintenance beyond these two (2) additional windows will be considered downtime.
4. Company will inform User about all relevant changes planned for the upcoming maintenance window no less than one (1) weeks prior to the maintenance window.
This downtime was announced several weeks in advance.
Iâm not saying this is the perfect way to do it, but theyâre following their own SLA and not breaking it like the topic title suggests.
In ToS scheduled downtime, which shall not be more than 12 hours per year.
Hey friends!
I just wanted to jump in now that Iâm awake again to add a bit of context about this specific scheduled downtime and what we expect going forward!
First off, I need to reread the SLA, but the intention was that we have two different types of downtime.
One is 4 hours on Sunday morning, 2-6am eastern US time, which requires a week or more advanced notice. I understand and agree with the points that the customer doesnât care if itâs scheduled or not, so we will be working to eliminate these, but as it stands, 2-6am eastern US time downtime if notified in advance does not count against the current SLA.
The second type of downtime we also used today, which is emergency downtime, which we have a small maximum of per year. I need to confirm but I think we ate half of our entire yearly budget, due to the migration running long.
This was an exceptional migration that we donât expect to happen again going forward. Our plan prior to production was to use Cockroach DB for object metadata, but due to time constraints we did not make that migration happen prior to production. All object metadata is now on Cockroach DB, which is what we intended from long before production launch, and do not expect to change this again going forward.
Thank you all for your patience with this migration! Weâre really excited by the new performance and scalability characteristics now available to Satellites with the new Cockroach DB backend. It will be a much more scalable service going forward!
For example, Google cloud does maintenance in such a way that end users donât even notice interruptions.
Why canât you do maintenance without the user impact? As I understand that bad architecture affects it. However, here in everything bad, even on SSD DB locks because of a bad structure.
Bla-bla-bla⌠In normal companies, there are generally closed test servers for " try this, try thatâŚ". This is PRODUCTION, you CANât be so irresponsible. It seems to me that you have no clients because of your bad attitude and new ones donât want to come.
Thanks for the additional information. I noticed it running long, thatâs unfortunate. Iâd like to give a tip to make sure in such cases mails donât go out to say the service is back online at the end of the planned maintenance window. This caused a bit of confusion. It would be nice to get an email about the maintenance running long at that moment instead.
Wouldnât getting an email about maintenance being complete mean service is back up ?
Yes, but it wasnât. That was the problem. The mail went out when the planned window ended, but before the work was done.
But to take several satellites out of service at one time? This is very much like centralization âŚ
Whereâs the difference? your data is only accessible through one satellite anyway.