Why did the delete expire?

Storgeez · March 11, 2021, 1:19am

What does “delete expired” log entry mean and why did the poor delete die?

nerdatwork · March 11, 2021, 2:05am

My best guess, deletes have to be executed in given time or they expire and marked for garbage collection.

Storgeez · March 11, 2021, 2:29am

That’s what I thought, but it is completely illogical considering there is no alternative - there is no way it will, at some point, stop being needed to be deleted, like the owner not wanting the delete anymore or the node being too late to delete.

kevink · March 11, 2021, 6:08am

Best guess is it will expire so your node doesn’t get overwhelmed with pending delete requests. It will get cleaned up by the garbage collection eventually. Disadvantage: It will stay in trash for days after being collected.

Storgeez · March 11, 2021, 4:47pm

Right, yeah, rescheduling in case it takes so long it causes a huge bottleneck, but depending on the given timeframe, on almost all nodes if it takes so long that it times out, a node running on that setup might be hopeless. But at least it makes any sense now.

littleskunk · March 12, 2021, 8:07pm

Did you consider that files could have an optional expire date?

kevink · March 12, 2021, 8:25pm

Why don’t you answer that question for us? Is there such a feature yet?

Pac · March 12, 2021, 10:23pm

Well if the node software can do that for deletes, I wanna know why it couldn’t do the same for ingress requests… In order to prevent SMR disks to stall and crash the whole thing!

I know it’s a bit different as you can’t reschedule an ingress request, like you do for deletes, but it could be rejected/canceled or something…

littleskunk · March 13, 2021, 1:43am

I thought I did

Customers can upload files with an expire date. There will be no delete message. When the time comes storage nodes simply delte it. “delete expired” (files)

kevink · March 13, 2021, 5:43am

Alright, thanks for answering that! (Maybe you did answer it and I just forgot…)

kevink · March 13, 2021, 5:45am

Ingress and egress requests do have a natural “expiration”: They expire once the customer got enough successful uploads/downloads.
But since my assumption about expired deletes was wrong, there’s no delete expiration either (even though I thought I read something like it ). Anyway, even that assumption was not a limitation of the amount of deletes, just a time limit, which ingress/egress has naturally due to the race condition with other nodes.

Pac · March 13, 2021, 10:02am

Right okay ^^

Well maybe things changed since, but back in the days where ingress was massive (100GB+ per day), these “natural expirations” did not work at all on my single SMR node at the time.
I’m not sure exactly why but my assumption is that the system was accepting ingress requests, put them in RAM, acknowledge the received piece as successful although it hadn’t been written to disk yet.
This caused the RAM to continuously grow while the SMR drive was stalling like crazy because it could not keep up, up to the point where RAM would eventually be full (and iowait around 200+) and the OOM Killer would kill the node. Which obviously is the worse case scenario as I suspect it would make the node lose some pieces. Then docker would restart the node, and the cycle would repeat…

Again, maybe things changed (they at least improved as some enhancements were made to put less pressure on disks), but if the node software is still not waiting for disks to actually write data down, I believe these “natural expirations” for ingress don’t really exist at the disk level. They do exist at the Internet bandwidth level (i.e. if your Internet connection isn’t fast enough, you might lose some races), but as long as a RAM cache is used to store ingress pieces temporarily before writing them to disk, then RAM is the bottleneck, not the disk, until we run out of RAM.

In fact, putting pieces in RAM so the node is quick is a very good approach, I think it should be kept. Cache is useful, it makes IO operations way smoother, so that’s good. I just think it should be capped, for instance 512MiB could be dedicated to ingress cache, and when it’s full, it means the storage medium cannot keep up, so the node software should stop accepting new pieces.
For instance… Engineers know better ^^

I hope I’m wrong, feel free to chime in and point out what’s wrong in my reasoning
(Sorry for being a bit off topic)

jammerdan · March 13, 2021, 10:53am

That’s interesting to read.
I wonder if the node could detect such a condition before the Docker crashes and maybe send 0 free space to the satellite temporarily to keep up.

kevink · March 13, 2021, 11:35am

There’s no argument against that… natural expirations don’t solve the problem of overwhelming SMRs with too many requests.

That assumption is correct imo. The writes to the HDD are async, so they are stored in RAM and only flushed to disk every couple seconds. So the RAM cache grows while the SMR drive can’t keep up. Eventually leading to crashes and possibly data loss (but I think it would just kill the node and the cache will still be written to the SMR as that cache is not bound to the node but to the write operations from the os)

That would certainly be an interesting solution but one that needs to be developed a lot. Currently (afaik) they only use normal async write operations so the OS takes care of caching those in RAM.

If you’re using zfs, you could try setting the dataset to sync=always and only assign a slog with 512MB. Then all writes will be cached in RAM and the SLOG. I’m not 100% sure what happens if that SLOG gets filled. Afaik the SLOG replaces the ZIL on disk so I would expect that once the SLOG is full, applications will have to wait until some space gets free again. If that’s the case, you’d have your 512MB cache and after that is full, pieces would naturally expire as the storagenode would not get a confirmation from the drive that the write operation succeeded.
Note: I can’t find anything yet about what happens when the SLOG is full but sync operations wait for a confirmation from the underlying storage system, so in the worst case it would wait until the SMR drive confirms the write operation. this should work in thise case.
Also Note: The SLOG isn’t actually a write cache. The described application is a somewhat strange way of using a SLOG and I haven’t tested it, it’s just theoretical. Feel free to test it though

Storgeez · March 14, 2021, 4:21pm

So it’s actually data expiring, not the request? In that case it should say “piece expired”, not “delete expired”, the latter is extremely confusing lol. Or perhaps it’s an imperative (but that wouldn’t make sense in a log): “Delete expired!”?