Delay between putobject and getobject in updating timestamp?

AussieNick · October 9, 2023, 7:33am

Hi all,

I’m using kopia to backup my computer and servers to a Storj bucket thru the S3 gateway-st I’m running on a VPS. I have a specific issue that keeps coming up when running repository maintenance. Kopia relies on the passage of time to keep things concurrency safe and it gives me an error when I try to perform maintenance.

ERROR error checking for clock skew: Clock skew detected: local clock is out of sync with repository timestamp by more than allowed 5m0s (local: 2023-10-08 20:48:59.759323512 +0000 UTC repository: 2023-10-08 09:51:07 +0000 UTC)

kopia will put an object “kopia.maintenance” and do a getMetadata on it to compare it’s timestamp against the local system clock to check for a large difference (± 5 minutes) as a safety feature. I believe the issues is caused by the StorJ Gateway because it returns the old metadata and timestamp after the new object is put and I sometimes see rate limit errors in the gateway log output

API: GetObject(bucket=kopia, object=testing/kopia.maintenance)
Time: 17:51:20 AEDT 10/09/2023
DeploymentID: <redacted>
RequestID: 178C5DC9AD3147BF
RemoteHost: <redacted>
Host: <redacted>:7777
UserAgent: MinIO (linux; amd64) minio-go/v7.0.63
Error: io: read/write on closed pipe (*errors.errorString)
       4: /go/pkg/mod/storj.io/minio@v0.0.0-20230627070725-370d887aacde/cmd/api-errors.go:2012:cmd.toAPIErrorCode()
       3: /go/pkg/mod/storj.io/minio@v0.0.0-20230627070725-370d887aacde/cmd/api-errors.go:2037:cmd.ToAPIError()
       2: /go/pkg/mod/storj.io/minio@v0.0.0-20230627070725-370d887aacde/cmd/object-handlers.go:510:cmd.ObjectAPIHandlers.GetObjectHandler()
       1: net/http/server.go:2122:http.HandlerFunc.ServeHTTP()

As a temporary workaround I’ve added a 3 second sleep timer to the kopia maintenance routine between the Put and getMetadata for “kopia.maintenance” and don’t have any more errors. What do you think is the issue here? Why does the S3 gateway return the old metadata when queried too quickly?

Alexey · October 9, 2023, 8:39am

Hello @AussieNick,
Welcome to the forum!

For me it looks like your local clocks is out of sync. You need to sync it to have the same time as on VPS.
It also seems that the time zone is configured incorrectly with 11 hours difference?

AussieNick · October 9, 2023, 9:57am

Well yes, my local timezone is UTC +11h. I tried changing the server timezone to UTC and that didn’t help. Looking into the logs a bit more, here is exactly whats going on.

2023-10-09T05:19:21.960835Z DEBUG kopia/repo [STORAGE] PutBlob  {"blobID":"kopia.maintenance","length":3319,"error":null,"duration":"360.149637ms"}
2023-10-09T05:19:21.961540Z DEBUG kopia/repo [STORAGE] GetMetadata      {"blobID":"kopia.maintenance","result":{"id":"kopia.maintenance","length":3309,"timestamp":"2023-10-08T21:11:00Z"},"error":null,"duration":"634.813µs"}

It does a Put then a GetMetadata on the same object within a few milliseconds and the old timestamp is returned.

This is what happens with the 3 second delay added.

2023-10-09T06:51:15.525387Z DEBUG kopia/repo [STORAGE] PutBlob  {"blobID":"kopia.maintenance","length":1634,"error":null,"duration":"422.906842ms"}
2023-10-09T06:51:18.566661Z DEBUG kopia/repo [STORAGE] GetMetadata      {"blobID":"kopia.maintenance","result":{"id":"kopia.maintenance","length":1634,"timestamp":"2023-10-09T06:51:15Z"},"error":null,"duration":"39.375456ms"}

The timestamp is shown correctly. Is this still because of the system time being out of sync? I’ve compared the time between every reference clock I can find and its in sync down to the second.

Alexey · October 10, 2023, 1:36am

Could you please try to run Gateway-ST on the same machine to exclude the time shift completely?
If there’s no clock sync issues then it perhaps caches metadata, especially if it replaces a previous object with the same key (name).

pwilloughby · October 10, 2023, 5:54am

Storj should never return old metadata after an object is put.

The 634.813µs GetMetadata call is suspiciously fast and must be coming from the now stale cache. Your 3 second delay shows that Kopia does eventually invalidate the cache and fetch the fresh metadata in the 39.375456ms GetMetadata call. I didn’t dig into it too much but maybe the maintenance routine could be refactored to use PutBlobAndGetMetadata instead of PutBlob to avoid this issue.

AussieNick · October 10, 2023, 8:43am

Thanks again for the reply on this! The kopia client is connecting to Gateway-ST on the same physical machine at the moment. I do have caching enabled on the Gateway-ST, do you think this might be a stale cache being served especially with a response in “634.813µs”?

AussieNick · October 10, 2023, 8:52am

Yeah I agree that call is returning way too quickly. I will keep running the maintenance with my patched in delay for now and maybe try and reduce it to a few milliseconds and see if that has any effect. Once I get a bit more data on this, I’ll open an issue on the kopia github.

pwilloughby · October 10, 2023, 6:00pm

I’m not sure that Gateway-ST with caching enabled provides the read after write consistency that Kopia requires. A possible workaround for the maintenance routine might be to exclude the “kopia.maintenance” object from the cache. Maybe something like this: MINIO_CACHE_EXCLUDE="*kopia.maintenance"

AussieNick · October 10, 2023, 10:31pm

I’ll give that a go and see if it makes a difference.

Update: That seems to have fixed the issue on my test repo.