ERROR lazyfilewalker.gc-filewalker.subprocess failed to save progress in the database

JWvdV · June 17, 2024, 7:27am

In my case, it’s pristine!

Bivvo · June 17, 2024, 9:15am

No, more simple: I was sleeping and had to restart manually next day.

JWvdV · June 17, 2024, 10:23am

Nah, I have a UPS for all devices.

Besides, there is also no apparent relation with total sizes of the databases:

root@VM-HOST:/var/lib/lxc# cd /var/lib/lxc; for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- ls -lSh /storj/DBs | head -n 1; lxc-attach $i -- docker logs storagenode --since 24h 2> /dev/null | grep locked | wc -l; lxc-attach $i -- docker logs storagenode --since 24h 2> /dev/null | grep 'uploaded' | wc -l; lxc-attach $i -- docker logs storagenode --since 24h 2> /dev/null | grep 'downloaded' | wc -l; lxc-attach $i -- docker logs storagenode --since 24h 2> /dev/null | grep 'started' | wc -l; done


STORJ10
totaal 830M
4
24
35567
35905


STORJ11
totaal 396M
3
4817
3075
8037


STORJ16
totaal 67M
3
24
8724
8773


STORJ17
totaal 56M
5
24
6127
6177


STORJ18
totaal 375M
5
50708
3491
68069


STORJ22
totaal 308M
0
24
2857
2908


STORJ23
totaal 1,6G
0
24
3713
3760


STORJ4
totaal 244M
0
21808
4525
60695


STORJ6
totaal 752M
0
24
3724
3899

There also seem to be no relation between traffic, data drive and so on…

Numbers are pertaining last 24h: amount of database locks, amount of finished uploads, amount of finished downloads, and amount of started downloads/uploads. These numbers also tell, that the data fish probably isn’t all to much involved in this.

Qwinn · June 17, 2024, 6:52pm

Well… the Salt Lake one just finally finished after 56 hours, and… no change to the dashbaord. And no database locked messages to explain it.

Bivvo · June 17, 2024, 8:16pm

Sorry, what is pristine?

JWvdV · June 17, 2024, 8:27pm

Alexey · June 18, 2024, 6:56am

no, I mean these log lines:

Bivvo:

2024-06-15T23:01:01Z    INFO    piecestore      upload started  {"Process": "storagenode", "Piece ID": "6U4TFC47QOH7V3YQPRLBTTKG2NMIXLBVYLULVJBKUZH4G544WDBQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "79.127.205.240:36938", "Available Space": 1016585449924}
2024-06-16T07:59:25Z    INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}

Do you see a time difference between these two lines?

Alexey · June 18, 2024, 6:57am

Do you now have all 4 trusted satellites finished their scans?
This should only fix a discrepancy between the OS reported usage and the piechart on the dashboard.
The Avg Disk would not be updated, this is a satellite-side reporting and currently it has issues: Avg disk space used dropped with 60-70%.

Qwinn · June 18, 2024, 7:05am

Hah - not even close. Even with the disk full and no longer taking in ingress, it’s still taking forever. After running 56 hours to do just the Salt Lake satellite, it’s been working on the US Satellite for the last 13 hours. It’s up to directory “j7”.

Alexey · June 18, 2024, 7:08am

I see. There I could not offer anything better than to disable a lazy mode to speedup a process. This should not be a problem now, because your node is full, so it wouldn’t reduce a success rate, but would finish scans earlier.

Qwinn · June 18, 2024, 7:11am

I might consider that, if restarting didn’t mean the last 69 hours of progress would be completely wasted and restarted from scratch. I highly doubt that non-lazy could be so much faster that it could make up for having to redo what it took the lazy 69 hours to do.

It would be helpful if the filewalkers would do the most outdated satellite first, instead of doing them in the same order every single time. The fact that the fourth satellite never gets done until all four can be completed in one go is, I think, a big reason that the distortions get maximized.

Alexey · June 18, 2024, 7:29am

but you said that some satellites are finished? However, you, probably right. Until we would have a resume for startup scan, it could be waste of time…

Qwinn · June 18, 2024, 7:41am

Only Salt Lake has finished. I think I may have finished it as often as 3 times in the last couple of weeks - twice for sure. But since that takes around 60 hours or more each time, I’ve never managed to finish a second satellite.

Alexey · June 18, 2024, 7:47am

Hm, what is concurrency for the scan?
retain.concurrency option/flag (not sure, though, that it affects used-space-filewalker…)

Qwinn · June 18, 2024, 7:51am

Whatever the default is. I’ve never seen any guidance to change it. I see it commented out in the config.yaml with a value of 5.

Qwinn · June 18, 2024, 7:54am

By the way, since I moved the databases to ssd (and maybe it was there before, but if so I didn’t notice it), I do see this now in the logs:

"Invalid configuration file key {“Process”: “storagenode-updater”, “Key”; “storage2.database-dir”}

Since changing that value was part of the guide to move the dbs to an ssd, I’ve ignored this error… and frankly, anything that will STOP the node from being updated right now I see as a good thing, so if this is breaking the updater for now, good. For now.

Tetricz · June 18, 2024, 9:22pm

One of my nodes with a slower, but still respectably fast drive started having this issue. I tried a lot of different things and eventually limiting it’s memory usage to 8GB was enough for it stop.

I haven’t been able to replicate this issue since I had it. Initially I wasn’t even bothering to limit ram usage as I have an overwhelmingly high amount for my needs (512GB).

I have a theory that if I were to give it back all 512GB, this problem would show up again. However, I don’t have the equipment nor time to test right now.

Qwinn · June 18, 2024, 10:07pm

Interesting. There’s been a couple of issues discussed in this thread, can you be specific as to which one? And how exactly did you limit the memory usage, please, under what OS?

Alexey · June 19, 2024, 4:37am

such INFO messages are comes from the storagenode-updater process as you may see. It doesn’t recognize all storagenode’s options but shares config.yaml.

Alexey · June 19, 2024, 4:40am

They likely mean the docker’s --memory option in the docker run command (if you would use it, it should be placed before the image name).