Fatal Error on my Node / Timeout after 1 min

Adavan · June 23, 2024, 9:27pm

We, community of SNO, can help you with some troubles with your node. You can ask here, on the forum your questions and I am sure, that all will be answered.

You need to setup some “playground”, where you can install some distributions and try to play with it. Setup remote connection via SSH, generate TLS certificates for more security, tune system / kernel settings …

Unfortunately, this is out of scope of the forum and it is off topic as well. I will send you private message. Lets the topic clear for better focusing.

Alexey · June 24, 2024, 2:47am

You are correct, however we suggesting to move to SSD only databases and only in the case if you have errors like “database is locked”. Unfortunately I do not have a better recommendation for this case except moving databases to a less loaded disk/SSD.

Adavan:

  PV /dev/md10   VG default         lvm2 [465,63 GiB / 0    free]
  PV /dev/md3    VG default         lvm2 [465,63 GiB / 0    free]
  PV /dev/md5    VG default         lvm2 [<5,46 TiB / 3,96 TiB free]
  PV /dev/md6    VG default         lvm2 [1,36 TiB / 0    free]
  PV /dev/md8    VG default         lvm2 [465,63 GiB / 0    free]
  PV /dev/md2    VG default         lvm2 [14,55 TiB / 0    free]
  PV /dev/md11   VG default         lvm2 [465,63 GiB / 0    free]
  PV /dev/md9    VG default         lvm2 [<7,28 TiB / 0    free]
  PV /dev/md4    VG default         lvm2 [465,63 GiB / 0    free]
  PV /dev/md1    VG default         lvm2 [<7,28 TiB / <7,28 TiB free]
  PV /dev/md7    VG default         lvm2 [<3,64 TiB / 0    free]
  Total: 11 [41,84 TiB] / in use: 11 [41,84 TiB] / in no VG: 0 [0   ]

At least now I can understand why your disk subsystem could have a slow response…
And you already know how to speed up it.

This

is recommended anyway, but it’s not a requirement. However, if you would use a write cache then it become a highly recommended, especially if you have power cuts.

Dhend · June 24, 2024, 6:48pm

Disabled file acces time recording and moved db’s to seperate ssd.
It’s better now, but it keeps filling up the ram. Its running 3 hours non stop now with 20gb of ram used still growing slowly…
Didnt expect that the drives couldnt handle data coming in with 6mb/s… Even with 3 drives together.

Is there a way to increase the size of the packages so its easier for the drives to ingest from RAM?

Vadim · June 24, 2024, 6:51pm

Check hdds for errors, because my 17 nodes event not concume 20GB of ram

Adavan · June 24, 2024, 6:53pm

Thank you for tip, I will thinking about it.

I would like to ask you to share more information. You just pointing to many MDs, but this is positive to me, because load is spreaded across all MDs, not targeted to only one. I mean, if SN process one working with piece on MD4 and second process working with piece on MD7, they can working parallel without any interference. Or I miss something?

Mean you only the two risky solutions? Or something else?

My questions are still opened, let me summarized them:

It is possible to run walkers one by one (serialized them) and not all at once (running parallel)? This will be helpful for me to save some IOs to another operations.
It is possible to limit worker threads in StorageNode? If I decrease them, there is better for me to have smaller count of non-interfering threads, like bunch of noisy threads.

snorkel · June 24, 2024, 6:54pm

On Synology, Docker reports wrong used memory. But your ram will fill anyways with cache and buffers.

Dhend · June 24, 2024, 6:57pm

I’ve a single node on 3 disk shr1. I expect you have 1 disk per node. Which I now see why everybody has such a setup. But that is not really my way as i’m just using space that I have and I got it sitting there backing up other stuff etc.

Anyways. I dont get any errors in the logs. It says it starts uploads of pieces and completes them. But it restarts sometimes, i think when it cant add more pieces to ram as the drives cant keep up… It worked fine when I only had 2 disks

Julio · June 24, 2024, 7:46pm

This config variable should help in that respect:

# how many concurrent retain requests can be processed at the same time. default 5
retain.concurrency: 1

Regards

Julio · June 24, 2024, 7:48pm

Heh sorry for the massive font, seems a # at the start of a line triggers that formatting on this forum. I don’t know a way around that.

Adavan · June 24, 2024, 8:16pm

Thank you, I will try it.

Edit:

Unfortunately, this isnt working (state after storjnode container restart):

$ iotop -n 1 -b -o | grep walker
3721495 be/7 root      879.05 K/s    0.00 B/s  0.00 % 87.37 % storagenode trash-cleanup-filewalker --storage config/storage --info config/storage/piecestore.db --info2 config/storage/info.db --pieces config/storage --driver  --filestore.write-buffer-size 4.0 MiB --filestore.force-sync=false --log.output stderr --log.encoding json --lower-io-priority=true
3721489 be/7 root        8.50 M/s    0.00 B/s  0.00 % 70.55 % storagenode used-space-filewalker --storage config/storage --info config/storage/piecestore.db --info2 config/storage/info.db --pieces config/storage --driver  --filestore.write-buffer-size 4.0 MiB --filestore.force-sync=false --log.output stderr --log.encoding json --lower-io-priority=true

Is the setting right?

Julio · June 24, 2024, 9:16pm

Welp, looks like two processes are running - I don’t run docker. However, in Windows I believe it only spawns an additional process if lazy filewalker (perhaps evidences by the trash & used-space/filewalker processes’ flag ‘–lower-io-priority=true’) is turned on, so you might try disabling that (false):
storage2.piece-scan-on-startup: true
pieces.enable-lazy-filewalker: false

Or I guess in docker run command speak, add --pieces.enable-lazy-filewalker=false

Note the scan on startup set to true, I haven’t read your specific problem having only scanned this thread. Maybe I’ll go back, and get more granular about what you’re trying to solve/problem is.

Adavan · June 24, 2024, 10:19pm

Thank you for your effort, I really appreciate it.

My problem is simple - slow HDDs. Or better - low IO.

So, I actually try to tune up StorageNode to get better performance. I discovered, that StorageNode spawn huge amount of workers threads, that attempt do their tasks parallel, which can easy exhaust my machine.

Some information about “choking background processes” or “minimizing workers threads” will be very valuable. Like in previous comment - I mean, not have all filewalkers running side by side, but let them running one by one.

Meantime I found a issue, where guy share his config. I found here two settings about workers:

storage2.delete-workers: 30
storage2.exists-check-workers: 30

I taked the settings to my, but decrease them to value 10. Of course, I dont have too much information, what exactly these two settings do, but I believe, that this can help me improve the performance for foreground operations and sacrifice some time to finishing background operations.

BTW: Can anybody explain for me the settings? What is the default values for them? Is there any documentation with all the settings with explanation and default values? Something like this.

Meantime I have lack of knowledge about caching mechanism in the kernel, so, the vm.* parameters will be my next destination. It will be maybe good, if I using “RAM cache” only in time, when node starts and all walkers attempt to do his job.

And I will looking for reasonable UPS unit for cases, when some problems with electricity occurred.

Next step can be really moving DBs to faster storage (SSDs), but lets try one by one .

Julio · June 24, 2024, 10:28pm

Ok… read back a little.
Despite Alexy’s assertion re: storage2.max-concurrent-requests and knowing you’ve tried 20 & 25… and your logs are still choking on 30+… do something drastic… put it to 5, explore what real affect that has. The choice of n is not a miracle, bad io can still have only a 5% success rate - and at that point who are you helping … you’d just be wasting significant bandwidth.

re: storage2.monitor.verify-dir-writable-timeout: 1m30s
goto 2m30 … expect more ram use, of course. and ensure that it’s matching INTERVAL parameter is an equal or preferably higher value, but do not exceed 5 minutes… the shutdown error prevents the satellites from tagging audit failures, ie: too long to receive audit confirmation = fail. A higher interval, I think can allow audit response to time out. Readable and writeable timeouts in Windows are primarily MFT fragmentation on NTFS, as it’s landing zones scatter everywhere depending on the density flow of information (small files mft entries).
…ramble ramble…
Your Optiplex is 20 years old… check temps on cpu/chipsets - core 2 duo? north bridge south bridge… the heatsink bonding may no longer be effective…Flakey aged chipsets and ram like toyoo mentioned can cause voodoo chit. You could burn in test that ram for 3 days and it may not bit flip, but slowly corrupt random addresses that you’d never know, consider replacing it if you’ve got some old hoarded non used stuff around. Since it was a ‘gaming’ rig, probably was overclocked at some point, over voltaged, what have you - obviously you know the risks. However, all things considered if you manage 2-3 months stability at 75% load, that pretty good - she’s been a hard worker, can’t always get perfection.
You might also consider turning async back on, as that’s a recent change and your problems co-inside. There’s also a way to limit slower ingress nodes, cancelling those slower than X kbps within 10 seconds, etc.
Obviously you’re overly verbose on this subject matter, you’ll come to workable conclusions soon enough, I’m sure.

Good luck

Alexey · June 25, 2024, 5:47am

You may set the retain.concurrency: 1 option (it’s 5 by default) either in the config.yaml or as a command line argument after the image name in the docker run command: --retain.concurrency=1, however it would affect only the garbage collector, not used-space-filewalker. But the last one usually work sequentially (at least as showed in my logs).

yes, only these two. At least for LVM. For ZFS you may have a little more options to optimize though, but it usually requires more resources.

Alexey · June 25, 2024, 5:50am

I edited your post, you may use two new lines with a three backticks on them and put the code between these lines, i.e.:

```
# how many concurrent retain requests can be processed at the same time. default 5
retain.concurrency: 1
```

And you now see a result.

Alexey · June 25, 2024, 5:53am

docker exec -it storagenode ./storagenode setup --help

You will get all available config options with their description and default values.

Alexey · June 25, 2024, 6:04am

Please do not assume, that the check timeouts are affecting general timeouts somehow - it is not so. These timeouts only regulates how sensitive checks to not responding hardware. But they of course can be used as an indicator: if you would increase the offending timeout with a small steps you can figure out when the node would stop to crash.
Likely exactly this timeout may happen for the uploads and success rate if you forced to increase a writeable check timeout, and may happen for downloads and audits (much more dangerous) if you forced to increase a readable check timeout (you need also to increase its interval check on the same amount, because they initially both are set to 1m0s).
So, increasing these timeouts is a band aid, not the solution. It will just increase risk of undetectable hangs of the hardware and a disqualification as a result.
But of course it could be unavoidable to increase them, if the hardware cannot keep up, just be careful.

daki82 · June 25, 2024, 10:32am

propably SMR ? that would be bad

Adavan · June 25, 2024, 6:05pm

Yop, lower is better for machine, definitely, but, as you mentioned, not for payout. So, it is nice to have the parameter and have some opportunity to tune it up.

Thank you, but I would like to stay default values as closer, as is possible. But fine, 5 minutes I can consider as a highest possible level.
I using linux (Debian 11) and EXT4 with noatime attribute. More attributes in fstab will not be helpful for my situation. Fragmentation sometimes occurred, sometimes I run defragmentation command against DBs, which are most changing, but maybe it will be good idea to run it against blob store (when walkers will finished they tasks, I can run it with checking mode and discovery most fragmented files). Next I come back with some results or examples for next consultation .
Unfortunately, walkers can finished after few days, so, I must be patient . Meantime I spot, that you know about the problem. So, one day it will be solved, but meantime I will do my best to stay alive .

Julio:

Your Optiplex is 20 years old… check temps on cpu/chipsets - core 2 duo? north bridge south bridge… the heatsink bonding may no longer be effective…Flakey aged chipsets and ram like toyoo mentioned can cause voodoo chit. You could burn in test that ram for 3 days and it may not bit flip, but slowly corrupt random addresses that you’d never know, consider replacing it if you’ve got some old hoarded non used stuff around. Since it was a ‘gaming’ rig, probably was overclocked at some point, over voltaged, what have you - obviously you know the risks. However, all things considered if you manage 2-3 months stability at 75% load, that pretty good - she’s been a hard worker, can’t always get perfection.
You might also consider turning async back on, as that’s a recent change and your problems co-inside. There’s also a way to limit slower ingress nodes, cancelling those slower than X kbps within 10 seconds, etc.

Sorry, this was misunderstanding. I compare stability between “manufactured computer” (Dell OptiPlex) and “assembled computer” (my storage rig). It seems, that the manufactured PCs are more stable and more reliable with more durability against dust …
So, again, this old Dell OptiPlex is my monitoring machine only (InfluxDB, Grafana, ElasticSearch, …).

I have set it the file, but, to be honest, no changes occurred . Or I not spot any difference between not set and set. Or the setting have minor impact to my situation.
Anyway, thank you for tip!

OK, thank you for confirmation.

Thank you, this was exactly what I looking for!

I agree. I playing with the parameters with very high caution. Default or standard exist for good reason, but it is nice to have opportunity to override them in case of trouble for investigation.

I discovered only problems with checking writable, so, I added only override to only this parameter. I will add override for readable, when I spot that error, but so far so good .

No, they are too old for SMR. All are definitively CMR or similar “convention” technology.

Very well, thank you all for the hints and tips! Now it is my turn to playing with the new knowledge .

My current configuration is:

storage2.max-concurrent-requests: 23
retain.concurrency: 1
#storage2.delete-workers: 10 ### (not have sense for me, default 1)
storage2.exists-check-workers: 1 ### (decreased for testing, default 5)
storage2.delete-queue-size: 2000 ### (decreased for testing, default 10000)
storage2.monitor.verify-dir-writable-timeout: 1m30s

My node looks basically same, like in previous state, BUT now I believe, that will be stable. Meantime I will play with “RAM cache”. All the kernel parameters are changable on the fly. Solution can be turn on “RAM cache” only in initialization, when walkers are running and turn it off in default state.

But, if anybody have any additional idea, I will be grateful for all .

Edit:

Now I discovered, that I still have some garbage on my node from decommissioned satellites.

12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB (europe-north-1.tardigrade.io)
118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW (satellite.stefan-benten.de)
12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo (us2.storj.io)

I made the magic described on the forum, I believe, that this can be helpful too.

Alexey · June 26, 2024, 7:38am

It’s an interesting idea, though. But you may like it and do not switch back (it would also keep the cached metadata while it’s accessed from the regular uploads/downloads, this usually noticable increasing the success rate, so you may decide to make it a permanent change ).