Fatal Error on my Node

tditte · April 16, 2023, 11:18am

I cloned the data with freefilesync, during operation and then switched off the node and made a comparison of the data again. 1 week no errors and now it’s like before

Alexey · April 16, 2023, 11:18am

Hm, probably a fragmentation? (it’s cloned as well)

By the way, I did not disable a defragmentation task for my disks. And even did not exclude for Microsoft Defender.

tditte · April 16, 2023, 11:33am

the fragmentation of the plate is 8%, the defragmentation is switched off and the defender is deactivated for this plate. my feeling is that somehow deleting the trash folder causes the node to crash. I think that the result is the problem, the filewalker looks at all the data, before it’s done with it, the node gets the command to delete the data again. Then the error occurs and the game starts over. there is no possibility. telling the node to run the filewalker first and then delete the data.

Alexey · April 16, 2023, 12:17pm

you have a several options:

disable a filewalker (if you do not use this disk for something other it should be pretty safe)
increase a timeout accordingly error (for write timeouts increase the writeable timeout, for read timeouts increase the readable timeout and readable check interval)

tditte · April 16, 2023, 3:38pm

How can I tell if Filewalker is disabled? I changed that in the config.

if set to true, all pieces disk usage is recalculated on startup

storage2.piece-scan-on-startup: false
is that okay?

I’ve already increased the read and write timeouts, it now stands at 2 min 30 s
but so far has made no difference.

I just noticed that the config of the two nodes are different, i.e. parameters that exist in one do not exist in the other and vice versa. is that normal? are both on the same version 1.76.2

snorkel · April 16, 2023, 5:28pm

The setting is correct. I hope you also uncommented the line by removing the # sign.
The config file is not updated so often, but you can add any parameter to it. If one node is newer than the other, it must have a more recent version of config file.

snorkel · April 16, 2023, 5:30pm

We need a wiki topic with all the config parameters, with detailed explanations and situations in which they should be used and how.

tditte · April 16, 2023, 5:35pm

That would be really great, then you would also understand some parameters better

tditte · April 16, 2023, 5:45pm

OK then it could be the problem that some nodes are causing problems. I just copied the config of the working node and moved it to the non-working one with the node-specific (public address+port) settings. and it seems to be working so far, at least the hard disk load has dropped to 15% and there are no errors at the moment. It’s not that there were incorrect settings in the configs of some nodes due to older versions.

naxbc · April 16, 2023, 6:49pm

@Alexey which of the parameters is not advisable to increase more than 5 minutes and fail audits?
Readable or writable parameters?

daki82 · April 16, 2023, 8:58pm

so you run defrag automatic or not?

Alexey · April 17, 2023, 3:17am

The readable timeout and interval. However, these parameters themselves will not start to fail audits, this will be an indication that your system potentially could not provide a piece for audit within 5 minutes, this will put your node into a containment mode and it will be asked for the same piece 2 more times before considering this audit as failed.

So, these parameters are safe from the auditing point of view, but if you have timeouts more than 5 minutes for reads you should be aware that this node may start to fail audits as well and maybe it’s better to crash it before that.

Alexey · April 17, 2023, 3:21am

Yes, the automatic defragmentation is enabled for all drives on my Windows PC. I only disabled an indexation for search and excluded them from the Windows Defender, because it makes no sense to scan or index pieces of encrypted data.

daki82 · April 17, 2023, 5:37am

so i will wait untill my online score recovers and then run the defrag while the node is “set full”
lets see if this cures the “slow subsystem” may take some weeks.

defragging could increse the writing speed significantly. sadly not so much reading.

snorkel · April 17, 2023, 9:53am

What is the correct format for setting those timeouts in docker run command?

nerdatwork · April 17, 2023, 12:50pm

Just so you know, whichever parameter is not set in docker run command is picked from the config.yaml file.

snorkel · April 17, 2023, 2:05pm

Yes, I know, but we, dockers, preffer to set everything in the run command.

tditte · April 17, 2023, 4:37pm

Addendum, after 24 hours the system is running as before version 1.75.2, no more errors, the recycle bin is finally empty again, the hard disk utilization is in the normal range as before with 5-15%, all settings are set to normal again and the system is runing .
So there is something wrong with some nodes with the config.

Alexey · April 18, 2023, 4:02am

add -- before the option and place it after the image name or in the command: clause in case of docker-compose.

Alexey · April 18, 2023, 4:03am

if you believe that the difference in the config is a root cause (not the load, which perhaps ended for your node), then could you please post a difference?