Fatal Error on my Node

I cloned the data with freefilesync, during operation and then switched off the node and made a comparison of the data again. 1 week no errors and now it’s like before

Hm, probably a fragmentation? (it’s cloned as well)

By the way, I did not disable a defragmentation task for my disks. And even did not exclude for Microsoft Defender.

the fragmentation of the plate is 8%, the defragmentation is switched off and the defender is deactivated for this plate. my feeling is that somehow deleting the trash folder causes the node to crash. I think that the result is the problem, the filewalker looks at all the data, before it’s done with it, the node gets the command to delete the data again. Then the error occurs and the game starts over. there is no possibility. telling the node to run the filewalker first and then delete the data.

2 Likes

you have a several options:

  • disable a filewalker (if you do not use this disk for something other it should be pretty safe)
  • increase a timeout accordingly error (for write timeouts increase the writeable timeout, for read timeouts increase the readable timeout and readable check interval)

How can I tell if Filewalker is disabled? I changed that in the config.

if set to true, all pieces disk usage is recalculated on startup

storage2.piece-scan-on-startup: false
is that okay?

I’ve already increased the read and write timeouts, it now stands at 2 min 30 s
but so far has made no difference.

I just noticed that the config of the two nodes are different, i.e. parameters that exist in one do not exist in the other and vice versa. is that normal? are both on the same version 1.76.2

The setting is correct. I hope you also uncommented the line by removing the # sign.
The config file is not updated so often, but you can add any parameter to it. If one node is newer than the other, it must have a more recent version of config file.

1 Like

We need a wiki topic with all the config parameters, with detailed explanations and situations in which they should be used and how.

6 Likes

That would be really great, then you would also understand some parameters better

OK then it could be the problem that some nodes are causing problems. I just copied the config of the working node and moved it to the non-working one with the node-specific (public address+port) settings. and it seems to be working so far, at least the hard disk load has dropped to 15% and there are no errors at the moment. It’s not that there were incorrect settings in the configs of some nodes due to older versions.

1 Like

@Alexey which of the parameters is not advisable to increase more than 5 minutes and fail audits?
Readable or writable parameters?

so you run defrag automatic or not?

The readable timeout and interval. However, these parameters themselves will not start to fail audits, this will be an indication that your system potentially could not provide a piece for audit within 5 minutes, this will put your node into a containment mode and it will be asked for the same piece 2 more times before considering this audit as failed.

So, these parameters are safe from the auditing point of view, but if you have timeouts more than 5 minutes for reads you should be aware that this node may start to fail audits as well and maybe it’s better to crash it before that.

Yes, the automatic defragmentation is enabled for all drives on my Windows PC. I only disabled an indexation for search and excluded them from the Windows Defender, because it makes no sense to scan or index pieces of encrypted data.

so i will wait untill my online score recovers and then run the defrag while the node is “set full”
lets see if this cures the “slow subsystem” may take some weeks.

defragging could increse the writing speed significantly. sadly not so much reading.

What is the correct format for setting those timeouts in docker run command?

Just so you know, whichever parameter is not set in docker run command is picked from the config.yaml file.

Yes, I know, but we, dockers, preffer to set everything in the run command. :blush:

Addendum, after 24 hours the system is running as before version 1.75.2, no more errors, the recycle bin is finally empty again, the hard disk utilization is in the normal range as before with 5-15%, all settings are set to normal again and the system is runing .
So there is something wrong with some nodes with the config.

2 Likes

add -- before the option and place it after the image name or in the command: clause in case of docker-compose.

if you believe that the difference in the config is a root cause (not the load, which perhaps ended for your node), then could you please post a difference?

1 Like