Got a node that crashes upon not being able to read a file. I have some data loss/corruption (about 0.5% of the total drive size). Why does it crash?
2024-07-07T17:11:39+02:00 FATAL process/exec_conf.go:429 Unrecoverable error {"Process": "storagenode", "error": "filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/ac/utdoz6lux7uhsvphpklt45ylywopllhjjoymlf2bnx7cyzcj4q.sj1; error=lstat /Storj/data/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/ac/utdoz6lux7uhsvphpklt45ylywopllhjjoymlf2bnx7cyzcj4q.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity; filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/ak/p45faihdd2aizdh7yzhlqxjoo6vor2tpzoiqdwvfekluwuwx4q.sj1; error=lstat /Storj/data/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/ak/p45faihdd2aizdh7yzhlqxjoo6vor2tpzoiqdwvfekluwuwx4q.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity; filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/ab/wxrvb3zelabaaefc7a6wbz5zfglp5uoj2j3jsehtcpefetkcdq.sj1; error=lstat /Storj/data/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/ab/wxrvb3zelabaaefc7a6wbz5zfglp5uoj2j3jsehtcpefetkcdq.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity; filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aa/ak27ty7kny3fi4cyyxku6ahlijoma6okrjtvcxc6evausjh44a.sj1; error=lstat /Storj/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aa/ak27ty7kny3fi4cyyxku6ahlijoma6okrjtvcxc6evausjh44a.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity", "errorVerbose": "group:\n--- filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/ac/utdoz6lux7uhsvphpklt45ylywopllhjjoymlf2bnx7cyzcj4q.sj1; error=lstat /Storj/data/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/ac/utdoz6lux7uhsvphpklt45ylywopllhjjoymlf2bnx7cyzcj4q.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/ak/p45faihdd2aizdh7yzhlqxjoo6vor2tpzoiqdwvfekluwuwx4q.sj1; error=lstat /Storj/data/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/ak/p45faihdd2aizdh7yzhlqxjoo6vor2tpzoiqdwvfekluwuwx4q.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/ab/wxrvb3zelabaaefc7a6wbz5zfglp5uoj2j3jsehtcpefetkcdq.sj1; error=lstat /Storj/data/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/ab/wxrvb3zelabaaefc7a6wbz5zfglp5uoj2j3jsehtcpefetkcdq.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: unrecoverable error accessing data on the storage file system (path=/Storj/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aa/ak27ty7kny3fi4cyyxku6ahlijoma6okrjtvcxc6evausjh44a.sj1; error=lstat /Storj/data/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/aa/ak27ty7kny3fi4cyyxku6ahlijoma6okrjtvcxc6evausjh44a.sj1: errno 97). This is most likely due to disk bad sectors or a corrupted file system. Check your disk for bad sectors and integrity\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
storj.io/common/process.cleanup.func1
/go/pkg/mod/storj.io/common@v0.0.0-20240604134154-517cce55bb8c/process/exec_conf.go:429
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039
storj.io/common/process.ExecWithCustomOptions
/go/pkg/mod/storj.io/common@v0.0.0-20240604134154-517cce55bb8c/process/exec_conf.go:112
main.main
/go/src/storj.io/storj/cmd/storagenode/main.go:34
runtime.main
/usr/local/go/src/runtime/proc.go:271
try renaming that file, if thereâs a bad block under it - itâll stay f#d, but storj will stop fatal erroring at that point every re-start (time-delayed looping); and just skip it. Note thatâs only the âacâ of that satâs prefix, thereâs likely much more fun ahead if a scrub doesnât work.
The node should do something with it, not me, thereâs almost a percent of data missing, nobody in their right mind is going to be able to remove these filesâŚ
I seriously hope the node isnât designed to throw away 100% of the data when it encounters expected corruption.
Thereâs no such thing as âexpected corruptionâ. The file can have invalid contents (that fail an audit): thatâs OK. The file could not exist: also OK. The file shouldnât throw OS-level errors when you try to open it (which is what is happening now) - filesystem errors are something a scrub should have dealt with.
We expect node to act selfish to some degree. If there is a type of problem which the node cannot interpret on its own, itâs favorable to go down in case this problem would result in audit failures. Bad audits can disqualify a node in an hour or two, if youâre unlucky. If a node goes down, it can wait for days until you have time and means to diagnoseâplus itâs easier to notice a node being down than a node showing errors in its logs from time to time.
(It would be better if known problems werenât logged as errors and SNOs just having a reliable way of getting notified about them, but this is a topic for some other threads that already exist on the forum)
This whole system is build upon the presumption that storage drives are neither infallible nor error-proof. Corruption is expected. I know thereâs a protection against the storage location becoming unwritable, but I didnât know it would refuse to work also when a single file is unserviceable. The last scrub is still running, itâs useless though, nothing will be accomplished by it except by accounting for all the errors.
Yes, the way it deals with it is by doing this - making the files unreadable. It is how it works, ZFS is very bad at dealing with corrupted data, I had all kinds of errors before it would start working.
This is understandable, I thought it would do so only for directory being unwritable, I remember when that change was made. However, saving my node from disqualification in this fashion will get me disqualified. Is there a switch for turning crashing the node for files being unreadable off?
We are trying to protect the node from a disqualification. If you are aware of the issue, you can fix it or apply a workaround (like renaming). The node software is not designed to work with a hardware issues, itâs true. And I believe that we shouldnât reinvent what is OS/or specialized recovery software doing on a low hardware level.
It may result in disqualification if the node would be offline for more than 30 days, yes. However, itâs much longer than a few hours of answering on audits but do not providing requested pieces and being disqualified.
There is no switch to disable the safety check this kind. Many SNO are asked to implement this safety check, so we did.
Now itâs your turn to fix the issue as you can.
Node should ignore inaccessible data, consider it missing, donât lose 10TB of data because you cannot access 4096 bytes, this is idiocy. Hopefully somebody changes this behaviour or adds a switch for it.
Sorry to barge in to the conversation, but hopefully storj never implements a switch to compensate for broken nodes. There is the (slight) possibility that all nodes storing pieces have that switch turned on = client lost his/her/its file.
On a side note, how did ZFS manage to corrupt data? Sounds like an underlying hardware issue to me, and the node did what itâs supposed to do: get off the network so it canât mess everything up (=clientâs POV).
No, you do not understand the issue; node shuts down all the pieces if one piece isnât accessible = all clients lose pieces on this node. Node should continue running and let the filewalker find the pieces that arenât accessible and delete them from the database = 0.5% - 1% data lost in my case, in reality, 100% of the node data was lost.
Actually I understand perfectly. The node shutdown = went offline = the network will know in 5 hours that the node is unavailable, and will trigger repair of the pieces it stores (by using other nodes that have pieces).
The issue was that instead of leaving the node offline while you were repairing the corrupted filesystem (again, how did ZFS manage to corrupt data?) you rushed to bring it online, further adding to the damage.
The node could have stayed offline for 10 days while you repaired the 20TB drive (assuming) and still make it back on time before being disqualified.
Actually you donât, because the node WAS offline for 11-12 days while I worked on the issues, the network was never the issue, the network will survive without those pieces, the issue is that good redundant data gets thrown out of the network and also that damage is done to the SNO.
ZFS didnât corrupt the data, data was corrupted, however once data gets corrupted in ZFS, ZFS doesnât deal well with that, ZFS still isnât designed good enough to deal with corruption because nearly all the people never get to that point with ZFS.
Also, node should never further add any damages to data.
You couldnât fix the node in 12 days? I find that hard to believe without a hardware issue.
How was the data corrupted? Was the disk failing? Was your RAM bad? Was the controller messing up?
If ZFS got to the point where corrupted data made it onto the disk (FYI: ZFS performs a check on every read and if it finds bad data, immediately tries to write back the correct data, a scrub isnât needed for this), then the corruption came from somewhere outside of the softwareâs control (=hardware issue as I have said in my first reply). The node got a read error from ZFS (again, this cannot happen in normal working conditions even with corrupted data, see previous) which the node interpreted as âthe sky is fallingâ hence it shut down.