This error spawned this morning on my node, it looks like there is an issue with the hashstore. Does anyone know if there is a workaround to solve the issue ? (current
version: v1.120.4)
2025-01-18 12:28:22,595 INFO spawned: ‘storagenode’ with pid 78
2025-01-18T12:28:22Z ERROR failure during run {“Process”: “storagenode”, “error”: “Failed to create storage node peer: hashstore: unable to flock: hashstore: bad file descriptor\n\tstorj.io/storj/storagenode/hashstore.NewStore:105\n\tstorj.io/storj/storagenode/hashstore.New:85\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:271\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:105\n\tstorj.io/storj/storagenode.New:618\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272”, “errorVerbose”: “Failed to create storage node peer: hashstore: unable to flock: hashstore: bad file descriptor\n\tstorj.io/storj/storagenode/hashstore.NewStore:105\n\tstorj.io/storj/storagenode/hashstore.New:85\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:271\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:105\n\tstorj.io/storj/storagenode.New:618\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272\n\tmain.cmdRun:86\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272”}
Error: Failed to create storage node peer: hashstore: unable to flock: hashstore: bad file descriptor storj.io/storj/storagenode/hashstore.NewStore:105 storj.io/storj/storagenode/hashstore.New:85 storj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:271 storj.io/storj/storagenode/piecestore.NewHashStoreBackend:105 storj.io/storj/storagenode.New:618
main.cmdRun:84
main.newRunCmd.func1:33 storj.io/common/process.cleanup.func1.4:392 storj.io/common/process.cleanup.func1:410 github.com/spf13/cobra.(*Command).execute:983 github.com/spf13/cobra.(*Command).ExecuteC:1115 github.com/spf13/cobra.(*Command).Execute:1039 storj.io/common/process.ExecWithCustomOptions:112
main.main:34
runtime.main:272
Thank you for alerting us! I’ve passed this on to the team and it will be investigated. @EasyRhino thank you for confirming you have seen the same error.
A team member will provide more information after the investigation.
This is perplexing. The most common reason for that error is that the file descriptor isn’t actually open, but looking at the code it seems that’s entirely impossible (it is opened immediately beforehand and the error is properly checked). flock() also doesn’t care if the descriptor is open for read or write, or if the file descriptor refers to a directory (on any of the platforms we support, afaik), so it’s not that.
Most importantly: what platform are you running on? The code is a little different for windows and the semantics can vary slightly between macOS/BSD and Linux, so that might be relevant.
Also: Can you check that those files (I think they should be at storage/hashstore/*/s[01]/meta/lock) and make sure they look like normal files (i.e. not sockets or device nodes or symlinks or fifos, etc)?
3 are linux/x86 and 1 is linux/ARM (all docker)
one x86 and one ARM are using badger cache, the other two aren’t
Every one has DB’s on a SSD
my remaining working nodes are all running version 1.119.15. the dead ones are running 1.20.4
Aha! That makes sense then. NFS is, as Roberto said, not supported or recommended, and the network drive latency could really affect your performance.
If you are determined to use NFS despite these warnings, well, flock() type locks do in fact work over NFS if it’s configured properly. The internet should be able to tell you how to diagnose and make sure all the necessary services are running and able to address each other.
Well, thanks for the response. my initial google attempts for how to configure NFS for file locking were unsuccessful.
I think my first question is, is it possible to relocate the hashstore directory to a local drive (similar to moving the databases to SSD)? config.yaml or some other option?
Thank you Alexey and thepaul. Here’s where I’m at.
I’m not smart enough to get storj happy with locking the files on my NFS server. Maybe it’s a client problem, maybe it’s a problem with my truenas setting, maybe there’s no normal NFS locking option that works with the flock that storj is attempting. I dunno.
However, I was able to get the nodes running… well enough? my defining a local drive as volume in my docker compose. Either of these syntaxes worked:
And the nodes seem to be up and running and uploading and download okay. I mean, i guess, the log isn’t full of errors or anything.
HOWEVER, what’s interesting is the resulting setup:
the local SSH ‘home folder’ mapping drive only has a single “meta” folder and a basic .migrate file and nothing else in it.
the mounted NFS storage drive that has everything else also has a hasstore folder and in here are some .bloomfilter files that seem to have actual data in them.
So in other words the only thing that’s locally mounted seems to be the lock file and the actual hashstore stuff is still on the forbidden NFS drive.
This may mean that storj trying to flock the meta file is unnecessary in the first place.
My node randomly started having this issue earlier today after running for nearly 6 months. I run storagenode natively on Linux and have my storage on the forbidden nfs mount over a dedicated 10 Gig fiber connection. I relocated the hashtable to a local drive and symlinked it, now all is well
It’s not forbidden, it’s not supported
So, you need to do a research, how to allow the NFS client to make a support of flock.
By the way, I tried hashstore with a CIFS mount in storj-up and it’s working so far. Of course, not so much data there, it’s not prod, but I was unable to got this issue so far.
For SMB I was forced to use nobrl option to make SQLite happy, this seems also allowed me to avoid the flock issue.