I’ve been running a node for very long time. Yesterday I had a little “incident” as I had to resize docker-volume (“/var/lib/docker”). Just for notice I’ve a separate volume for node storage.
After resizing was done something must have gone wrong as I had to re-deploy every docker container. All docker volumes still were present. When it came to place to restore storj-node at first glance it looked fine, but then I had these weird errors. I don’t know why this happened as I stopped docker before doing volume resize.
The storj-storage volume has 550 GB (of course as it was already running almost full):
550G allocated 522G , avail 27G (used 96%)
2022-11-15T08:42:20.377Z WARN piecestore:monitor Disk space is less than requested. Allocated space is {"Process": "storagenode", "bytes": 28450369536}
2022-11-15T08:42:20.377Z ERROR piecestore:monitor Total disk space is less than required minimum {"Process": "storagenode", "bytes": 500000000000}
2022-11-15T08:42:20.377Z ERROR services unexpected shutdown of a runner
Error: piecestore monitor: disk space requirement not met
2022-11-15 08:42:20,510 INFO exited: storagenode (exit status 1; not expected)
and further on
Error: Error starting master database on storagenode: database: info opening file "config/storage/info.db" failed: context canceled
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:324
storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:306
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281
storj.io/storj/storagenode/storagenodedb.OpenExisting:248
main.cmdRun:193
storj.io/private/process.cleanup.func1.4:377
storj.io/private/process.cleanup.func1:395
github.com/spf13/cobra.(*Command).execute:852
github.com/spf13/cobra.(*Command).ExecuteC:960
github.com/spf13/cobra.(*Command).Execute:897
storj.io/private/process.ExecWithCustomConfigAndLogger:92
main.main:478
runtime.main:255
2022-11-15 08:42:21,540 INFO stopped: storagenode (exit status 1)
2022-11-15 08:42:21,541 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
I didn’t change anything on the storj-storage volume I’m binding to the storj-container! It still has the same size and I’m very confused why this happens? I already searched forum and storj-docs for that and tried backing up all db-files and trying node restart to recover them as new db’s. But the error happens again.
Please try to stop and remove the storagenode container and run it back using all your parameters.
After that make sure that databases are available for the user who run the storagenode container.
Maybe you run your node before with sudo and now added option --user $(id -u):$(id -g)?
Hello @Alexey,
I already tried that so many times now without success :-/
It feels like the only chance is to totally prune storage, but of course this really feels bad to purge all stored files and risk to get punished for that by storj-network.
2022-11-15T12:46:05.450Z WARN piecestore:monitor Disk space is less than requested. Allocated space is {"Process": "storagenode", "bytes": 28450234368}
2022-11-15T12:46:05.450Z ERROR piecestore:monitor Total disk space is less than required minimum {"Process": "storagenode", "bytes": 500000000000}
2022-11-15T12:46:05.450Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: disk space requirement not met", "errorVerbose": "piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:125\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2022-11-15T12:46:05.450Z ERROR nodestats:cache Get pricing-model/join date failed {"Process": "storagenode", "error": "context canceled"}
I’m absolutly confused by this:
piecestore monitor: disk space requirement not met
I’m shure I somewhere noticed even for total node shutdown one should reduce storage to help the network to relocate the data. But this indicates it’s not possible obvisously.
But never the less… I don’t want to quit, but if the node doesn’t work anymore and storj may reduce my reputation I think about such a hard quit :-/
Please never run setup against a worked node - you may destroy it, if you would provide a wrong path for identity or data. This command should be run only once for entire node’s life.
What am I asking is not related to config.yaml. Did you have option --user $(id -u):$(id -g) in your normal full docker run command before?
If you do not remember, then please show the result of the command (replace the path /mnt/storj/storagenode/ to yours):
ls -l /mnt/storj/storagenode/storage
is related either to corrupted databases or a wrong permissions.
I found the culprit for this… for some reason in config.yaml (may be related trying to fix that by recreating this config due to moving the old original one as config.yaml~bak) there was configured 1 TB… this explains a lot.
But still the node has fails:
2022-11-15T13:01:23.462Z ERROR piecestore:cache error getting current used space: {"Process": "storagenode", "error": "context canceled; context canceled; context canceled; context canceled; context canceled; context canceled; context canceled", "errorVerbose": "group:\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled"}
Error: piecestore monitor: disk space requirement not met
2022-11-15 13:01:23,541 INFO exited: storagenode (exit status 1; not expected)
There’s something totally screwed …
And by the way… I know these sqlite-dbs are totally sensible for crashes and so on. Why the hell is there no node backup for such files inside storj-network f.e. by node-identifier key being able to do something like auto-recovery on such fails?
Funny … a storage/backup solution that doesn’t backup its own sensible databases …
Did you have option --user $(id -u):$(id -g)
Regarding this… I’m using docker-compose and on the virtual machine docker is running as root… same for storage permissions actually.
So I don’t think this can be the issue … I checked file-permissions in storage as well just to be shure.
May this cause any problems? My initial deployment was without docker-compose, but as I had some repeating issues with docker I decided to use more reliable deployments using docker-compose instead of manual deployments causing containers that are gone without having the actual configuration (which now is reflected in my docker-compose.yaml to do anytime deployment with the actual configuration).
doesn’t matter. The variables from the docker run command have a precedence above parameters in the config.yaml file.
Again, there is nothing related to your config.yaml file. This is either broken database or a wrong permissions, no other options.
You may register for free and do so
then please remove --user $(id -u):$(id -g)
The correct bind option for storagenode in the docker-compose.yaml file:
ERROR piecestore:monitor Total disk space is less than required minimum {“Process”: “storagenode”, “bytes”: 500000000000}
Additional info for you:
df -h /mnt/storj-storage/
Filesystem Size Used Avail Use% Mounted on
- 570G 522G 47G 92% /mnt/storj-storage
I also tried to lv-resize the volume to additional 10GB to make shure it has some free space (which it had before, but not free 500 GB). It’s clear that an already running can not have free space as max-storage is configured (=500000000000 bytes = 500 GB).
So what option do I have? Prune the storage and start from scratch? Will storj-network do recovery on my node or will I get bad reputation for that?
When the filewalker re-calculate used space, it should update the database. After that you may remove this parameter.
However, if you provided a wrong path to the identity, when you run setup command, it could explain, why your node cannot recognize the used space - it belongs to another identity.
So I think that should be fine even though it looks a little bit different to your setup shown. To be shure I also tried your configuration directly setting up the volume in service.node.volume, but it results in the same issue like before:
Something is screwed with piecestore monitor …
2022-11-15T13:45:53.761Z ERROR collector error during collecting pieces: {"Process": "storagenode", "error": "pieceexpirationdb: context canceled", "errorVerbose": "pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:39\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:532\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:88\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
Error: piecestore monitor: disk space requirement not met
Please update your binding to a correct one, right now you uses dangerous -v equivalent (docker will create an empty volume, if disk would not be mounted for any reason).
2022-11-15T13:48:35.430Z WARN piecestore:monitor Disk space is less than requested. Allocated space is {"Process": "storagenode", "bytes": 49924898816}
2022-11-15T13:48:35.430Z INFO trust Scheduling next refresh {"Process": "storagenode", "after": "5h57m21.523840512s"}
2022-11-15T13:48:35.431Z INFO bandwidth Performing bandwidth usage rollups {"Process": "storagenode"}
2022-11-15T13:48:44.716Z INFO orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S sending {"Process": "storagenode", "count": 5}
....
Let’s see what happens. What exacty does this command-argument do? Will it somehow skip something in monitoring?
it lowers required minimum free space on the disk.
As result the node will start (even with foreign data) and filewalker starts to calculate used space and should update the database.
Ok. Great… looks like it’s starting now… and even dashboard is showing up now. How long should I wait before removing the additional argument? Will the node do some garbage or cleanup or something in background?
Hmm… oh man… great. Yesterday it looked fine. Sorry was unable to report it due to limitations here for new user replies.
But today “Your node has been disqualified”
Great how STORJ is threatening operators giving their best getting disqualified for nothing, but problems that are caused by internal problems with STORJ node sqlite databases and something I would describe as “unclean garbage collection”.
As mentioned in thread named “[MAIN NODE] Your node has been disqualified”, this is permanent. If that is true I will totally quit STORJ now with my node. I’m really pissed of by that (not against you personally, but STORJ).
It’s totally okay to do such things if an operator is doing something wrong, but an operator running a node for month without interruption, this is like getting a kick in the balls for nothing.
I’m keeping the node online for now waiting for your reply if we can do something to get it un-disqualified. But if that’s the last state, I will shutdown immediately not wasting anymore time on that.