My storage node was running flawlessly for a month or so, but today some contractors cut the power without announcing and now it won’t come back up properly. The docker daemon is running and trying to run the storagenode and watchtower containers, but the storagenode containers keep restarting.
Inspecting the logs of the storagenode container reveals a lot of errors of the form:
2019-07-22T18:36:49.489Z INFO piecestore:orderssender.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW finished
2019-07-22T18:36:49.490Z ERROR piecestore:orderssender.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW failed to settle orders {"error": "order: unable to connect to the satellite: transport error: no address", "errorVerbose": "order: unable to connect to the satellite: transport error: no address\n\tstorj.io/storj/storagenode/orders.(*Sender).settle:161\n\tstorj.io/storj/storagenode/orders.(*Sender).Settle:142\n\tstorj.io/storj/storagenode/orders.(*Sender).runOnce.func1:127\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-07-22T18:36:49.507Z ERROR vouchers Error requesting voucher {"satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "voucher: unable to connect to the satellite: transport error: no address", "errorVerbose": "voucher: unable to connect to the satellite: transport error: no address\n\tstorj.io/storj/storagenode/vouchers.(*Service).request:132\n\tstorj.io/storj/storagenode/vouchers.(*Service).Request:116\n\tstorj.io/storj/storagenode/vouchers.(*Service).RunOnce.func1:103\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
I checked the router and all port forwarding still works. Does anybody know what causes this?
I don’t think I can get the output of the dashboard, running any sort of command against the container gives output that it’s restarting and to try later:
wander@wander-Z83-II:~$ docker exec -it storagenode sh
Error response from daemon: Container 789e43b8217447d521fb922532b6865ba5bad446c1fb0af97a1ed5f51610d66d is restarting, wait until the container is running
wander@wander-Z83-II:~$ docker exec -it storagenode ./dashboard.sh
Error response from daemon: Container 789e43b8217447d521fb922532b6865ba5bad446c1fb0af97a1ed5f51610d66d is restarting, wait until the container is running
As you can see it fails for both a normal shell and for running the dashboard.sh script directly.
That has given some more info, which is only mildly more informative:
2019-07-23T05:45:11.785Z DEBUG Fatal error: piecestore monitor: disk space requirement not met
storj.io/storj/storagenode/monitor.(*Service).Run:118
storj.io/storj/storagenode.(*Peer).Run.func6:351
golang.org/x/sync/errgroup.(*Group).Go.func1:57
2019-07-23T05:45:20.137Z INFO Configuration loaded from: /app/config/config.yaml
2019-07-23T05:45:20.140Z DEBUG debug server listening on 127.0.0.1:46441
2019-07-23T05:45:20.186Z INFO Operator email: wjw.hillen@gmail.com
2019-07-23T05:45:20.187Z INFO Operator wallet: 0x2F071599ea475dE9887aa27d09dD87b8d897D83B
2019-07-23T05:45:20.194Z DEBUG Binary Version: v0.15.3 with CommitHash 6217d9690fd7363d1ae35eefc12135ed6286a89f, built at 2019-07-17 18:42:01 +0000 UTC as Release true
2019-07-23T05:45:20.662Z DEBUG allowed minimum version from control server is: v0.15.1
2019-07-23T05:45:20.662Z INFO running on version v0.15.3
2019-07-23T05:45:20.667Z DEBUG Initialized telemetry batcher with id = "1pNBEBLoVJL4AacMsNz5Kkkfvsq3r8VZvrQdSL9kQLEdG71P3H"
2019-07-23T05:45:20.677Z INFO db.migration Latest Version {"version": 13}
2019-07-23T05:45:20.678Z DEBUG piecestore:orderssender sending
2019-07-23T05:45:20.680Z INFO piecestore:orderssender.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW sending {"count": 23}
2019-07-23T05:45:20.681Z INFO Node 1pNBEBLoVJL4AacMsNz5Kkkfvsq3r8VZvrQdSL9kQLEdG71P3H started
2019-07-23T05:45:20.681Z INFO Public server started on [::]:28967
2019-07-23T05:45:20.681Z INFO Private server started on 127.0.0.1:7778
2019-07-23T05:45:20.683Z INFO vouchers Checking vouchers
2019-07-23T05:45:20.685Z INFO vouchers Requesting voucher {"satellite": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2019-07-23T05:45:20.685Z INFO vouchers Requesting voucher {"satellite": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2019-07-23T05:45:20.685Z INFO vouchers Requesting voucher {"satellite": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2019-07-23T05:45:20.688Z INFO vouchers Requesting voucher {"satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2019-07-23T05:45:20.690Z INFO piecestore:monitor Remaining Bandwidth {"bytes": 1999999947576064}
2019-07-23T05:45:20.690Z WARN piecestore:monitor Disk space is less than requested. Allocating space {"bytes": 20040899840}
2019-07-23T05:45:20.690Z ERROR piecestore:monitor Total disk space less than required minimum {"bytes": 500000000000}
2019-07-23T05:45:20.691Z ERROR Failed to do periodic version check: Get https://version.alpha.storj.io: context canceled
It says it can’t access enough disk space anymore, which is weird because it seems I can still access it from the CLI itself.
I managed to fix it! For future reference: the power outage caused my (USB) hard drive to become unmounted (even though it’s mentioned in fstab) and remounting it immediately restored the node to proper working order. I’m looking into why fstab did not work but for now the issue is solved.
2019-09-05T08:18:37.299Z INFO bandwidth Performing bandwidth usage rollups
2019-09-05T08:18:37.300Z INFO Node <MYNODEID> started
2019-09-05T08:18:37.300Z INFO Public server started on [::]:28967
2019-09-05T08:18:37.300Z INFO Private server started on 127.0.0.1:7778
2019-09-05T08:18:37.314Z INFO orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S sending {"count": 1}
2019-09-05T08:18:37.314Z INFO orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW sending {"count": 71}
2019-09-05T08:18:37.314Z INFO orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 sending {"count": 2}
2019-09-05T08:18:37.326Z INFO piecestore:monitor Remaining Bandwidth {"bytes": 39990363710464}
2019-09-05T08:18:37.326Z WARN piecestore:monitor Disk space is less than requested. Allocating space {"bytes": 182663012096}
2019-09-05T08:18:37.327Z ERROR piecestore:monitor Total disk space less than required minimum {"bytes": 500000000000}
2019-09-05T08:18:37.328Z ERROR piecestore:cacheUpdate error getting current space used calculation: {"error": "context canceled"}
2019-09-05T08:18:37.328Z INFO orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S finished
2019-09-05T08:18:37.328Z ERROR piecestore:cacheUpdate error persisting cache totals to the database: {"error": "piece space used error: context canceled", "errorVerbose": "piece space used error: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:121\n\tstorj.io/storj/storagenode/pieces.(*CacheService).PersistCacheTotals:78\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:66\n\tstorj.io/storj/internal/sync2.(*Cycle).Run:87\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:61\n\tstorj.io/storj/storagenode.(*Peer).Run.func7:410\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:37.328Z ERROR orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S failed to settle orders {"error": "order: unable to connect to the satellite: context canceled", "errorVerbose": "order: unable to connect to the satellite: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:37.328Z ERROR version Failed to do periodic version check: Get https://version.alpha.storj.io: context canceled
2019-09-05T08:18:37.329Z INFO orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 finished
2019-09-05T08:18:37.330Z ERROR orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 failed to settle orders {"error": "order: unable to connect to the satellite: context canceled", "errorVerbose": "order: unable to connect to the satellite: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:37.330Z INFO orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW finished
2019-09-05T08:18:37.330Z ERROR orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW failed to settle orders {"error": "order: unable to connect to the satellite: context canceled", "errorVerbose": "order: unable to connect to the satellite: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:38.369Z FATAL Unrecoverable error {"error": "piecestore monitor: disk space requirement not met", "errorVerbose": "piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:116\n\tstorj.io/storj/storagenode.(*Peer).Run.func6:407\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
No, it has been working for months.
. . . . . . . . . . . . . . . . .
What if the port was wrong? I have set the running command of docker with a custom port, not the default 28967, and I have opened it on my router and firewall.
However, in the log it says:
May the custom port not work in the latest update?
Make sure you use the --mount option in your run command and not the -v options. Also make sure you have enough free space on the HDD to actually store the amount of data you assigned to the node.
A custom port is fine, but the log still shows 28967 because that port is still used inside the container. If you use port 12345 your port mapping option should be -p 12345:28967
Well, the node seems to think there is less than 200GB of available space. You can try running this and see why it may think that.
docker exec storagenode df -h
It’s either because there is not enough space or wrong mounts or something off with your file system. Are you doing anything special or just local internal storage?
Ok, I think I see what might be going wrong, but… don’t get too excited as I don’t know how to fix it.
You only have 177GB of free space and for some reason the node is not able to calculate the currently used space.
ERROR piecestore:cacheUpdate error getting current space used calculation: {"error": "context canceled"}
ERROR piecestore:cacheUpdate error persisting cache totals to the database: {"error": "piece space used error: context canceled", "errorVerbose": "piece space used error: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:121\n\tstorj.io/storj/storagenode/pieces.(*CacheService).PersistCacheTotals:78\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:66\n\tstorj.io/storj/internal/sync2.(*Cycle).Run:87\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:61\n\tstorj.io/storj/storagenode.(*Peer).Run.func7:410\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Normally the node would calculate [free space] + [space used by node] as a total of available space to the node, but since the second element could not be calculated, it’s just using your free space which isn’t enough to meet the 500GB threshold. While I don’t really know why it can’t calculate the used space, I hope this may point someone who knows more in the right direction.
For what it’s worth, this is where the error is thrown in the code.
The reason could be if info.db has been corrupted and deleted at some point.
The other way - is a wrong mount of the different folder on the disk to the container for example.
You can easily check it:
sudo find / -iname info.db
if it return multiple instances then it’s a reason. You should mount a right path to the container.