Restart loop because of "Fatal error: piecestore monitor: disk space requirement not met"

WJWH · July 22, 2019, 6:43pm

My storage node was running flawlessly for a month or so, but today some contractors cut the power without announcing and now it won’t come back up properly. The docker daemon is running and trying to run the storagenode and watchtower containers, but the storagenode containers keep restarting.

Inspecting the logs of the storagenode container reveals a lot of errors of the form:

2019-07-22T18:36:49.489Z	INFO	piecestore:orderssender.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	finished
2019-07-22T18:36:49.490Z	ERROR	piecestore:orderssender.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	failed to settle orders	{"error": "order: unable to connect to the satellite: transport error: no address", "errorVerbose": "order: unable to connect to the satellite: transport error: no address\n\tstorj.io/storj/storagenode/orders.(*Sender).settle:161\n\tstorj.io/storj/storagenode/orders.(*Sender).Settle:142\n\tstorj.io/storj/storagenode/orders.(*Sender).runOnce.func1:127\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-07-22T18:36:49.507Z	ERROR	vouchers	Error requesting voucher	{"satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "voucher: unable to connect to the satellite: transport error: no address", "errorVerbose": "voucher: unable to connect to the satellite: transport error: no address\n\tstorj.io/storj/storagenode/vouchers.(*Service).request:132\n\tstorj.io/storj/storagenode/vouchers.(*Service).Request:116\n\tstorj.io/storj/storagenode/vouchers.(*Service).RunOnce.func1:103\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

I checked the router and all port forwarding still works. Does anybody know what causes this?

Alexey · July 22, 2019, 9:11pm

Hello @WJWH,
Welcome to the forum!

Please, copy the output of your dashboard.

WJWH · July 22, 2019, 9:46pm

I don’t think I can get the output of the dashboard, running any sort of command against the container gives output that it’s restarting and to try later:

wander@wander-Z83-II:~$ docker exec -it storagenode sh
Error response from daemon: Container 789e43b8217447d521fb922532b6865ba5bad446c1fb0af97a1ed5f51610d66d is restarting, wait until the container is running
wander@wander-Z83-II:~$ docker exec -it storagenode ./dashboard.sh
Error response from daemon: Container 789e43b8217447d521fb922532b6865ba5bad446c1fb0af97a1ed5f51610d66d is restarting, wait until the container is running

As you can see it fails for both a normal shell and for running the dashboard.sh script directly.

Alexey · July 22, 2019, 9:55pm

I see, please edit your config.yaml file and replace the info log level to debug, save the config and restart the container

docker restart -t 300 storagenode

Please, take a latest logs after that and report back

WJWH · July 23, 2019, 5:54am

That has given some more info, which is only mildly more informative:

2019-07-23T05:45:11.785Z	DEBUG	Fatal error: piecestore monitor: disk space requirement not met
	storj.io/storj/storagenode/monitor.(*Service).Run:118
	storj.io/storj/storagenode.(*Peer).Run.func6:351
	golang.org/x/sync/errgroup.(*Group).Go.func1:57
2019-07-23T05:45:20.137Z	INFO	Configuration loaded from: /app/config/config.yaml
2019-07-23T05:45:20.140Z	DEBUG	debug server listening on 127.0.0.1:46441
2019-07-23T05:45:20.186Z	INFO	Operator email: wjw.hillen@gmail.com
2019-07-23T05:45:20.187Z	INFO	Operator wallet: 0x2F071599ea475dE9887aa27d09dD87b8d897D83B
2019-07-23T05:45:20.194Z	DEBUG	Binary Version: v0.15.3 with CommitHash 6217d9690fd7363d1ae35eefc12135ed6286a89f, built at 2019-07-17 18:42:01 +0000 UTC as Release true
2019-07-23T05:45:20.662Z	DEBUG	allowed minimum version from control server is: v0.15.1
2019-07-23T05:45:20.662Z	INFO	running on version v0.15.3
2019-07-23T05:45:20.667Z	DEBUG	Initialized telemetry batcher with id = "1pNBEBLoVJL4AacMsNz5Kkkfvsq3r8VZvrQdSL9kQLEdG71P3H"
2019-07-23T05:45:20.677Z	INFO	db.migration	Latest Version	{"version": 13}
2019-07-23T05:45:20.678Z	DEBUG	piecestore:orderssender	sending
2019-07-23T05:45:20.680Z	INFO	piecestore:orderssender.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	sending	{"count": 23}
2019-07-23T05:45:20.681Z	INFO	Node 1pNBEBLoVJL4AacMsNz5Kkkfvsq3r8VZvrQdSL9kQLEdG71P3H started
2019-07-23T05:45:20.681Z	INFO	Public server started on [::]:28967
2019-07-23T05:45:20.681Z	INFO	Private server started on 127.0.0.1:7778
2019-07-23T05:45:20.683Z	INFO	vouchers	Checking vouchers
2019-07-23T05:45:20.685Z	INFO	vouchers	Requesting voucher	{"satellite": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2019-07-23T05:45:20.685Z	INFO	vouchers	Requesting voucher	{"satellite": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2019-07-23T05:45:20.685Z	INFO	vouchers	Requesting voucher	{"satellite": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2019-07-23T05:45:20.688Z	INFO	vouchers	Requesting voucher	{"satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2019-07-23T05:45:20.690Z	INFO	piecestore:monitor	Remaining Bandwidth	{"bytes": 1999999947576064}
2019-07-23T05:45:20.690Z	WARN	piecestore:monitor	Disk space is less than requested. Allocating space	{"bytes": 20040899840}
2019-07-23T05:45:20.690Z	ERROR	piecestore:monitor	Total disk space less than required minimum	{"bytes": 500000000000}
2019-07-23T05:45:20.691Z	ERROR	Failed to do periodic version check: Get https://version.alpha.storj.io: context canceled

It says it can’t access enough disk space anymore, which is weird because it seems I can still access it from the CLI itself.

Alexey · July 23, 2019, 7:33am

Please, copy your current docker run command.
Copy the output of the command: df -h

WJWH · July 23, 2019, 7:31pm

I managed to fix it! For future reference: the power outage caused my (USB) hard drive to become unmounted (even though it’s mentioned in fstab) and remounting it immediately restored the node to proper working order. I’m looking into why fstab did not work but for now the issue is solved.

Alexey · July 23, 2019, 8:56pm

Please, update your docker run command: https://documentation.storj.io/setup/storage-node#running-the-storage-node
We replaced the dangerous -v option to the --mount, it has a different syntax, so, please, read carefully.

luca.berte98 · September 4, 2019, 9:36am

I got this error:

Alexey · September 4, 2019, 10:06pm

Please, copy the text of your logs:

docker logs --tail 20 storagenode

luca.berte98 · September 5, 2019, 8:22am

2019-09-05T08:18:37.299Z        INFO    bandwidth       Performing bandwidth usage rollups
2019-09-05T08:18:37.300Z        INFO    Node <MYNODEID> started
2019-09-05T08:18:37.300Z        INFO    Public server started on [::]:28967
2019-09-05T08:18:37.300Z        INFO    Private server started on 127.0.0.1:7778
2019-09-05T08:18:37.314Z        INFO    orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      sending {"count": 1}
2019-09-05T08:18:37.314Z        INFO    orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW       sending {"count": 71}
2019-09-05T08:18:37.314Z        INFO    orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      sending {"count": 2}
2019-09-05T08:18:37.326Z        INFO    piecestore:monitor      Remaining Bandwidth     {"bytes": 39990363710464}
2019-09-05T08:18:37.326Z        WARN    piecestore:monitor      Disk space is less than requested. Allocating space     {"bytes": 182663012096}
2019-09-05T08:18:37.327Z        ERROR   piecestore:monitor      Total disk space less than required minimum     {"bytes": 500000000000}
2019-09-05T08:18:37.328Z        ERROR   piecestore:cacheUpdate  error getting current space used calculation:   {"error": "context canceled"}
2019-09-05T08:18:37.328Z        INFO    orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      finished
2019-09-05T08:18:37.328Z        ERROR   piecestore:cacheUpdate  error persisting cache totals to the database:  {"error": "piece space used error: context canceled", "errorVerbose": "piece space used error: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:121\n\tstorj.io/storj/storagenode/pieces.(*CacheService).PersistCacheTotals:78\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:66\n\tstorj.io/storj/internal/sync2.(*Cycle).Run:87\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:61\n\tstorj.io/storj/storagenode.(*Peer).Run.func7:410\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:37.328Z        ERROR   orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      failed to settle orders {"error": "order: unable to connect to the satellite: context canceled", "errorVerbose": "order: unable to connect to the satellite: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:37.328Z        ERROR   version Failed to do periodic version check: Get https://version.alpha.storj.io: context canceled
2019-09-05T08:18:37.329Z        INFO    orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      finished
2019-09-05T08:18:37.330Z        ERROR   orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      failed to settle orders {"error": "order: unable to connect to the satellite: context canceled", "errorVerbose": "order: unable to connect to the satellite: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:37.330Z        INFO    orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW       finished
2019-09-05T08:18:37.330Z        ERROR   orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW       failed to settle orders {"error": "order: unable to connect to the satellite: context canceled", "errorVerbose": "order: unable to connect to the satellite: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-09-05T08:18:38.369Z        FATAL   Unrecoverable error     {"error": "piecestore monitor: disk space requirement not met", "errorVerbose": "piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:116\n\tstorj.io/storj/storagenode.(*Peer).Run.func6:407\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Alexey · September 5, 2019, 9:59am

is it a new node? If so, please read the requirements: Step 1. Understand Prerequisites - Storj Docs

luca.berte98 · September 5, 2019, 1:45pm

No, it has been working for months.
. . . . . . . . . . . . . . . . .
What if the port was wrong? I have set the running command of docker with a custom port, not the default 28967, and I have opened it on my router and firewall.
However, in the log it says:

May the custom port not work in the latest update?

BrightSilence · September 5, 2019, 5:54pm

Make sure you use the --mount option in your run command and not the -v options. Also make sure you have enough free space on the HDD to actually store the amount of data you assigned to the node.

A custom port is fine, but the log still shows 28967 because that port is still used inside the container. If you use port 12345 your port mapping option should be -p 12345:28967

luca.berte98 · September 5, 2019, 7:07pm

I have all correctly set.

luca.berte98 · September 10, 2019, 7:41am

So, isn’t there any solution? My node is still offline and I can’t contribute to the network!

BrightSilence · September 10, 2019, 8:38am

Well, the node seems to think there is less than 200GB of available space. You can try running this and see why it may think that.

docker exec storagenode df -h

It’s either because there is not enough space or wrong mounts or something off with your file system. Are you doing anything special or just local internal storage?

luca.berte98 · September 10, 2019, 10:30am

Here is the result:

What should I see?

The PC is only dedicated to the Storage Node

BrightSilence · September 10, 2019, 10:46am

Ok, I think I see what might be going wrong, but… don’t get too excited as I don’t know how to fix it.

You only have 177GB of free space and for some reason the node is not able to calculate the currently used space.

ERROR   piecestore:cacheUpdate  error getting current space used calculation:   {"error": "context canceled"}
ERROR   piecestore:cacheUpdate  error persisting cache totals to the database:  {"error": "piece space used error: context canceled", "errorVerbose": "piece space used error: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:121\n\tstorj.io/storj/storagenode/pieces.(*CacheService).PersistCacheTotals:78\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:66\n\tstorj.io/storj/internal/sync2.(*Cycle).Run:87\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:61\n\tstorj.io/storj/storagenode.(*Peer).Run.func7:410\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Normally the node would calculate [free space] + [space used by node] as a total of available space to the node, but since the second element could not be calculated, it’s just using your free space which isn’t enough to meet the 500GB threshold. While I don’t really know why it can’t calculate the used space, I hope this may point someone who knows more in the right direction.
For what it’s worth, this is where the error is thrown in the code.

github.com

storj/storj/blob/022f5d2e14997672038db44c72d35758dbceeaa3/storagenode/pieces/cache.go#L47


// Run recalculates the space used cache once and also runs a loop to sync the space used cache
// to persistent storage on an interval
func (service *CacheService) Run(ctx context.Context) (err error) {
	defer mon.Task()(&ctx)(&err)


	totalAtStart := service.usageCache.copyCacheTotals()


	// recalculate the cache once
	newTotal, newTotalBySatellite, err := service.store.SpaceUsedTotalAndBySatellite(ctx)
	if err != nil {
		service.log.Error("error getting current space used calculation: ", zap.Error(err))
	}
	if err = service.usageCache.Recalculate(ctx, newTotal,
		totalAtStart.totalSpaceUsed,
		newTotalBySatellite,
		totalAtStart.totalSpaceUsedBySatellite,
	); err != nil {
		service.log.Error("error during recalculating space usage cache: ", zap.Error(err))
	}


	if err = service.store.spaceUsedDB.Init(ctx); err != nil {

Alexey · September 11, 2019, 11:27pm

The reason could be if info.db has been corrupted and deleted at some point.
The other way - is a wrong mount of the different folder on the disk to the container for example.
You can easily check it:

sudo find / -iname info.db

if it return multiple instances then it’s a reason. You should mount a right path to the container.