Help with constantly restarting storagenode container?

zcopley · March 1, 2022, 2:10am

Hi, after updating my node it’s constantly bombing out and restarting. Here are the logs:

2022-03-01T01:53:36.241Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-03-01T01:53:36.249Z        INFO    Operator email  {"Address": "zach@mydomain.com"}
2022-03-01T01:53:36.249Z        INFO    Operator wallet {"Address": "[REDACTED]"}
2022-03-01T01:53:36.494Z        INFO    Telemetry enabled
2022-03-01T01:53:36.596Z        INFO    db.migration    Database Version        {"version": 39}
Error: Error during preflight check for storagenode databases: storage node preflight database error: used_serial: expected schema does not match actual:   &dbschema.Schema{
-       Tables: []*dbschema.Table{
-               s"Name: used_serial_\nColumns:\n\tName: expiration\n\tType: TIMESTAMP\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\n\tName: satellite_id\n\tType: BLOB\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\n\tName: serial_number\n\tType: BLOB\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\nPrimaryKey: \nUniques:\n\t",
-       },
+       Tables: nil,
-       Indexes: []*dbschema.Index{
-               s`Index<Table: used_serial_, Name: idx_used_serial_, Columns: expiration, Unique: false, Partial: "">`,
-               s`Index<Table: used_serial_, Name: pk_used_serial_, Columns: satellite_id serial_number, Unique: false, Partial: "">`,
-       },
+       Indexes: nil,
  }

        storj.io/storj/storagenode/storagenodedb.(*DB).Preflight:323
        main.cmdRun:190
        storj.io/private/process.cleanup.func1.4:359
        storj.io/private/process.cleanup.func1:377
        github.com/spf13/cobra.(*Command).execute:840
        github.com/spf13/cobra.(*Command).ExecuteC:945
        github.com/spf13/cobra.(*Command).Execute:885
        storj.io/private/process.ExecWithCustomConfig:88
        storj.io/private/process.ExecCustomDebug:70
        main.main:320
        runtime.main:203
2022-03-01T01:54:38.832Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-03-01T01:54:38.838Z        INFO    Operator email  {"Address": "zach@mydomain.com"}
2022-03-01T01:54:38.839Z        INFO    Operator wallet {"Address": "[REDACTED]"}
2022-03-01T01:54:39.028Z        INFO    Telemetry enabled
2022-03-01T01:54:39.118Z        INFO    db.migration    Database Version        {"version": 39}
Error: Error during preflight check for storagenode databases: storage node preflight database error: used_serial: expected schema does not match actual:   &dbschema.Schema{
-       Tables: []*dbschema.Table{
-               s"Name: used_serial_\nColumns:\n\tName: expiration\n\tType: TIMESTAMP\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\n\tName: satellite_id\n\tType: BLOB\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\n\tName: serial_number\n\tType: BLOB\n\tNullable: false\n\tDefault: \"\"\n\tReference: nil\nPrimaryKey: \nUniques:\n\t",
-       },
+       Tables: nil,
-       Indexes: []*dbschema.Index{
-               s`Index<Table: used_serial_, Name: idx_used_serial_, Columns: expiration, Unique: false, Partial: "">`,
-               s`Index<Table: used_serial_, Name: pk_used_serial_, Columns: satellite_id serial_number, Unique: false, Partial: "">`,
-       },
+       Indexes: nil,
  }

        storj.io/storj/storagenode/storagenodedb.(*DB).Preflight:323
        main.cmdRun:190
        storj.io/private/process.cleanup.func1.4:359
        storj.io/private/process.cleanup.func1:377
        github.com/spf13/cobra.(*Command).execute:840
        github.com/spf13/cobra.(*Command).ExecuteC:945
        github.com/spf13/cobra.(*Command).Execute:885
        storj.io/private/process.ExecWithCustomConfig:88
        storj.io/private/process.ExecCustomDebug:70
        main.main:320
        runtime.main:203

I saw some other people who had similar problems with their serials_used.db, but I can’t find that file. Is it in the running docker container? If so, I don’t know how to connect to the container interactively to delete it. It doesn’t stay up long enough for me to connect.

Anyone have any ideas how to fix?

Thanks,
Zach

zcopley · March 1, 2022, 6:41pm

What’s option B? Recreating my node from scratch with a new ID?

Kinda would be sad because this node has been online continuously since Summer 2019.

jensamberg · March 1, 2022, 8:06pm

Had similar issue. Do you use docker ? Check if you have write acces on the drive

zcopley · March 1, 2022, 9:08pm

Yes, Docker running on a raspberry pi.

The trouble is that the container won’t stay running long enough for me to check anything… for example:

docker exec -it storagenode /bin/bash

Doesn’t work because the container won’t stay up long enough.

Maybe is there a way to keep it from restarting?

jensamberg · March 2, 2022, 5:27am

Stop the container. I mean check the drive from the pi. Not in the container

Alexey · March 4, 2022, 9:01am

You need to recreate these databases following this guide: https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error

zcopley · March 5, 2022, 1:17am

My serials_used.db got accidentally deleted. I tried to recreate it, but then the node starts complaining about satellites.db.

2022-03-05T01:04:03.113Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-03-05T01:04:03.119Z        INFO    Operator email  {"Address": "zach@mydomain.com"}
2022-03-05T01:04:03.119Z        INFO    Operator wallet {"Address": "[REDACTED]"}
2022-03-05T01:04:03.330Z        INFO    Telemetry enabled
2022-03-05T01:04:03.420Z        INFO    db.migration    Database Version        {"version": 39}
2022-03-05T01:04:04.199Z        INFO    Got a signal from the OS: "terminated"
Error: Error during preflight check for storagenode databases: storage node preflight database error: satellites: expected schema does not match actual:   &dbschema.Schema{
        Tables: []*dbschema.Table{
                &{
                        Name: "satellite_exit_progress",
                        Columns: []*dbschema.Column{
                                ... // 2 identical elements
                                &{Name: "finished_at", Type: "TIMESTAMP", IsNullable: true},
                                &{Name: "initiated_at", Type: "TIMESTAMP", IsNullable: true},
                                &{
                                        ... // 2 identical fields
                                        IsNullable: false,
                                        Default:    "",
-                                       Reference:  nil,
+                                       Reference:  s"Reference<Table: satellites, Column: node_id, OnDelete: , OnUpdate: >",
                                },
                                &{Name: "starting_disk_usage", Type: "INTEGER"},
                        },
-                       PrimaryKey: []string{"satellite_id"},
+                       PrimaryKey: nil,
                        Unique:     nil,
                },
                &{
                        Name: "satellites",
                        Columns: []*dbschema.Column{
                                &{Name: "added_at", Type: "TIMESTAMP"},
+                               s"Name: address\nType: TEXT\nNullable: true\nDefault: \"\"\nReference: nil",
                                &{Name: "node_id", Type: "BLOB"},
                                &{Name: "status", Type: "INTEGER"},
                        },
                        PrimaryKey: []string{"node_id"},
                        Unique:     nil,
                },
        },
        Indexes: nil,
  }

        storj.io/storj/storagenode/storagenodedb.(*DB).Preflight:323
        main.cmdRun:190
        storj.io/private/process.cleanup.func1.4:359
        storj.io/private/process.cleanup.func1:377
        github.com/spf13/cobra.(*Command).execute:840
        github.com/spf13/cobra.(*Command).ExecuteC:945
        github.com/spf13/cobra.(*Command).Execute:885
        storj.io/private/process.ExecWithCustomConfig:88
        storj.io/private/process.ExecCustomDebug:70
        main.main:320
        runtime.main:203

The instructions for trying to do surgery on these sqlite files is pretty confusing. Can’t I just delete the bad files and let the node recreate them and keep going, as non-ideal as that would be?
Doesn’t seem like it. I tried that and no dice.

zcopley · March 5, 2022, 1:25am

zcopley:

                        Columns: []*dbschema.Column{
                                ... // 2 identical elements
                                &{Name: "finished_at", Type: "TIMESTAMP", IsNullable: true},

Thanks. I found them.

They were just in /storj/storage where they belong.

Somehow the update blew up my node I think. I don’t think there is any disk issue. I ran

find /storj/storage/ -iname "*.db" -maxdepth 1 -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'

and everything came back “ok”. Yet the node restarts endlessly. I guess I could muck around trying to recreate the files with new schemas and migrating data by hand. Ugh.

Pac · March 5, 2022, 7:25am

If your errors are still related to malformed/broken database files, be aware that in last resort, a node can be recovered with no database at all.

Have a look there:

Alexey · March 6, 2022, 9:25am

Yes, you can. The provided instruction is basically doing exactly that.
To allow storagenode to recreate databases, there should not be any database, thus the step about delete corrupted, move remained to backup, start the node to let it re-create empty databases, then stop the node and return databases from backup.
This way you could try to save most of the stat.

zcopley · March 9, 2022, 10:47pm

Well I tried that, but the node just keeps restarting. I get these errors in the logs:

2022-03-09T22:44:48.595Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-03-09T22:44:48.601Z        INFO    Operator email  {"Address": "zach@mydomain.com"}
2022-03-09T22:44:48.602Z        INFO    Operator wallet {"Address": "[REDACTED]"}
2022-03-09T22:44:48.797Z        INFO    Telemetry enabled
2022-03-09T22:44:48.891Z        INFO    db.migration    Database Version        {"version": 39}
2022-03-09T22:44:49.744Z        WARN    trust   Failed to fetch URLs from source; used cache    {"source": "https://tardigrade.io/trusted-satellites", "error": "HTTP source: Get https://tardigrade.io/trusted-satellites: x509: certificate signed by unknown authority", "errorVerbose": "HTTP source: Get https://tardigrade.io/trusted-satellites: x509: certificate signed by unknown authority\n\tstorj.io/storj/storagenode/trust.(*HTTPSource).FetchEntries:63\n\tstorj.io/storj/storagenode/trust.(*List).fetchEntries:90\n\tstorj.io/storj/storagenode/trust.(*List).FetchURLs:49\n\tstorj.io/storj/storagenode/trust.(*Pool).fetchURLs:240\n\tstorj.io/storj/storagenode/trust.(*Pool).Refresh:177\n\tstorj.io/storj/storagenode.(*Peer).Run:691\n\tmain.cmdRun:200\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:320\n\truntime.main:203"}
2022-03-09T22:44:49.804Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2022-03-09T22:44:50.473Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.
2022-03-09T22:44:50.473Z        INFO    bandwidth       Performing bandwidth usage rollups
2022-03-09T22:44:50.475Z        WARN    piecestore:monitor      Disk space is less than requested. Allocating space    {"bytes": 75945963520}
2022-03-09T22:44:50.475Z        ERROR   piecestore:monitor      Total disk space less than required minimum     {"bytes": 500000000000}
2022-03-09T22:44:50.476Z        INFO    Node 12Qvm91dLJLoMNSS6gBjUdXTo7oeNgr2o5qp4MBh4LctWVCp4Df started
2022-03-09T22:44:50.476Z        INFO    Public server started on [::]:28967
2022-03-09T22:44:50.476Z        INFO    Private server started on 127.0.0.1:7778
2022-03-09T22:44:50.477Z        ERROR   collector       error during collecting pieces:         {"error": "piece expiration error: context canceled", "errorVerbose": "piece expiration error: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:39\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:492\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:88\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:55\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func1:56\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-09T22:44:50.478Z        INFO    trust   Scheduling next refresh {"after": "7h19m20.323808978s"}
2022-03-09T22:44:50.478Z        ERROR   nodestats:cache Get pricing-model/join date failed      {"error": "context canceled"}
2022-03-09T22:44:50.478Z        ERROR   bandwidth       Could not rollup bandwidth usage        {"error": "sql: transaction has already been committed or rolled back"}
2022-03-09T22:44:50.479Z        ERROR   piecestore:cache        error getting current space used calculation:   {"error": "context canceled; context canceled; context canceled; context canceled; context canceled; context canceled; context canceled", "errorVerbose": "group:\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled"}
2022-03-09T22:44:50.479Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:140\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:309\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:338\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-09T22:44:50.480Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:140\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:309\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:338\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-09T22:44:50.480Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:140\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:309\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:338\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-09T22:44:50.481Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:140\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:309\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:338\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-09T22:44:50.481Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:140\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:309\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:338\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-09T22:44:50.482Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:140\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:309\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:338\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Error: piecestore monitor: disk space requirement not met

There is disk space…

root@squatch:/storj# df
Filesystem      1K-blocks       Used Available Use% Mounted on
/dev/root      1924808788 1772477828  74165692  96% /
devtmpfs           494908          0    494908   0% /dev
tmpfs              499516          0    499516   0% /dev/shm
tmpfs              499516      58600    440916  12% /run
tmpfs                5120          4      5116   1% /run/lock
tmpfs              499516          0    499516   0% /sys/fs/cgroup
/dev/sda1           43234      22857     20377  53% /boot
overlay        1924808788 1772477828  74165692  96% /var/lib/docker/overlay2/656680bdb164ef2c3708bc6df0ac8704e2ebe33d98d5bfda8bb7b1602972a323/merged
tmpfs               99900          0     99900   0% /run/user/1000

So. I dunno.

zcopley · March 10, 2022, 5:12am

Hi CutieePie,

It’s very kind of you to attempt to assist. Thanks.

uname output:

pi@squatch:~ $ uname -a
Linux squatch 4.19.66-v7+ #1253 SMP Thu Aug 15 11:49:46 BST 2019 armv7l GNU/Linux

usb-devices output:

pi@squatch:~ $ usb-devices

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480 MxCh= 1
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0002 Rev=04.19
S:  Manufacturer=Linux 4.19.66-v7+ dwc_otg_hcd
S:  Product=DWC OTG Controller
S:  SerialNumber=3f980000.usb
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480 MxCh= 4
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=2514 Rev=0b.b3
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub

T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  3 Spd=480 MxCh= 3
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=2514 Rev=0b.b3
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub

T:  Bus=01 Lev=03 Prnt=03 Port=00 Cnt=01 Dev#=  5 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=ff(vend.) Sub=00 Prot=ff MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=7800 Rev=03.00
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=ff Driver=lan78xx

T:  Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=0830 ProdID=0002 Rev=01.00
S:  Manufacturer=SupTronics
S:  Product=X830 V2.0
S:  SerialNumber=201902000007
C:  #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=0mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage

Storj data is on the same disk as the OS, not an external drive. I built my node patterned after the nodes that Storj used to bootstrap their network. It’s a Pi 3B+ with a Geekworm SATA HDD Storage Expansion Board (X830 V2.0) that lets me boot off a 3.5" 2TB Western Digital Blue physical hard drive. It’s all powered off of the same power adapter plugged into a UPS.

When I say that I updated my node, I think it was actually the watchtower thing that updated it. Although, I do occasionally update the system OS with apt upgrade. I don’t think apt was what did it. I wasn’t able to load the web dashboard so that’s when I went to investigate. However, It totally is possible I updated with apt some time earlier but didn’t check to make sure my node was still running fine right afterward. The node has been running without issue for so very long I tend to be more absent minded and lackadaisical about its maintenance than I should be.

Docker command:

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967/tcp \
    -p 28967:28967/udp \
    -p 127.0.0.1:14002:14002 \
    -e WALLET="[REDACTED]" \
    -e EMAIL="zach@mydomain.com" \
    -e ADDRESS="[REDACTED]:28967" \
    -e STORAGE="1.8TB" \
    --mount type=bind,source="/home/pi/.local/share/storj/identity/storagenode",destination=/app/identity \
    --mount type=bind,source="/storj",destination=/app/config \
    --name storagenode storjlabs/storagenode:latest

Thanks again for your help!

Zach

baker · March 10, 2022, 2:00pm

Hi Zach,

If you created all new databases, you node is now unaware that the data you hold on that drive is actually Storj data. It sees less than the minimum free space and won’t start. What you can do is temporarily reduced the minimum required free space so your node can start and the filewalker can read how much data you hold. You need to stop the node container, edit your config.yaml file as follows:

# how much disk space a node at minimum has to advertise
# storage2.monitor.minimum-disk-space: 500.00 GB

Uncomment the storage2.monitor.minimum-disk-space: 500.00 GB and reduce the value to something less than your free space available, for example:

# how much disk space a node at minimum has to advertise
storage2.monitor.minimum-disk-space: 50.00 GB

Then start your node. You can recomment this later if you wish.

zcopley · March 13, 2022, 7:53pm

Hmm. Well, that line isn’t commented out in my config.yml.

Here’s my entire config.yml file:

# path to the certificate chain for this identity
identity.cert-path: "identity/identity.cert"

# path to the private key for this identity
identity.key-path: "identity/identity.key"

# the public address of the Kademlia node, useful for nodes behind NAT
kademlia.external-address: ""

# operator email address
kademlia.operator.email: ""

# operator wallet adress
kademlia.operator.wallet: ""

# the minimum log level to log
log.level: info

# public address to listen on
server.address: ":28967"

# private address to listen on
server.private-address: "127.0.0.1:7778"

# total allocated bandwidth in bytes
storage.allocated-bandwidth: 500.0 GiB

I tried adding that line anyway, but it didn’t work:

2022-03-13T19:48:02.009Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-03-13T19:48:02.009Z        INFO    Invalid configuration file key  {"Key": "storage.monitor.minimum-disk-space"}
2022-03-13T19:48:02.014Z        INFO    Operator email  {"Address": "zach@mydomain.com"}
2022-03-13T19:48:02.014Z        INFO    Operator wallet {"Address": "[REDACTED]"}
2022-03-13T19:48:03.735Z        INFO    Telemetry enabled       {"instance ID": "[REDATED]"}
2022-03-13T19:48:03.820Z        INFO    db.migration    Database Version        {"version": 53}
2022-03-13T19:48:04.874Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2022-03-13T19:48:05.552Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.
2022-03-13T19:48:05.553Z        INFO    Node 12Qvm91dLJLoMNSS6gBjUdXTo7oeNgr2o5qp4MBh4LctWVCp4Df started
2022-03-13T19:48:05.553Z        INFO    Public server started on [::]:28967
2022-03-13T19:48:05.553Z        INFO    Private server started on 127.0.0.1:7778
2022-03-13T19:48:05.553Z        INFO    failed to sufficiently increase receive buffer size (was: 160 kiB, wanted: 2048 kiB, got: 320 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2022-03-13T19:48:06.722Z        INFO    trust   Scheduling next refresh {"after": "5h4m18.205126991s"}
2022-03-13T19:48:06.724Z        WARN    piecestore:monitor      Disk space is less than requested. Allocated space is{"bytes": 75249008640}
2022-03-13T19:48:06.724Z        ERROR   piecestore:monitor      Total disk space is less than required minimum  {"bytes": 500000000000}
2022-03-13T19:48:06.724Z        ERROR   services        unexpected shutdown of a runner {"name": "piecestore:monitor", "error": "piecestore monitor: disk space requirement not met", "errorVerbose": "piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:125\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.727Z        ERROR   nodestats:cache Get pricing-model/join date failed      {"error": "context canceled"}
2022-03-13T19:48:06.729Z        ERROR   gracefulexit:chore      error retrieving satellites.    {"error": "satellitesdb: context canceled", "errorVerbose": "satellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:149\n\tstorj.io/storj/storagenode/gracefulexit.(*service).ListPendingExits:89\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run.func1:53\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:50\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.729Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.730Z        ERROR   gracefulexit:blobscleaner       couldn't receive satellite's GE status  {"error": "context canceled"}
2022-03-13T19:48:06.730Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.729Z        ERROR   collector       error during collecting pieces:         {"error": "pieceexpirationdb: context canceled", "errorVerbose": "pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:39\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:521\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:88\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.730Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.731Z        INFO    bandwidth       Performing bandwidth usage rollups
2022-03-13T19:48:06.732Z        ERROR   bandwidth       Could not rollup bandwidth usage        {"error": "bandwidthdb: context canceled", "errorVerbose": "bandwidthdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Rollup:301\n\tstorj.io/storj/storagenode/bandwidth.(*Service).Rollup:53\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/bandwidth.(*Service).Run:45\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.731Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.733Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.734Z        ERROR   pieces:trash    emptying trash failed   {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-03-13T19:48:06.740Z        ERROR   piecestore:cache        error getting current used space:       {"error": "context canceled; context canceled; context canceled; context canceled; context canceled; context canceled; context canceled", "errorVerbose": "group:\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled"}
Error: piecestore monitor: disk space requirement not met

peem · March 13, 2022, 8:38pm

It has to be:

storage2.monitor.minimum-disk-space

storage2

and you have

storage.monitor.minimum-disk-space

zcopley · March 13, 2022, 9:47pm

Oh! The other entries in the file started with “storage” (sans 2) so I adjusted to match. I changed it to “storage2” and the node does seem to be working now. Do I leave this setting? I have to admit, I don’t really understand what it’s doing now.

zcopley · March 13, 2022, 10:22pm

LOL. Nice.

baker · March 14, 2022, 2:22am

That’s okay, you can recover from suspension by keeping the node online. You should keep an eye on the disk space used to make sure your node doesn’t fill the disk before it finished relearning how much space it is using.

Storage nodes are soft coded to require 500 GB of allocated space or they will not start. Since your node “forgot” about how much space was used when you wiped the databases, it determined that there was not enough free space to allocate at least 500 GB. We use this setting to override the 500 GB minimum. Once the node has had a chance to read how much space is actually being used, this setting will no longer be necessary. You can revert the change, or leave it as it should functionally make no difference.

zcopley · March 14, 2022, 5:50am

Okay, thank you, baker. I really do appreciate all the help from the forum. Seems like I’m back in business.

Best,
Zach