Node shutting down after some time

My node is always stopping.
Don’t know why. Can anybody help me please??
After restarting it’s working again for some hours.

Thanks in advance
Alex

Linux raspberrypi 5.10.103-v7+ #1529 SMP Tue Mar 8 12:21:37 GMT 2022 armv7l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Fri Jan 13 23:15:19 2023 from 192.168.178.201
pi@raspberrypi:~ $ sudo docker exec -it storagenode /app/dashboard.sh
Error response from daemon: Container e1b694f05cbaae3dd22ed1d02586ed0379082db97a26dd8ac023a76542de0e11 is not running
pi@raspberrypi:~ $ sudo docker logs storagenode --tail 100 --graceful-exit.num-workers int number of workers to handle satellite exits (default 4)
–healthcheck.details Enable additional details about the satellite connections via the HTTP healthcheck.
–healthcheck.enabled Provide health endpoint (including suspension/audit failures) on main public port, but HTTP protocol. (default true)
-h, --help help for run
–identity.cert-path string path to the certificate chain for this identity (default “identity/identity.cert”)
–identity.key-path string path to the private key for this identity (default “identity/identity.key”)
–nodestats.max-sleep duration maximum duration to wait before requesting data (default 5m0s)
–nodestats.reputation-sync duration how often to sync reputation (default 4h0m0s)
–nodestats.storage-sync duration how often to sync storage (default 12h0m0s)
–operator.email string operator email address
–operator.wallet string operator wallet address
–operator.wallet-features wallet-features operator wallet features
–pieces.delete-to-trash move pieces to trash upon deletion. Warning: if set to false, you risk disqualification for failed audits if a satellite database is restored from backup. (default true)
–pieces.write-prealloc-size memory.Size file preallocated for uploading (default 4.0 MiB)
–preflight.database-check whether or not preflight check for database is enabled. (default true)
–preflight.local-time-check whether or not preflight check for local system clock is enabled on the satellite side. When disabling this feature, your storagenode may not setup correctly. (default true)
–retain.concurrency int how many concurrent retain requests can be processed at the same time. (default 5)
–retain.max-time-skew duration allows for small differences in the satellite and storagenode clocks (default 72h0m0s)
–retain.status storj.Status allows configuration to enable, disable, or test retain requests from the satellite. Options: (disabled/enabled/debug) (default enabled) --server.address string public address to listen on (default “:7777”)
–server.extensions.revocation if true, client leaves may contain the most recent certificate revocation for the current certificate (default true)
–server.extensions.whitelist-signed-leaf if true, client leaves must contain a valid “signed certificate extension” (NB: verified against certs in the peer ca whitelist; i.e. if true, a whitelist must be provided)
–server.peer-ca-whitelist-path string path to the CA cert whitelist (peer identities must be signed by one these to be verified). this will override the default peer whitelist
–server.peer-id-versions string identity version(s) the server will be allowed to talk to (default “latest”)
–server.private-address string private address to listen on (default “127.0.0.1:7778”)
–server.revocation-dburl string url for revocation database (e.g. bolt://some.db OR redis://127.0.0.1:6378?db=2&password=abc123) (default “bolt://config/revocations.db”)
–server.use-peer-ca-whitelist if true, uses peer ca whitelist checking (default true)
–storage.allocated-bandwidth memory.Size total allocated bandwidth in bytes (deprecated) (default 0 B)
–storage.allocated-disk-space memory.Size total allocated disk space in bytes (default 1.00 TB)
–storage.k-bucket-refresh-interval duration how frequently Kademlia bucket should be refreshed with node stats (default 1h0m0s)
–storage.path string path to store data in (default “config/storage”)
–storage.whitelisted-satellites storj.NodeURLs a comma-separated list of approved satellite node urls (unused)
–storage2.cache-sync-interval duration how often the space used cache is synced to persistent storage (default 1h0m0s)
–storage2.database-dir string directory to store databases. if empty, uses data path
–storage2.delete-queue-size int size of the piece delete queue (default 10000)
–storage2.delete-workers int how many piece delete workers (default 1)
–storage2.exists-check-workers int how many workers to use to check if satellite pieces exists (default 5)
–storage2.expiration-grace-period duration how soon before expiration date should things be considered expired (default 48h0m0s)
–storage2.max-concurrent-requests int how many concurrent requests are allowed, before uploads are rejected. 0 represents unlimited.
–storage2.max-used-serials-size memory.Size amount of memory allowed for used serials store - once surpassed, serials will be dropped at random (default 1.00 MB)
–storage2.min-upload-speed memory.Size a client upload speed should not be lower than MinUploadSpeed in bytes-per-second (E.g: 1Mb), otherwise, it will be flagged as slow-connection and potentially be closed (default 0 B)
–storage2.min-upload-speed-congestion-threshold float if the portion defined by the total number of alive connection per MaxConcurrentRequest reaches this threshold, a slow upload client will no longer be monitored and flagged (default 0.8)
–storage2.min-upload-speed-grace-duration duration if MinUploadSpeed is configured, after a period of time after the client initiated the upload, the server will flag unusually slow upload client (default 10s)
–storage2.monitor.interval duration how frequently Kademlia bucket should be refreshed with node stats (default 1h0m0s)
–storage2.monitor.minimum-bandwidth memory.Size how much bandwidth a node at minimum has to advertise (deprecated) (default 0 B)
–storage2.monitor.minimum-disk-space memory.Size how much disk space a node at minimum has to advertise (default 500.00 GB)
–storage2.monitor.verify-dir-readable-interval duration how frequently to verify the location and readability of the storage directory (default 1m0s)
–storage2.monitor.verify-dir-writable-interval duration how frequently to verify writability of storage directory (default 5m0s)
–storage2.order-limit-grace-period duration how long after OrderLimit creation date are OrderLimits no longer accepted (default 1h0m0s)
–storage2.orders.archive-ttl duration length of time to archive orders before deletion (default 168h0m0s)
–storage2.orders.cleanup-interval duration duration between archive cleanups (default 5m0s)
–storage2.orders.max-sleep duration maximum duration to wait before trying to send orders (default 30s)
–storage2.orders.path string path to store order limit files in (default “config/orders”)
–storage2.orders.sender-dial-timeout duration timeout for dialing satellite during sending orders (default 1m0s)
–storage2.orders.sender-interval duration duration between sending (default 1h0m0s)
–storage2.orders.sender-timeout duration timeout for sending (default 1h0m0s)
–storage2.piece-scan-on-startup if set to true, all pieces disk usage is recalculated on startup (default true)
–storage2.retain-time-buffer duration allows for small differences in the satellite and storagenode clocks (default 48h0m0s)
–storage2.stream-operation-timeout duration how long to spend waiting for a stream operation before canceling (default 30m0s)
–storage2.trust.cache-path string file path where trust lists should be cached (default “config/trust-cache.json”)
–storage2.trust.exclusions trust-exclusions list of trust exclusions
–storage2.trust.refresh-interval duration how often the trust pool should be refreshed (default 6h0m0s)
–storage2.trust.sources trust-sources list of trust sources (default https://www.storj.io/dcs-satellites)
–version.check-interval duration Interval to check the version (default 15m0s)
–version.request-timeout duration Request timeout for version checks (default 1m0s)
–version.server-address string server address to check its version against (default “https://version.storj.io”)

Global Flags:
–color use color in user interface
–config-dir string main directory for storagenode configuration (default “config”)
–db.conn_max_lifetime duration Maximum Database Connection Lifetime, -1ns means the stdlib default (default 30m0s)
–db.max_idle_conns int Maximum Amount of Idle Database connections, -1 means the stdlib default (default 1)
–db.max_open_conns int Maximum Amount of Open Database connections, -1 means the stdlib default (default 5)
–debug.addr string address to listen on for debug endpoints (default “127.0.0.1:0”)
–debug.trace-out string If set, a path to write a process trace SVG to
–defaults string determines which set of configuration defaults to use. can either be ‘dev’ or ‘release’ (default “release”)
–identity-dir string main directory for storagenode identity credentials (default “identity”)
–log.caller if true, log function filename and line number
–log.development if true, set logging to development mode
–log.encoding string configures log encoding. can either be ‘console’, ‘json’, ‘pretty’, or ‘gcloudlogging’.
–log.level Level the minimum log level to log (default info)
–log.output string can be stdout, stderr, or a filename (default “stderr”)
–log.stack if true, log stack traces
–metrics.addr string address(es) to send telemetry to (comma-separated) (default “collectora.storj.io:9000”)
–metrics.app string application name for telemetry identification. Ignored for certain applications. (default “storagenode”)
–metrics.app-suffix string application suffix. Ignored for certain applications. (default “-release”)
–metrics.event-addr string address(es) to send telemetry to (comma-separated) (default “eventkitd.datasci.storj.io:9002”)
–metrics.instance-prefix string instance id prefix
–metrics.interval duration how frequently to send up telemetry. Ignored for certain applications. (default 1m0s)
–tracing.agent-addr string address for jaeger agent (default “agent.tracing.datasci.storj.io:5775”)
–tracing.app string application name for tracing identification (default “storagenode”)
–tracing.app-suffix string application suffix (default “-release”)
–tracing.buffer-size int buffer size for collector batch packet size
–tracing.enabled whether tracing collector is enabled (default true)
–tracing.interval duration how frequently to flush traces to tracing agent (default 0s)
–tracing.queue-size int buffer size for collector queue size
–tracing.sample float how frequent to sample traces

2023-01-13 22:16:59,805 INFO stopped: storagenode (exit status 1)
2023-01-13 22:16:59,809 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
pi@raspberrypi:~ $

Where is the actual logs?

? What do you mean? There is last 100 lines

Error: trust: malformed cache: unexpected end of JSON input
2023-01-14 01:06:21,428 INFO exited: storagenode (exit status 1; not expected)
2023-01-14 01:06:24,450 INFO spawned: ‘storagenode’ with pid 64
2023-01-14T01:06:24.996Z INFO Configuration loaded {“Process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2023-01-14T01:06:24.999Z INFO Anonymized tracing enabled{“Process”: “storagenode”}
2023-01-14T01:06:25.008Z INFO Operator email {“Process”: “storagenode”, “Address”: “xxxx”}
2023-01-14T01:06:25.009Z INFO Operator wallet {“Process”: “storagenode”, “Address”: “0x95222C72cD2E3FdB920daA5Aa711fd8506F152D0”}
Error: trust: malformed cache: unexpected end of JSON input
2023-01-14 01:06:25,287 INFO exited: storagenode (exit status 1; not expected)
2023-01-14 01:06:26,290 INFO gave up: storagenode entered FATAL state, too many start retries too quickly
2023-01-14 01:06:27,295 WARN received SIGQUIT indicating exit request
2023-01-14

That’s the error. Rename the trust-cache.json file and it will be recreated.

1 Like

That did not work somehow. Still got problems. After some minute uptime, node stops and log says:

pi@raspberrypi:~ $ sudo docker logs storagenode --tail 100

goroutine 258 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2cb18c0, {0xf4523c, 0x2c90e70})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 259 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2cb18c0, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204

goroutine 231 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2cb17a0, {0xf4523c, 0x2c90810})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 232 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2cb17a0, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204

goroutine 229 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2cb1680, {0xf4523c, 0x2c907b0})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 230 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2cb1680, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204

goroutine 205 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2abc480, {0xf4523c, 0x2871ce0})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 207 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2abc7e0, {0xf4523c, 0x2871d40})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 208 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2abc7e0, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204

goroutine 210 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2abcb40, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204

goroutine 297 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2cb1b00, {0xf4523c, 0x2c91710})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 298 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2cb1b00, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204

goroutine 947 [sleep, 5 minutes]:
time.Sleep(0x8bb2c97000)
/usr/local/go/src/runtime/time.go:194 +0x170
github.com/spacemonkeygo/monkit/v3.(*ticker).run(0x1576b60)
/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.19/meter.go:203 +0x24
created by github.com/spacemonkeygo/monkit/v3.(*ticker).register
/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.19/meter.go:195 +0x80

goroutine 354 [select, 5 minutes]:
database/sql.(*DB).connectionOpener(0x2cb1c20, {0xf4523c, 0x2a808d0})
/usr/local/go/src/database/sql/sql.go:1226 +0x9c
created by database/sql.OpenDB
/usr/local/go/src/database/sql/sql.go:794 +0x188

goroutine 355 [select, 5 minutes]:
database/sql.(*DB).connectionCleaner(0x2cb1c20, 0x1a3185c5000)
/usr/local/go/src/database/sql/sql.go:1069 +0xe8
created by database/sql.(*DB).startCleanerLocked
/usr/local/go/src/database/sql/sql.go:1056 +0x204
2023-01-17 15:57:23,223 INFO exited: storagenode (exit status 2; expected)
2023-01-17 15:57:24,400 INFO spawned: ‘storagenode’ with pid 53
2023-01-17 15:57:24,403 WARN received SIGQUIT indicating exit request
2023-01-17 15:57:24,436 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-01-17T15:57:24.623Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode-updater”}
2023-01-17 15:57:24,876 INFO stopped: storagenode-updater (exit status 0)
2023-01-17 15:57:24,903 INFO stopped: storagenode (terminated by SIGTERM)
2023-01-17 15:57:24,908 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

If I reboot, it works for some minutes again, then node stops.

Looks like not only cache file is corrupted.
Please try to check databases:

1 Like

The check says “ok”, so I assume this is not the problem?!
Thanks Alexey!

Yes, if databases are OK, then we need to see the beginning of the exception

Please try to find in your logs the beginning of this exception, it should start with date and time.

Hi folks,

After I tried some of alexeys suggested fix and used some of the mentioned comments, it s working now for 13h.
Hope that was the fix. Even if I did not understand 100/100 what I was doing.

Let’s see what next days bring.

BR
Alfred

2 Likes

It’s working now for 130h non stop. We can close this issue! Thanks to all of you!

3 Likes