Windows to Ubuntu move went wrong: docker ping error on starting

bitomatas · January 5, 2025, 11:28am

Hi,
Have moved all data to Ubuntu docker , seems dockers stars fine, dashboard visible and accessible from outside, but shows OFFLINE . Logs show repeating errors, that cannot contect sattelite. Both ports are forwarded well (EdgeOS),

Dashboard access from outside well

Docker is running as a root user so theres shoul not be permission issues but could these be authorization issues, or permissions due to copied from windows ?

Logs (replaced my ip with XX.XX.XX.XX):

2025-01-05T11:13:56Z INFO Version is up to date {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2025-01-05T11:13:56Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.117.8”}
2025-01-05T11:13:56Z INFO Version is up to date {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2025-01-05 11:13:57,446 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-05 11:13:57,446 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-05 11:13:57,447 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-05T11:13:57Z ERROR contact:service ping satellite failed {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp XX.XX.XX.XX:28967: connect: connection refused”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp XX.XX.XX.XX:28967: connect: connection refused\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:102\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}
2025-01-05T11:13:58Z ERROR contact:service ping satellite failed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp XX.XX.XX.XX:28967: connect: connection refused”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp XX.XX.XX.XX:28967: connect: connection refused\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:102\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}

seanr22a · January 5, 2025, 12:14pm

I’m not a Ubuntu user but isn’t ufw turned on by default so you have to open necessary ports in ufw ? or simply turn ufw off …

" By default, UFW is set to **deny all incoming connections and allow all outgoing connections."

bitomatas · January 5, 2025, 12:30pm

nop, no UFW at all. all disabled, iptables clean. weird thing 14002 port works fine . but storj main port, is not accessible from any other IP other than 127.0.0.1

nerdatwork · January 5, 2025, 1:22pm

Try these troubleshooting steps.

bitomatas · January 5, 2025, 1:31pm

The port forward is correctly configured, as i have windows running properly , and linux 14002 port is acessible ok. I think there might be issue with identity, as i copied files form windows. Is there a way to check those ?

Roxor · January 5, 2025, 2:02pm

Did you check the yougetsignal page nerdatwork mentioned? It says 28967 is closed to you (checked at around 9am EST - You missed a 7x.xx.xxx.x.0 edit above )

bitomatas · January 5, 2025, 2:12pm

i know that it is not opened. the problem is it is not opened not from outside, but from inner network it is not accessibel as well. seems like dahsboard 14002 loads webb, but main funtionality on 28967 is not, as i cannot telnet from local net . And i ran out of ideas what to check The only way i can telent that port if from 127.0.0.1 , however i i try from home network 192.168.1.X - it wont telnet …

Roxor · January 5, 2025, 2:24pm

I don’t understand what problem you’re trying to solve. The “ping satellite: failed to ping storage node” error in your log is indicating a connection from outside is failing to make it in: there’s no part of Storj that will be talking to your node from your LAN.

Or does the node work 100% OK when you open 28967 to the Internet… but you have some internal monitoring tool that wants to touch 28967 too?

I heard of a problem like that before: but can’t find the thread. It was a SNO who appeared to have everything fine in his router, and in the VM running the node… but it ended up being something in proxmox that was blocking 28967 from other LAN IPs. I’ll try to search again…

nerdatwork · January 5, 2025, 2:48pm

Apparmor could be the culprit so you can read

bitomatas · January 5, 2025, 3:00pm

Now i have a small progress - when i created config.yaml manually inside the docker and restarted, seems the port became accessible from outside, so i see now externally:
{
“Statuses”: null,
“Help”: “To access Storagenode services, please use DRPC protocol!”,
“AllHealthy”: true
}

This is weird, as my docker container creation log was really good, not sure why it never enabled that main server listnening on all intefaces except localhost (127.0.0.1) .

Also i see in log things started working :

However , now the 14002 dashboard shows me OFFLINE even tho i see in logs things are online:

I am sure it works, as i juts got emials confirming my node is up

So no the only bit of help needed is to find why dashboard does not see node is working ?

bitomatas · January 5, 2025, 4:01pm

OK , if anyone has any ideas why my dashboard could be showing me OFFLINE I would appreciate tried playing with IP swithching to 0.0.0.0 , 192.168.1.166 - rmeoving at all keeping only :port seems like not working. Is there any kind of special log i can trace for dashboard ?

nerdatwork · January 5, 2025, 4:54pm

Try hard refresh. Ctrl+F5 in the dashboard page.

bitomatas · January 5, 2025, 5:25pm

No, its not the case (tried clearing cahces, different browsers) . I now think maybe all this happened as i probably havent run setup , but just initiated docker container and added windows files ? Thats why config.yaml wasnt there. But no when i look at it , i dont understand what could be missing for dashboard.

nerdatwork · January 5, 2025, 6:18pm

You dont have to forward your dashboard port though. (14002)

bitomatas · January 5, 2025, 6:51pm

Yes, i know , but thats not an issue - i can remove forward any time . If any storj admin would be interested in analysing what happened and possibly help me - i would be very grateful and would share an access, as i am running out of ideas

arrogantrabbit · January 5, 2025, 8:05pm

Maybe this? Fixing Docker's MTU Issues on Ubuntu - Civo.com

Alexey · January 6, 2025, 2:50am

I see the subfolder storagenode in your /app/identity, did you provide a correct identity in your docker run command?

Also, if you recreated the config, please compare it with this default config:

# how frequently bandwidth usage rollups are calculated
# bandwidth.interval: 1h0m0s

# how frequently expired pieces are collected
# collector.interval: 1h0m0s

# use color in user interface
# color: false

# server address of the api gateway and frontend app
console.address: :14002

# path to static resources
# console.static-dir: ""

# the public address of the node, useful for nodes behind NAT
contact.external-address: ""

# how frequently the node contact chore should run
# contact.interval: 1h0m0s

# protobuf serialized signed node tags in hex (base64) format
# contact.tags: ""

# Maximum Database Connection Lifetime, -1ns means the stdlib default
# db.conn_max_lifetime: 30m0s

# Maximum Amount of Idle Database connections, -1 means the stdlib default
# db.max_idle_conns: 1

# Maximum Amount of Open Database connections, -1 means the stdlib default
# db.max_open_conns: 5

# address to listen on for debug endpoints
# debug.addr: 127.0.0.1:0

# expose control panel
# debug.control: false

# If set, a path to write a process trace SVG to
# debug.trace-out: ""

# open config in default editor
# edit-conf: false

# in-memory buffer for uploads
# filestore.write-buffer-size: 128.0 KiB

# how often to run the chore to check for satellites for the node to exit.
# graceful-exit.chore-interval: 1m0s

# the minimum acceptable bytes that an exiting node can transfer per second to the new node
# graceful-exit.min-bytes-per-second: 5.00 KB

# the minimum duration for downloading a piece from storage nodes before timing out
# graceful-exit.min-download-timeout: 2m0s

# number of concurrent transfers per graceful exit worker
# graceful-exit.num-concurrent-transfers: 5

# number of workers to handle satellite exits
# graceful-exit.num-workers: 4

# Enable additional details about the satellite connections via the HTTP healthcheck.
healthcheck.details: false

# Provide health endpoint (including suspension/audit failures) on main public port, but HTTP protocol.
healthcheck.enabled: true

# path to the certificate chain for this identity
identity.cert-path: identity/identity.cert

# path to the private key for this identity
identity.key-path: identity/identity.key

# if true, log function filename and line number
# log.caller: false

# if true, set logging to development mode
# log.development: false

# configures log encoding. can either be 'console', 'json', 'pretty', or 'gcloudlogging'.
# log.encoding: ""

# the minimum log level to log
log.level: info

# can be stdout, stderr, or a filename
# log.output: stderr

# if true, log stack traces
# log.stack: false

# address(es) to send telemetry to (comma-separated)
# metrics.addr: collectora.storj.io:9000

# application name for telemetry identification. Ignored for certain applications.
# metrics.app: storagenode

# application suffix. Ignored for certain applications.
metrics.app-suffix: -alpha

# address(es) to send telemetry to (comma-separated)
# metrics.event-addr: eventkitd.datasci.storj.io:9002

# instance id prefix
# metrics.instance-prefix: ""

# how frequently to send up telemetry. Ignored for certain applications.
metrics.interval: 30m0s

# maximum duration to wait before requesting data
# nodestats.max-sleep: 5m0s

# how often to sync reputation
# nodestats.reputation-sync: 4h0m0s

# how often to sync storage
# nodestats.storage-sync: 12h0m0s

# operator email address
operator.email: ""

# operator wallet address
operator.wallet: ""

# operator wallet features
operator.wallet-features: ""

# move pieces to trash upon deletion. Warning: if set to false, you risk disqualification for failed audits if a satellite database is restored from backup.
# pieces.delete-to-trash: true

# run garbage collection and used-space calculation filewalkers as a separate subprocess with lower IO priority
# pieces.enable-lazy-filewalker: true

# file preallocated for uploading
# pieces.write-prealloc-size: 4.0 MiB

# whether or not preflight check for database is enabled.
# preflight.database-check: true

# whether or not preflight check for local system clock is enabled on the satellite side. When disabling this feature, your storagenode may not setup correctly.
# preflight.local-time-check: true

# how many concurrent retain requests can be processed at the same time.
# retain.concurrency: 5

# allows for small differences in the satellite and storagenode clocks
# retain.max-time-skew: 72h0m0s

# allows configuration to enable, disable, or test retain requests from the satellite. Options: (disabled/enabled/debug)
# retain.status: enabled

# public address to listen on
server.address: :28967

# whether to debounce incoming messages
# server.debouncing-enabled: true

# if true, client leaves may contain the most recent certificate revocation for the current certificate
# server.extensions.revocation: true

# if true, client leaves must contain a valid "signed certificate extension" (NB: verified against certs in the peer ca whitelist; i.e. if true, a whitelist must be provided)
# server.extensions.whitelist-signed-leaf: false

# path to the CA cert whitelist (peer identities must be signed by one these to be verified). this will override the default peer whitelist
# server.peer-ca-whitelist-path: ""

# identity version(s) the server will be allowed to talk to
# server.peer-id-versions: latest

# private address to listen on
server.private-address: 127.0.0.1:7778

# url for revocation database (e.g. bolt://some.db OR redis://127.0.0.1:6378?db=2&password=abc123)
# server.revocation-dburl: bolt://config/revocations.db

# enable support for tcp fast open
# server.tcp-fast-open: true

# the size of the tcp fast open queue
# server.tcp-fast-open-queue: 256

# if true, uses peer ca whitelist checking
# server.use-peer-ca-whitelist: true

# total allocated bandwidth in bytes (deprecated)
storage.allocated-bandwidth: 0 B

# total allocated disk space in bytes
storage.allocated-disk-space: 2.00 TB

# how frequently Kademlia bucket should be refreshed with node stats
# storage.k-bucket-refresh-interval: 1h0m0s

# path to store data in
# storage.path: config/storage

# a comma-separated list of approved satellite node urls (unused)
# storage.whitelisted-satellites: ""

# how often the space used cache is synced to persistent storage
# storage2.cache-sync-interval: 1h0m0s

# directory to store databases. if empty, uses data path
# storage2.database-dir: ""

# size of the piece delete queue
# storage2.delete-queue-size: 10000

# how many piece delete workers
# storage2.delete-workers: 1

# how many workers to use to check if satellite pieces exists
# storage2.exists-check-workers: 5

# how soon before expiration date should things be considered expired
# storage2.expiration-grace-period: 48h0m0s

# how many concurrent requests are allowed, before uploads are rejected. 0 represents unlimited.
# storage2.max-concurrent-requests: 0

# amount of memory allowed for used serials store - once surpassed, serials will be dropped at random
# storage2.max-used-serials-size: 1.00 MB

# a client upload speed should not be lower than MinUploadSpeed in bytes-per-second (E.g: 1Mb), otherwise, it will be flagged as slow-connection and potentially be closed
# storage2.min-upload-speed: 0 B

# if the portion defined by the total number of alive connection per MaxConcurrentRequest reaches this threshold, a slow upload client will no longer be monitored and flagged
# storage2.min-upload-speed-congestion-threshold: 0.8

# if MinUploadSpeed is configured, after a period of time after the client initiated the upload, the server will flag unusually slow upload client
# storage2.min-upload-speed-grace-duration: 10s

# how frequently Kademlia bucket should be refreshed with node stats
# storage2.monitor.interval: 1h0m0s

# how much bandwidth a node at minimum has to advertise (deprecated)
# storage2.monitor.minimum-bandwidth: 0 B

# how much disk space a node at minimum has to advertise
# storage2.monitor.minimum-disk-space: 500.00 GB

# how frequently to verify the location and readability of the storage directory
# storage2.monitor.verify-dir-readable-interval: 1m0s

# how long to wait for a storage directory readability verification to complete
# storage2.monitor.verify-dir-readable-timeout: 1m0s

# if the storage directory verification check fails, log a warning instead of killing the node
# storage2.monitor.verify-dir-warn-only: false

# how frequently to verify writability of storage directory
# storage2.monitor.verify-dir-writable-interval: 5m0s

# how long to wait for a storage directory writability verification to complete
# storage2.monitor.verify-dir-writable-timeout: 1m0s

# how long after OrderLimit creation date are OrderLimits no longer accepted
# storage2.order-limit-grace-period: 1h0m0s

# length of time to archive orders before deletion
# storage2.orders.archive-ttl: 168h0m0s

# duration between archive cleanups
# storage2.orders.cleanup-interval: 5m0s

# maximum duration to wait before trying to send orders
# storage2.orders.max-sleep: 30s

# path to store order limit files in
# storage2.orders.path: config/orders

# timeout for dialing satellite during sending orders
# storage2.orders.sender-dial-timeout: 1m0s

# duration between sending
# storage2.orders.sender-interval: 1h0m0s

# timeout for sending
# storage2.orders.sender-timeout: 1h0m0s

# if set to true, all pieces disk usage is recalculated on startup
# storage2.piece-scan-on-startup: true

# allows for small differences in the satellite and storagenode clocks
# storage2.retain-time-buffer: 48h0m0s

# how long to spend waiting for a stream operation before canceling
# storage2.stream-operation-timeout: 30m0s

# file path where trust lists should be cached
# storage2.trust.cache-path: config/trust-cache.json

# list of trust exclusions
# storage2.trust.exclusions: ""

# how often the trust pool should be refreshed
# storage2.trust.refresh-interval: 6h0m0s

# list of trust sources
# storage2.trust.sources: https://www.storj.io/dcs-satellites

# address for jaeger agent
# tracing.agent-addr: agent.tracing.datasci.storj.io:5775

# application name for tracing identification
# tracing.app: storagenode

# application suffix
# tracing.app-suffix: -release

# buffer size for collector batch packet size
# tracing.buffer-size: 0

# whether tracing collector is enabled
# tracing.enabled: true

# how frequently to flush traces to tracing agent
# tracing.interval: 0s

# buffer size for collector queue size
# tracing.queue-size: 0

# how frequent to sample traces
# tracing.sample: 0

# Interval to check the version
# version.check-interval: 15m0s

# Request timeout for version checks
# version.request-timeout: 1m0s

# server address to check its version against
version.server-address: https://version.storj.io

You may generate it, if you would do a setup step but with providing a different temp folder for /app/config to do not break your current node. It will generate the default folders structure in the provided temp path and config.yaml. You may copy it to your real data folder and remove the temp folder after that.

bitomatas · January 6, 2025, 7:29pm

Thanks everyone for help. Just for anyone who will bump into similar situation - the dashboard started working itself after me not touching anything for hours. And no it wasnt cache issue, as i said, i tried different PCs, incognito mode, and cleanups. As this was first launch after moving 8TB from Windows i assume maybe it needed some time to correctly start ? Anyway - windows was showing dashboard instantly , and here it needed lots of time to appear.
All the best to everyone here !