Disqualified on Stefan satellite during Graceful Exit

Good day.
Yesterday I launched an exit.
Got a ban today.
Help please, I have 100% audits !!!

Here is the satellite output information:
c:\Program Files\Storj\Storage Node>storagenode exit-satellite
2020-07-16T16:07:58.847+0300 INFO Identity loaded. {“Node ID”: “12KJbX2TziK3SaCVPTz39iDY6YxpsqPbFqbuW9VNDiR72U9Gi3b”}
By starting a graceful exit from a satellite, you will no longer receive new uploads from that satellite.
This action can not be undone.
Are you sure you want to continue? [y/n]
:y
Domain Name Node ID Space Used
satellite.stefan-benten.de:7777 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW 38.5 GB
saltlake.tardigrade.io:7777 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE 3.8 TB
asia-east-1.tardigrade.io:7777 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 341.5 GB
us-central-1.tardigrade.io:7777 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S 355.4 GB
europe-west-1.tardigrade.io:7777 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs 583.8 GB
europe-north-1.tardigrade.io:7777 12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB 3.8 TB
Please enter a space delimited list of satellite domain names you would like to gracefully exit. Press enter to continue:
satellite.stefan-benten.de:7777

Domain Name Node ID Percent Complete Successful Completion Receipt
satellite.stefan-benten.de:7777 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW 0.00% N N/A

c:\Program Files\Storj\Storage Node>storagenode exit-status
2020-07-17T15:28:45.012+0300 INFO Identity loaded. {“Node ID”: “12KJbX2TziK3SaCVPTz39iDY6YxpsqPbFqbuW9VNDiR72U9Gi3b”}

Domain Name Node ID Percent Complete Successful Completion Receipt
satellite.stefan-benten.de:7777 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW 66.32% N 0a483046022100d7efe13646740e11a988b4e7c1fb141071efb136379672721d37caadfc2dc7a4022100dd7382091f79a7fb690478d08d339fb2ae5dd8e859998bbd6707920e487d00c510021a20004ae89e970e703df42ba4ab1416a3b30b7e1d8e14aa0e558f7ee268000000002220ad419149f0970abdbd520e0bcafb23c396cdb24d540e45c1800e3010000000002a0c08d487c6f80510aeb8a4b503

I have a log of my node, there are many messages like this in the logs:
2020-07-16T16:46:14.774+0300 INFO gracefulexit:chore piece transferred to new storagenode {“Storagenode ID”: “18ysLjqM8xTPdWSEVuozA7WzAEejnmTA7pf78goHNnFw2eUXKC”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “6FQMKORP2HY2IB63TM2ZZWPBCMP5N73HRMXRLCII27CT72HTAAEQ”}
2020-07-16T16:46:14.824+0300 INFO piecestore upload canceled {“Piece ID”: “BROVG22ANWSE5SG4NBCOVL7MD6JNYPVPCOD6MUNVTFSYZ74XRTKA”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “PUT”}
2020-07-16T16:46:15.036+0300 ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “WWFCY2FCNYWFCLCNO2JKW5TUQLLNG5SKRLPFPZBFCY7QZTX3NRWQ”, “error”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID”, “errorVerbose”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID\n\tstorj.io/common/rpc.Dialer.dialTransport:310\n\tstorj.io/common/rpc.Dialer.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/uplink/private/piecestore.DialNodeURL:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:66\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:220\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:247\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}
2020-07-16T16:46:15.037+0300 ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID”, “errorVerbose”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID\n\tstorj.io/common/rpc.Dialer.dialTransport:310\n\tstorj.io/common/rpc.Dialer.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/uplink/private/piecestore.DialNodeURL:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:66\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:220\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:247\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}
2020-07-16T16:46:15.249+0300 ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “MZLKIHYP76MAFBNH5CPCPC6GQQ3BI6OXEOZLN3VLYHQ5KWW6M54A”, “error”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID”, “errorVerbose”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID\n\tstorj.io/common/rpc.Dialer.dialTransport:310\n\tstorj.io/common/rpc.Dialer.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/uplink/private/piecestore.DialNodeURL:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:66\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:220\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:247\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}
2020-07-16T16:46:15.249+0300 ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID”, “errorVerbose”: “piecestore: rpccompat: tls peer certificate verification error: tlsopts error: peer ID did not match requested ID\n\tstorj.io/common/rpc.Dialer.dialTransport:310\n\tstorj.io/common/rpc.Dialer.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/uplink/private/piecestore.DialNodeURL:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:66\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:220\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:247\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Please unblock my node and help me solve the problem with a nice satellite output.
Thank you in advance!

What is your current Graceful Exit parameters in the config.yaml?

1 Like

All settings in the configuration file are standard! I did not rule anything except that I transferred the database to the SSD-disk.

# directory to store databases. if empty, uses data path
storage2.database-dir: C:\storj\db




# how frequently bandwidth usage rollups are calculated
# bandwidth.interval: 1h0m0s

# how frequently expired pieces are collected
# collector.interval: 1h0m0s

# use color in user interface
# color: false

# server address of the api gateway and frontend app
console.address: 192.168.14.12:14002

# path to static resources
# console.static-dir: ""

# the public address of the node, useful for nodes behind NAT
contact.external-address: 134.***.***.***:28967

# how frequently the node contact chore should run
# contact.interval: 1h0m0s

# Maximum Database Connection Lifetime, -1ns means the stdlib default
# db.conn_max_lifetime: -1ns

# Maximum Amount of Idle Database connections, -1 means the stdlib default
# db.max_idle_conns: 20

# Maximum Amount of Open Database connections, -1 means the stdlib default
# db.max_open_conns: 25

# address to listen on for debug endpoints
# debug.addr: 127.0.0.1:0

# If set, a path to write a process trace SVG to
# debug.trace-out: ""

# open config in default editor
# edit-conf: false

# how often to run the chore to check for satellites for the node to exit.
# graceful-exit.chore-interval: 15m0s

# the minimum acceptable bytes that an exiting node can transfer per second to the new node
# graceful-exit.min-bytes-per-second: 128 B

# the minimum duration for downloading a piece from storage nodes before timing out
# graceful-exit.min-download-timeout: 2m0s

# number of concurrent transfers per graceful exit worker
# graceful-exit.num-concurrent-transfers: 1

# number of workers to handle satellite exits
# graceful-exit.num-workers: 3

# path to the certificate chain for this identity
identity.cert-path: C:\Users\Администратор\AppData\Roaming\Storj\Identity\storagenode\identity.cert

# path to the private key for this identity
identity.key-path: C:\Users\Администратор\AppData\Roaming\Storj\Identity\storagenode\identity.key

# if true, log function filename and line number
# log.caller: false

# if true, set logging to development mode
# log.development: false

# configures log encoding. can either be 'console' or 'json'
# log.encoding: console

# the minimum log level to log
log.level: info

# can be stdout, stderr, or a filename
log.output: winfile:///C:\Program Files\Storj\Storage Node\\storagenode.log

# if true, log stack traces
# log.stack: false

# address to send telemetry to
# metrics.addr: collectora.storj.io:9000

# application name for telemetry identification
# metrics.app: storagenode.exe

# application suffix
# metrics.app-suffix: -release

# instance id prefix
# metrics.instance-prefix: ""

# how frequently to send up telemetry
# metrics.interval: 1m0s

# path to log for oom notices
# monkit.hw.oomlog: /var/log/kern.log

# maximum duration to wait before requesting data
# nodestats.max-sleep: 5m0s

# how often to sync reputation
# nodestats.reputation-sync: 4h0m0s

# how often to sync storage
# nodestats.storage-sync: 12h0m0s

# operator email address
operator.email: s*********@gmail.com

# operator wallet address
operator.wallet: 0x650*************

# how many concurrent retain requests can be processed at the same time.
# retain.concurrency: 5

# allows for small differences in the satellite and storagenode clocks
# retain.max-time-skew: 24h0m0s

# allows configuration to enable, disable, or test retain requests from the satellite. Options: (disabled/enabled/debug)
# retain.status: disabled

# public address to listen on
server.address: :28967

# log all GRPC traffic to zap logger
server.debug-log-traffic: false

# if true, client leaves may contain the most recent certificate revocation for the current certificate
# server.extensions.revocation: true

# if true, client leaves must contain a valid "signed certificate extension" (NB: verified against certs in the peer ca whitelist; i.e. if true, a whitelist must be provided)
# server.extensions.whitelist-signed-leaf: false

# path to the CA cert whitelist (peer identities must be signed by one these to be verified). this will override the default peer whitelist
# server.peer-ca-whitelist-path: ""

# identity version(s) the server will be allowed to talk to
# server.peer-id-versions: latest

# private address to listen on
server.private-address: 127.0.0.1:7778

# url for revocation database (e.g. bolt://some.db OR redis://127.0.0.1:6378?db=2&password=abc123)
# server.revocation-dburl: bolt://C:\Program Files\Storj\Storage Node/revocations.db

# if true, uses peer ca whitelist checking
# server.use-peer-ca-whitelist: true

# total allocated bandwidth in bytes
storage.allocated-bandwidth: 96.0 TB

# total allocated disk space in bytes
storage.allocated-disk-space: 9.08 TB

# how frequently Kademlia bucket should be refreshed with node stats
# storage.k-bucket-refresh-interval: 1h0m0s

# path to store data in
storage.path: D:\

# a comma-separated list of approved satellite node urls
# storage.whitelisted-satellites: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777,118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW@satellite.stefan-benten.de:7777,121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@asia-east-1.tardigrade.io:7777,12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs@europe-west-1.tardigrade.io:7777

# how often the space used cache is synced to persistent storage
# storage2.cache-sync-interval: 1h0m0s

# how soon before expiration date should things be considered expired
# storage2.expiration-grace-period: 48h0m0s

# how many concurrent requests are allowed, before uploads are rejected. 0 represents unlimited.
# storage2.max-concurrent-requests: 0

# how frequently Kademlia bucket should be refreshed with node stats
# storage2.monitor.interval: 1h0m0s

# how much bandwidth a node at minimum has to advertise
# storage2.monitor.minimum-bandwidth: 500.0 GB

# how much disk space a node at minimum has to advertise
# storage2.monitor.minimum-disk-space: 500.0 GB

# how long after OrderLimit creation date are OrderLimits no longer accepted
# storage2.order-limit-grace-period: 24h0m0s

# length of time to archive orders before deletion
# storage2.orders.archive-ttl: 168h0m0s

# duration between archive cleanups
# storage2.orders.cleanup-interval: 24h0m0s

# timeout for dialing satellite during sending orders
# storage2.orders.sender-dial-timeout: 1m0s

# duration between sending
# storage2.orders.sender-interval: 1h0m0s

# timeout for sending
# storage2.orders.sender-timeout: 1h0m0s

# allows for small differences in the satellite and storagenode clocks
# storage2.retain-time-buffer: 48h0m0s

# Interval to check the version
# version.check-interval: 15m0s

# Request timeout for version checks
# version.request-timeout: 1m0s

# server address to check its version against
# version.server-address: https://version.storj.io

Audits are 100%, all files are in place.
The Internet channel is much larger than necessary.!

I have one more node on a different server, a different internet and address - the errors in the log file are identical!

I would like to note that on my other nodes today only 49 megabytes of information have been received from this satellite. It’s a network problem!
Below is information from all my node:

satellite.stefan-benten.de:7777 (118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW)
 - egress max 243 MiB (2020-07-11), average 170 MiB
 - ingress max 503 MiB (2020-07-01), average 444 MiB
 - bandwidth max 732 MiB (2020-07-12), average 614 MiB
 - bandwidth total 2,82 GiB egress, 7,37 GiB ingress

I asked for help from developers to take a look into issue with your node.
Please, let’s wait for the answer and do not invent another theory without proper backed data.

At the moment I can say, that your node is successfully transferred 11,848 pieces and failed to transfer 5,766

Ok, let’s wait, really hope for a quick response from the developers!

I am ready to provide all the necessary information and send you my files.
I hope you unblock my node and I can continue …

Please, save your logs from the start of the GE and do not shutdown your node for a while.

Okay, the logs are kept separate.
My node will be turned on.

1 Like

I have the same problem. I started a GE from stefan on july 15th, it was progressing very slowly like 7% a day while I have only 17.9 GB from Stefan and today I got disqualified at 19.57% completion. My node has a 100% audit and up-time, is connected on a Gb fiber and is otherwise running smoothly for months
My node id is 1pp28BaTvY34hWoYBgDDG8RKGnEhM1XL3WQnaPjFuXc63FXCjv
The log shows a lot of GE errors most in the style
020-07-15T23:39:37.646Z ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “11
8UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “write tcp 172.17.0.2:59476->78.94.240.189:7777: write:
connection reset by peer”, “errorVerbose”: “write tcp 172.17.0.2:59476->78.94.240.189:7777: write: connection reset
by peer\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:321\n\tstor
j.io/common/pb.(*drpcSatelliteGracefulExitProcessClient).Send:1345\n\tstorj.io/storj/storagenode/gracefulexit.(*Work
er).transferPiece:309\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).Run.func2:111\n\tstorj.io/common/sync2.(
Limiter).Go.func1:43”}
020-07-15T23:57:06.176Z ERROR gracefulexit:chore failed to put piece. {“Satellite ID”: “118UWpMCHz
s6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “WAMWB4KYRPTILIX327TQ2CIJDOZIE4YVKHW4VJ33EOKXA2MBZG3A”, “erro
r”: “piecestore: rpccompat: dial tcp 79.112.141.137:28967: i/o timeout”, “errorVerbose”: “piecestore: rpccompat: dia
l tcp 79.112.141.137:28967: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:290\n\tstorj.io/common/rpc.Diale
r.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/uplink/private/piecestore.DialNodeURL:51\n\tsto
rj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:66\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece
:220\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:247\n\tstorj.io/storj/storagenode/gracefulex
it.(*Worker).Run.func2:111\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Can you help me?

it is strange, I mage GE from stefab-benten in june from 10 nodes in row. all was OK. on first node was 370GB and it was made in 3-4 days. other was samller and made faster.

One difference, Load of network with test data was very small in differ to now.

Thank for your concern.
Yes its strange, I got 16365 “connection reset by peer” messages out of 50629 gracefulexit messages in the log.
172.17.0.2 is a docker address.
78.94.240.189 is prod.stefan-benten.de its a node 17 hops and 32ms away from me and I see a lot of messages like :
2020-07-15T23:35:35.722Z INFO gracefulexit:chore piece transferred to new storagenode {“Storagenod
e ID”: “12sp6mbM2kvLbAhiKGQavae1tGRfc7PnUYT9ARSwZRckrPqNEQ4”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJ
exMtSkmKxvvAW”, “Piece ID”: “YNB2NOM7CGRJPWCZ7PAFTEWUEEZZQLB4ZKAIRFWZAEM7LEDQOJ2A”}

looking like a successful transfer followed within 1ms by:

2020-07-15T23:35:35.723Z ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “11
8UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “write tcp 172.17.0.2:59476->78.94.240.189:7777: write:
connection reset by peer”, “errorVerbose”: “write tcp 172.17.0.2:59476->78.94.240.189:7777: write: connection reset
by peer\n\tstorj.io/drpc/drpcstream.(*Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:321\n\tstor
j.io/common/pb.(*drpcSatelliteGracefulExitProcessClient).Send:1345\n\tstorj.io/storj/storagenode/gracefulexit.(*Work
er).transferPiece:309\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).Run.func2:111\n\tstorj.io/common/sync2.(
Limiter).Go.func1:43”}

I have a fast internet, but still the nearest node I can reach is well over 2ms is my link to fast? I have no clue on what I can do on a “connection reset by peer” message.

I would like recommend to keep your full logs from that storagenode and keep it online for a while.
I asked for help from Stefan, I hope he could take a look into this. If he would not have a time, I’ll create an internal issue (or two) for both cases.

I have the full log saved, if you want it just let me know where to send it, its 153MB. The node is running, its my most profitable one since July 9th with a steady egress in the 40GB/day. I tested the GE on this node as it has the less withhold 28 from Stefan. What I'm wondering, is it safe to do a GE from my others nodes. My first experience is pretty bad, my others 2 nodes have been running steady since the beginning, I have no reasons be concerned about data corruption and I don't have any tools to check the node integrity. Stefan sat bring pennies/month, the chances to have my system crashing in the next few months and loosing a few hundreds ($380 whithhold) doesn’t compare with making a few dimes from keeping this sat online.
What should I do with my others nodes?

You can send them via https://alpha.transfer.sh

I would like to recommend to postpone it until v1.8.x

1 Like

I’d keep at least one node online for that satellite as it was mentioned that it would be repurposed. You might be sorry if you exit them all.

1 Like

Good day.
Any information regarding my node?
What are the terms for solving the problem?
Any news?

I know that the fix is included into v1.8.x
But I do not have a ETA.

Sorry I did not understand how alpha.transer works but you can found my log-file on this link http://dl.free.fr/rTpAgyIDD

I have a 1TB node which got me a $82 income from Stefan with 238 withheld and brought me .15 in June. Either I keep it going risking a disk crash and loosing all the withheld or I GE and get the $238 to buy a 12TB drive with it and keep going. Most of the rumors say Stefan is going to close, a 12TB drive would bring me a better income and would expand Storj capacity. Isn’t that the goal?

3 Likes

I don’t have no clue why the $ signs sometime disappear, but I meant USD 238 and USD 0.15 in the previous post.

1 Like