Node Suspension

Please, use this for audits check instead: Script for Audits stat by satellites
For the logs check use this script: Script: Calculate Success Rates for Audit, Download, Upload, Repair

1 Like

Running the suggested script for audit check in Docker this is what I got…

"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"
{
  "totalCount": 3514,
  "successCount": 3510,
  "alpha": 18.05000000000001,
  "beta": 1.95,
  "score": 0.9025000000000001
}
"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"
{
  "totalCount": 5859,
  "successCount": 5854,
  "alpha": 16.290124999999954,
  "beta": 3.709875,
  "score": 0.8145062499999995
}
"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"
{
  "totalCount": 9630,
  "successCount": 9628,
  "alpha": 18.04999999999995,
  "beta": 1.95,
  "score": 0.9024999999999997
}
"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"
{
  "totalCount": 5957,
  "successCount": 5948,
  "alpha": 19.000000000000014,
  "beta": 1,
  "score": 0.9500000000000001
}
"12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"
{
  "totalCount": 4691,
  "successCount": 4685,
  "alpha": 18.05000000000001,
  "beta": 1.95,
  "score": 0.9025000000000001
}
"12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"
{
  "totalCount": 107,
  "successCount": 104,
  "alpha": 17.06894399902371,
  "beta": 2.8525,
  "score": 0.8568125884780344
}

Can you advise on how to solve the issue, please?

The email contains a timestamp. If you search for that time in your storage node logs you should get very close to the last audit error that triggered suspension mode.

1 Like

So i also got suspended for europe-west-1.

I got suspended by 3 satellites in 3 hours.

Please search for failed audits in your logs. The second script could give you a summary info, but you should search for exactly lines, where audit is failed.
As @littleskunk suggested, try to search on timestamp. The timestamps in the logs in UTC

i have no idea how to run the audit stat script… it returns “jq command not found”

Are not the logs deleted after a node restart?

Is it recommended to keep them bind mounting a directory from the host filesystem to let them survive a container restart?

Thanks

Install it, please.
It’s depends on your distro how to do it.
Debian based - sudo apt install jq, CentOS sudo yum install jq

docker restart -t 300 storagenode will only restart the container without removing it.
Also, you can redirect your logs: https://documentation.storj.io/resources/faq/redirect-logs

Is there anything unusual about this successrate output that could guide me to what the problem with my node being suspended is?

========== AUDIT =============
Successful: 10648
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 454082
Failed: 3359
Success Rate: 99.266%
========== UPLOAD ============
Successful: 1262868
Rejected: 27135
Failed: 110
Acceptance Rate: 97.896%
Success Rate: 99.991%
========== REPAIR DOWNLOAD ===
Successful: 38522
Failed: 0
Success Rate: 100.000%
========== REPAIR UPLOAD =====
Successful: 34328
Failed: 2
Success Rate: 99.994%

Seems not. At least not in the info log level.
Ok, then there is only way is to look into logs on timestamp of the email regarding suspension

pi@storjpi:~ ./audit_satellites.sh "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW" { "totalCount": 3059, "successCount": 3048, "alpha": 19.99999999999995, "beta": 0, "score": 1 } "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE" { "totalCount": 6639, "successCount": 6635, "alpha": 19.99999999999995, "beta": 0, "score": 1 } "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6" { "totalCount": 7841, "successCount": 7826, "alpha": 19.99999999999995, "beta": 0, "score": 1 } "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S" { "totalCount": 7180, "successCount": 7170, "alpha": 19.99999999999995, "beta": 0, "score": 1 } "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs" { "totalCount": 6053, "successCount": 6047, "alpha": 19.99999999999995, "beta": 0, "score": 1 } "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB" { "totalCount": 254, "successCount": 245, "alpha": 19.99993377272333, "beta": 0, "score": 1 } pi@storjpi:~

Please, search for failed audits in your logs. The exact lines with errors from your logs on timestamp of the suspension email

Thank you for providing the information.

Now I redirected the logs as suggested, but i guess there is going to be no trace of the past (due to Storagenode does not start (network connectivity problem ) had to kill the container earlier today).

Because of that, how am I supposed to know how to fix the issue causing the suspension?
Is there a way?

Thanks

It is quite simple. If you fixed it, you will get out of suspension mode and receive uploads from that satellite again (satellite ID can be found here: https://tardigrade.io/trusted-satellites)

If you don’t fix it then the following audits from that satellite will keep failing.

Just got the same email about us-central. It is working beautifully, no clue why this happened.

If it pass audits, your node will went out of suspension. If not, you will see errors in your logs, when audits are failed.
It’s better to switch to debug mode to see more information.
The sign of failed audit either concrete error, or started audit download (“download” and “GET_AUDIT”) for exact piece, but not finished (hided error. I hope it will be visible in the debug level)

Thank you @littleskunk

I guess then because I was suspended from both

  • us-central-1 (AKA 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S)
  • asia-east-1 (AKA 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6)

AND

from the current log…

2020-04-23T00:17:46.278Z	INFO	piecestore	uploaded	{"Piece ID": "M7S735CKBXHL6FTW4P7Z3EH3452XFDZK76XZPMNRZH4P3TQ4O5EA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}
2020-04-23T00:17:49.677Z	INFO	piecestore	uploaded	{"Piece ID": "ROV6PKCEPEHDXFI6G22IHL4GXYR2ZVZVUVAWLY4Y3AEEXOGSP6WA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}
2020-04-23T00:18:01.013Z	INFO	piecestore	uploaded	{"Piece ID": "RYPVNPDLRBD46S27LCWH77BFXL5DXJ2FDXHWSPCVDFFGL452QDZQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}
2020-04-23T00:18:13.638Z	INFO	piecestore	uploaded	{"Piece ID": "HIJARHM2MRBISZ64J5XNQRGBRRI3WJYEPZMWEXCIYCUWEBZ6ZZFA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}
2020-04-23T00:18:14.244Z	INFO	piecestore	uploaded	{"Piece ID": "4K5BHWQUWYPRTQA5UA4KP7AIB55OTLKISFPVCAOCE5AVHPZTNDQA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}


2020-04-23T00:20:18.720Z	INFO	piecestore	uploaded	{"Piece ID": "HATQDNNQ2G3CFCMWDG4ACLCBVRZZQTSOXNAJ4YCDTNWHJQXY456Q", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2020-04-23T00:20:31.826Z	INFO	piecestore	uploaded	{"Piece ID": "35YSKRPOQHS2O64YC7OY45FAJIHZC7ZQYKUWQKFFVTBWKDG6XSLQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2020-04-23T00:20:34.975Z	INFO	piecestore	uploaded	{"Piece ID": "XKK2XTW23OXIHP5VFEN4AC7LYHTZADP35WCNNQRVX5GWUE3JVRBQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2020-04-23T00:20:45.114Z	INFO	piecestore	uploaded	{"Piece ID": "SXYLP4IMI4SHKE65I6V2AK47CU7J7SZSMDZKZUGLS3W3ECLKKNZA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2020-04-23T00:20:50.725Z	INFO	piecestore	uploaded	{"Piece ID": "WTMOGYDLB3MY5Q4PBNEMUCUFXEJVI4GCHB44FRIACL3GUA3UGS3Q", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}

it means I was unsuspended from both nodes?

100% on audits for us-central, how can I get this?