Changelog v0.28.2

Garbage Collection
Every 5 days the satellite will send a bloomfilter to the storage node. The storage node will use it to identify garbage pieces. In the storage node log you will notice a few delete messages but on disk the space will still be used for additional 7 days. The logging is currently missleading. First we move the pieces into a trash folder and 7 days later we delete it without any additional log messages. If something goes wrong and the satellite sends empty bloomfilters we can send a command to all storage nodes and they will automatically recover the content of the trash folder. It is better to have a disaster recovery plan even if it is unlikely to happen.

Trusted Satellite List
The storage node now gets the list of trusted satellites from https://tardigrade.io/trusted-satellites This allows us to add trusted satellites without having to deploy a new release.

Bugfix Order DB Lock
A few storage nodes had errors messages because the orderdb was locked by a long running query. After upgrading your storage node you might see the error messages one last time.

Graceful Exit
I am sorry we missed to implement one important restriction. In the terms we mentioned that graceful exit is only available for storage nodes that are at least 15 month old. Without that restriction it would be possible to game the system. A 15 month restriction is a bit too much for our early storage nodes. Most likely we will start with 6 month. Graceful exit is still disabled until we implemented this restriction.

Warning: Garbage collection is messing up the free space calculation. If your storage node is full and executes garbage collection it might accept more pieces than it has free space. At the end it will get disqualified. We are working on a fix. Please reduce your allocated size to make sure the storage node is not running into a crash. Bug is now fixed: Changelog v0.28.4 (storage node only)

6 Likes

I think it typo: “garbage collection” should be " Graceful Exit"

1 Like

Corrected. Thank you.

1 Like

3 posts were split to a new topic: Error creating tables for master database on storagenode: migrate: creating version table failed: migrate: database disk image is malformed

I received the email today about updating to the new release. Nice instructions for the Docker folks. Absolutely no instructions for those of us running a host node on Windows and not using docker.

1 Like

@bney because, on windows GUI version it update automaticly.

2 Likes

one of my nodes hang

2019-12-19T01:16:28.299Z        ^[[34mINFO^[[0m piecestore      downloaded      {"Piece ID": "WRN3KCPFGFJASMI2RB2MA32HNUNZKG27OJ2F6OUP5XKUP5AYWGGQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-12-19T01:16:30.280Z        ^[[34mINFO^[[0m piecestore      upload started  {"Piece ID": "XO3HBKR37PSZ7L6YBQRNONTSSFHISF4UB5IKCO2UAIFXQPDFMKKQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT"}
2019-12-19T05:00:36.163Z        ^[[34mINFO^[[0m Configuration loaded from: /mnt/node19/config.yaml
2019-12-19T05:00:36.163Z        ^[[35mDEBUG^[[0m        debug server listening on 127.0.0.1:38909
2019-12-19T05:00:36.172Z        ^[[34mINFO^[[0m Operator email: [censored]
2019-12-19T05:00:36.172Z        ^[[34mINFO^[[0m operator wallet: [censored]
2019-12-19T05:00:36.187Z        ^[[35mDEBUG^[[0m        Binary Version: v0.28.2 with CommitHash 4b1939981a19095e1e71bf5f8c742e055406dd98, built at 2019-12-18 13:45:13 +0000 UTC as Release true
2019-12-19T05:00:36.679Z        ^[[35mDEBUG^[[0m        version allowed minimum version from control server is: v0.27.0
2019-12-19T05:00:36.679Z        ^[[34mINFO^[[0m version running on version v0.28.2
2019-12-19T05:00:36.680Z        ^[[35mDEBUG^[[0m        telemetry       Initialized batcher with id = "1aGsGmNPwGzwQvr2RMGDoeedsdGrC9wKnY1ZqEvxo1s36Lxbd4"
2019-12-19T05:00:36.843Z        ^[[34mINFO^[[0m db.migration    Database Version        {"version": 28}
2019-12-19T05:00:37.402Z        ^[[35mDEBUG^[[0m        trust   Fetched URLs from source; updating cache        {"source": "https://tardigrade.io/trusted-satellites", "count": 4}
2019-12-19T05:00:37.433Z        ^[[35mDEBUG^[[0m        trust   Satellite is trusted    {"id": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2019-12-19T05:00:37.433Z        ^[[35mDEBUG^[[0m        trust   Satellite is trusted    {"id": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2019-12-19T05:00:37.433Z        ^[[35mDEBUG^[[0m        trust   Satellite is trusted    {"id": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2019-12-19T05:00:37.433Z        ^[[35mDEBUG^[[0m        trust   Satellite is trusted    {"id": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2019-12-19T05:00:37.433Z        ^[[35mDEBUG^[[0m        gracefulexit:chore      checking pending exits
2019-12-19T05:00:37.433Z        ^[[34mINFO^[[0m bandwidth       Performing bandwidth usage rollups
2019-12-19T05:00:37.433Z        ^[[34mINFO^[[0m trust   Scheduling next refresh {"after": "6h46m23.501949099s"}
2019-12-19T05:00:37.435Z        ^[[34mINFO^[[0m contact:chore   Storagenode contact chore starting up
2019-12-19T05:00:37.435Z        ^[[34mINFO^[[0m Node [censored] started
2019-12-19T05:00:37.435Z        ^[[34mINFO^[[0m Public server started on [::]:4319
2019-12-19T05:00:37.435Z        ^[[34mINFO^[[0m Private server started on 127.0.0.1:4219
2019-12-19T05:00:37.436Z        ^[[34mINFO^[[0m pieces:trashchore       Storagenode TrashChore starting up
2019-12-19T05:00:37.436Z        ^[[35mDEBUG^[[0m        pieces:trashchore       starting EmptyTrash cycle
2019-12-19T05:00:37.437Z        ^[[35mDEBUG^[[0m        orders  sending
2019-12-19T05:00:37.437Z        ^[[35mDEBUG^[[0m        orders  cleaning
2019-12-19T05:00:37.437Z        ^[[35mDEBUG^[[0m        gracefulexit:chore      no satellites found
2019-12-19T05:00:37.438Z        ^[[34mINFO^[[0m orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW       sending {"count": 10}
2019-12-19T05:00:37.438Z        ^[[34mINFO^[[0m orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      sending {"count": 9}
2019-12-19T05:00:37.438Z        ^[[34mINFO^[[0m orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      sending {"count": 1}
2019-12-19T05:00:37.515Z        ^[[34mINFO^[[0m orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW       finished
2019-12-19T05:00:37.529Z        ^[[34mINFO^[[0m orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      finished
2019-12-19T05:00:37.551Z        ^[[35mDEBUG^[[0m        version allowed minimum version from control server is: v0.27.0
2019-12-19T05:00:37.551Z        ^[[34mINFO^[[0m version running on version v0.28.2
2019-12-19T05:00:37.638Z        ^[[34mINFO^[[0m piecestore:monitor      Remaining Bandwidth     {"bytes": 31908091781120}
2019-12-19T05:00:37.641Z        ^[[35mDEBUG^[[0m        contact:chore   Starting cycle  {"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2019-12-19T05:00:37.641Z        ^[[35mDEBUG^[[0m        contact:chore   Starting cycle  {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2019-12-19T05:00:37.641Z        ^[[35mDEBUG^[[0m        contact:chore   Starting cycle  {"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2019-12-19T05:00:37.641Z        ^[[35mDEBUG^[[0m        contact:chore   Starting cycle  {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2019-12-19T05:00:37.710Z        ^[[35mDEBUG^[[0m        contact:endpoint        pinged  {"by": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "srcAddr": "34.76.59.218:51908"}
2019-12-19T05:00:37.737Z        ^[[35mDEBUG^[[0m        contact:endpoint        pinged  {"by": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "srcAddr": "78.94.240.180:53924"}
2019-12-19T05:00:37.786Z        ^[[35mDEBUG^[[0m        orders  cleanup finished        {"items deleted": 1126}
2019-12-19T05:00:37.846Z        ^[[34mINFO^[[0m orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      finished
2019-12-19T05:00:38.216Z        ^[[35mDEBUG^[[0m        contact:endpoint        pinged  {"by": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "srcAddr": "146.148.100.200:52680"}
2019-12-19T05:00:39.125Z        ^[[35mDEBUG^[[0m        contact:endpoint        pinged  {"by": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "srcAddr": "104.199.235.109:45724"}
2019-12-19T05:00:56.571Z        ^[[34mINFO^[[0m piecestore      upload started  {"Piece ID": "735DG5YORNCAJETAUI73GO2AGKEEMVVPNWQUJ6LB5XLYGTKVHUVQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2019-12-19T05:01:00.379Z        ^[[34mINFO^[[0m piecestore      uploaded        {"Piece ID": "735DG5YORNCAJETAUI73GO2AGKEEMVVPNWQUJ6LB5XLYGTKVHUVQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2019-12-19T05:01:09.529Z        ^[[34mINFO^[[0m piecestore      deleted {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "735DG5YORNCAJETAUI73GO2AGKEEMVVPNWQUJ6LB5XLYGTKVHUVQ"}
2019-12-19T05:01:31.475Z        ^[[34mINFO^[[0m piecestore      upload started  {"Piece ID": "QD4LLQTIQOHPSOAHN3EJRRETMTPGWF4CEV63FUUODUPTC3T25BEA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}
2019-12-19T05:01:31.722Z        ^[[34mINFO^[[0m piecestore      uploaded        {"Piece ID": "QD4LLQTIQOHPSOAHN3EJRRETMTPGWF4CEV63FUUODUPTC3T25BEA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PU

i restarted it manually
Where can I complain and give advice to developers ?

screenshot of syslog in that time

from 4:42 i try to restart service manually

NEED bindings to linux\systemd watchdog\notify

Does the node move or copy+delete the files to “trash” when doing garbage collection?

It does quite a lot of IO during that operation, so it appears to me that the files are copied and then deleted from the original location instead of moved.

1 Like

2 posts were merged into an existing topic: Error piercestore protocol: rpc error: code = canceled desc = context canceled

Same here (Windows), this needs to be fixed…

1 Like

Warning: Garbage collection is messing up the free space calculation. If your storage node is full and executes garbage collection it might accept more pieces than it has free space. At the end it will get disqualified. We are working on a fix. Please reduce your allocated size to make sure the storage node is not running into a crash.

2 Likes

Instructions on how to change your allotted space are here

This warning should have gone out as an email blast to all SNOs, since it has ability to crash the nodes. I would expect critical information like that to be communicated directly.

We are working on getting the info out to everyone who can be affected! Littleskunk’s posting in the forum is the fastest first step. Thank you!!

2 Likes

5 posts were merged into an existing topic: Node offline after update to the next version

A post was merged into an existing topic: Node offline after update to the next version

Bug is now fixed:

5 Likes