Hi Alexey. I ran chown
on the entire folder the node uses. I do not run as root. The space still resets to zero. Same behaviour as I described before.
Please, enable the debugging log level in your config.yaml
(log.level: debug
), save the config and restart the node. Then check for any errors related to databases.
After updating to 0.27.1, my main node (1.6TB as per the config), which was full, reported zero disk usage. I did not check the dashboard at first after updating, just the container logs, and all seemed fine.
However, what worries me is that the node has apparently started getting data.
I have a dedicated hard-drive for its mount point, and the mount works alright. The storage directory lists succesfully and is the expected size (1546388M as of writing this), however according to the node dashboards (both CLI and web-based GUI) I have used 18.9GB and the logs reveal plenty of uploads. The timestamps of files within the storage dir are recent, which means that the container still uses this very directory.
The configs were not changed in any way, nor was the launch script that I use.
I am also running a second node with another hard-drive dedicated to it (since the first one was full), and that worked just fine.
I am at a complete loss here, and I am concerned about effectively all the data stored, and filling out the drive to its capacity since the old files are there but not taken into account.
What do I do guys?
E: FWIW, the successrate checker script doesnāt report anything out of the ordinary either:
Successful: 35
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 5059
Failed: 199
Success Rate: 96.215%
========== UPLOAD ============
Successful: 22679
Rejected: 0
Failed: 245
Acceptance Rate: 100.000%
Success Rate: 98.931%
========== REPAIR DOWNLOAD ===
Successful: 330
Failed: 26
Success Rate: 92.697%
========== REPAIR UPLOAD =====
Successful: 113
Failed: 0
Success Rate: 100.000%
Hi fragamemnon,
I have encountered a strikingly similar situation, however my circumstances are slightly different. And for me this happened while running v0.26.2. I have not updated to v0.27.1 as I have still not resolved the issue.
Update: After a graceful reboot, the node will not start due to insufficient disk space error (the drive is 2TB, with 288GB currently free).
Makes sense since the minimum space requirement is 500GB and the node canāt see that the used space is actually being used by storj.
When I updated and ran the node for the first time, the disk had ~300GB free, which is still well below the minimum. It ran succesfully.
This seems exactly like the pattern of behaviour I saw. Although I have more than enough free space, on the first restart when my used space went back to zero, the node did not detect that my allocated space was less than the free space, and gave no errors to that effect. On subsequent restarts my node detected that the allocated space was less than the free space, gave the error, and changed the allocated space to match. (all verified in the logs)
I am still unable to get the node to read the old database.
Iād be reluctant to delete the possibly abandoned data because I risk deleting something else by accident (timestamps do not give me absolute certainty).
Still looking for advice, downtime is growingā¦
@Alexey , could you chime in?
Please, enable a debug
log level for your storagenode (log.level: debug
in your config.yaml
), save the config and restart the node.
Please, check logs of the node for some errors related to any database.
Thanks for chiming in, Alexey.
Hereās the log output:
2019-12-16T07:06:10.201Z INFO Configuration loaded from: /app/config/config.yaml
2019-12-16T07:06:10.202Z DEBUG debug server listening on 127.0.0.1:33127
2019-12-16T07:06:10.219Z INFO Operator email: fragamemnon@overclocked.net
2019-12-16T07:06:10.219Z INFO operator wallet: 0x86e846f759da3025b65732ec133f41c16bc5aa6b
2019-12-16T07:06:10.675Z DEBUG Binary Version: v0.27.1 with CommitHash 08a5cb34f1f43e6c851dabbd3f9e9d1510534dcd, built at 2019-12-11 12:51:29 +0000 UTC as Release true
2019-12-16T07:06:11.289Z DEBUG version allowed minimum version from control server is: v0.26.0
2019-12-16T07:06:11.289Z INFO version running on version v0.27.1
2019-12-16T07:06:11.290Z DEBUG telemetry Initialized batcher with id = "128oWRvkvoWtetkJ6ntVzU9KJhPvkrJnwqArHxFZVCxvyDQxb5J"
2019-12-16T07:06:11.296Z INFO db.migration Database Version {"version": 26}
2019-12-16T07:06:11.297Z DEBUG gracefulexit:chore checking pending exits
2019-12-16T07:06:11.297Z INFO contact:chore Storagenode contact chore starting up
2019-12-16T07:06:11.297Z INFO Node 128oWRvkvoWtetkJ6ntVzU9KJhPvkrJnwqArHxFZVCxvyDQxb5J started
2019-12-16T07:06:11.297Z INFO Public server started on [::]:28967
2019-12-16T07:06:11.297Z INFO Private server started on 127.0.0.1:7778
2019-12-16T07:06:11.298Z INFO bandwidth Performing bandwidth usage rollups
2019-12-16T07:06:11.298Z INFO pieces:trashchore Storagenode TrashChore starting up
2019-12-16T07:06:11.298Z DEBUG pieces:trashchore starting EmptyTrash cycle
2019-12-16T07:06:11.298Z DEBUG orders cleaning
2019-12-16T07:06:11.299Z DEBUG orders sending
2019-12-16T07:06:11.299Z DEBUG gracefulexit:chore no satellites found
2019-12-16T07:06:11.303Z DEBUG orders no orders to send
2019-12-16T07:06:11.329Z DEBUG orders cleanup finished {"items deleted": 0}
2019-12-16T07:06:11.333Z INFO piecestore:monitor Remaining Bandwidth {"bytes": 59926975982336}
2019-12-16T07:06:11.334Z WARN piecestore:monitor Disk space is less than requested. Allocating space {"bytes": 340815607040}
2019-12-16T07:06:11.334Z ERROR piecestore:monitor Total disk space less than required minimum {"bytes": 500000000000}
2019-12-16T07:06:11.334Z ERROR piecestore:cacheUpdate error getting current space used calculation: {"error": "context canceled"}
2019-12-16T07:06:11.334Z ERROR version Failed to do periodic version check: version control client error: Get https://version.storj.io: context canceled
2019-12-16T07:06:11.335Z ERROR piecestore:cacheUpdate error persisting cache totals to the database: {"error": "piece space used error: context canceled", "errorVerbose": "piece space used error: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceSpaceUsedDB).UpdateTotal:115\n\tstorj.io/storj/storagenode/pieces.(*CacheService).PersistCacheTotals:82\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:68\n\tstorj.io/storj/private/sync2.(*Cycle).Run:87\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:63\n\tstorj.io/storj/storagenode.(*Peer).Run.func6:445\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2019-12-16T07:06:11.362Z FATAL Unrecoverable error {"error": "piecestore monitor: disk space requirement not met", "errorVerbose": "piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:118\n\tstorj.io/storj/storagenode.(*Peer).Run.func2:433\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
This is the consequence of losing databaseā¦
The only way - to free up some space on that drive to be able to run a node. Since it doesnāt recognized the own pieces, it can see only the free space. It should have at least 500GB to be able to start a node.
Please, stop and remove the container, then check the piece_spaced_used.db
with this instruction:
So it seems I might have been a bit impatient when I reported back initially. After replacing the owner on the entire node working folder, I started the node and the used space started at 0 again. I had let the node run for about 60s, the used space had gone up to about 100 MB. I self-declared the problem unfixed and logged out. However I had forgotten to stop the node. When I remembered about an hour later, I logged back in and saw that the node was showing 1.9 TB of used space. I tried restarting the node, and the used space remained at 1.9 TB. This was all while the node was running v0.26.2. (this all happened Dec 14), but I havenāt been able to report back until now)
I forgot to dump the logs before watchtower updated to v0.27.1, so unfortunately I canāt get any info about what may have fixed the problem. I guess it was the ownership issue, although I donāt know what could have changed as the node had been running fine for months. And it seems that the node was properly tracking all of the pieces during this since I have not seen an unrecoverable failed audits.
Thanks for your help @deathlessdd and Alexey. Looks like I am back up and running.
Iāve follwed the database repair steps. However, the HDD is of 2TB capacity, and I have 1.6TB of node data.
How should I proceed about freeing up space?
Please, open the config.yaml
in your data folder, try to uncomment (remove the #
) this line in the config:
# storage2.monitor.minimum-disk-space: 500.0 GB
and replace the 500.0 with 300.0, save the configuration file and restart the container
docker restart -t 300 storagenode
Then check is it able to start and recognize the space?
2 posts were split to a new topic: Storagenode had corrupted data in over 400 blocks on the hard-drive storage
Hello all,
Iām quite new to the storj community, and have the same problem.
Each time docker container is restarted Disk Space Remaining is being reset.
I checked the logs, and see this line
2020-01-09T19:46:03.215Z ERROR piecestore:cache error getting current space used calculation: {āerrorā: ālstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/ys/r4gkofts26bk5m3gflnvyvkhjttix5tods2577zl2onse6fceq.sj1: structure needs cleaningā}
Is it anything I can fix?
Thanks in advance!
This suggest file system issues. Good idea to run e2fsck to try and fix it.
Are you saying storj node corrupted clean new disk for less than a month?
Doesnāt look feasible for me. As far as I understand (and see in dashboard) new nodes donāt have much traffic. What are they doing with the disk if it got corrupted that fast?
For me this looks more like some software error.
For @Dylan, itās āalmostā exactly as in this document: https://documentation.storj.io/setup/cli/storage-node
docker run -d --restart unless-stopped -p 28967:28967 \
-p 14002:14002 \
-e WALLET="0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
-e EMAIL="user@example.com" \
-e ADDRESS="ip:28967" \
-e BANDWIDTH="10TB" \
-e STORAGE="3TB" \
--mount type=bind,source="/media/drive/identity/storagenode",destination=/app/identity \
--mount type=bind,source="/media/drive/storj",destination=/app/config \
--name storagenode storjlabs/storagenode:beta