I cannot explain, I have only facts. The supposedly fast disk with supposedly fast hardware with supposedly without any HW bugs have the issue with “database is locked”, thousands other nodes doesn’t have it. Is it a software bug? Could be, but unlikely due to number of the exposition where they do not have it. Is it possible that this exact setup have bugs - yes, but also I have no tools to check that. What I have? The “database is locked” issue and much more often even on HW referenced friends of the owner. What can I assume? That this particular setup have greater problems than other references. Is it a software bug - may be, but unlikely.
What’s the option to mitigate? Move DB to another disk. However, the author is resist to do so. What my options - no one. Sorry.
You’ve given me reasons for why this being a software issue is unlikely.
I’m giving you reasons why this being a hardware bug is IMPOSSIBLE.
There is no hardware issue that can explain it effectively being slow only on one sqlite3 database at a time. None. That is prima facia a software issue.
And when you’ve eliminated the impossible, whatever remains, however unlikely, must be the truth.
I’m glad. But I have only one suggestion which seems not so impactful, please move databases to another disk. Please, move them to any other disk. I can help you to do so. You likely will remove your nightmares once and forever. Please?
the alternative is to move to LVM with SSD cache layer or ZFS with the special device (since you have 64GB of RAM).
Fine, since you have asked nicely. I have done so, and will let it run for a while.
But understand something. Even if I don’t see a single additional database lock, you will never ever convince me that a hardware issue was responsible for only locking a single sqllite3 database on one run, and only a single different database on another run. Because. It. Is. Impossible.
Best case scenario for you is that the faster speed somehow compensates for the bug. But that there is a software bug that is responsible for the locking behavior in the first place is undeniable. Hey, the bug could somehow be in my OS (not sure how, pretty vanilla install, but whatever)! But the idea that it is hardware causing only one sqllite3 database at a time to be locking, and it picks a different database each time to lock exclusively, is ludicrous on its face.
Honestly, I won’t be all that surprised that something like taking this step does resolve it. Because that would explain how it isn’t affecting everyone, which you strangely seem to think is a prerequisite for anything that qualifies as a bug. There must be some mitigation for this that is covering up the bug for the people it has affected. But that it is a software bug on some level is certain, because hardware CANNOT explain what you’re trying to have it explain here. Period.
Perhaps, no. However, I used stats from the other nodes (hundreds requests for a minute…), and the conclusion is - you must move your databases to the another drive. End of the message.
Sorry to be rude… Unfortunately this is the only solution I would give you: please move databases to the another drive.
“Impossible” is a very big word.
“Highly improbable” is more likely. Who knows, you may have some weird combination of hardware or some setting that is causing these problems.
But I agree, that does seem odd.
It appears that stress testing the nodes has been very fruitful in that it has brought up a whole load of issues.
I have never known so many posts on this forum of people with quite legitimate questions or concerns about performance or odd node behaviour.
I would suggest, @Alexey, that you pass on to the engineers that there are a lot of odd behaviours being reported that may have been hidden with lower loads and are surfacing now. I’m sure you already do, to be fair, but although most people seem to be running the nodes just fine (my “potatoes” are surviving, although I think I’ll be changing the OS drive to a NVMe stick to better handle logging and databases) there sure seem to be a lot of niggles that merit attention.
Just my non-IT-Professional two cents.
With nodes growing in size and additional stress data brought to them, this comes not surprising.
The solution: We need to get rid of databases and filewalking processes as much as possible. Even if these being optimized now the problem will re-surface with even larger nodes and even more data on them in the future. Also current fix to move databases to an SSD is more band aid than solution.
Exactly. This stress test clearly has shown limitations in some of the design options of the software.
Does need addressing before nodes get even bigger and network throughput increases even more. I’m sure people high up the ranks in Storj labs are paying a lot of attention to this.
with daily 30G ingress it was not a big deal if you restarted the pod, because if you missed like 2-3 days of new progress you missed only like 100G from your stats, however with some cases a daily TB income, missing 2-3 days of disk usage data can be 2-3TB, which is far over for most peoples 10% safety net.
In the hopes that this is true, allow me to give a bit more detail on what I have been experiencing for months now (this locking issue was affecting me even when my node was only around 3TB, rather than the 12+TB it is at now).
Right now, piece_expiration.db seems to be roughly 12x or so the size of bandwidth.db. I think the difference was considerably less when my node was much smaller. That shifting size difference roughly corresponds to how often it would pick one database or the other to lock exclusively. RIght now, pretty much 11 out of 12 restarts, it’ll lock piece_expiration.db and ONLY piece_expiration.db, while access to bandwidth.db will be completely error free. But 1 in 12, it will ONLY lock bandwidth.db - I can’t really be sure if it’s doing it more or less frequently, but if there’s a difference it’s not obvious - while access to the much larger piece_expiration.db will be completely error free.
I suspect if I restarted the node a few hundred/thousand times, I’d see it affect the 3rd largest db, storage_usage.db (seem to be around 192k) while bandwidth.db and piece_expiration.db would be fine.
How that could be the fault of a disk drive just being slow in general, such that it is only slow on the 800mb database this run, and only slow on this other 60mb database on the next run, is… I can’t fathom any possible mechanism by which that could possibly happen.
It could in theory be an OS setting, sure, though I have made very very few modifications from the default. But I would consider that to be a software issue, not hardware. There’s no way that simply “ur disk is too slow” covers it. A faster disk for the databases may compensate for whatever is causing this bizarre behavior - but that would clearly be a bandaid, not a treatment for the underlying cause.
Maybe their SQLite implementation requires only a little bit of tuning. Check this out: https://archive.is/Xfjh6
Awesome find, those look like exactly the right kinds of tuning that are needed here.
If somebody were to give me some guidance on where exactly I could edit some of these settings within the storj docker structure (I am very good with various versions of SQL, but not so much with docker), I’d be happy to put my dbs back on the disks and try playing with these settings to see if one of them would fix the problem.
Or better yet, a Storj engineer can try their best bet on the settings and send me test versions, since the problem is so readily reproducible for me.
Assuming my node survives the day, of course.
Btw by enabling fw on startup and disabling lazy, my Pi crashes after 2 days.
Yes, of course I passed this to the team. @Qwinn’s setup for me looks very correct, even if I personally do not like Seagate HDDs, it shouldn’t struggle in my opinion. But as showed - it’s happening.
The only problem that it happening on a small subset of setups. And I failed to figure out what could be a culprit in @Qwinn’s setup. Perhaps the disk is somehow broken? But S.M.A.R.T. is ok
Unfortunately right now I cannot suggest anything better than moving databases to another disk, to any less loaded disk or SSD if possible.
Hm. Perhaps you found it. What’s the size of the piece_expiration.db
and bandwidth.db
databases?
What else databases were locked? Please show their sizes too.
It could be possible that the size may play a role.
Or simple
ls -l --si /mnt/storj/storagenode/storage/*.db
For my biggest node (6TB):
$ ls -l --si /mnt/x/storagenode2/storage/*.db
-rw-r--r-- 1 root root 76M Jun 16 07:37 /mnt/x/storagenode2/storage/bandwidth.db
-rw-r--r-- 1 root root 25k Jun 16 08:30 /mnt/x/storagenode2/storage/garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 173k Jun 16 04:10 /mnt/x/storagenode2/storage/heldamount.db
-rw-r--r-- 1 root root 17k Jun 15 04:06 /mnt/x/storagenode2/storage/info.db
-rw-r--r-- 1 root root 25k Jun 15 04:06 /mnt/x/storagenode2/storage/notifications.db
-rw-r--r-- 1 root root 33k Jun 15 04:36 /mnt/x/storagenode2/storage/orders.db
-rw-r--r-- 1 root root 175M Jun 16 08:06 /mnt/x/storagenode2/storage/piece_expiration.db
-rw-r--r-- 1 root root 25k Jun 15 04:06 /mnt/x/storagenode2/storage/piece_spaced_used.db
-rw-r--r-- 1 root root 25k Jun 15 04:06 /mnt/x/storagenode2/storage/pieceinfo.db
-rw-r--r-- 1 root root 25k Jun 15 04:06 /mnt/x/storagenode2/storage/pricing.db
-rw-r--r-- 1 root root 37k Jun 16 08:08 /mnt/x/storagenode2/storage/reputation.db
-rw-r--r-- 1 root root 33k Jun 15 05:06 /mnt/x/storagenode2/storage/satellites.db
-rw-r--r-- 1 root root 25k Jun 15 04:06 /mnt/x/storagenode2/storage/secret.db
-rw-r--r-- 1 root root 1.2M Jun 16 04:10 /mnt/x/storagenode2/storage/storage_usage.db
-rw-r--r-- 1 root root 21k Jun 15 04:06 /mnt/x/storagenode2/storage/used_serial.db
-rw-r--r-- 1 root root 25k Jun 15 04:06 /mnt/x/storagenode2/storage/used_space_per_prefix.db
What was the reason of crash? The FATAL error? OOM?
Unfortunately nothing in the storj logs:
2024-06-15T23:01:01Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "2IBW72X4IBKYD3YSXYUWZQZKPTVVYOJOI5K3O2SEGBC5BRS35YLQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "79.127.205.227:34932", "Available Space": 1016587703236}
2024-06-15T23:01:01Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "6U4TFC47QOH7V3YQPRLBTTKG2NMIXLBVYLULVJBKUZH4G544WDBQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "79.127.205.240:36938", "Available Space": 1016585449924}
2024-06-16T07:59:25Z INFO Configuration loaded {"Process": "storagenode", "Location": "/app/config/config.yaml"}
2024-06-15T23:01:01Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "WHAEZXQNKQAUAIMYCMZRF2U2DP7SHLCFRSIDDGQGKHDAVEHC7FLQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_REPAIR", "Offset": 0, "Size": 2319360, "Remote Address": "199.102.71.57:46938"}
2024-06-15T23:01:01Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "MKOZ6FRG5PKUWA4US4WKM4IS5PIFWMHYQZVM4UV6EZTOB7L2GM4A", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "79.127.219.42:59328", "Available Space": 364150549606}
2024-06-16T07:59:25Z INFO Configuration loaded {"Process": "storagenode", "Location": "/app/config/config.yaml"}
any hint where to look?
the PI was not accessible any more by SSH, so I had to reboot it by disabling/enabling energy.
Hello Alexey
Sorry for the delay with the wiki, i was in full scripting mania and have now a sript ready to show.
For now i have it on Google Drive:
$ ~/storj/nodes# ./node-used-space-progress.sh
Date (local time): 2024-06-16T09:31:00+02:00, Date (sec): 1718523060
Satellite: US1, Subfolder scanning: xq, Progress: 73,44%, 752 folders out of 1024 done.
Average time in seconds per folder: 224
Remaining folders: 272
Remaining time estimation : 0 day(s) 16 hour(s) 55 minute(s).
Filewalker current runtime: 1 day(s) 22 hour(s) 55 minute(s).
What is now the best way to make it more public?
Edit my linked post and add it at the end or create a new thread for this and link it too?
Currently it works at least on:
- Debian 12 Bookworm
- Raspeberry OS Bookworm
- Armbian 24.2.1 Bookworm
But i have no Fedora, FreeBSD, Solaris or other *ix systems to test.
It’s up on you. Usually scripts is easier to host on GitHub, but let me know, what approach would you like to use. We always can move a separate post to a new thread to the getting started - Storj Community Forum (official) category for example and make it a wiki (to allow infinite edits).
Hm, looks like an external event and seems the abrupt one. Then dmesg
or journalctl
.