Upcoming storage node improvements including benchmark tool

Roxor · May 8, 2024, 12:42am

That’s an interesting question. I don’t think it would… as the used-space-filewalker is just reading metadata… and this fsync change only affects writes?

Ambifacient · May 8, 2024, 12:46am

I believe only indirectly. We now have reduced IOPS because fsync calls are fewer and certain operations can be merged into one operation when the data is eventually offloaded from memory. So in theory reads have more IOPS budget.

Mitsos · May 8, 2024, 12:50am

That was my train of thought as well. If the lazy filewalkers (all of them) aren’t waiting around for IO to finish before resuming, they may get a few seconds more of runtime.

jammerdan · May 8, 2024, 3:27am

What are similar issues? I have seen different things like OOM-kills, Docker forced container removal, kills from the node software due to read or write timeout, reboot on node not stopping on time and there is a report here:

Would these situations with force-sync=false lead to losing pieces and failing audits?

nerdatwork · May 8, 2024, 3:29am

I think its the recent segment failure that occurred for a storage node made Storj add this flag to prevent future occurences.

arrogantrabbit · May 8, 2024, 3:58am

No, only when filesystem is not synchronized prior to power off. So, panics, hangs, system resets.

Some (most?) filesystems allow to specify sync behavior regardless of what app wants. I have sync disabled on storagenode datasets from day one. But I have an UPS and an OS that does not reboot out of the blue.

snorkel · May 8, 2024, 3:58am

What’s the size of a piece in this test? If I want to test with 10 GiB of data, what number of pieces should I use?
A good install guide for non-coders to install all dependencies would be useful. Put it in the first post.
Or some binnaries, if the tool is final, and dosen’t change daily. I don’t want to install or sort of stuff for a test, on my NASes. Thanks!
Where goes that flag in the run command? Before or after the container name? What’s the exact syntax?
Will the default value be --filestore.force-sync=false ?

agente · May 8, 2024, 6:05am

Ok… when we can use it on nodes?
I’m in a mess with entire server at the moment

Vadim · May 8, 2024, 7:21am

hello.
where can i find this script code?
I will try to make some windows app to windows users for this

nerdatwork · May 8, 2024, 7:49am

go install ./cmd/tools/piecestore-benchmark/
# storj.io/storj/shared/dbutil/sqliteutil
shared\dbutil\sqliteutil\db.go:86:28: undefined: sqlite3.Error
shared\dbutil\sqliteutil\db.go:87:25: undefined: sqlite3.ErrConstraint
shared\dbutil\sqliteutil\migrator.go:104:24: destDB.Backup undefined (type *sqlite3.SQLiteConn has no field or method Backup)

littleskunk · May 8, 2024, 7:57am

For windows you also need cgo

nerdatwork · May 8, 2024, 8:11am

 & 'C:\Program Files\Go\bin\go.exe' version
go version go1.22.3 windows/amd64

littleskunk · May 8, 2024, 8:13am

By default it the benchmark is using 62068 Bytes. * 29 pieces that is a 1.8MB file. We wanted to go with a good default size that is somewhere in the middle between a big and a small file.

A 10 GB file would get split in 64MB segments first. So 64MB/29 is the piece size for big files. The benchmark doesn’t care about this limitation. You could go bigger and therefor test what would happen if the segment size would be higher. --piece-size

Not possible at the moment. The PR needs to get merged first and even that only removes one line from the installation instructions.

Golang is easy to install. I would bet there are some instructions out there for your NAS. I am not saying you have to. I am just saying it is easier enough to do it on a NAS. And all the compiling artifacts are kept in one directory so you can remove the compiled binaries very easy. They don’t get copied all over the place.

Like for any other flag. You can do via config file, environment variable or run command flag.

Yes that will be the default.

littleskunk · May 8, 2024, 8:25am

Try this: Go & SQLite on Windows. I was in need of a fast in-memory and… | by Aravind Yarram | Medium

ACarneiro · May 8, 2024, 9:01am

I’m feeling slightly uneasy about this tradeoff of data integrity for performance.

It seems to me that there will be increased instances of either missed pieces or corrupt ones (as, at any one given time there will be nodes powering off ungracefully, crashing, etc.)

That the audits may not pick up these failures doesn’t mean they are not there.

Conceivably, with the passing of time is it not possible for all the nodes holding one particular piece to have silently corrupted it and lost its data?

Or have the Storj statisticians calculated is the probability of that happening being so vanishingly low that it’s effectively discardable?

BrightSilence · May 8, 2024, 9:12am

That, that’s the one.
In short, if it’s significant enough to cause an issue, it’s significant enough to be found by audits and disqualify your node.
But this is almost certainly never coming anywhere near to being a problem like that. Unless as @littleskunk said, you have more than daily power loss issues.

ACarneiro · May 8, 2024, 9:24am

It’s not so much the issue of my node losing lots of pieces (and thus failing audits and getting disqualified).
It’s more the issue of lots of nodes silently losing small numbers of pieces (thus passing audits) but all of the required nodes for that one segment of data having lost enough pieces that it can’t be reconstructed.

I’m not sure I’m quite able express myself very well on a keyboard

BrightSilence · May 8, 2024, 9:51am

No, I get what you mean. But the point is that the audit system is tuned in such a way that it only allows for survivable losses to silently exist. On average it disqualifies nodes somewhere between 2 and 4 percent data loss and the network could easily deal with even all nodes having about 4% data loss. So there is no added risk here.

snorkel · May 8, 2024, 10:46am

I’m pretty sure I will loose pieces on one of my nodes, because it’s slow, with 1GB RAM, and I always find old pieces in temp. So both nodes on that machine will opt out of this impruvement. They are almost full anyways.
All my nodes have UPS, but as the battery ages, I can’t always guarantee a clean shutdown.
It would be helpful if we have somewere a statistic with lost pieces, maybe on the dashboard that anyone checks. In logs… not so helpful.

nerdatwork · May 8, 2024, 11:35am

piecestore-benchmark  -pieces-to-upload 100000
uploaded 100000 pieces in 2m17.2761269s (43.12 MiB/s)
collected 100000 pieces in 55.3187177s (107.00 MiB/s)

OS: Windows 10 Pro
HDD: Seagate Exos