Creating identity on storagenode host, seemingly disrupted existing storagenode identity

kevink · August 16, 2020, 8:49pm

that’s impossible unless your whole setup is screwed…

Apart from that, why would the storagenode need the identity to access the DBs? It only needs it for the satellites.

SGC · August 16, 2020, 8:54pm

best point i’ve heard thus far lol
i duno… what it’s used for…
one thing finished and another started blinking red like crazy
seemed related

BrightSilence · August 16, 2020, 9:14pm

So you’re now arguing that a completely independent process manipulated the memory of another process? Despite all protections in place against that? Not only that, but you’re suggesting that this happened inadvertently. You do realize that this is the kind of thing hackers try to do to mess with other software. It’s not exactly something that can happen by accident.

The node uses it to identify itself to other entities on the network. Not to access the databases. If there was an issue with the identity you would see errors around trying to settle orders that aren’t your nodes etc. You’d also be offline with an unknown identity.

You’re just absolutely 100% wrong. What you suggest is absolutely impossible. But I’m out, several people have told you this now. You keep ignoring it and sticking to your own faulty conclusions. Enjoy chasing your own tail.

SGC · August 16, 2020, 9:55pm

well i have had problems with docker in my nested containers interfering with my host docker, so really doesn’t seem like that far a stretch, even if some of the logic behind it is flawed…

but if my logic is flawed, then how flawed is it to use the word impossible…
doesn’t really seem like that is really anything that exists, today impossible is tomorrows human rights…

andrew2.hart · August 17, 2020, 7:17am

I reproduced your error but not quite.
1 create identity
2 copy 4 files to safe
3 authorize key1, move 6 files and start storagenode1
4 move 4 files back from safe
5 authorize key2 and start storagenode2

At this point storagenode1 seemed to stop receiving. That is expected I think. Could you have authorized your new node with the old identity files by mistake?

SGC · August 17, 2020, 7:53am

haven’t signed the new identity yet, the only real overlap would be that both identities where created in the same location, with the same name, but the old identity isn’t on the same pool…

but since they are both zfs, there would be or could be some sort of ARC related thing, because through the which drive it’s located on wouldn’t matter… ofc each file should have it’s own metadata and be unique…

but really it goes deeper than that, because if a file is larger than 128k in this case, then it would be split up ind multiple 128k records which, if checksummed to be the same data, then will be merged with existing data instead… so that the same piece of data isn’t stored twice… and only if its changed will it then be split from the original record into its own unique data, when the checksums changed…

so it would unless if the files are nearly 100% unique then they would become the same records in ARC
zfs is weird… but efficient… but should work… ofc zfs for linux is still kinda the new kid on the zfs block and still have a few bugs and caveates.

so yes and no… yes i believe the files could for some reason become mixed, isn’t likely but atleast a part of it might… ofc like kevink pointed out the identity shouldn’t be used for the databases, which i guess makes sense… but then on the other hand this stuff deal with a lot of cryptographics so i duno if the databases would be encrypted or use other such security measures for whatever reason…

tho i’m sure people will start ranting about how wrong i am not to make an assumption about something like that…

but yeah long story short… i haven’t moved or copied the newly generated files, and the old identity has been in the storagenode location for months… but it’s possible that something went wrong in memory i guess… that some how disrupted the identity zfs records or identity zfs metadata of the records, due to parts of the files being similar… never really had something like that happen before… but i don’t suppose stuff can’t get much closer to actually being the same files… even if it seems unlikely…

or maybe the way the identity generation upon completion writes the data triggered the underlying issue with my bad drive to push it over the edge…and into a extended cycle of high latency…

just had an incident with the vm that has been giving me grief… not sure if some sort of malware had gotten on to it… was trying to reinstall it and it just basically stalled out even my proxmox webgui…
so i figured i would go check the storagenode logs, can barely even find a cancelled download …

i suppose i should go back and try to study my logs from when it happened… i mean if it is disk latency for either writes or reads then one could argue that downloads and uploads would have high cancel rates.

that would be one way to rule out if it was a issue with access to the pool.

SGC · August 17, 2020, 8:33am

no i said exactly that i didn’t make an assumption either way…
i know that most of the databases are open… but you saying all of it is open is most likely you making an assumption… just like you and brightsilence assume my system lost pool access…

when it’s clear from the image of the log that there are no cancelled files either in upload or download
thus we can conclude that pool access was infact not affected, just as i would have expected…

so the IOwait / disk latency was a blind alley, i guess i’m back at… i should really try to replicate it…

funny thing is that the whole latency thing was sort of busted just by looking at the first thing on the thread…

Vadim · August 17, 2020, 9:30am

I resontly got problem that my pc with i5, ssd, 8gb ram worked slow. Cpu was low load, and no understand why. After i added more 8gb ram(16 total) all started to fly. Before adding there was no sign that ram ovrloded, it showd only 50% used.
Now only 20% but i see it cash much more in ram and looks like use less page files, even if ssd is fast, it is much slower then ram.

BrightSilence · August 17, 2020, 9:41am

This is a completely different issue, where you’re actually using the same identity on two nodes. Which will definitely get them both disqualified quickly and they’ll be fighting to claim the address of the node.

@SGC I think you have a lot of knowledge about certain things, but not always the insight of when to apply it. Yes, ZFS (and other file systems and even memory management) has systems to deduplicate data that is the same. But these systems are built with the incredible high risk of cross process manipulation in mind. If it were possible to manipulate instance B in such a way that instance A is effected by it, the file system would basically be broken and unsuitable for pretty much any serious use. Since ZFS is pretty widely adopted for very serious use cases, we can assume that this isn’t happening. Again, this kind of stuff is what hackers often try to exploit when attacking such systems. It takes a lot of effort and meticulous attacks to find a hole and exploit that and when they are found, this would be big news in the security community. This could never happen by accident, because these systems are built specifically to prevent that.

Also:

Nearly the same could never lead to deduplication taking place. Only exactly the same.
A newly generated identity would be pretty much 100% different from another one. That’s kind of the point. Nothing about it is the same except for perhaps similar file sizes. I suggest you compare 2 different identities.

You absolutely can not conclude that. Almost all of the db writes are working fine as well. Nobody said you couldn’t write at all. We just said there was enough delay that caused some of the db writes to hit a still open lock from another process. You dismiss once again the most likely cause that we’ve seen as a common issue with all database lock issues. By all means, keep running into the same concrete wall, I’m sure it’ll break at some point. Ignore the people who are telling you you can easily walk around it… I mean, what do they know about concrete walls, right?

SGC · August 17, 2020, 1:55pm

well my system doesn’t work like that… i can only get latency on reads, and even if one of the raidz1’s was unresponsive, then if stuff is written to the pool that data will “simply” be allocated to another of the raidz1’s depending on which one isn’t overloaded…

on top of that all writes go to the slog, which will ack a sync write in a few milliseconds, partly why i did my zfs setup like i did… to prevent / mitigate such issues and from day 1 of starting up on the new pool, i’ve barely even see errors these days… so it was very odd timing… i cannot explain why… or what happened… yet… but i intend to try and find out…

we also completely agree that what i propose / my unsubstantiated and most likely wrong view would be a thing that should not happen, even under the worst of conditions… since it would like you say, mean that basic virtualization / memory management / computing would quickly degrade into a big mess of random corrupted data…

so if we assume that didn’t happen either… that only leaves the still open question of what actually happened…

i will admit that my hypothesis was based upon more than a few flawed assumptions, like the one kevink pointed out with the identity not being related to the databases…

i’m also very new to linux… i know from my many years with windows that running multiple programs can at times create instability and unforeseen effects by stuff like memory allocation or shared system variables being used or whatever… i’m not much of a programmer, but i know atleast in windows, it has happened a lot in the past… and i also think i still see it in linux… been having issues with a paravirtualized vm running advanced java interfering with netdata… or so it seems…

been an ongoing issue of trying to figure out what actually keeps killing my netdata install…
and i know damn well it shouldn’t happen, maybe i got something configured in a wrong way…

it’s also not recommended installing docker directly into proxmox since proxmox is a hypervisor and docker is essentially also a hypervisor… and now i’m starting to do containers, with a nested docker inside… so it’s difficult to rule stuff out when i usually have 5-6+ vm’s / containers running…

it’s a bit like when i was trying to do gpu passthrough… also an experimental feature in proxmox… then the vm i did the gpu passthrough to would work… but then after a while the entire server would crash… until it just kept crashing and crashing… eventually i fixed it… took me 4-5 days until i finally pulled the power cable and did a cold start… which apparently changed something…

not always this stuff ends up making sense…
and turned out the issue that initiated the reboot issue was a vm cpu choice… O.o

BrightSilence · August 17, 2020, 3:07pm

Database operations are never just a simple read or write. Since you saw the locks happening on the bandwidth.db, lets take bandwidth usage roll up as an example. Every once in a while your node will aggregate all bandwidth usage from the past time period in a roll up per satellite, per type of bandwidth usage. Since this operation requires writing to the bandwith_usage_rollups table, it requires a write lock. With sqlite, that lock has to be a full file lock (since there is no database service and sqlite uses files system locks instead). During that operation the node would have to read all bandwith records for the past time period, compute the aggregation and then write the result into the bandwidth_usage_rollups table, which has a primary key that has to be unique, so for that insert it needs to do an index scan on the bandwidth_usage_rollups table in order to check that there is no unique constraint violation. After that the now aggregated records are removed from the bandwidth_usage table. This likely all happens in a single transaction, which requires keeping logs in order to be able to roll back the entire operation if a step goes wrong. And after everything is done the changes are committed to the database. Only then is the file lock removed and can other processes make changes to that database again.

You claim to know your system well enough that during that entire process there can’t be any IO constraint that slows this process down. I claim that even the most experienced people won’t be able to say that with any certainty. Especially not on a relatively complex setup like you have.

Actually, if you dig long and deep enough it always does. Computers don’t do magic, they do logic.
So if it doesn’t make sense yet, you don’t yet understand the problem and you need to keep digging.

SGC · August 17, 2020, 8:37pm

well i dug… and found something odd, that would sort of explain, everything and then some more…

my slog (write cache) sdd cannot keep up because i run the storagenode full and actually most stuff these days as sync writes…

sort of patched it now, by adding another one… but still not fast enough
not sure why this has become a problem now with all the downloads… might be a good stack of deletes in the mix and i ofc never did move my database… no matter what it is… then it’s without a bottleneck which i was aware of and have been browsing and researching solutions for in a long time… just never actually got around to actually making the purchase…

doesn’t really explain exactly why the whole identity generation cause the lock to happen… but i could easily image that for various reasons… like i had many vm’s running, with the ability to swap… so it could steal their memory thus requiring them to swap, which put additional stress on an already loaded ssd because the node was kinda active or whatever…

and would explain 10 other different annoying things i’ve been running into…

and yes i know you and everybody else just says… well why don’t you just run default instead of sync always…

which will end up in one of those way to long explanation…
so it was a disk thing… just on a part of my pool i hadn’t noticed latency on before… and i guess when i checked it the last few times in netdata i thought it was a hdd not an ssd due to it’s latency lol
not like it was off the charts or anything… 150ms only in brief peaks… really wouldn’t consider that really bad… ofc on an ssd it’s not great…

but this is not a new ssd either, so a little latency is to be expected at high loads… didn’t even look that bad… but ofc if one adds thousands iops … and then 150ms latency… or backlog… not sure how netdata actually gets it’s numbers… but 150ms delay in maybe each io or per “data burst” could quickly scale in like crazy… which then leads me to ponder was my increase in latency on my first drive on the first raidz1 pool actually an artifact from the io caused by the slog ssd higher up the chain… and it just ended up showing up as read, because it only would read from the drive, because writes would ofc go else where…

kinda almost makes my head spin lol if i cannot trust the latency directly stated by the system for each drive because it obviously would decend through the pool data handling structure…

so that issues might show up on lower level devices before it actually becomes noticeable on the primary affected drive…

BrightSilence · August 17, 2020, 8:46pm

So, you’re saying it was an IO bottleneck?

Wish I or anyone else would have thought of that…

Do you know what it’s called when something is red and yellow and when it flies through your neighbors window, your phone rings?

…
…
…

A coincidence…

SGC · August 17, 2020, 9:11pm

well it did cause it was just the last straw that broke the camels back…
might just have ended up chalking it up to a coincidence, had i not dug deep enough…
thanks for the help… even tho it might not seem like it at times, i do try to listen…

even when i don’t agree, only noticed it just now because everything was looking fine
and the storagenode was slowly starting to climb in memory usage again…
and i have been relentlessly monitoring my disk activity, in the best ways i could manage.

but when the problem looks like this… it’s not easy to spot
i mean it looks totally fine, even when causing me grief, ill have to get a proper pcie nvme ssd for the slog… will be very interesting to see if that completely mitigates what i thought was a hdd issue, because it was the hdd with 2s to even 8sec latency at the worst times… pretty sure this is the cause tho… even tho it doesn’t look like it…

and like i already stated 1 hdd having latency issues shouldn’t affect the overall pool performance one bit… afaik, would just make the system a bit more unreliable in regard to data redundancy and prone to actual data errors.