[Tech Preview] Hashstore backend for storage nodes

snorkel · January 7, 2025, 6:57pm

Wow! O month? And the restarts or updates don’t affect the migration? It just resumes from were it was?
Are there any log entries for migration start and finish?

Aleksman4o · January 7, 2025, 9:19pm

I’ve executed code from first post in hashstore/meta folder with all “true” values for passive and active migrations and restarted node. Now all new parts goes to folder hashstore, but old blobs folder not becoming less.
I’m on 1.119.8, should i wait for 1.120 or something missed?
@littleskunk , how you triggered full migration?

snorkel · January 8, 2025, 5:09am

How the hashstore manages power outages without UPS? Can the entire log files that are written or compacted in that time be lost?

Alexey · January 8, 2025, 7:12am

You just need to wait, the migration is pretty slow. I tested on storj-up and it works, all data will be migrated to hashstore eventually. The trash is supported normally (it will not be migrated though, so wait a week and the old trash should gone).

littleskunk · January 8, 2025, 11:58am

I don’t like to repeat myself. All important information are mentioned in the first post including the required versions!

alpharabbit · January 8, 2025, 5:42pm

Roberto · January 8, 2025, 5:54pm

1.119.15 is currently being released

alpharabbit · January 8, 2025, 5:57pm

That’s a crucial question in my opinion. The old filestore is quite resilient against file corruption or even complete loss of some files. Now it seems all depends on one file.

Or did I miss something? Could a node recover from losing a hashtable file?

Roxor · January 9, 2025, 12:51am

.15?!?! They almost didn’t get that bun out of the oven

jammerdan · January 9, 2025, 3:57am

How will migration handle corrupted files or other files that should not be there?

As reported I saw the issue that corrupted pieces from blob folders got successfully moved to the trash by retain process. But the trash deleter obviously did not delete them but errored and made the deletion quit which in my opinion should not happen. Anything in the trash should be deleted regardless if it is a valid piece or not.

So what will happen if the migration process stumbles across e.g. corrupted pieces?

MarviBiene · January 9, 2025, 5:36am

Technically it could just write a new file and only delete the old one if it completes. So in case of file corruption or power outage it could rewrite the same file without any los. Or am I wrong?

Alexey · January 9, 2025, 7:10am

Not known yet. You maybe a first one, if you want to try.
I would assume that this piece would be ignored or in a worst case it could try to migrate it over and over again.

Alexey · January 9, 2025, 7:13am

I think it may miss some pieces in the case if the interruption is happened when the piece is uploaded, but the record is not added. However, I believe, that in that case it would also not confirm the successful upload to the client.
However, with a bad luck it may be confirmed and the record is added, but the file become corrupted due to a “lucky” interruption. In that case all pieces could be lost (I may be wrong and there is a replay function).

I would ask the team, how is it handled.

jammerdan · January 9, 2025, 7:17am

Thanks.
This is what an ls of the trash folder with corrupted pieces looked like:

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-10-21/xg
''$'\304''g'$'\t\346''3V'$'\345\255''OzIB'$'\326'')^'$'\016''d1'
'\222''G+'$'\306''Xƹ'$'\373\240''b'$'\373\251''f'$'\016'']o'$'\362\332''"Ww'$'\177\002''G*.'$'\003''N'$'\031\364''p'$'\253\271''`ts'\'''$'\345''QlS'$'\376\213\311''"'$'\b''6'$'\216\272\352''J'$'\233\244''?a'$'\200\341''DZi6'$'\341''W0'$'\311''R '$'\276\274\037''qW'$'\361''ڢwy'$'\r\346\354''Ofەz'$'\277\331''`'$'\300\243''-'$'\307\313\031\335''n'$'\211\217\017''謞'$'\312''٠'$'\303\325\005'
'\355\024''|'$'\256''Io['$'\004''"'$'\301\214\021\213\026''1K'$'\353''Đ'$'\034''N4'$'\366''MD7'$'\n\326\335''e'$'\326''cDC'$'\311\037\372\375''e'$'\351''b-'$'\211'')vz'$'\005\a''t^?'$'\264\267\343''^'$'\267\314''-L'\''}.'$'\222\335''T'$'\242\241''(.7'$'\026'\''ԡ`oƖ'$'\331''B'$'\324''M%"'$'\003\211\364''!'$'\001\261\353\235''ZŁ'$'\204''}'$'\344''?Ex'$'\277''O'$'\035''q'$'\272\371''|F'$'\362\031\244'
'PΣ'$'\243''g'$'\353''d'\'')1'$'\223\310''&;'$'\264''c'$'\230\200''`Mr'$'\336\034''f'$'\017\235\b''{'$'\367''Fi>'$'\317'',:'$'\370''}'$'\351'':'$'\361\a\345\334''_'
'R'$'\214\207\331\002\241''ۭk'$'\317''o'$'\244''鷗0'$'\314''o'$'\017\217\261\243''[1̲{'$'\244'
''$'\361\033''Y'$'\033''_'$'\205''a'$'\262\251''p'$'\316\203''4'$'\361''Jq'$'\004'\'''$'\353\346''V-'$'\024\221\304'']='$'\202\250''薉'$'\035\210''B['

The deletion process quit on that but manual deletion of piece was possible without any issues.

elek · January 9, 2025, 11:03am

It will log the problem on info level and continue to migrate remaining pieces.

alpharabbit · January 9, 2025, 3:24pm

It seems like logfiles are just raw data. Without the hashtable file all data is useless. So a single file error can kill the node.

In my opinion the hashtable files need some kind of redundancy either implemented by storj or by the underlaying file system.

Roxor · January 9, 2025, 3:44pm

Maybe the 1GB hashstore data files contain something similar to the .sj1 filenames we have today, as internal headers before each piece of raw data?

So under normal operation the hashtable would directly index you to the proper location in each 1GB file… but if you lost that hashtable… you could still read through every data file and recreate the hashtable? It wouldn’t be pretty to have to read everything… but recovery doesn’t need to be pretty… just possible.

alpharabbit · January 9, 2025, 3:54pm

Now a log file is chunks of [rawdata]. If it was chunks of [piece id, size, rawdata] the data could be restored in case of a hashtable loss.

snorkel · January 9, 2025, 11:34pm

So… there are:

log files that contain the pieces recorded with append only; they are 1GB in size each.
Hashtable file/files.

Is there only an hashtable file for the entire node?
Or is there an hashtable file for each log file?

Case A: You can loose 40 log files of 1GB each per 1TB of stored data, and be DQed (4% limit).

Case B: one hashtable file. You can loose the hashtable file for the entire node and be DQed.

Case C: multiple hashtable files. You can loose 40 hashtable files that make 40 log files unusable and be DQed.

Case D: the hashtable file/files can be recreated from the log files, so if you loose any percent of hashtable files, you are not DQed.

Which one is it true?

alpharabbit · January 9, 2025, 11:48pm

There is 2 hashstores per satellite (one active, one ready for compaction), each hashstore has its own hashfile. So there is a total of 8 hashfiles atm. I don’t know if hashfiles will be split if a certain size is reached.