Hashstore rollout commencing!

So until now, I didn’t see anyone reporting better success rates or more data stored on nodes with hashstore, only suppositions that it is better… is there any proof? All I see is reports of bad success rates and less data stored.

I’ve fully converted half of my nodes so far. My average download/upload rate has doubled. I don’t track success rates. Disk usage has started climbing slowly back up.

3 Likes

The proof so far seems to be the Select network where they implemented this:

However it is probably safe to assume that Select and global network have different characteristics:

I am not yet seeing such effects. But I have nodes that are still migrating. But I think I can say that at this point currently none has a better upload success rate than before. Maybe more pieces are thrown at them now? I don’t know. But hashstore should “absorb” that load according to what was posted. I’ll have to wait some more days till migrations have completed to restart nodes and see.

Can’t compare select and global. I don’t believe tha race is the same for storing a piece. Success rate seems to be somehow strange, and maybe less representative for node performance, so let’s ignore that.
All that matters is stored data at the end of the month. I will switch 2 more nodes to hashstore, that are not bottlenecked by hardware, and I will have a better comparition in my own farm. I see many sno that are switching all their nodes, instead of just a few, and don’t have a way to compare them.

1 Like

Almost all of my nodes finished migration. Performance has not changed.

2 Likes

Regardin rollout schedule, maybe you should take the IPs into account, not to activate the migration on more than one node at a time for a particular IP. Many multinodes are on the same system, and starting 2 or more migrations at the same time maybe will make the system too busy…?
Or this could be operators job, to opt-out of migration on any other node, and wait for the first one to finish…

My ancient 6700K with 32 GB RAM did 24 migrations in parallel. No problems.

6 Likes

Should be fine, as the bottleneck is the disk IO.

We recommend to use one storagenode instance per physical hard disk. It’s not recommended to start multiple instances on the same disk.

(With dedicated disk, it’s even easier: you can use a faster disk usage calculation).

We tested the migration on nodes with 36 disks (36 storagenode instances) without any problem.

But true, if somebody over-provisioned disks, it’s better to closely monitor the system state.

2 Likes

Is it? It seems the disk IO during migration does not exceed 60 IOPs, and cpu is not bottlenecked either. It can go much faster, but does not. I assumed it’s a conscious decision?

It is possible to have it go as fast as possible?

Defaults suggest going full speed though…

1 Like

I’m going to setup few new nodes.. I’m lazy and I’m not going to modify meta files for hashstore.I’m counting on you to automate everything quickly :slight_smile:

1 Like

That’s the sort of spirit I can get behind!

(but truthfully, I also just set up a new node, and I went ahead and set it up as full hashstore just to avoid later migration).

Do I get this right? When all data is stored in 1GB files, basically I don’t need a special device on ZFS anymore (for the very small files)?

Sounds about right. For hashstore SSD doesn’t add too much, but RAM is.

For hashstore - is more ram preferable?

I just happend to get 256GB for cheap on hand for my server.
Do you know if i need to do anything for nodes on windows to allow them to use more ram?
So when hashstore comes out, more ram is better?

It should work with any amount of RAM, but from what I have read - yes, and not only for hashstore. It’s also useful for almost any filesystem operations, independently of the used backend.
You may migrate one of the nodes to hashstore, but measure a memory usage before and after. This could shed more light on this topic.

More RAM is always better. If nothing else, it’s a better feeling to have it :smiley:

In addition what Alexey wrote (OS usually utilize RAM for read/write cache), you can directly utilize it with Hashstore.

  1. Either you can use memtbl (keeping an index of the table records in memory). This would require 1.3 x TB of RAM.
  2. Or, you can enable MMAP (map the hashtbl to memory)

We usually use SSD for the hashtbl (metadata) and HDD for the logs (raw binary) on servers.

We use memtbl only on selected instances (we don’t have 1.3x memory everywhere)

2 Likes

What happens if I enable memtbl without having 1.3 GB of RAM per TB? Could this cause issues if someone makes a mistake?

1 Like

I have the STORJ_HASHSTORE_TABLE_DEFAULT_KIND=memtbl. I’m not sure how STORJ_HASHSTORE_MEMTBL_MMAP fits in to this. Should the MMAP variable be set to true or false for best performance with memtbl? What is the default ?