Virtual constipation, a coder who decided that he knows better than the operator how much space there is on the node or the path to hell is paved with good intentions

AiS1972 · April 28, 2024, 11:13am

I didn’t want to write a post on this topic for a long time, but two blooms on one satellite finally scared me into thinking about this topic.

So… Storj recently announced that it has begun to correctly size storage on Windows nodes.
And therefore, many have encountered virtual constipation - this is when the node was working normally and there was free space on it, but after the update the free space became equal to zero and a storage overhead appeared.
Brilliant, isn’t it?

My request, even I would say demand, please remove this patch

github.com/storj/storj

storagenode/blobstore: fix disk space on windows

committed 01:37PM - 15 Mar 24 UTC

profclems

+3 -2

So on windows nodes, the total disk space was showing the total number of free b…ytes on the disk due to this change https://github.com/storj/storj/commit/d96c411ddb107a882a3d13980dacfcba9249e6c5. According to the documentation for GetDiskFreeSpaceExW: - 1st pointer = Directory name - Second pointer = the total amount of free space available to the user associated with the calling thread - 3rd pointer = total amount of space - 4th pointer = total amount of free space Ref: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getdiskfreespaceexw Could be the reason for nodes crushing on windows https://github.com/storj/storj/issues/6818 Change-Id: I47081ed658caf26557f7da618b3be4aecb95cf1a

I myself am able to specify the size of the node and not be in virtual constipation.

And second…
Is it normal to have two blooms per satellite?

Alexey · April 28, 2024, 12:44pm

Could you please elaborate, what’s the problem exactly?
The linked commit just says that we want to show the actual disk space, in this case - for Windows.

Yes, if your node is so slow to do not finish the previous one in a week.
If so, well, you have a problem. How is your storagenode configured? does it use a network filesystems to store data? (like SMB/CIFS, NFS, etc.) or maybe RAID some kind (a foreign one of course in case of VM) or other remote disks?

AiS1972 · April 28, 2024, 12:53pm

The problem is that the node software determines the disk space itself, and does it incorrectly.
There is a lot of free space on the disk, but the node thinks that there is no space and does not accept data - this is what I call virtual constipation.
Increasing the disk size in the node settings does not help, because the software believes that it knows the disk size better than the operator.
This feature appeared in Windows nodes; on Linux, increasing the disk size in the node settings works correctly.
That is why I ask you to remove this crooked disk space detector and give the operator the opportunity to set the disk size himself.

Alexey · April 28, 2024, 12:55pm

Could you please prove that statement? I’m sorry, but otherwise we would have a GitHub issue at least.

We do a detection for the users who do not aware what’s they are specified as an allocated space to prevent node to die due to insufficient space. And this is not a joke unfortunately, many operators specify their full disk capacity instead of what’s available.

AiS1972 · April 28, 2024, 12:57pm

This node is the only one on the disk

And this node has primocash installed

Alexey · April 28, 2024, 1:02pm

I’m sorry, but it proves nothing, unless if you have an evidence, that the available space is calculated (and advertised) wrongly.

AiS1972 · April 28, 2024, 1:03pm

Great. Let this be a default for housewives…
But please leave at least a key in the settings, so that those who understand a little more can adjust the node size themselves

Alexey · April 28, 2024, 1:06pm

I would suggest to create a separate feature request there: Storage Node feature requests - voting - Storj Community Forum (official), otherwise it likely will be buried under other questions…

AiS1972 · April 28, 2024, 1:06pm

Alexey, do you see the screenshot in the first message of the topic?
There is free space on the disk, but there is no free space on the node. In principle, this was previously solved simply by adding space in the node settings.
Now this is impossible.
Therefore, ok - the path will be by default as it is.
But you need a key to disable this, under the responsibility of the node owner. Previously it all worked and there were no problems.

Alexey · April 28, 2024, 1:08pm

Yes. But. I do not know all configuration decisions which you are made.
So, I cannot suggest something helpful so far.

Do you know the version? To sent developers a diff?

AiS1972 · April 28, 2024, 1:10pm

Алексей, а не могли бы Вы оформить запрос такой фичи, для нас, виндузятников, пожалуйста? Чтобы она не была похоронена под ворохом сообщений.

Alexey · April 28, 2024, 1:13pm

Да, мог бы. Тут проблема в другом - я - часть компании, мои реквесты будут оцениваться раз в 10 ниже, чем любой запрос от Community.
Так что, если вы на самом деле хотите, чтобы это заметили - создайте свой запрос. Извиняюсь заранее.

Вы знаете - общение, оно - двустороннее. Я тут играю очень маленькую роль, потому что я - как бы внутри.

AiS1972 · April 28, 2024, 1:16pm

v1.97.3 - вот на этой версии еще нода реагировала на размер в yaml , далее уже софт стал считать себя умнее оператора… но есть ньюанс - операторы так то разные

AiS1972 · April 28, 2024, 1:23pm

Алексей!
Элементарно! Вы создаете запрос, я, к примеру, под ним подписываюсь.
Я в общем то эту проблему для себя решил, пусть и временно.
И это не разрешение прогулки при старте, потому как после завершения прогулки место свободное может как совпадать, так и быть меньше или больше.
Я просто удалил базы почти на всех нодах и стартанул ноды без прогулок.
И бинго - все мои ноды принимают, а что там в дашборде - без разницы, платят за то что считается на спутниках по ордерам.

Alexey · April 28, 2024, 1:37pm

You are correct. What’s submitted by nodes and confirmed by libuplink will be paid. Your node sends signed (by both sides) orders to the satellite to be paid.
However, the database deletion is not a solution!
You need to fix other issues to fix the problem.

на русском

ok

да, тут вы правы. Что отображается на панели имеет мало общего с тем, что будет оплачено… (будет оплачено только то, что было подписано клиентом и узлом и отправлено на спутник). В идеале - они должны совпадать (мы работаем над этим…).

AiS1972 · April 30, 2024, 7:19am

Алексей!
Посмею не согласиться с тем, что снос баз и отключение прогулок так себе варик.
На скрине нода которая завершила прогулку (кстати спасибо, что после начитывания хранения прекращается ингрес - так проще среди толпы нод понять где прогулка окончилась после удаления баз).
Итак - прогулка прошла, замечу не быстро при включенном примокеше, но это не такая большая проблема . А проблема в том, что после прогулки я попал в виртуальный запор - на диске свободно 988 ГБ, а на ноде запор отягощенный виртуальным оверхедом- свободно 0 и оверхед 137 ГБ.
Именно по этому я сейчас снесу базы и отключу прогулку.
И именно по этому прошу для тех виндузятников, кто не совсем еще одомохозяился, сделать ключ в настройках для отключения супермозга матрицы при определения места на диске. Всегда что-то может пойти не так и должен быть ручной вариант настройки .
Заранее спасибо, Алексей.

ps: Возможно это причина storagenode/blobstore: fix disk space on windows · storj/storj@28d00b5 · GitHub , могу сказать точно - 3+ месяца назад проблема на скрине решалась просто, я бы добавил на размер ноды в yaml 137+966 Гб и перезапустил ноду. И всё было бы нормально.
Надеюсь что сторж сделает ключ отключения режима домохозяйки для определения свободного места на ноде.

AiS1972 · May 3, 2024, 11:21am

Алексей!
Any comments ?
Если есть два блюма, то они работают одновременно или последовательно ?
И да 6.5 часов промежуток между блюмами по 9.7 мб это тоже нормально ?

Alexey · May 5, 2024, 4:48am

On your screenshot you are comparing a measurement in binary units (Windows calculates space in binary units but wrongly shows a decimal units names), but the node calculates space in decimal units, so what’s showed by Windows as 966 wrongly GB (because it’s actually GiB) is 1037.23 GB in decimal (SI) units.

this is already implemented,

# if set to true, all pieces disk usage is recalculated on startup (default true)
# storage2.piece-scan-on-startup: true

just uncomment it, change to false, save the config and restart the node. However, I believe, that you have a different meaning for that filewalker.
What’s actually do, it scans all pieces to calculate used space include trash on start (only), it updates databases after the successful finish. If you disable it, your node will update these databases only after successful operations with the pieces, like uploads new pieces, moving pieces to the trash, removing expired pieces or from the trash, etc.
If you would have a wrong values in databases regarding initially used space, they will remain wrong, just updated with changes (here you may have a virtual overusage by the way due to this).
So, in this case if you have a discrepancy between the real usage and the showed usage, you shouldn’t disable a used-space-filewalker on start to allow the node to update its databases to a correct values.

You need to use scripts to check how many outdated data here, not the creation date (for PowerShell you need to use +8 to see the outdated data, which should be deleted):

However, I believe that we do not account subfolders correctly, so you may have subfolders, which date name is older than 7 days due to this bug. It should be fixed in a new version. Meanwhile you may remove subfolders with date name older than 7 days from the trash.
See:

на русском

На screenshot вы сравниваете десятичные TB с двоичными TiB, они всегда будут не совпадать. Windows показывает место в двоичных единицах измерения (base 2), хотя и показывает их десятичные названия, а узел считает в десятичных единицах измерения (base 10), поэтому Windows 966 якобы GB (а на самом деле GiB), это 1037.23 GB.

Это уже реализовано:

# if set to true, all pieces disk usage is recalculated on startup (default true)
# storage2.piece-scan-on-startup: true

раскомментируйте и измените на false, сохраните конфиг и перезапустите узел.
Однако он делает совсем не то, что вы, наверное, представляете.
Этот параметр запускает used-space-filewalker, чтобы посчитать занятое место, в т.ч. в trash и обновить БД. Если он будет отключён, то узел будет обновлять БД только при операциях с кусочками, т.е. при загрузке, сборке мусора, удалении кусочков из trash и перемещении кусочков в trash.
Но если начальное состояние БД содержало некорректные данные, а вы отключили used-space-filewalker, то они так и останутся хоть и обновлёнными, но некорректными (отсюда и виртуальный overusage).
Поэтому в этом случае (если есть расхождение в использованном месте и отображаемом) вам нельзя отключать его, чтобы узел имел возможность исправить БД на корректное значение для использованного места и trash.

Вам нужно использовать скрипты для подсчёта устаревшей информации в trash, а не дату создания (для PowerShell надо использовать значение +8):

How long does trash stay before getting deleted? - #6 by Alexey

Тем не менее, у нас есть баг, мы не учитываем подпапки в trash больше, а из-за нескольких downgrade версий у вас может быть каша в trash типа такой:

Trash does not go away in 7 days - #57 by jammerdan

В этом случае вы можете удалить вручную подпапки, чья дата модификации (или имя содержащее дату) старше 7 дней.