Disk usage discrepancy?

digitalfrank · March 25, 2024, 2:00pm

Hi, i have this flag activated for all my nodes. I have 30 nodes but i don’t have nothing error or slowly filewalker. I have activate the filewalker with high priority. I have nodes windows on VM with 12 core e 48gb ram. I have 12 nodes on this machine.

@daki82 you have experience that if this flag is off, the filewalker run more speedly?

Vicente · March 25, 2024, 2:52pm

I have unchecked, it is processing files.

daki82 · March 25, 2024, 8:05pm

Well, i guess it does not have to update the index file all the time…who knows.

Alexey · March 26, 2024, 3:18am

Unlikely. This indexing thing took less iops than all believes.
As far as I can see, almost any Windows VM is affected, especially if the Windows VM is running not on Hyper-V but on some Linux distro. The bare metal or a Linux VM with ext4 likely would not have an issue.

How did you connect the disk to this VM?

digitalfrank · March 26, 2024, 5:01am

the vm is on vmware esxi. the disk to the VM was connected by creating a raid 0 in the raid card and then created a datastore of the entire disk on ESXI and then passed completely to the VM.

Alexey · March 26, 2024, 8:14am

1 disk failure and data is gone. But I think you know this.
As expected it’s not a Hyper-V VM, where they have a relatively good integration.
You may try this method to improve speed of the disk subsystem:

The final goal - is to have all filewalkers successfully finish their work:

digitalfrank · March 27, 2024, 5:58am

Thanks @Alexey ,I certainly know the problems of raid 0. But to use the storj policy or more importantly the advice to use one disk per node and not use raid 1 or raid 5 or 6, I did it this way. However, I have not had a problem for over 3.5 years and the disks complete the filewalkers completely and I have never had differences in used space or strange blocks apart from once on a node but there it was my mistake. I currently have the multinode dashboard which shows me that I have 120tb occupied and the monthly average shows me 116.5 but this month I had problems with scheduled electricity downtime due to maintenance of the supplier’s electrical substations and therefore I have an online audit at 98% which negatively affects the average. The rest is all ok. Anyway thanks for the advice and the notification, I learned something else I didn’t know.

Alexey · March 27, 2024, 7:25am

This is not a policy, we recommended to do not use any RAID at all (any level), unless it’s already exist.
But yes, it should be one node per disk, see How to add an additional drive? - Storj Docs.

Yes, the problem become noticeable when your disk is close to a full capacity, including the how NTFS works and that it requires a periodic defragmentation, especially after such usage produced by our customers.
The virtualization makes situation only worse, especially when you use Windows VM on the Linux host. It’s better to use docker in this case - much less wasted resources and IOPSes.

GiantJack · March 28, 2024, 12:16am

just noticed 1 node is not getting new data, checked system:

root@raspberrypi:/store# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G  9.8G   18G  36% /
/dev/sda1       9.1T  9.0T   39G 100% /store

what is the best way to get this fixed?

nerdatwork · March 28, 2024, 12:42am

What is the version of this node ?

Have you read this thread ?

GiantJack · March 28, 2024, 2:10am

version 1.96.6
i did take a fast look but was thinking its another problem because i miss 2.3TB not some GB only (i would not care) but this much feels bad

GiantJack · March 28, 2024, 2:17am

maybe i should not open a new thread, looks like same problem on 1 of my nodes:
just noticed 1 node is not getting new data, checked system:

root@raspberrypi:/store# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G  9.8G   18G  36% /
/dev/sda1       9.1T  9.0T   39G 100% /store

RPI 2
9TB external HDD
ext4 (like all my nodes but only this one scam my free space)

makes me a bit angry this bug, cant be that hard for this piece of software to check what files it think are used (the 6.66TB ) and compre to what is really on the filesystem… in other words please fix this annoying bug asap

nerdatwork · March 28, 2024, 2:20am

Do you see errors in your log about filewalkers ?

GiantJack · March 28, 2024, 2:20am

too bad i have mostly disable logs but will try to active them again

GiantJack · March 28, 2024, 2:30am

found something but looks like no error:


2024-03-28T02:28:33Z    INFO    lazyfilewalker.used-space-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2024-03-28T02:28:33Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-28T02:28:33Z    INFO    lazyfilewalker.used-space-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-28T02:28:35Z    INFO    lazyfilewalker.used-space-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-28T02:28:35Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started   {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}

i think maybe i enable lazy filewalker long time ago because RPI 2 is old and very slow CPU

GiantJack · March 28, 2024, 2:48am

removed container and made a new but still lazy filewalker in logs:

root@raspberrypi:/store/storagenode# docker logs -f node2 --tail 100 | grep file
2024-03-28T02:44:22Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "process": "storagenode", "piecesTotal": 12579867904, "piecesContentSize": 12573946112}
2024-03-28T02:44:22Z    INFO    lazyfilewalker.used-space-filewalker    subprocess finished successfully        {"process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2024-03-28T02:44:22Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-28T02:44:22Z    INFO    lazyfilewalker.used-space-filewalker    subprocess started      {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-03-28T02:44:24Z    INFO    lazyfilewalker.used-space-filewalker.subprocess Database started        {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}
2024-03-28T02:44:24Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started   {"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode"}

just checked grafana and this looks like about the start of the problem, like 6 days ago on 22.3

edit: looks like fileworker is doing some IO hope it fixes itself:

edit² just saw version changed to 1.95.1
strange it downgraded itself ?

nerdatwork · March 28, 2024, 3:10am

There was an issue with 1.97 so the minimum allowed version was upgraded to 1.95.1.

Keep an eye on filewalkers and see if they complete properly without an error.

GiantJack · March 28, 2024, 3:13am

will do, there is for sure something going on
/dev/sda1 9.1T 9.0T 32G 100% /store
it decrease from 33G to 32G free space
have to go sleep soon maybe ill restart the node with storage set to 6TB that it will not try to fill the HDD more than can fit

nerdatwork · March 28, 2024, 3:30am

You should definitely do that.

Alexey · March 28, 2024, 4:21am

You likely have a “context canceled” errors with filewalkers or “FATAL” errors and restarts. These read timeouts from the disk may also affect your audit score, so it’s better to figure out why your disk is so slow to respond.
It could be a bad USB connection/cable, not enough power, or you do not have an external power supply for your disk or it’s not enough, or the USB controller in the HDD enclosure is simple bad or malfunctioning or overheated.