Hi
So on the dec. 2., my node started having 100% disk busy times 100% of the time.
It started after a reboot of my system and adding two disk to a non related pool, that freed up two drives that i’m now using on two other nodes, on the same system.
I dont see/think there is a correlation here, to my issue, just full disclosure, it did things to my system at the time of the issue.
Both docker stats and iostat confimes the “heavy” load on the read side with around 1-2MB/s, high IOwait and util
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 0.60 52.89 0.00 46.25
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
vda 27.00 1212.80 218.90 89.02 292.43 44.92 1.60 354.40 89.90 98.25 191.75 221.50 0.00 0.00 0.00 0.00 0.00 0.00 0.40 30.00 8.21 96.32
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
105048dfb1cd watchtower 0.00% 14.05MiB / 5.8GiB 0.24% 1.69kB / 0B 10.8MB / 10.7MB 10
ad01c7ab94f9 storagenode 1.38% 2.021GiB / 5.8GiB 34.84% 11.9MB / 260MB 4.32GB / 74MB 46
And that would be great if is was being payed for it, but I’m not (as that would translate into a 100GB+ egress per day load)
If I stop the docker container, the load stops.
Looking at the logs I see two issues.
- Lots of upload started and upload cancelled. Which is weird to me as the system is full per settings ( 6.3TB )
2023-10-18T13:40:45Z INFO piecestore upload started {"process": "storagenode", "Piece ID": "Y3CRVWYNY4XF7EZ5MH2QHJHLX627OQNFPT7NQSDSGLIJLOA7SC7Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 1193176610432, "Remote Address": "5.161.192.57:32442"}
2023-10-18T13:40:45Z INFO piecestore upload canceled (race lost or node shutdown) {"process": "storagenode", "Piece ID": "5QF3I4GNRO4WN4EM64D4JV7OLZOJMZREXBL6BVZOZM2JNS4CQVTQ"}
2023-10-18T13:40:45Z INFO piecestore upload started {"process": "storagenode", "Piece ID": "UN7ORTS5UB6FFRJLWR2TVB7UOWEVF6R6TMZ24HBES52XVYTQIH7A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 1193176596608, "Remote Address": "5.161.220.231:45664"}
2023-10-18T13:40:45Z INFO piecestore upload canceled (race lost or node shutdown) {"process": "storagenode", "Piece ID": "Y3CRVWYNY4XF7EZ5MH2QHJHLX627OQNFPT7NQSDSGLIJLOA7SC7Q"}
2023-10-18T13:40:46Z INFO piecestore upload canceled (race lost or node shutdown) {"process": "storagenode", "Piece ID": "JDDL5VROUAODKRZWHT47CCY667L5PIYLHDEZM6LOR3E36FWHHOWQ"}
2023-10-18T13:40:46Z INFO piecestore upload started {"process": "storagenode", "Piece ID": "N7MVRVOHZA4WRHPBUM7URUSLBWT3JRXWT6HZMGPQ7CTFSS7ZR76A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 1193176582528, "Remote Address": "23.237.232.130:34448"}
2023-10-18T13:40:46Z INFO piecestore upload started {"process": "storagenode", "Piece ID": "VMN5UAG6F4RFMA6TIUXAGSRONGRRCSGPBR3ZMVBYOHCRHMS2GTAA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 1193176572288, "Remote Address": "5.161.117.79:44630"}
2023-10-18T13:40:46Z INFO piecestore upload canceled (race lost or node shutdown) {"process": "storagenode", "Piece ID": "UN7ORTS5UB6FFRJLWR2TVB7UOWEVF6R6TMZ24HBES52XVYTQIH7A"}
2023-10-18T13:40:46Z INFO piecestore upload canceled (race lost or node shutdown) {"process": "storagenode", "Piece ID": "N7MVRVOHZA4WRHPBUM7URUSLBWT3JRXWT6HZMGPQ7CTFSS7ZR76A"}
2023-10-18T13:40:46Z INFO piecestore upload canceled (race lost or node shutdown) {"process": "storagenode", "Piece ID": "VMN5UAG6F4RFMA6TIUXAGSRONGRRCSGPBR3ZMVBYOHCRHMS2GTAA"}
2023-10-18T13:40:46Z INFO piecestore upload canceled {"process": "storagenode", "Piece ID": "MPIIDWWBR4JE22KDUI5L5DCSGGVDGNSGG7QFPR3CVZU7CLKIY4KA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 65536, "Remote Address": "72.52.83.203:46234"}
- It seems the upload, keeps trying because it thinks there is free space available, it seems based on the “Available Space” field.
2023-10-18T14:52:54Z INFO piecestore upload started {"process": "storagenode", "Piece ID": "JPPI2AYVVGE3A4XBV4OF2CP7Q2MJFOJCQMVVUHQ5QUVGCXVCBQ4A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 1192009931392, "Remote Address": "23.237.232.146:55970"}
df confirms the disk usage and that there is still physical space available on disk
df -T --si
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 3.1G 0 3.1G 0% /dev
tmpfs tmpfs 623M 603k 623M 1% /run
/dev/vda2 ext4 7.4T 6.4T 607G 92% /
tmpfs tmpfs 3.2G 0 3.2G 0% /dev/shm
tmpfs tmpfs 5.3M 0 5.3M 0% /run/lock
/dev/vda1 vfat 536M 7.8M 529M 2% /boot/efi
overlay overlay 7.4T 6.4T 607G 92% /var/lib/docker/overlay2/6a97fefe52f0f138985176c9690db0ba0d9c96cf88c2ff5e556d4857873f59ce/merged
tmpfs tmpfs 623M 0 623M 0% /run/user/1000
overlay overlay 7.4T 6.4T 607G 92% /var/lib/docker/overlay2/5d10a35cf3b7082f947c9727335641436acf97f7da409d2895e5d31b946fae1b/merged
So i’m also getting download cancelled which makes sense as my node is now very slow to respond to download requests as the disk is 100% busy.
Which brings me back to why do I have a high read load if nothing is being serviced to anyone hardly. Its feels very much like a DOS attack from the satellite or some bug, but I just cant find the issue.
Anyone got a clue to where I should look next to troubleshoot the issue