Hi Egon,
Sorry for taking so long to get back to you. Your hunch about the filesystem was correct. After running the consistency check, some errors were found and fixed. The node is working properly again. What I find odd is that the node was working fine when I updated from v0.31.9 to v0.31.10 earlier on Jan 31 with no issues. This problem only came up when I updated from v0.31.10 to v0.31.12 later that day. No restarts and system was stable during that time. I always stop the node with the stop -t 300
option. I doubt it will be useful, but here is the output from fsck:
fsck output
user@rock64:~$ sudo fsck -cfk /dev/sda1
fsck from util-linux 2.29.2
e2fsck 1.43.4 (31-Jan-2017)
Checking for bad blocks (read-only test): done
storj1: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'sw4ya375p2yzobnq2o2524supvudenzeynjn5icguodw45ndcq.sj1' in /v3alpha/storage/blobs/abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa/6v (182845651) has deleted/unused inode 185522922. Clear<y>? yes
Entry 'vmoddrhex4cuusm462omlgrcdvad264qt6ohh4a2mvcfb42whq.sj1' in /v3alpha/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/v3 (183107920) has deleted/unused inode 185522916. Clear<y>? yes
Entry 'lyeiv7q6ldn2o64gmgu4bemssp7ogjlljolomcy6y5c7ob6m3a.sj1' in /v3alpha/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/cc (183108055) has deleted/unused inode 185522915. Clear<y>? yes
Entry 'nmzohfbmil234soiscuf4x76boycbkoq7oaikoyajappqwmtxq.sj1' in /v3alpha/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/w5 (183238890) has deleted/unused inode 185522917. Clear<y>? yes
Entry 'himu2lbxqftawh5lqxplddrlmand6zrmc5j4vl57jmy5p2q6pq.sj1' in /v3alpha/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/st (183369738) has deleted/unused inode 185522914. Clear<y>? yes
Entry '5yttgjqkjo3k5j37gs4mdbd4cnd5bjpwipctal6oyctv5vceia.sj1' in /v3alpha/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3v (183369993) has deleted/unused inode 185522921. Clear<y>? yes
Entry 'ifoql56domt26lbesby6loadwafcq7muifmmrqwpxr3ew7rsrq.sj1' in /v3alpha/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/4s (183369815) has deleted/unused inode 185522924. Clear<y>? yes
Entry 'q7lf4u452i353qv7eiatznc2qbmumzfazkfg3a7qphf4ycuokq.sj1' in /v3alpha/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/as (183763999) has deleted/unused inode 185522913. Clear<y>? yes
Entry 'bgmh7pvpurt4yqrmk33wxuvcxb5jyze3q6h3q5gfgof3uqw7ja.sj1' in /v3alpha/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3g (183894022) has deleted/unused inode 185522919. Clear<y>? yes
Entry 'ug3c4kge3cfvsgsrth4pgvs4hnx7pemjs7oxz2r27vgeervxja.sj1' in /v3alpha/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/no (184044993) has deleted/unused inode 185522920. Clear<y>? yes
Entry 'lkfhqh6lgr76zuz6zg7id2mgstdpuw5z2ebrb37gv4c6s6zvsq.sj1' in /v3alpha/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/e6 (184045497) has deleted/unused inode 185522918. Clear<y>? yes
Entry 'onctlznq376s3l6bwh5gfzo2r4lgie6nf5j4reew5mwcvtlota.sj1' in /v3alpha/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/fy (184162034) has deleted/unused inode 185522923. Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 185522929
Connect to /lost+found<y>? yes
Inode 185522929 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522930
Connect to /lost+found<y>? yes
Inode 185522930 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522931
Connect to /lost+found<y>? yes
Inode 185522931 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522932
Connect to /lost+found<y>? yes
Inode 185522932 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522933
Connect to /lost+found<y>? yes
Inode 185522933 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522934
Connect to /lost+found<y>? yes
Inode 185522934 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522935
Connect to /lost+found<y>? yes
Inode 185522935 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522936
Connect to /lost+found<y>? yes
Inode 185522936 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522937
Connect to /lost+found<y>? yes
Inode 185522937 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522938
Connect to /lost+found<y>? yes
Inode 185522938 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 185522942
Connect to /lost+found<y>? yes
Inode 185522942 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186030414
Connect to /lost+found<y>? yes
Inode 186030414 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186128954
Connect to /lost+found<y>? yes
Inode 186128954 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186132228
Connect to /lost+found<y>? yes
Inode 186132228 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186132229
Connect to /lost+found<y>? yes
Inode 186132229 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186132230
Connect to /lost+found<y>? yes
Inode 186132230 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186132231
Connect to /lost+found<y>? yes
Inode 186132231 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186132232
Connect to /lost+found<y>? yes
Inode 186132232 ref count is 2, should be 1. Fix<y>? yes
Unattached inode 186132234
Connect to /lost+found<y>? yes
Inode 186132234 ref count is 2, should be 1. Fix<y>? yes
Pass 5: Checking group summary information
Block bitmap differences: -(558731904--558732010) +(558761472--558761983) +(558765056--558765109) +(558765984--558766000) +(558767616--558767999) +(558768640--558769151) -(558770688--558771253) -(558772224--558772789) +(558781952--558782005) -(558794304--558794357) -(558809088--558809141) -(558820352--558820863) -742137734 -742138234 -(742139482--742139485)
Fix<y>? yes
Free blocks count wrong for group #17051 (11181, counted=11288).
Fix<y>? yes
Free blocks count wrong for group #17052 (10200, counted=12613).
Fix<y>? yes
Free blocks count wrong for group #17053 (11838, counted=12458).
Fix<y>? yes
Free blocks count wrong for group #22648 (6816, counted=6822).
Fix<y>? yes
Free blocks count wrong (463954884, counted=463958030).
Fix<y>? yes
Inode bitmap differences: -(185522913--185522928)
Fix<y>? yes
Free inodes count wrong for group #22646 (33, counted=49).
Fix<y>? yes
Free inodes count wrong (242534050, counted=242534066).
Fix<y>? yes
storj1: ***** FILE SYSTEM WAS MODIFIED *****
storj1: 1661262/244195328 files (7.5% non-contiguous), 512796146/976754176 blocks
My setup is a rock64 4GB board running Debian Stretch, with a 4TB external USB drive formatted ext4. It is auto-mounted at boot time with an fstab entry. The only unique thing there is I mount the drive with the noatime
option. The share location is attached to the container with the bind mount type.
Since the node seems able to recover the used space calculation once the file system inconsistencies are fixed, does this mean the node was stuck trying to read the size of a bad file and wouldn’t continue reading the rest of the pieces? I haven’t had any failed audits (yet). I feel like this is something the node should be able to recover from on it’s own by rebuilding the used space db automatically. Although I will admit that I have only a loose understanding of what goes on behind the scenes.
Thanks again for your help! If you feel this needs further investigation, I would be happy to help with what I can. Also, thanks @Alexey for splitting this off into it’s own thread. I should have done so in the first place.
(edit: cleaned up fsck output)