Ingress stopped on a node with migration ON

I didn’t receive any emails regarding node offline, the node is not full, is the only one in course of migration to hashstore, and the ingress stopped suddenly a few days ago. The logs are clean, nothing pops out of the ordinary, info custom log level.
The second node is identical to this one, but is not on migration. They share the same IP and running on the same machine, Syno DS216+ 8GB RAM. That one has ingress. What could it be?

Node 1:

Node 2:

Egress still working? When you look at the logs you see downloaded but no uploaded? Tried restarting the node?

Egress still working according to dashboard. I filtered out download and upload log entries, so only dashboard output.
I looked at the logs from 1 feb untill now. The migration started on 1 feb. No errors, or db errors, or anything. All walkers are finishing their jobs. It’s like the node thinks it’s full, or the satellites think my node is full. But the dashboard and the OS shows disk not full.
Maybe I should make a ticket at support…
I don’t remember how to check the success rate without graphana; I will look into it.

Now I restarted the node. These are the lines after restart, without the private info, of course:

INFO    pieces:trash    emptying trash finished {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "elapsed": "231.100764ms"}
INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
INFO    pieces:trash    emptying trash finished {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "elapsed": "42.341742ms"}
INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
INFO    pieces:trash    emptying trash finished {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "elapsed": "36.658883ms"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "count": 376}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "count": 210}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "count": 3163}
INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Total Audits": 1563888, "Successful Audits": 1546649, "Audit Score": 1, "Online Score": 0.9967923938284693, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}
INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Total Audits": 2355913, "Successful Audits": 2343627, "Audit Score": 1, "Online Score": 0.9948546918889446, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}
INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Total Audits": 2154622, "Successful Audits": 2132783, "Audit Score": 1, "Online Score": 0.9952870145775708, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 10000, "size": 2222186496}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 20000, "size": 4440919808}
INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "count": 295}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "count": 244}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "count": 4045}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 30000, "size": 6658002176}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 40000, "size": 8927165696}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 50000, "size": 11160358656}
INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 1117424919808, "Total Pieces Content Size": 1114304376576, "Total Pieces Count": 6094811, "Duration": "1h39m14.062601676s"}
INFO    pieces  used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 46186965504, "Total Pieces Content Size": 46081612800, "Total Pieces Count": 205767, "Duration": "5m30.686663165s"}
INFO    pieces  used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 60000, "size": 13424731904}
INFO    bandwidth       Persisting bandwidth usage cache to db  {"Process": "storagenode"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "count": 349}
WARN    contact:service Your node is still considered to be online but encountered an error.    {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Error": "contact: failed to ping storage node using QUIC, your node indicated error code: 0, rpc: quic: timeout: no recent network activity"}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "count": 219}
INFO    orders  finished        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "count": 4194}
INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 111694336, "Total Pieces Content Size": 111550464, "Total Pieces Count": 281, "Duration": "16m23.991922546s"}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 70000, "size": 15676877824}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 80000, "size": 17805887488}
INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 90000, "size": 20015865600}

Does your QUIC always say misconfigured or is that new?

Yes, since some new QUCK upgrade last year. That’s not an issue.
So, they modified some flags in the debug stats, but I managed to find something;
the download success rate is 83%, but the uploads are none after restart:

Node 1, find all “upload” entries:

Line 3398: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore current=0.000000
	Line 3399: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore highwater=0.000000
	Line 3400: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore successes=0.000000
	Line 3401: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore errors=0.000000
	Line 3402: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore panics=0.000000
	Line 3403: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore failures=0.000000
	Line 3404: function,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore total=0.000000
	Line 3405: function_times,kind=success,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore count=0.000000
	Line 3406: function_times,kind=failure,name=upload-stream-recv,scope=storj.io/storj/storagenode/piecestore count=0.000000
	Line 3646: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore current=0.000000
	Line 3647: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore highwater=0.000000
	Line 3648: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore successes=0.000000
	Line 3649: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore errors=0.000000
	Line 3650: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore panics=0.000000
	Line 3651: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore failures=0.000000
	Line 3652: function,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore total=0.000000
	Line 3653: function_times,kind=success,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore count=0.000000
	Line 3654: function_times,kind=failure,name=upload-stream-send-and-close,scope=storj.io/storj/storagenode/piecestore count=0.000000
	Line 3870: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore current=0.000000
	Line 3871: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore highwater=0.000000
	Line 3872: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore successes=0.000000
	Line 3873: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore errors=0.000000
	Line 3874: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore panics=0.000000
	Line 3875: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore failures=0.000000
	Line 3876: function,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore total=0.000000
	Line 3877: function_times,kind=success,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore count=0.000000
	Line 3878: function_times,kind=failure,name=upload-handle-message,scope=storj.io/storj/storagenode/piecestore count=0.000000

The other node that has ingress, has 97% download success rate, and shows this for ingress…
Node 2, find all “upload_success” entries:

Line 3388: upload_success_count,scope=storj.io/storj/storagenode/piecestore high=36438.000000
	Line 3389: upload_success_count,scope=storj.io/storj/storagenode/piecestore low=1.000000
	Line 3390: upload_success_count,scope=storj.io/storj/storagenode/piecestore value=36438.000000
	Line 3669: upload_success_byte_meter,scope=storj.io/storj/storagenode/piecestore rate=697502.900446
	Line 3670: upload_success_byte_meter,scope=storj.io/storj/storagenode/piecestore total=7703840532.000000
	Line 3671: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore count=36438.000000
	Line 3672: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore sum=7703840532.000000
	Line 3673: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore min=10.000000
	Line 3674: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore max=2319360.000000
	Line 3675: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore rmin=512.000000
	Line 3676: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore ravg=213488.000000
	Line 3677: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore r10=768.000000
	Line 3678: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore r50=8960.000000
	Line 3679: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore r90=291737.000000
	Line 3680: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore r99=2319360.000000
	Line 3681: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore rmax=2319360.000000
	Line 3682: upload_success_size_bytes,scope=storj.io/storj/storagenode/piecestore recent=2319360.000000
	Line 3820: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore count=36438.000000
	Line 3821: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore sum=56848785227.734955
	Line 3822: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore min=13.234460
	Line 3823: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore max=39305419.876404
	Line 3824: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore rmin=11084.911133
	Line 3825: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore ravg=804899.687500
	Line 3826: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore r10=75874.743750
	Line 3827: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore r50=293532.234375
	Line 3828: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore r90=1993298.412500
	Line 3829: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore r99=5388751.885000
	Line 3830: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore rmax=5738156.500000
	Line 3831: upload_success_rate_bytes_per_sec,scope=storj.io/storj/storagenode/piecestore recent=564788.358752
	Line 3950: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore count=36438.000000
	Line 3951: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore sum=48791566987484.000000
	Line 3952: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore min=2621147.000000
	Line 3953: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore max=803995111419.000000
	Line 3954: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore rmin=3319833.000000
	Line 3955: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore ravg=1359864576.000000
	Line 3956: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore r10=11194290.000000
	Line 3957: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore r50=22616566.000000
	Line 3958: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore r90=1051107084.000000
	Line 3959: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore r99=28459872860.000000
	Line 3960: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore rmax=55587495936.000000
	Line 3961: upload_success_duration_ns,scope=storj.io/storj/storagenode/piecestore recent=4106600223.000000

No idea, but the new 1.122.1 version is being rolled out now. Maybe you can manually download it and see if it works any better for your hashstore node.

What the heck? The dashboard shows less occupied space than a few hours ago:
I won’t do a manual update, cause any restart might bring the old version back.
All the startup scans finished, because the badger cash is still on, and they finish pretty quickly.

Ou yeah, I forgot to mention, the badger is ON, the startup piece scan is ON and the lazzy is OFF.

I think that’s just how the current version works with hashstore. It’s a bug that they fixed in the newest release. It’s like it’s only showing the size of the data in the old piecestore which is getting smaller because it’s all getting moved into the hasstore which it does not show the size of.

From what I’ve seen, it stopped doing version rollbacks when they started saving the executable inside the bin folder.

Also, your custom logging does not show date and time?

1 Like

Ah, yes, I forgot about the dashboard bug. That’s right.
I removed the time stamps, for privacy. I know, I exagerated. :smile:

Ah, I was just trying to calculate your migration rate from your logs.

Between 7 and 12 min per 10000 pieces.
I started on 1 feb around 0:00h with 5TB.

You may check the reported free space using the multinode dashboard:

Likely the free space is reported as negative too.
The workaround is to enable a dedicated disk feature.

1 Like

I forgot about that post, and I read the entire thread. Thanks! I figured it’s related to the migtation.
I can’t use the dedicated disk feature because it’s not dedicated entirely for storagenode.
I will wait for the update. The ingress is going to the other node for now, so I don’t miss anything.

1 Like

Node updated to 1.122.1, now shows filled HDD, but DSM reports 2.5TB free. The drive is 8TB. Migration active and not finished.

What size do you get when you run du --si inside your hashstore folder?

I’ll let it finish the migration. I don’t want to choke it with another scan.

The nice thing about the hashstore is that instead of millions of little files, there’s only a few big files, so it only takes a second to scan the file size.

3 Likes

yeah, hashstore scan finishes quickly, but if there is still blob storage left it will take ages. before starting the migration i ran a du scan for the number of file and it took me 5 days to finish with 35M files.

i m having the same issue with ingress dropped to zero since the last two days. migration should be over in 2-3 days according to my esimations, but i doubt the ingress is related to that, even so when seeing that it persist even on the version where disks space calculations are finally correct

I believe, that your node is reporting the negative free space too, as here:

The workaround is to use The whole disk/partition allocation feature.
This problem should be fixed in 1.122.x

That’s why I mentioned the version. Is still not fixed. Now shows 635 GB overused, but I have 2.5 TB of disk free.