Next trash problem: Trash not deleting

Next trash problem:
I don’t see a reason why this trash does not get deleted:

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
2024-04-22  2024-04-30  2024-05-10  2024-05-12  2024-05-13  2024-05-14  2024-05-16  2024-05-17  2024-05-18

to compare with other

ls /storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
2024-05-16  2024-05-17  2024-05-18

So it appears to be working generally but not for the US-1 satellite.

These are the subfolders in the oldest US-1 trash date folder. Number does not decrease.

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-22
a2  al  b4  bn  c6  cp  da  dr  ec  et  fe  fv  gg  gx  hi  hz  ik  j3  jm  k5  ko  l7  lq  mb  ms  nd  nu  of  ow  ph  py  qj  r2  rl  s4  sn  t6  tp  ua  ur  vi
a3  am  b5  bo  c7  cq  db  ds  ed  eu  ff  fw  gh  gy  hj  i2  il  j4  jn  k6  kp  la  lr  mc  mt  ne  nv  og  ox  pi  pz  qk  r3  rm  s5  so  t7  tq  ub  us  vj
a4  an  b6  bp  ca  cr  dc  dt  ee  ev  fg  fx  gi  gz  hk  i3  im  j5  jo  k7  kq  lb  ls  md  mu  nf  nw  oh  oy  pj  q2  ql  r4  rn  s6  sp  ta  tr  uc  ut  vk
a5  ao  b7  bq  cb  cs  dd  du  ef  ew  fh  fy  gj  h2  hl  i4  in  j6  jp  ka  kr  lc  lt  me  mv  ng  nx  oi  oz  pk  q3  qm  r5  ro  s7  sq  tb  ts  ud  uu  vl
a6  ap  ba  br  cc  ct  de  dv  eg  ex  fi  fz  gk  h3  hm  i5  io  j7  jq  kb  ks  ld  lu  mf  mw  nh  ny  oj  p2  pl  q4  qn  r6  rp  sa  sr  tc  tt  ue  uv
a7  aq  bb  bs  cd  cu  df  dw  eh  ey  fj  g2  gl  h4  hn  i6  ip  ja  jr  kc  kt  le  lv  mg  mx  ni  nz  ok  p3  pm  q5  qo  r7  rq  sb  ss  td  tu  uf  uw
aa  ar  bc  bt  ce  cv  dg  dx  ei  ez  fk  g3  gm  h5  ho  i7  iq  jb  js  kd  ku  lf  lw  mh  my  nj  o2  ol  p4  pn  q6  qp  ra  rr  sc  st  te  tv  ug  ux
ab  as  bd  bu  cf  cw  dh  dy  ej  f2  fl  g4  gn  h6  hp  ia  ir  jc  jt  ke  kv  lg  lx  mi  mz  nk  o3  om  p5  po  q7  qq  rb  rs  sd  su  tf  tw  uh  uy
ac  at  be  bv  cg  cx  di  dz  ek  f3  fm  g5  go  h7  hq  ib  is  jd  ju  kf  kw  lh  ly  mj  n2  nl  o4  on  p6  pp  qa  qr  rc  rt  se  sv  tg  tx  ui  uz
ad  au  bf  bw  ch  cy  dj  e2  el  f4  fn  g6  gp  ha  hr  ic  it  je  jv  kg  kx  li  lz  mk  n3  nm  o5  oo  p7  pq  qb  qs  rd  ru  sf  sw  th  ty  uj  va
ae  av  bg  bx  ci  cz  dk  e3  em  f5  fo  g7  gq  hb  hs  id  iu  jf  jw  kh  ky  lj  m2  ml  n4  nn  o6  op  pa  pr  qc  qt  re  rv  sg  sx  ti  tz  uk  vb
af  aw  bh  by  cj  d2  dl  e4  en  f6  fp  ga  gr  hc  ht  ie  iv  jg  jx  ki  kz  lk  m3  mm  n5  no  o7  oq  pb  ps  qd  qu  rf  rw  sh  sy  tj  u2  ul  vc
ag  ax  bi  bz  ck  d3  dm  e5  eo  f7  fq  gb  gs  hd  hu  if  iw  jh  jy  kj  l2  ll  m4  mn  n6  np  oa  or  pc  pt  qe  qv  rg  rx  si  sz  tk  u3  um  vd
ah  ay  bj  c2  cl  d4  dn  e6  ep  fa  fr  gc  gt  he  hv  ig  ix  ji  jz  kk  l3  lm  m5  mo  n7  nq  ob  os  pd  pu  qf  qw  rh  ry  sj  t2  tl  u4  un  ve
ai  az  bk  c3  cm  d5  do  e7  eq  fb  fs  gd  gu  hf  hw  ih  iy  jj  k2  kl  l4  ln  m6  mp  na  nr  oc  ot  pe  pv  qg  qx  ri  rz  sk  t3  tm  u5  uo  vf
aj  b2  bl  c4  cn  d6  dp  ea  er  fc  ft  ge  gv  hg  hx  ii  iz  jk  k3  km  l5  lo  m7  mq  nb  ns  od  ou  pf  pw  qh  qy  rj  s2  sl  t4  tn  u6  up  vg
ak  b3  bm  c5  co  d7  dq  eb  es  fd  fu  gf  gw  hh  hy  ij  j2  jl  k4  kn  l6  lp  ma  mr  nc  nt  oe  ov  pg  px  qi  qz  rk  s3  sm  t5  to  u7  uq  vh

Number inside of /vl does not decrease as well:

/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-22/vl | wc -l
9518

I see only the one reason - it’s not on 1.104.5 or later.

My API is sying "version":"1.104.5".

This is interesting, because it must be resolved in that version.

No, it is not interesting. I am getting really tired over this.
I am sitting on a mountain of trash on various nodes and it does not get deleted for many different reasons and every node seems to have a different issue.
I don’t know why this was introduced this way.

Seems only your node have this issue, what’s a difference?
My nodes working perfectly fine.

Could you please describe your setup for the node which has this problem?

Also, does your databases are ok? and no filesystem problems (you did run fsck on them)?

Does this look normal to you?

2024-05-18T22:46:54Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "createdBefore": "2024-04-27T17:59:59Z", "bloomFilterSize": 5960770, "Process": "storagenode"}

This looks like a very old bloomfilter but log line is from yesterday.
All others seem to be on time:

2024-05-18T22:46:54Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Process": "storagenode", "createdBefore": "2024-05-14T17:59:59Z", "bloomFilterSize": 18591}
2024-05-18T22:46:54Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode"}
2024-05-18T22:46:54Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "createdBefore": "2024-04-27T17:59:59Z", "bloomFilterSize": 5960770, "Process": "storagenode"}
2024-05-18T22:46:55Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode"}
2024-05-18T22:46:55Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "createdBefore": "2024-05-12T17:59:59Z", "bloomFilterSize": 1781944}
2024-05-18T22:46:55Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Process": "storagenode"}
2024-05-18T22:46:55Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Process": "storagenode", "createdBefore": "2024-05-12T17:59:59Z", "bloomFilterSize": 311747}

I don’t understand it but it looks like the US-1 bloomfilter is behind the others.

I do not see any retain in your excerpts. Is it intentional?
Only the retain process will move data to the trash.

And then pieces:trash should remove an expired data from the trash.

No, I grepped for gc-filewalker only.
This is grepped with retain:

2024-05-18T22:46:51Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-05-14T17:59:59Z", "Filter Size": 18591, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-05-18T22:46:51Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-04-27T17:59:59Z", "Filter Size": 5960770, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-18T22:46:51Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-05-12T17:59:59Z", "Filter Size": 1781944, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-18T22:46:51Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-05-12T17:59:59Z", "Filter Size": 311747, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}

Still it looks like the filter for US-1 is somehow behind the others. Maybe it took much longer to process other bloomfilters before this one? US-1 is the largest satellite.
But why then no deletion of the folder date 22th of April?

so, it’s not performed yet. Wait for the succeed operation for each satellite.

But why:

  • Bloomfilter date "Created Before": "2024-04-27T17:59:59Z" when the others are "Created Before": "2024-05-12T17:59:59Z"
  • /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa 2024-04-22 not deleted and not deleting? Still 9518 files. So no progress.

Just stop the lazy file walker already. It only creates problems. Since I stopped it months ago, I never had problems with undeleted things, databases not updated, dashboards left behind etc. Just stop that lazy walker, run a full file walker, and than you can stop that too.
So many complain but don’t try to change anything in their parameters…

But there is no progress in removing them from trash:

2024-05-18T22:47:01Z    INFO    lazyfilewalker.trash-cleanup-filewalker starting subprocess     {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-18T22:47:01Z    INFO    lazyfilewalker.trash-cleanup-filewalker subprocess started      {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-18T22:47:01Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "dateBefore": "2024-05-11T22:47:01Z"}
2024-05-18T22:47:02Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      Database started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode"}

It should be working on deletion of the date folder 2024-04-22.

But there is no progress. Numbers are like before:

/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-22/vl | wc -l
9518

It is this satellite only that still has folders from dateBefore": "2024-05-11T22:47:01Z".
All others have only folders 11/05 and later.
So the trash deletion process is only not working/progressing for this satellite.
Or does it not trash while retain is running? But this is not progressing also.

It seems that the whole retain/trashing process for this satellite has died and is not doing anything. The latest trash date folder for US-1 remains at:

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-05-18/ | wc -l
1

I don’t know if it is a lazy filewalker issue. Maybe. But it is the default setting so I expect it to work and not to break everything.
But I can try without. However when I restart no the bloomfilters probably get deleted. So I’ll wait first if there is some other suggestion what to do or what else to check.

Please search for

, not the

or at least search for finished.

2024-05-18T22:46:51Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-18T22:46:53Z    INFO    pieces:trash    emptying trash finished {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "elapsed": "1.637657263s"}
2024-05-18T22:46:53Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-05-18T22:47:00Z    INFO    pieces:trash    emptying trash finished {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "elapsed": "7.088689171s"}
2024-05-18T22:47:00Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-05-18T22:47:01Z    INFO    pieces:trash    emptying trash finished {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "elapsed": "1.361623768s"}
2024-05-18T22:47:01Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}

Can you check if there’s still a subprocess running the trash chore? Maybe with docker top <storagenode_container_name> if it’s a docker node

Of course. This is what I see when I grep for “trash”:

root                1888494             1888336             0                   May18               ?                   00:08:14            /app/storagenode trash-cleanup-filewalker --storage storage --info database/piecestore.db --info2 database/info.db --pieces storage --driver --filestore.write-buffer-size 4.0 MiB --filestore.force-sync=false --log.output stderr --log.encoding json --lower-io-priority=true

I don’t know how I can display headers for the line.

Thanks @jammerdan. Did you say there’s a retain/gc filewalker running for the same satellite? I’m trying to confirm if there might be possible resource contention between the two processes which has stalled the trash chore

1 Like

I can not tell if something has been deleted. But no there was 1 bloomfilter for every satellite
in the retain folder and it seemed that all getting processed but the US-1 satellites one.
In the meantime a second bloomfilter for the Saltlake satellite has arrived so there is two of them now.
It might be that the trash thread for US-1 satellite has died or something and don’t get revived