Space accounting appears to be broken on version 1.5.2

Pentium100 · June 19, 2020, 5:59am

Here’s something interesting I just noticed -
Dashboard:

                   Available        Used     Egress     Ingress
     Bandwidth           N/A      2.7 TB     0.7 TB      2.0 TB (since Jun 1)
          Disk        3.1 GB     14.2 TB

df output

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        15T   13T  2.6T  83% /storj

The run command has space set to 14200GB

The trash folder is 1.1TB though.
So, is trash accounted as twice the used space?
OTOH, at least it’s not the other way around, the disk running out of space while the node thinks it still has space.

EDIT: hmm… the web dashboard does show 1.1TB available, so which value is used by the node? Though I guess I’ll find out when the available space drops to zero on CLI dashboard.

BrightSilence · June 19, 2020, 7:31am

I’ve noticed the same thing as a result of the massive garbage collection on stefan-benten. My node isn’t out of space yet though, but I did see the same increase in space used in the CLI dashboard and the web dashboard is showing a lot more available space. However, I know that 1.6.3 changes this part of the web dashboard significantly, so I’m not sure the 1.5.2 values still matter.

SGC · June 19, 2020, 8:11am

zfs does reduce the empty space in each file, thus files that take up a certain amount of space takes up less written to disk on zfs… i’m getting a good deal out of that…
you sure thats not what you are seeing… you should do zfs get all dataset
and check the compression ratio in relation to written on disk… the linux commands usually show what is written to disk… and not the actual size of the files on an uncompressed volume.
which is what storj seems to calculate in… if i correct for the compression ratio then my numbers are like within 1%

the compression ratio is written viewed from the original size… thus x2.0 compression is equal 50% and x1.50 is 75% … thus one cannot just multiply the compression ratio with the size written to disk to get the true size, but have to look at the logicalused number …

or go through the trouble of actually doing the math, which i find annoying… i’m sure somebody more educated in mathematics could calculate that with ease…

as you can see here the logicalused is 10.2tb
granted my node say 14.15tb remaining out of 24tb assigned… but i have had like 150k files deleted in the last 72hr … so most likely just in trash, since the numbers was nearly flawless before this big deletion seqeuence.

    NAME               PROPERTY              VALUE                  SOURCE
tank/storagenodes  type                  filesystem             -
tank/storagenodes  creation              Sat Jun 13 14:05 2020  -
tank/storagenodes  used                  9.04T                  -
tank/storagenodes  available             12.1T                  -
tank/storagenodes  referenced            30.6K                  -
tank/storagenodes  compressratio         1.12x                  -
tank/storagenodes  mounted               yes                    -
tank/storagenodes  quota                 none                   default
tank/storagenodes  reservation           none                   default
tank/storagenodes  recordsize            512K                   local
tank/storagenodes  mountpoint            /tank/storagenodes     default
tank/storagenodes  sharenfs              off                    default
tank/storagenodes  checksum              on                     default
tank/storagenodes  compression           zle                    local
tank/storagenodes  atime                 off                    local
tank/storagenodes  devices               on                     default
tank/storagenodes  exec                  on                     default
tank/storagenodes  setuid                on                     default
tank/storagenodes  readonly              off                    default
tank/storagenodes  zoned                 off                    default
tank/storagenodes  snapdir               hidden                 default
tank/storagenodes  aclinherit            restricted             default
tank/storagenodes  createtxg             34                     -
tank/storagenodes  canmount              on                     default
tank/storagenodes  xattr                 off                    local
tank/storagenodes  copies                1                      default
tank/storagenodes  version               5                      -
tank/storagenodes  utf8only              off                    -
tank/storagenodes  normalization         none                   -
tank/storagenodes  casesensitivity       sensitive              -
tank/storagenodes  vscan                 off                    default
tank/storagenodes  nbmand                off                    default
tank/storagenodes  sharesmb              off                    default
tank/storagenodes  refquota              none                   default
tank/storagenodes  refreservation        none                   default
tank/storagenodes  guid                  12218601271439980832   -
tank/storagenodes  primarycache          all                    default
tank/storagenodes  secondarycache        all                    default
tank/storagenodes  usedbysnapshots       0B                     -
tank/storagenodes  usedbydataset         30.6K                  -
tank/storagenodes  usedbychildren        9.04T                  -
tank/storagenodes  usedbyrefreservation  0B                     -
tank/storagenodes  logbias               latency                default
tank/storagenodes  objsetid              68                     -
tank/storagenodes  dedup                 off                    default
tank/storagenodes  mlslabel              none                   default
tank/storagenodes  sync                  always                 inherited from tank
tank/storagenodes  dnodesize             legacy                 default
tank/storagenodes  refcompressratio      1.00x                  -
tank/storagenodes  written               30.6K                  -
tank/storagenodes  logicalused           10.2T                  -
tank/storagenodes  logicalreferenced     12K                    -
tank/storagenodes  volmode               default                default
tank/storagenodes  filesystem_limit      none                   default
tank/storagenodes  snapshot_limit        none                   default
tank/storagenodes  filesystem_count      none                   default
tank/storagenodes  snapshot_count        none                   default
tank/storagenodes  snapdev               hidden                 default
tank/storagenodes  acltype               off                    default
tank/storagenodes  context               none                   default
tank/storagenodes  fscontext             none                   default
tank/storagenodes  defcontext            none                   default
tank/storagenodes  rootcontext           none                   default
tank/storagenodes  relatime              off                    default
tank/storagenodes  redundant_metadata    all                    default
tank/storagenodes  overlay               off                    default
tank/storagenodes  encryption            off                    default
tank/storagenodes  keylocation           none                   default
tank/storagenodes  keyformat             none                   default
tank/storagenodes  pbkdf2iters           0                      default
tank/storagenodes  special_small_blocks  0                      default

Pentium100 · June 19, 2020, 8:18am

this is ext4, viewing from inside the VM.

SGC · June 19, 2020, 8:21am

oh… rgr

i really should get my node move to vm’s also… would make my bandwidth monitoring so much easier…

Pentium100 · June 19, 2020, 8:27am

For me it’s ext4 inside a zvol.
Since I need to move the data to another zvol, I thought about using zfs inside the VM, but it looks like ext4 is a bit faster than zfs-inside-zfs.

bundyboss · June 19, 2020, 8:17am

Hi,
same problem again but with more Giga
Storage Node Dashboard ( Node Version: v1.5.2 )
Last Contact ONLINE
Uptime 125h46m41s

               Available        Used      Egress     Ingress
 Bandwidth           N/A     78.2 GB     78.1 GB     53.0 MB (since Jun 1)
      Disk     -253.7 GB      0.8 TB

I’ve configured 500Gb. My VPS will be full if configuration is not respected again. How I can solve that?

bundyboss · June 19, 2020, 8:21am

I’ve stopped storage node and start it again and now it’s OK.
Does I need to check storage often?
Uptime 2m5s

               Available         Used      Egress     Ingress
 Bandwidth           N/A      78.2 GB     78.1 GB     53.0 MB (since Jun 1)
      Disk      119.8 MB     499.9 GB

SGC · June 19, 2020, 4:57pm

initially my reason to go baremetal was because of performance and then it also turned out to be fairly difficult to actually have the storagenode be safe and good over a shared storage… i’m still looking for a good solution but thus far most of them are crap… tried nfs, smb and afaik iSCSI only works for sharing a full volume…

NFS is slow, tho much easier to get to work than smb, ofc when connecting from a windows machine one has to jump through hoops, and don’t even get me started on the permissions…
of nfs has the advantage of being very multi users friendly and has some pretty advanced options.

SMB can be configured better when accessing from a windows machine, making it a seemless experience, when one can find the right way to write a config file, took me ages… and users management and permissions… FUBAR i think best covers it…

so i’ve been looking for a way to give my vm direct access to parts of the zfs pool… but yeah not really easy it seems… if i want to be able to access it from others vm’s also and from my host OS.

yeah i don’t think you gain anything from running zfs on zfs either… aside from twice the overhead in performance loss if not more.
i do set my vm’s to CoW or qcow2 or whatever its called in proxmox, not sure if that is a mistake actually… i mean the vm has no connection aside from data storage to the pool below, so if it copies something it will basically be able to destroy the data i suppose… or make write holes… not sure… tho
maybe i should do a test of that in the near future.

i should really look in to finding a way to get my storagenode onto a vm and keep the storagenode data directly on the pool… but who has the time

BrightSilence · June 19, 2020, 5:31pm

Just wanted to point to the topic >> Space accounting appears to be broken on version 1.5.2

Pentium100 · June 20, 2020, 4:25am

Well, apparently the node does use the CLI dashboard value, so the space accounting IS broken:

2020-06-19T06:06:30.110Z        INFO    piecestore      upload started  {"Piece ID": "E45S66IX7WWZM5PZPURBRGXEIV5SFIOYNCROHMJ5PRKR4P7EJXJA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT_REPAIR", "Available Space": -1750182528}
2020-06-19T06:06:31.531Z        ERROR   piecestore      upload failed   {"Piece ID": "E45S66IX7WWZM5PZPURBRGXEIV5SFIOYNCROHMJ5PRKR4P7EJXJA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT_REPAIR", "error": "out of space", "errorVerbose": "out of space\n\tstorj.io/common/rpc/rpcstatus.Error:82\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:352\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

The API shows slightly differently:

  "diskSpace": {
    "used": 13089285879936,
    "available": 14200000000000,
    "trash": 1590589517568
  },

And here’s what df shows:

Filesystem          1B-blocks           Used     Available Use% Mounted on
/dev/sda1      16490148446208 13482789212160 2853410840576  83% /storj

Alexey · June 20, 2020, 7:54pm

Neither first nor the second are compatible with SQLite. You will have problems sooner or later. Please, do not use them.

Pentium100 · June 22, 2020, 8:01am

Restarting the node fixed it. now it shows 0.7TB available and is accepting data.

BrightSilence · June 22, 2020, 8:17am

Same here, it may take a while after the restart for the numbers to be corrected though. Just keep it running after the restart and it should fix itself.

SGC · June 22, 2020, 3:38pm

if it was somebody else i would have assumed it to be a bit flip… i haven’t … had… [checks]
wow yeah mine is also so far off it’s not even funny…

NAME                USED  AVAIL     REFER  MOUNTPOINT
tank/storagenodes  9.13T  12.0T     30.6K  /tank/storagenodes

and log says
Available Space": 7359872049131

and uptime is 145h
thats a pretty big deviation…

we should really have some sort of alert that people can set so the node can tell us there is a problem with it… setting individual alerts for all of this stuff would be a nightmare.