Super heavy queries executed every hour - why?

xopok · August 31, 2019, 1:04pm

I have moved the main DB file to the SSD and mounted it inside the docker container with --mount option. Works like a charm.
However once every hour there’s a huge spike in random disk reads and all of them are for the database file (I straced the culprit process to find it out). See iostat output 4 minutes fragment (filtered to contain only sdb (SSD) and sda (HDD with data)) with high usage below [1].

SSD can easily tolerate 1500 reads per second, on the other hand my spinning disk is at least one order of magnitude slower. If I were to run storage node in a default configuration it’ll spend half of the time with 100% utilization of the spinning disk (probably it depends on the database size and if my DB would be >2x bigger then I’d be totally screwed). It seems wasteful and inefficient.

In another post I’ve read there’s a plan to calculate the size of the storage directory instead, also once every hour. That’ll make it even harder to offload random reads from the HDD to the fast SSD (I can imagine using bcache but it’ll require me to reformat the drive and restore all the data which I’d rather avoid doing).

If it matters, my node has about 1.5TB of stored data (6TB capacity for storj) and the info.db is about 1.2GB.

Thanks!

[1]:

31/08/19 12:44:04
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util
sdb               0.00     0.02  457.37    5.35  1873.13    21.87     8.19     0.21    0.50    0.49    0.88   0.46  21.22
sda               0.00     2.35    1.38   23.30     6.40   122.20    10.42     0.02    1.10    2.71    1.01   0.68   1.68
31/08/19 12:45:04
sdb               0.00     0.00 1567.55    0.02  6658.73     0.07     8.50     0.75    0.49    0.49    1.00   0.48  74.82
sda               0.53     3.00    7.70   38.32   134.47   190.53    14.13     0.05    1.78    5.26    1.08   1.10   5.05
31/08/19 12:46:04
sdb               0.00     0.00 1446.28    0.00  6365.07     0.00     8.80     0.71    0.52    0.52    0.00   0.49  70.80
sda               0.53     2.33   13.30   54.18   282.13   260.87    16.09     0.09    2.12    5.57    1.27   1.30   8.77
31/08/19 12:47:04
sdb               0.35     0.03  462.15    1.38  2104.40     6.80     9.11     0.23    0.55    0.55    0.90   0.49  22.90
sda               3.05     2.65   19.92   23.67   369.60   122.33    22.57     0.08    2.66    4.60    1.03   1.83   7.97

Alexey · August 31, 2019, 1:17pm

Hello @xopok,
Welcome to the forum!

This is query to collect orders to send to the satellite for payout.

xopok · August 31, 2019, 1:20pm

Thanks Alexey

Given the amount of data read it looks like a fullscan of a very big table. Might there be missing indexes that would speedup the query?

littleskunk · August 31, 2019, 5:20pm

Please don’t guess what going on if we have a simple way to find it out: Guide to debug my storage node

On my storage node the order sender takes only a few milliseconds!

xopok · August 31, 2019, 8:50pm

Wow! What a great debugging tool, thanks!

I’ve updated the config and restarted the node.
Right after the node start GetExpired took 4.5 minutes to execute.
Then SpaceUsedTotalAndBySatellite took more than 10 minutes to finish (again, that’s against info.db on SSD). This was captured about 1-2 minutes before the function succeeded:

$ curl localhost:7777/mon/ps
...
[8515920101861776382] storj.io/storj/storagenode/pieces.(*CacheService).Run() (elapsed: 10m26.250118251s)
   [1768256393082306129] storj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite() (elapsed: 10m26.249991427s)
    [8491757356978730587] storj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces() (elapsed: 8m46.858267664s)
     [2639336955131488925] storj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).WalkSatelliteV0Pieces() (elapsed: 3m43.261450696s)

Is this something unexpected?

littleskunk · September 1, 2019, 4:56pm

I would say yes it is expected. With v0.19.0 we moved piece info out of sqlite database into files.

Karama · September 1, 2019, 6:58pm

I found many triggers in config.yaml.

I think these are run every 1 hours or not implemented?

Maybe these are related with.

# how frequently bandwidth usage rollups are calculated
# bandwidth.interval: 1h0m0s

# how frequently expired pieces are collected
# collector.interval: 1h0m0s

# how often the space used cache is synced to persistent storage
# storage2.cache-sync-interval: 1h0m0s

# how frequently Kademlia bucket should be refreshed with node stats
# storage2.monitor.interval: 1h0m0s

# duration between sending
# storage2.orders.sender-interval: 1h0m0s

littleskunk · September 1, 2019, 7:27pm

That is the one you might want to change. Be aware of the side effects that will have. Worst case the storage node will fill the drive to 100% without noticing it because the next cache sync is too far away.