Guide to debug my storage node, uplink, s3 gateway, satellite

We have integrated a package called monkit in all storage nodes. It is very usefull for debuging. By default it will open a random port.

In order to connect to the debug port you will need to:

  1. Add debug.addr: ":5999" to the config file. Now the port is fixed and will not change on every restart.
  2. Add -p 127.0.0.1:5999:5999 to the docker run command.
  3. Ready for action!

You can request the following informations:

  1. curl localhost:5999/mon/funcs
    Find out how long a function like GetExpired needs to execute.
  2. curl localhost:5999/mon/ps
    What is running at the moment and how long is it running.
  3. curl localhost:5999/mon/stats
    Statistics.
  4. curl localhost:5999/debug/pprof/goroutine?debug=2
    Full stacktrace with everything that is running.
  5. curl localhost:5999/mon/trace/svg?regex=Upload > upload.svg
    Wait for the next call of the specific function and create a tracefile.
  6. curl localhost:5999/debug/pprof/heap > heap
    Memory leaks.

With v1.5.2 it is possible to change the log level without having to restart the process.

curl -XPUT 'http://127.0.0.1:5999/logging' --header 'Content-Type: text/plain' --data-raw '{"level":"debug"}'

This is just a quick introduction out of my head. The more advanced calls are missing right now. Feel free to add them.

12 Likes
Super heavy queries executed every hour - why?
0.28.4 Disk activity is crazy
Time to redo the logs a little?
Changelog v1.4.2
Strange traffic behavior
Хотел как лучше, надежнее, масшабируемо, а получилось плохо
Upload success very low
Changelog v1.5.2
How storagenode docker app manage the memory?
Fatal message on SN
Changelog v1.6.4
Should I change max-concurrent-requests? >> Note: The implementation has changed. You shouldn't use this setting any longer. Leave it commented out
Errore Disk Space Used
Large amounts of trash on europe-west-1
Node update can cause a FATAL in logs
Node Dashboard Never Finishes Loading
Prometheus Storj-Exporter
[Tech Preview] Email alerts with Grafana and Prometheus
Steadily Increasing CPU usage
Steadily Increasing CPU usage
Too many errors after 1.47.3 update
Copying node data is abysmally slow
Deleted pieces will be sent to trash by default
Filewalker running twice simulatenously?
How to switch log levels
Storage2.piece-scan-on-startup: Not applying
Диск загружен при запуске stroj на docker
There are too many unpaid files on the hard disk
Issue with not deleting trash since months
Wrong used & remaining disk space
1.78.2 slamming my array (Windows)
Errors with ubuntu22
ERROR filewalker failed to get progress from database
How to see details on my node apart from 14002
CLI dashboard for SNOs
Storj - out of sync
Filewalker not running when disabling lazyfilewalker
The trash is unpaid?
ERROR filewalker failed to get progress from database
Script to run used space filewalker limit to one at a time
Filewalker time scan
Storj Terminal Dashboard
Траффик на windows-gui ноде
Return of used data
Resetting node used data amount keeps returning to 0
No bloom filters from AP1, US1 and EU1
Траффик на windows-gui ноде
Easy monitoring for Storagenodes
Log file getting 2GB size increase in just under an hour
No trash-cleanup?
When will "Uncollected Garbage" be deleted?
Disk usage discrepancy?
Changelog v0.21.4
Documentation metrics
Disk usage discrepancy?
Collector unable to delete piece
High IO delay Direct mount bind zfs
1.107.3 Error: collector unable to delete pieces
What others are charging. Is Storj pricing too cheap?
[Tech Preview] Hashstore backend for storage nodes
Corrupt node after power outage
Visual Dashboard - Grafana Mon: 24hr Docker log > Telegraf > InfluxDB
Volunteer for orderdb is locked fix
Changelog v0.28.4 (storage node only)
Disqualified for unknown reason
Speed of graceful exit

I tried to research it and I think my head exploded.

1 Like

Anything I can help with?

Bring up pieces of @subwolf’s head together I think

Well I found the entry point in /storagenode/pkg/process/debug.go, it’s trying to navigate through https://godoc.org/gopkg.in/spacemonkeygo/monkit.v2 … just, no.

@Alexey can you please make my post edit able?

With v1.5.2 it is possible to change the log level without having to restart the process.

curl -XPUT 'http://127.0.0.1:5999/logging' --header 'Content-Type: text/plain' --data-raw '{"level":"debug"}'

3 Likes

done.
I made it a wiki.

1 Like

Small followup question, I assume this means it only changes for this session and will revert to what’s in the config file after a reboot? Can you confirm?

Yes that is correct.

1 Like

Is there currently a debugging method for those using the Windows GUI?

Sure, you just have to do the first step of adding debug.addr: ":5999" to the config.yaml.

I just put the URLs localhost:5999/mon/funcs etc in a browser after that, so you don’t need curl.

2 Likes

Do any of you know how to set the debug.addr environment variable through the docker run command instead of via the config.yaml? Something along the lines of -e --debug.addr 5999 ?

Hello @Doom4535,
Welcome to the forum!

The other way around - you can add it after the name of the container, i.e.

docker run ... storjlabs/storagenode:latest --debug.addr=":5999"

Awesome, thank you @Alexey

In order to use this monithoring tool, do I need to set the log.level to debug? Curently is running with log.level error…

Hi
I’ve followed the guide and got the debug port exported (storage node).
It seems to be working fine but some info is missing.
eg From the curl examples you mention, the 1st (/mon/funcs) only shows 2 functions:
storj.io/private/process.root
storj.io/storj/private/version/checker.(*Client).All

The 5th example (/mon/trace/svg with regex=Download) returns:
Bad Request: regex preselect matches 0 functions
adding &preselect=false, it never completes (no reply) even though I see downloads in the meantime (in the log).

Is there something else that needs to be enabled?
The log level is info if that is relevant.

Are you running the storage node updater? That might block the debug port first before the storage node gets it. They both share the same config file. My solution to this problem was to specify the debug port in my systemd service with --debug.addr

2 Likes

Yep, that’s what it was.
I added the --debug.addr (instead of the config file) and it’s now working as expected.
Thanks!

1 Like