Hey @hashbackup,
Your investigation seems to have been fruitful - you have almost all the relevant files already. So I will focus on clarifying a few details:
Firstly, we have different queries for selecting nodes in different circumstances. Many of the functions in the satellite/satellitedb/nodeselection.go
file actually have nothing to do with selecting nodes for normal uploads. EDIT: we may have used some of these functions at one point for normal uploads, but it doesn’t look like anything in nodeselection.go
is used right now with the current cached node selection for uploads.
Here is the precise path that is followed for a new segment upload (links include the line numbers):
- The uplink makes a request to the satellite to create a Segment. The endpoint is handled inside the
satellite/metainfo
package (side note - “metainfo” refers to the data about where a file is stored, like which pieces are on which storage nodes, as well as encrypted segment-encryption keys and the encrypted path) - Inside the
BeginSegment
function in the metainfo package,overlay.FindStorageNodesForUpload()
is called: storj/metainfo.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub (“overlay” is the service we use to interact with thenodes
table in the database; it includes a caching layer for functionality like node selection for segment uploads) -
FindStorageNodesForUpload
basically attempts to get the storage nodes from the upload cache: storj/service.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub - If the upload cache has been updated recently (within 3 minutes), the cache selects the nodes from in-memory: storj/uploadselection.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub - I am not as familiar with how the cache is implemented, but it looks like it essentially takes in the same configuration as the database for node selection (I will get to that in the next point)
- If the upload cache has not been updated in the last 3 minutes, the cache is refreshed with the database - 1. storj/uploadselection.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub 2. storj/uploadselection.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub. The function that is called on the database is defined here: storj/overlaycache.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub, and the upload config that is used is defined here: storj/config.go at 4b79f5ea862d4dcf5998bf9ae63ec52cd151e26c · storj/storj · GitHub
- After the cache is synced with the db, we would select the nodes from the cache the same way as if the cache did not need to be refreshed
Node selection is something that we have modified and reworked a lot over time, especially as we add new features (e.g. a separate node selection mechanism for repair, graceful exit). Furthermore, we have services like audit, which perform audits on a per-node basis, but select segments to audit without even looking at the nodes
table or using the overlay
service.
Anyway, I hope I was able to help clarify something, and I’m happy to answer any followup questions, as I may not have fully addressed your comment.