Make the node crash with a fatal error if the storage path becomes unavailable

While similar to the suggestion posted here, the implementation of this one should require no changes to satellite code and focuses on taking the node offline instead of failing audits, when a storage location becomes unavailable. After discussing with @kevink we decided this is different enough to post as a separate suggestion.

We seem to have 2 issues that we want to prevent.

  1. The node starts over at 0 if the mount point isn’t available at start.
  2. The node keeps running if the storage location somehow becomes unavailable during runtime.

Let’s start with point 1. This is only really an issue on Linux if the data location is on the root of an HDD. There are several ways to prevent this, like using a subdirectory on the used HDD or placing your identity on the node’s HDD as well. If either of those is the case, the node simply won’t start and the problem is solved. I suggest changing installation instructions to include storing the identity on the same HDD as the storage location as well as a note to use a subfolder for data and not just store it in the root of an HDD. Additionally for the docker implementation, the entrypoint could be altered to not automatically run setup if the config file is missing. That way the node would stop with an error about the missing config file.

Moving on to point 2. I think this can be fixed by simply making the node check whether the storage path is available from time to time or when any transfer fails with a file not found or similar error. If that is the case, it should crash with a FATAL error of “storage location not available”.
Edit: This check should not only make sure that the location can be read from and written to, but should also double check that it is a valid node storage path. Ideally this could be done by storing some file there with the node id in it so the node can check that the data pointed to belongs to the node that’s accessing it. This could avoid mistakes of pointing to the wrong storage location on mutli node systems as well.

The combination of these 2 things would ensure the node also doesn’t automatically restart as it won’t start if the storage location isn’t available either. As for notifications, work is already being done to email SNOs when their node goes offline and in the mean time, the node being offline can be detected with uptime robot, which is already used by a large part of the community. It wouldn’t mention the specific problem, but a quick look at the node logs would instantly show what the issue is.

The upside of this approach is simplicity. It fixes a very specific problem of the storage location not being available and doesn’t touch the way normal audits work. It doesn’t provide SNOs a way to avoid having to respond to audits either. And doesn’t require changes on the satellite end. Hopefully this relatively simple solution will be considered a quick win by the Storj team.

I noticed @moby responded to the other suggestion related to the same thing. So just pinging you to this one as well. Though something similar may now already be in the works.

I should point out that there are security implications to placing the node’s identity files in a location that is readable by the docker installation.

While it is true that the node software will simply not start if it can’t find the identity files, a potential attacker might be able to steal the identity files through some as yet unknown … so-called 0-day … exploit in docker.

The identity files should only be readable by the user on the system who is running the node.

The default installation stores the identity files at:

~/.local/share/storj/identity/storagenode

The permissions on the directory are such that only the identity owner can traverse the directory structure.

drwxr--r-- 3 user user 4096 Oct 25 2019 storj

Both the permissions as well as the filesystem separation are very important security measures. In general, it is not a good idea to store keys or certificates in filesystem areas that are directly readable by an Internet facing service.

1 Like

I agree with you, there would be simpler solutions and my suggestion only covers the startup although we need a solution for disconnected USB drives too (even though this is a dangerous scenario that often leads to filesystem corruption on the HDD).

I suggest you create a new thread with your ideas because I hope it will reach more people than reviving an old idea that never got an official answer anyways.

This is by definition the case, since you’re mapping the identity location to a docker path which needs to be accessible by the node software. There is nothing preventing you from setting different file and folder permissions on the identity if it’s stored on the same HDD. I’m not saying you should store it in the storage location, just on the same physical disk.

Additionally, storj is not like a web server, where entire paths can be accessed from the outside, opening you up to path traversal attacks. Neither is docker. They both work with limited API’s that make the kind of attack you are describing nearly impossible.

The solution I proposed would cover that scenario as well. I intentionally called it a check on whether the storage location exists. It doesn’t matter where the data is stored, if the path doesn’t exist, it shouldn’t start or if it’s already started, it should crash with a fatal error.

@Alexey would you be so kind to split off these last 4 posts as a separate idea? Title: Make the node crash with a fatal error is the storage path becomes unavailable

1 Like

Already done:

5 Likes

I should have checked first. That’s awesome, thanks!

1 Like

I added this to the top post to describe in more detail what the node could do to check correct config as well as storage availability.

This check should not only make sure that the location can be read from and written to, but should also double check that it is a valid node storage path. Ideally this could be done by storing some file there with the node id in it so the node can check that the data pointed to belongs to the node that’s accessing it. This could avoid mistakes of pointing to the wrong storage location on mutli node systems as well.

3 Likes

This change should implement this suggestion pretty much as described! Thanks!
https://review.dev.storj.io/c/storj/storj/+/2272

2 Likes

The check for drive unmounts has been implemented in the v1.11.1

1 Like