Docker loop after harddisk were unplugged

Hi,

My node is offline since a few hours now, I accidentally unplugged my USB drive and the docker is not booting anymore.
Here’s some logs, I don’t know which is fatal, and I don’t how how to fix the errors, if it’s possible.
Any help appreciated, thanks :slight_smile:

2020-09-29T09:29:48.777Z	ERROR	piecestore:cache	error getting current space used calculation: 	{"error": "pieces error: failed to enumerate satellites: readdirent: bad message", "errorVerbose": "pieces error: failed to enumerate satellites: readdirent: bad message\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:644\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func1:56\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.777Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attempts": 1, "error": "ping satellite error: rpccompat: context canceled", "errorVerbose": "ping satellite error: rpccompat: context canceled\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.778Z	INFO	contact:service	context cancelled	{"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2020-09-29T09:29:48.778Z	INFO	contact:service	context cancelled	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2020-09-29T09:29:48.779Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 1, "error": "ping satellite error: rpccompat: context canceled", "errorVerbose": "ping satellite error: rpccompat: context canceled\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.779Z	INFO	contact:service	context cancelled	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2020-09-29T09:29:48.779Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "attempts": 1, "error": "ping satellite error: context canceled", "errorVerbose": "ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:138\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.779Z	INFO	contact:service	context cancelled	{"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2020-09-29T09:29:48.780Z	ERROR	nodestats:cache	Get pricing-model/join date failed	{"error": "context canceled"}
2020-09-29T09:29:48.780Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "attempts": 1, "error": "ping satellite error: context canceled", "errorVerbose": "ping satellite error: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:138\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.780Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 1, "error": "ping satellite error: rpccompat: context canceled", "errorVerbose": "ping satellite error: rpccompat: context canceled\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.780Z	INFO	contact:service	context cancelled	{"Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2020-09-29T09:29:48.780Z	INFO	contact:service	context cancelled	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2020-09-29T09:29:48.797Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: open config/storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: bad message", "errorVerbose": "pieces error: filestore error: open config/storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa: bad message\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:150\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:359\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.820Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: open config/storage/trash/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: bad message", "errorVerbose": "pieces error: filestore error: open config/storage/trash/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: bad message\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:150\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:359\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.842Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: open config/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa: bad message", "errorVerbose": "pieces error: filestore error: open config/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa: bad message\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:150\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:359\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.864Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: open config/storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa: bad message", "errorVerbose": "pieces error: filestore error: open config/storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa: bad message\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:150\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:359\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.886Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: open config/storage/trash/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa: bad message", "errorVerbose": "pieces error: filestore error: open config/storage/trash/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa: bad message\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:150\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:359\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-09-29T09:29:48.908Z	ERROR	piecestore	upload failed	{"Piece ID": "CSOJEWUHGAZXQNFITBNZSLRJMA55HNZWY3MAOO23JLXWFORLF2BQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "pieces error: filestore error: open config/storage/temp/blob-944619741.partial: bad message", "errorVerbose": "pieces error: filestore error: open config/storage/temp/blob-944619741.partial: bad message\n\tstorj.io/storj/storage/filestore.(*blobStore).Create:166\n\tstorj.io/storj/storagenode/pieces.(*Store).Writer:210\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:290\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
Error: pieces error: failed to enumerate satellites: readdirent: bad message

BTW I can fetch https://version.storj.io/ : my ISP is not blocking it, and 28967 is opened.

Just few hints:

  • what is your node status (you can see it on the GUI or CLI dashboard)?
  • are you sure your USB device is mounted on your machine (if you are on Linux, there a lot of chances that you need to manually re-mount your disk)?
  • did you try to restart the docker container?
  • did you try to restart your whole server (the one that hosts the Docker container), after plugging back your HDD?
  • node status is offline on the web:14002, and unreachable through gui (need to wait for the storj boot to be completed)
  • sure mounted and running on ubuntu 20.04, already tried to umount/remount several times
  • I tried to restart the container and even the server, still the same issue

Check your disk for errors using command fsck

in progress, it seems the disk has errors
finger crossed! :slight_smile:

Please keep us posted :slight_smile:
If your disk has errors, it’s not a good sign :confused:

I hope your disk is still OK and you just have a data corruption. In this case you may probably need to recreate a new identity since your node may be DQed

fsck fixed some errors and my node is up and running again!
Thank you very much :slight_smile:

I hope I won’t be disqualified in the next few days then.

2 Likes

Just got a mail of diqualification :frowning:
Any chance that I can host data on this node in the next few days?
Or I need to start a new node from scratch with a new identity?
It’s pretty rude since I was online for 5 months without a second of downtime before this unfortunate event :confused:

Disqualification is permanent and you have to start from scratch now. Are you disqualified from all satellites ?

only one for now (europe north, the main one for me)

Could you check that the node id in your email DQ match the output of the dashboard DQ (Just for my interest)

you shouldn’t get DQ for just accidentally unplugging your hdd…
you sure thats why it got DQ…

You can. For example:

  • the automount options (GUI), it’s just mount a disk to a different random mountpoint;
  • mounting via device name (i.e. /dev/sda1 instead of UUID);
  • not mount option in /etc/fstab at all;

Using the root of the drive for storage with one or all above - thus data stored in mountpoint (i.e. on your system drive) and then hided when the actual mount is happening (common for the Unraid, because they mount disks to userspace after the full load, when the docker already started).
And of course, hardware problem - the disk was unplugged abruptly, so data can be corrupted. Depending on filesystem it could be heavily corrupted - ExFAT, for example.

1 Like

I’m DQed on all nodes now. Yes my data was corrupted and fsck changed a lot of things. That’s why I think.
Will start another node today and will be more careful with my electric plugs…
Thanks anyway! Great to know the community is here to help :slight_smile:

3 Likes

tho i’m on linux, i haven’t really tried the fsck
but i know that in windows i always avoid running the checkdisk thing like the plague…
i’ve often seen it totally … over a filesystem… i’m sure it’s doing something useful for some stuff because it can at times fix a disk that won’t boot and such… but to be fair i’ve seen it mess with data and directories it shouldn’t more than it’s helped me.

maybe you can fix the data if you haven’t deleted it already… modifications to a filesystem could very well be reversed or something… but i duno, just saying it may be an option

ofc if you are DQ it’s not much help to recover the data…
really don’t let tools run on your drives if they don’t have to …

would be interesting to know if it was the disk or the repair tool that broke the data…
because if it was the repair tool that broke it, then thats a danger for most SNO’s

The data is already corrupted. Do you want to have unknown bugs in the future and wonder why is it happened, spent money on hardware and then figure out that filesystem was just simple corrupted and you could save a lot of time and money if you did a fsck or chkdsk in time?
It’s better to figure that out right now, than in some time in the future. Fixing the past errors is way costly than in time.

1 Like

i just know that i’ve had more times chkdsk just ruined my data rather than fixing it… and afterwards its all weird file and folder names… and everything worked fine before chkdsk ran…
so i basically stopped using it a long long time ago, and never really looked back… never really had problems doing without it… i mean disks go bad at times… thats just how it is… chkdsk starts automatically and then in 30 sec its changed the names of 50000 files
thats just not optimal for my usage, never did get around to figuring out why it would do that…

but i’ve seen it happen across many different computers and over vast spans of time, i just always turned it off when i got my hands on a windows machine… never seemed to give me any grief doing without it on like close to maybe a hundred machines over a decade or so…

but if i don’t have any important data on the drive it can sometimes fix a filesystem so a windows will boot… :smiley: but thats about all i dare use it for…

chkdsk sucks imho, but atleast chkdsk isn’t my headache anymore… and nor is fsck :smiley: since now i’m on zfs and it has it’s own stuff for that… which seems to do a great job thus far.

That is why I set up the following configuration for my HDD (i’m using Linux):

  • each disk is mounted with UUID instead of standard path (e.g. “UUID=b14382af-cff4-4d80-b1e8-1f36b546240d” instead of “/dev/sda1”)
  • each disk is mounted on a dedicated directory in the “/media” directory (for example: /media/storj1)
  • in each disk, there is 2 directories: storj1_data and *storj1_identity**.

This way, if the Storj container restarts without the disk attached (or attached at a wrong place, or a wrong HDD is attached to the directory), the container should stop without trying to connect.

2 Likes