Raspberry Pi4 - Node crashes since today... weird GO error

Hey guys,

do you know where to start digging for the problem?

It started today afternoon, all of a sudden… RP4 is fully uptodate and I am using the latest Storj Node docker version.

Any help is appreciated :slight_smile:

Can you paste it between 3 backticks like ``` so its visible ?

panic: runtime error: makeslice: len out of range [recovered],
	panic: runtime error: makeslice: len out of range [recovered],
	panic: runtime error: makeslice: len out of range,
,
goroutine 916 [running]:,
github.com/spacemonkeygo/monkit/v3.newSpan.func1(0x0),
	/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.7-0.20200515175308-072401d8c752/ctx.go:147 +0x2e0,
panic(0x8aafc8, 0xa609b8),
	/usr/local/go/src/runtime/panic.go:969 +0x118,
github.com/spacemonkeygo/monkit/v3.newSpan.func1(0x0),
	/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.7-0.20200515175308-072401d8c752/ctx.go:147 +0x2e0,
panic(0x8aafc8, 0xa609b8),
	/usr/local/go/src/runtime/panic.go:975 +0x3c4,
storj.io/storj/storagenode/orders.readLimitAndOrder(0xa6c850, 0x28eae48, 0x2a4d700, 0x2991a00, 0x0, 0x0),
	/go/src/storj.io/storj/storagenode/orders/store.go:536 +0x9c,
storj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1(0x2a6d2d0, 0x69, 0xa777e0, 0x2d1d4d0, 0x0, 0x0, 0x0, 0x0),
	/go/src/storj.io/storj/storagenode/orders/store.go:242 +0x3a4,
path/filepath.walk(0x2a6d2d0, 0x69, 0xa777e0, 0x2d1d4d0, 0x2d34c04, 0x0, 0x0),
	/usr/local/go/src/path/filepath/path.go:360 +0x2fc,
path/filepath.walk(0x2cd2ba0, 0x14, 0xa777e0, 0x2c5c360, 0x2d34c04, 0x0, 0x6729d0),
	/usr/local/go/src/path/filepath/path.go:384 +0x204,
path/filepath.Walk(0x2cd2ba0, 0x14, 0x2d34c04, 0x1c, 0x2a03780),
	/usr/local/go/src/path/filepath/path.go:406 +0xe8,
storj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite(0x2b22840, 0xee55620b, 0xbfda4c05, 0x759ba7c9, 0x3, 0x1046d78, 0x2a037a0, 0x0, 0x0),
	/go/src/storj.io/storj/storagenode/orders/store.go:198 +0xdc,
storj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore(0x2814000, 0xa758f8, 0x2c0ccc0, 0xee55620b, 0xbfda4c05, 0x759ba7c9, 0x3, 0x1046d78),
	/go/src/storj.io/storj/storagenode/orders/service.go:398 +0x300,
storj.io/storj/storagenode/orders.(*Service).SendOrders(0x2814000, 0xa75a78, 0x2a168d8, 0xee55620b, 0xbfda4c05, 0x759ba7c9, 0x3, 0x1046d78),
	/go/src/storj.io/storj/storagenode/orders/service.go:192 +0x168,
storj.io/storj/storagenode/orders.(*Service).Run.func1(0xa75a78, 0x2a168d8, 0xa75a78, 0x2a168d8),
	/go/src/storj.io/storj/storagenode/orders/service.go:139 +0x84,
storj.io/common/sync2.(*Cycle).Run(0x2c58990, 0xa758f8, 0x2a30cc0, 0x2a168c0, 0x0, 0x0),
	/go/pkg/mod/storj.io/common@v0.0.0-20200925121432-61f74bdf4b5c/sync2/cycle.go:92 +0x134,
storj.io/common/sync2.(*Cycle).Start.func1(0x9a9ab0, 0x0),
	/go/pkg/mod/storj.io/common@v0.0.0-20200925121432-61f74bdf4b5c/sync2/cycle.go:71 +0x34,
golang.org/x/sync/errgroup.(*Group).Go.func1(0x2bfe2d0, 0x2b1d640),
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x50,
created by golang.org/x/sync/errgroup.(*Group).Go,
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x50,

What is the version of your node ?

The latest docker version: v1.14.7

Please, do this:

  1. Stop and remove the container
docker stop -t 300 storagenode
docker rm storagenode
  1. Execute this command:
docker rmi storjlabs/storagenode
docker pull storjlabs/storagenode
  1. Then run your storagenode as usual

Sadly it didn’t solve my issue…same error :frowning:

The worst thing is that I don’t understand what could have caused this, because I didn’t change anything.

After checking the source code of the v1.14.7 I see that the problem comes from having an order limit that’s exceed the maximum value of an int32 but I cannot say if a valid order limit can have a size above that value or that happens because there is an invalid/malformed order limit.

The crash comes from https://github.com/storj/storj/blob/e3b8c02b58c90962c3baad410f50a44b64684cb3/storagenode/orders/store.go#L536

In that line a new slice is made and the length of it is set by the above line with limitSize := binary.LittleEndian.Uint32(sizeBytes[:])

What happens there is that Go int type can be 32 or 64 bits depending of the architecture and OS and in your case I’m assuming that’s 32 because despite is a Raspberry Pi4, which is a 64 bits architecture, it’s running a 32 bits OS.

Then when the value a greater value from the max int32 the limitSize variable, which is of uint32 get passed to make builtin function whose second parameter of the type int and that’s when it got coerced to a negative int32, making the make function to panic with panic: runtime error: makeslice: len out of range.

Following the stack trace, the call to readLimitAndOrder (that’s the function that panics) comes from the ListUnsentBySatellite (https://github.com/storj/storj/tree/https://github.com/storj/storj/blob/e3b8c02b58c90962c3baad410f50a44b64684cb3/storagenode/orders/store.go#L242).

We could add a check for limitSize and return if the value gets negative when coercing to int but it turns that we have had some refactorings on such code that may have solved the problem and they are already merged into master.

  1. https://github.com/storj/storj/commit/fbf2c0b242af768ee5c81b2b4eab8fde57f6cba0
  2. https://github.com/storj/storj/commit/02cbf1e72a3be1a3d81543c122d299d154a5135c

They are tagged on the v1.15.1 so I expect them to be in the next official release but I’m going to make sure that they get into.

3 Likes

Hey,

thanks for all your effort and work.
I’m understanding roughly 1/10 of what you were saying, but it’s okay :slight_smile:
If you’re saying I have to wait until the next version, then I think my node gets suspended for beeing offline in the meanwhile…but let’s see.

Regards

2 Likes

It won’t be suspended for being offline because we haven’t activated it yet.

On the other hand, I think that the refactoring may solve the issue because it reuses code that have been there for long time without that kind of problem but because of the amount of changes I cannot ensure it 100%, so :pray: that once you update to the new release you won’t run into this problem anymore.

Thanks for your patience.

3 Likes

@ifraixedes when you say an order limit, do you mean the max concurrent setting in the config yaml

it seems weird to me that a RPI most likely running Rasbian, i mean i run debian, i’m fine…
my storagenode image in docker is the same…i know its all rather complex under the hood… but still

wouldn’t it have to be a settings issue…

@peppoonline
if you got max concurrent set to anything other than 0 in the config.yaml , then try setting it to 0
else try to turn off the device, and literally pull the power cable, wait a good 15sec to 1 minute… and then turn it on again and see if it will run correctly…

a bit of a long shot… and i duno if the max concurrent thing could be the culprit, but it’s really the only thing i can think of in relation to the storagenode when talking about orders.

but maybe somebody else can pitch in on that.

it’s not because i think ifraixedes is wrong, just that there must be something different with his RPI that
makes it doesn’t work when tons of others do on the same software… thats what strikes me as odd

Concurrent Setting is 0

I’ll try the complete power off thing. Sadly this RP4 is offsite for like 4 months now…need to find some “remote hands” for that :wink:
Yeah, like I said, I don’t understand it either…I have two more identical RPs and they work fine…same Storjnode version, same updates and software.

i doubt the power off thing works… but sometimes i’ve had errors i just couldn’t solve and it’s saved my bacon… one of those hail mary’s i duno whats going on, but i’m going to try to do something different…

not sure why it even works… maybe something can get stuck in memory or transistors can be blocked in some odd way, … who knows… like i say it’s worked for me a couple of times… but don’t bother with to much trouble… because it most likely won’t work…

but aside from a flat out reinstall… which is even more hell on a remote device…

1 Like

No, I was looking at the code and see how a slice is created for reading an order limit stored in one of the files.

I have a Raspberry PI too and I haven’t got that issue so far so that’s what it makes me think that it could be because of an invalid/malformed order limit.

1 Like

Not a solution, but if you’re using a Pi with 64 bit hardware, you could change to a 64bit installation as a work around (likely technically a better move anyway). You can get a 64 bit ubuntu install or hunt down a beta 64 bit build for raspbian.

1 Like

@Doom4535
without about 64bit should always be preferred, there is just such a bit leap for some tasks, ofc it does require the cpu to support 64bit… which i suppose most do, but not sure…

@peppoonline
did you get your issue resolved?

1 Like

Hey guys, no it didn’t get resolved.
“ifraixedes” told me to wait for the next version…that’s all I could do :frowning:

@peppoonline If you have an extra uSD card, you could setup a 64bit install and then pop in the old one later to see if a newer docker version fixes your issue.

yeah i would try to do a reinstall of the OS just remember not to format the storagenode and keep your identity files… if you haven¨t placed them in the storagenode folder then i can recommend doing that.

i think gambling that a new storagenode version will fix the issue is a long shot at best unless if there was a specific problem that was solved… which to my understanding there wasn’t…

i would proceed assuming that there is something different about your configuration and try to figure out what that is…

an OS reinstall should be fairly basic, ofc you only had remote access to the RPI which could be a problem…

Hi @peppoonline,

Please try this

It’s something that I thought yesterday when I knew that the next release will take a bit longer due to having to fix some issues that our QA team has found.

Sorry :pray: I forgot about updating this post.

4 Likes