Known issues we are working on

I would like to avoid that we waste our time testing things that are known issues. Here is a list of issues that we are aware of and currently working on.

Zombie Segments
Fixed with v0.33.4. Please let us know if you still have problems with zombie segments.

Open previous summary

We split file uploads into 64M segments. A file with lets say 3 segments will create a s0, s1 and l segment in the database. The l segment is the last segment and it is the important one. If you call a listing it will search in the database for all l segments. Zombie segments don’t have a l segment. This can happen by canceling an upload after the first few segments. Zombie segments can’t be listed and they can’t be deleted. If you try to upload a file with the same path it will error out because you can’t overwrite the zombie segment. The only way around it is uploading the file to a different path.
We have a zombie segment reaper but even if we would call it every day it will only delete zombie segments that are older than 3 days. It is a cleanup job but not a solution. The developer team is working on it. I will keep you updated.

Slow Deletes
Fixed with v0.31.12 but we have a new bug now. The satellite is to slow to communicate with all storage nodes and will drop most of it. GC has to handle it.

Open previous summary

We already moved the delete messages from the uplink to the satellite. Now the uplink only has to tell the satellite to delete a file and the satellite will contact the storage nodes. The performance is better but still not as good as we would like to have it. We are working on it. The satellite has to return as quickly as possible and send the delete messages in the background.

Uplink fast at the beginning slow at the end
The uplink is currently unable to track the upload and download speed. It is tracking only the speed the file gets read into buffer. At the beginning of a segment the buffer gets filled quickly and at the end of the segment it looks like no progress while the buffer still gets drained and send into the network. We are working on it but we might not be able to fix it in time. We understand that the uplink progressbar is confusing but it would be even worse if we remove it. Please use external tools to watch the transfer speed.

Timeouts and slow satellite responses
Fixed with v0.31.12

Open previous summary

We are working on that one as well but I don’t have a good description of the problem at the moment.

Upload fails with less than 80 pieces
Fixed with v0.31.12 in combination with a few other bugs that still needs to be fixed.

Open previous summary

If you are using an old bucket that was created with an old uplink you are starting the upload with old reed solomon settings. In the next release we will change the behavior on the satellite side and make sure we ignore the reed solomon settings that are stored with the bucket. The new setting should be 110 instead of 95. This will give us a higher error tolerance. We are also hunting down the bad storage nodes and try to fix some of the errors messages they are returning. The next release will stop some of these nodes from starting.
I am not sure if it is a good idea to wait for the next release. If you are affected by this issue please make sure you are using the latest uplink and create a new bucket with that uplink. If possible please run uploads with log level debug and give us the output. It will contain all the storage nodes errors.

Bandwidth accounting delayed by 4 days
Customer and storage node bandwidth accounting is delayed by 4 days.

Speed of graceful exit
Fixed with v0.31.12. Graceful exit is able to move data quickly. We have some tickets to improve the performance a bit more but at the current state the speed is acceptable.

Open previous summary

Should also get better with the next release. I have changed most of the graceful exit settings. I wouldn’t say it is fixed. I only try to get the maximum speed from the current graceful exit implementation. My hope is that this will be enough to ignore the limitation of the current implementation for the moment.

Billing includes download overhead
For fast download speed the uplink downloads more pieces than required to reconstruct the file. With the current reed solomon settings and the implementation of the uplink the overhead is up to 30%. This additional traffic will show up in billing. We promised to be halve the price of other cloud storage providers and are looking into other download implementation with similar speed but less overhead.

Graceful exit node receives audit requests after deleting a piece
The graceful exit node is deleting the piece too early before the satellite is aware of the new owner.

Garbage collection deletes unpaid data after 7 days
Uplinks don’t have to send delete messages to the storage nodes. The satellite keeps the deleted segments in memory and contacts the storage nodes in the background. At the moment the satellite is unable to drain the queue fast enough. Garbage collection will kick in even if the storage node was online all the time and didn’t miss any delete messages. The storage node will keep the unpaid data for additional 7 days. The 7 day delay is intentional and we are not going to change that. I have created an issue to make sure the satellite is able to send out the delete messages quickly and don’t fall back on garbage collection.

Project limits not rest
The project limits are getting reset after 30 days which is not the end of the month.

1 TB coupon missing
The 1 TB coupon is only getting triggered with a 50$ STORJ transaction or more. 2 transaction with 30$ each will not trigger it. Credit card should always trigger it.

Low request limit
By default each project is limited to 10 requests. The limit is too low. We are running tests to find a better value.

18 Likes

How can node owners detect if they are one of the issues?

The storage node will stop working at some point. In the log message you will find the reason for it.

8 posts were split to a new topic: I had my node simply stop working with just download and upload requests

Updated with the current situation after v0.31.12

4 Likes

What Issues are now fixed ? And what are the Issues to solve for go to production?

1 Like

This is likely to be published with the changelog for the next release. I would not expect an update here until the coming release is out.

2 Likes

When might that be? :slight_smile:

Still a ton to do, keep it going, folks.

When is the next release the Aha roadmap is not actual

1 Like

We don’t yet have an ETA on the next release at the moment

@littleskunk Could you please add “DNS resolve issue” to the list?
I see the fix but not see this problem on the list.

2 Likes

This thread should be pinned.

Great, so they actually implemented that! That should significantly improve the speed!

Hi @jensamberg I spoke with our product manager @brandon. He said there are actually 3 new roadmaps coming out. So in the next few weeks they should be visible. Thanks for paying attention, we love when the community does that!

1 Like

Updated to reflect the current version v0.33.4