This Tardigrade Thursday episode is a podcast with our VP of Engineering JT Olio, and is a discussion about end-to-end encryption.
This Tardigrade Thursday episode is a podcast with our VP of Engineering JT Olio, and is a discussion about end-to-end encrytion.
Next week tune in for a chat about GitBackup with founder Shawn Wilkinson
hey I’m Jocelyn Matthews.I am the community manager at Storj Labs,and with me today I have JT Olio. JT is our vice president of engineering and the topic today is going to be “end-to-end encryption what developers need to know.” So first off JT how are you doing today?
good I’m glad to be here great
encryption is something that a lot of people are thinking about and talking about right now and it has actually some pretty specific definitions so I was hoping that you could shed some light on what encryption is what it means to developers and how we implement it in our platform
sure that’s a great question so typically encryption typically people understand encryption in the sense that – sorry for my dog chewing food in the background we’re all at home-- people typically understand encryption in the sense that it makes it so that your files can’t be read without having some way of decrypting the files and that involves having a key
but the question really comes down to who has the keys where are the keys stored and could someone get a copy of the keys so could someone get their hands on the keys and encrypt or decrypt your data for you
in a lot of services that offer encryption specifically a lot of the existing cloud object storage providers the default is encrypted storage but the key is managed by the company that’s storing the data and when you interact with that service the data that you’re sending to and from them is decrypted and then they store it they encrypt it on their side with keys they manage and so really that means that you know you don’t it’s not it’s only a little bit better than not having it encrypted at all
you don’t really have any control over the data in transit you don’t really have any control over what happens if the company changes their business practices if they start if they decide they want to sell your data encrypted data where you don’t manage
the keys is just another way of adding a checkbox that’s checked that makes you feel secure but potentially isn’t doing is much as you’d think
so end to end encryption is the idea that only you and only the person on the other end of the communication that you’re sending data to and potentially that’s also you again are the only people that have the encryption keys and there’s a lot of trade-offs to that right
like end to end encryption is sometimes not very user friendly and you can sort of see this in terms of how signal the the secure text messaging app has been dealing with you know recovering your phone when you recovering your informational in your phone you get a new phone right like all of your safety numbers change with signal and you don’t necessarily get the same history of all your messages
if you have cloud based encryption where the service manages your keys you can do password recovery you can do all sorts of different neat features to be able to browse and and your your content through a web browser because the server is doing a lot of decryption in those sweet things the the mechanisms to make end to end encryption user friendly are very challenging for us one of the ways that that comes down to is so storage is in to end encrypted the user who’s uploading the data chooses who gets the keys and our service doesn’t get the decryption keys one benefit of that is that we can’t be compromised in a way that will decrypt your data
but the downside is if you lose your key we can’t help you and from a user interface perspective our web interface without a significant higher level of client-side application support than we have the web interface can’t show your files to you because the web interface is being hosted by our server and our server doesn’t have the keys to decrypt your data
so some of these some of these challenges are solvable for in to end encryption key base and signal are making great strides to making user friendly strong encryption tools but they’re hard and not everyone not every company does that a lot of companies take the shortcut of storing the keys and managing the keys themselves
so yeah that’s there’s kind of interesting trade-offs it often comes down to into end encryption feels much more and brittle in the sense that yeah there’s it’s very unforgiving if you lose your keys it’s very unforgiving if you want certain functionality you just can’t do certain functionality within to end encryption without some backflips
and so we’ve decided to make the trade off of the market is already saturated with people who are doing the trick of storing the key for you we’re providing strong in to end encryption where only you have access to your keys that means that if you lose your key we can’t help you but that also means that we can never be compromised in such a way we can never change our business practices we can never decide to sell your data right
ok so I’m hearing a couple of things.
one is that handing off that piece to someone else instead of doing it yourself with end-to-end encryption… there’s a trade-off that happens. in one case you have complete control but you also have complete responsibility and for a lot of people you know it’s worth it to have that kind of control and security and that peace of mind.
the second is that I’d like to unpack a little bit of what you said about the web interface which maybe some people may not be familiar with. when you are going to upload to the platform you encrypt it yourself on your own machine before it ever goes anywhere that’s pretty clear and then you go ahead and you make your API key on the web interface but then everything else you do is in the command line right?
okay so I just wanted to clarify that for anyone who has been interested in the platform but maybe hasn’t got their hands on it and experimented with it yet. and that the initial steps occur in a web browser and it’s specifically because we have these kinds of encryption considerations because we’re not exposing them is that right?
yeah yeah the the sort of the signup flow is you sign up through our web interface the satellite web console just like you would with s3 with Amazon or with Google storage you sign up through your web browser you configure your access but then you can’t actually upload or download data.
okay great different tools for different things. so we passed lightly over some different kinds of encryption and what their trade-offs are. are there any things to spotlight within that, or any specific terms that people should familiarize themselves with what are some of the key concepts that they should be aware of?
sure yeah um the major thing that we do is we do sort of the most recent standard best practice for encryption which is authenticated encryption we our default configuration uses AES 256 GCM authenticated encryption
but we also have support in our code base that currently is not as tunable as we plan to make it but we plan to make it it’ll also allow you to use the Daniel J Bernstein secret box cipher set for doing encryption and decryption and it is an open source project
so of course you know if you believe that there is something we can do to improve our encryption or add something in addition for your encryption needs we certainly would be interested in pull requests
the major things that we’re doing for encryption in sort of the architecture of our system is all data every time you upload an object we choose an encryption key for that object on the client side under your control purely at random
that encryption key is chosen cryptographically randomly and what that allows us to do is to make sure that there’s no sort of connection between how an object gets uploaded and really anything else
so we choose a random encryption key for your data and store the data encrypted with that key into the network and then the metadata about where we put your file where we sprinkle the encrypted grains of sand is to use a metaphor keeping track of where we put it and keeping track of that random key
we then use that data and we encrypt it with data chosen based on a hierarchically determined deterministic encryption mechanism our hierarchical determine a deterministic encryption is similar to what Bitcoin uses for hierarchical wallets it’s not quite the same though because we have paths and the I think VIP 32 or whatever protocol I don’t remember the exact number of hierarchical wallet encryption and key derivation is based on numbers so so we have a hierarchical derivation for paths
so what that means is it allows you to share sub trees sub prefixes of the data that you’ve uploaded into a bucket and give someone access to some small amount of your data to be able to decrypt it without necessarily giving them the keys for everything in your bucket
so we have a rich access control and sharing mechanism that allows you to sort of break down areas of your bucket into smaller sections and then you can just share those you can just share a file you can just share a folder
they’re not technically folders they’re actually since we’re ditching the s3 API they’re actually prefixes delimited with forward slashes for our encryption needs but we actually have keys and prefixes and there isn’t really a concept of folders but can sort of mimic it with the prefixes we support internally secret box we by default support AES GCM
one trade-off of having all of your paths and your filenames encrypted is that our system doesn’t list your files in alphabetical order it lists them in order of what the encrypted pass names were sorted by and so one of the options that we’re considering a sort of reexpose allowing users to if they choose to not encrypt their paths so that they can get paths in alphabetical order if that’s a more important mean to them
but we want to make sure that the default and the the easiest thing to do with our system is to have all of your path and file data remain encrypted
what are some other important terms for people to be familiar with?
yeah I think that there’s a there’s a couple of different things that you can do I mentioned authenticated invention so when when people talk about you know sha-256 hashes those are a way of taking data and breaking it into sort of a fingerprint a signature of that data in a way that’s irreversible you can’t go backwards so we take hashes of a lot of data to make sure that the data doesn’t change while it is in flight while it is stored
we’re very good at being able to authenticate from a cryptographic perspective that the data that you stored with us is the data that you’re getting back and we do that through sha-256 and then yeah like I mentioned our encryption is the AES 256 GCM authenticated encryption and what that means is it’s a best practice for how to encrypt data in a way that includes these hashes so that the hashes are part of the decryption step so that decryption fails if the data is slightly adjusted or incorrect the reason why that’s important is because of many of the encryption ciphers and this is actually part of the thing ongoing with zoom
there’s a lot of questions going on about zooms own practices with cryptography and security zoom uses the ECB cipher mode which without going into some of the other details about that doesn’t have authentication as part of it so if you decrypt data in one of these other cipher modes you can actually just get back garbage and so we’re making sure that the data that you get back is the data that you put there and that it fails to decrypt if something got tampered with there’s lots of tamper-evident seals basically and all the data that gets stored so and then in terms of how that all it you know impacts
we also talked a lot about erasure coding and reed-solomon and it’s worth pointing out that those are actually not enriched encryption mechanisms it’s important that we do encryption first and then with the encrypted data that we have then we use the erasure code reed-solomon that we’re using to make sure that we’re able to reliably break that data up store it on lots of different nodes and reconstitute it once we have some of the original pieces that we stored back so reed-solomon allows us to have a really interesting property which is let’s say Reed-Solomon code is we configure it to be 20/40 or something
I think right now in production we are using 29/80 what that means is for any 80 pieces that we store we only need 29 of them to get the data back and then because the data that we used with Reed Solomon was authenticated with used authenticated encryption once we reconstitute the data just the act of decrypting it confirms that the data that we got back is exactly the data that was stored how widely is made Solomon used
Reed Solomon is actually an old algorithm it’s used in CDs it’s the reason why when you scratch a CD your CD will actually keep working it’s used in satellite communication it’s used like real satellite you not like things orbiting the planet use reed-solomon to make sure that the data that’s transmitted to them isn’t lost so yeah reed-solomon is is old there are some other erasure codes that have interesting properties that actually just came out of patent protection that may be worth pursuing
I think if people have heard of tornado codes or some of these other things there’s more recent techniques on the table we’re using reed-solomon mainly because it’s well understood it’s fast it definitely solves our problems in a good way and we actually have a library
not every reed-solomon library supports actually doing error correction where if some of the pieces are slightly corrupted reed-solomon is actually able to detect and correct those pieces and our reed-solomon library that we’re using for go was written by one of our employees that figured out the Birla Camp Welch error correction algorithm and I don’t know of many other libraries that implement that in a way that allows us to do error recovery
many of the reed-solomon libraries essentially just assume that there are no errors in any of the reconstituted pieces and hope for the best with the expectation that you’ve probably done some hashing of your own to confirm that the pieces were accurate before you even did that we do sort of the best of both world’s approach this is kind of how we do audits is how we do recovery and restore its it’s a neat trick
cool okay tell me something that surprises people about end-to-end encryption
yeah I think ultimately whenever I’ve worked with people about in to end encryption it really comes down to i’ve often worked a lot with with people who are trying to use in to end encrypted products and it’s sort of surprising the amount of functionality and features that are impacted by the decision to use in to end encryption
i’ve worked with people before who you know we were at a previous company storing data in an in and in to end encrypted manner and the one of the pieces of data that was getting stored was pictures and the question that this person had was well we’re having trouble we want to show the user all these pictures in a web browser can we just generate thumbnails real quick i’m gonna turns out oh no actually it’s you can’t just go generate thumbnails you can’t just you know that if this was wordpress sure we could make a little thumbnail generation feature but because this isn’t encrypted we need it’s a lot harder you need to get the user to opt in you need the user’s client software to generate the thumbnails we can’t just do it on the server side
you know there’s a lot of really interesting trade-offs in terms of usability functionality and security that get really intertangled it’s not the case that it’s impossible to build a really user-friendly product that has into end encryption but just sort of reading some of the experience reports about how for instance signal is going about some of their persistence of the social network of users using signal is really eye-opening in terms of just kind of how hard some of these problems are
so it really comes down when people are often surprised about in end to end encryption it’s really just about wait why did why is this simple thing so hard and it really if you if it’s a product decision to do in to end encryption you just have to jump into it with both eyes wide open about okay this is a trade off we’re making they’re going to be greater challenges a technical perspective but if you’re got the appetite to solve them they can be really fun
so I’m hearing a lot of talk about trade-offs and considerations how important is it to make that decision upfront about how you’re handling encryption . is it something that needs to be factored in before you start a project is it something that
because I think that a lot of developers really just we need to get something done we need to get something up and running and happening and to be honest encryption and security are just not always front of mind at that like the urgency is to get something done so does that lead to situations where you hit a point where it’s too late to implement these things?
how do you factor that in when weighing it against real-world considerations?
yeah that’s a really great question I think especially when it comes to something like in the end encryption it is really hard to make you after by the time that you already have a shipping product you probably have users depending on features that are the sorts of features that would not necessarily be that easy to implement
so you know to your point it’s really important that this is something that you think about heavily at the architectural level before you get started it’s not really something to bake in after
what’s interesting about data center based cloud object storage is it’s really easy to punt on this problem because you may not manage all the servers you have this problem where you know you can just not worry about encryption right away because you manage all this first you’re storing the people’s data they can trust you if they trust you then well you know it’s sort of up to you to sort of secure your perimeter
in a decentralized or distributed storage product like ours you don’t have that choice there’s no perimeter to defend since everything is split up you have absolutely no choice other than to solve strong security and strong encryption from the absolute get-go it’s just table stakes and so it’s sort of interesting but in a way having your data get spread across a lot of different storage nodes actually creates a sort of a harsher environment for a product to succeed in and because of that I think that products that actually work at all are much likely there’s a much higher bar and much more likely to be much more secure
you can’t take shortcuts and so you know once you get the initial sort of proofed out testing of your decentralized storage application the fact that your data is resting potentially under people’s beds as opposed to in a lockdown data center means that your data from a electronic level from a network level is actually probably safer
so it’s just an interesting sort of distinction about how we don’t get the choice of just doing perimeter defense only we have to make every fundamental level of our system be secure all the way throughout the system if this is going to work at all it’s sort of an all-or-nothing table stakes type thing to make sure that you have really strong encryption I’d be much more surprised to see data in an end encrypted decentralized product get lot leaked or lost than data from a cloud operator all you have to do is penetrate the perimeter and once you’re inside the data center if they don’t have good security in there if you can get to the server the data center managed provider manage keys you have access to probably a lot more than just one user’s data
but it is definitely a different level and a different phase and each of these new phases brings with it different assumptions and different attitudes towards security you know even regular laypeople now have very different attitudes towards you know how they handle passwords or how they think of backing things up to where we were 10 years 5 years 3 years ago.
what do you think the evolution to a new going to look like as things like tardigrade keep spreading and become more and more mainstream as more people are using them do you see expectations around security and privacy are shifting in predictable ways
yeah I think no I think that well some some predictable and some not Google has been pushing I mean going back to sort of this concept about perimeter defense versus other defense companies for a long time relied on VPNs to sort of secure their employees and their their internal data there’s been sort of a growing push to rethink that model
Google push there beyond core project which was basically an effort to standardize on security models where you know you’re not just relying on a perimeter VPN for your security and you’ve also seen a lot of people start advocating for password managers or security keys or two-factor authentication and the idea here I mean the reason why you do two-factor authentication at all is because you want you want to make sure that because there’s this sort of growing surface area of how many of your things might be able to be attacked from some hacker nowhere near where you’re living
people have this sort of like strong idea of locality well none of my neighbors are gonna want to get into my email so it’s fine and that works you know for your garage door but the truth is there’s 7 billion people on the planet and with everything connected to the internet all 7 billion of those people if they had internet access could be attempting to get into your account and so the security properties of kind of you know having these like simple walls continues to get more and more complicated people are setting up two-factor authentication people are stopping using VPNs people are starting to think a lot more about peer-to-peer communication versus just sort of how secure you are at a server
and so I think sort of in the long-term view I do think that things are moving more in a inherently decentralized security mindset even if it’s the case that everyone is working for the same company and it’s not actually a decentralized product that you’re working on even just this most recent global disaster that we find ourselves in has pushed a lot of people towards remote work and people aren’t in offices
so like is the same sort of transition as happening is that we’re realizing more and more security has to be baked in at the beginning into everything and you can’t cut a corner and put a fence around something
Thanks thanks so much there’s been a really interesting talk any last words before we sign off
no I yeah thank you for I it seems like the sort of thing I could talk about for a while it’s an exciting topic it’s really fun getting to work at Storj Labs specifically because this is where I feel like this is the type of product and the type of problem that is at the forefront of these issues being a decentralized storage product
obviously having world-class security is table stakes for us um and so it’s exciting to get to work on that and it’s exciting to sort of push the envelope
okay thanks JT it’s been great and I look forward to speaking with you again
yeah thanks you too Jocelyn take care
Thanks for this nice explanation. This podcast gave me an idea of a nice explainer video you could do. Encryption is tightly connected with access control. Some of the concepts were mentioned in this video, but how to set up access control can be a bit abstract as the web interface only lets you generate API keys, but not set up specific access control.
I’d love to see a video that explains shortly how hierarchical access control works using macaroons and then shows how you can create limited access keys for specific folders or write only access etc. To be honest, I haven’t needed this yet and I know there is some documentation, but most of the “getting started” docs skip over this part and a little explainer video can bring across the message much more quickly.
Thanks @brightsilence yes we have a “restrictive macaroons” tutorial on the to-do list - I agree its something people will be interested in. We also have an upcoming demo video on writing a sample app that uploads and serves a multimedia file that I think folks will like.