Machine learning on encrypted data
The O’Reilly Data Show Podcast: Alon Kaufman on the interplay between machine learning, encryption, and security.
In this episode of the Data Show, I spoke with Alon Kaufman, CEO and co-founder of Duality Technologies, a startup building tools that will allow companies to apply analytics and machine learning to encrypted data. In a recent talk, I described the importance of data, various methods for estimating the value of data, and emerging tools for incentivizing data sharing across organizations. As I noted, the main motivation for improving data liquidity is the growing importance of machine learning. We’re all familiar with the importance of data security and privacy, but probably not as many people are aware of the emerging set of tools at the intersection of machine learning and security. Kaufman and his stellar roster of co-founders are doing some of the most interesting work in this area.
Here are some highlights from our conversation:
Running machine learning models on encrypted data
Four or five years ago, techniques for running machine learning models on data while it’s encrypted were being discussed in the academic world. We did a few trials of this and although the results were fascinating, it still wasn’t practical.
… There have been big breakthroughs that have led to it becoming feasible. A few years ago, it was more theoretical. Now it’s becoming feasible. This is the right time to build a company. Not only because of the technology feasibility but definitely because of the need in the market.
From inference to training
A classical example would be model inference. I have data; you have some predictive model. I want to consume your model. I’m not willing to share my data with you, so I’ll encrypt my data; you’ll apply your model to the encrypted data, so you’ll never see the data. I will never see your model. The result that comes out of this computation, which is encrypted as well, will be decrypted only by me, as I have the key. This means I can basically utilize your predictive insight, you can sell your model, and no one ever exchanged data or models between the parties.
… The next frontier of research is doing model training with these type of technologies. We have some great results, and there are others who are starting to do and implement some things in hardware. … Some of our recent work around applying deep learning to encrypted data combines different methods. Homomorphic encryption has its pros and cons; secure multi-party computation has other advantages and disadvantages. We basically mash various methods together to derive very, very interesting results. … For example, we have applied algorithms to genomic data at scale and we obtained impressive performance.
Related resources:
- Sharad Goel and Sam Corbett-Davies on “Why it’s hard to design fair machine learning models”
- Chang Liu on “How privacy-preserving techniques can lead to more robust machine learning models”
- “How to build analytic products in an age when data privacy has become critical”
- “Data collection and data markets in the age of privacy and machine learning”
- “What machine learning means for software development”
- “Lessons learned turning machine learning models into real products and services”