Can Machine Learning protect privacy?

People often blame AI (artificial intelligence) for helping Big Brother intrude on our privacy, and it's undeniably a weapon of mass intrusion, but it could also do some good.

Disclaimer

Some people think AI consists of using computers and machines to mimic the problem-solving and decision-making capabilities of the human mind.

Some people think it's a way to replace humans inputs in various tasks such as communicating with customers online or playing games.

Some people confuse Machine Learning (ML) with AI, while ML is only a branch of AI.

There are different points of view on this topic, and if you don't have the same vision and definition of AI, it's hard to have a constructive and instructive debate.

Let's consider in this post only a subfield of ML: federated learning.

The danger of mobile keyboards

Mobile phones are not the best devices for privacy (Do you like that euphemism? ^^). By default, they expose your location, and many other critical information.

You have to uncheck a massive amount of default settings and sometimes even to use custom ROM of Android and iOS to get rid of trackers.

However, mobile keyboards are probably the worst. On both Android and iOS, you can customize your keyboard by installing third-party solutions with various features.

I bet you already know that, but do you know most third-party keyboards send those data over the internet?

The keyboard has access to everything you type: private conversations, passwords, credit card numbers, etc. Some third-party applications require a ridiculously large amount of permissions, including access to confidential information to run correctly.

To provide better text prediction and autocorrection, some developers (not all, of course) abuse this access and process sensitive data like any other data.

As a result, those data get sometimes stolen, and it's unacceptable.

Better privacy often means fewer features

Some people argue it's not a big deal. If you don't want to risk anything, then switch off some features and keep calm.

There are two main problems with that argument:

Not all third-party keyboards are clear about which feature require processing in the cloud and which feature can be used safely
You have to lose great features to keep a minimum of privacy

It gives this paradox: Google and Microsoft's default keyboards are more trustworthy than some third-party applications.

Is that the best we can do? Do we need to keep the default keyboard?

The answer is yes, but some approaches might help to solve this problem.

Federated learning to the rescue

According to Google:

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do Machine Learning from the need to store the data in the cloud

Source: Google AI blog - Federated Learning

Instead of sending all kinds of data anywhere, including critical information, the program summarizes what it has learned and sends recommended changes to the model.

All personal data stay on the phone. Federated learning does not come without technical challenges, but it looks way better for privacy from the user's perspective.

This way, companies can use multiple local datasets without exchanging user's data. Unlike with the standard ML models, you don't need to centralize data on one server. You don't need any hub.

There's no need to send what is typed or spoken by the user.

How can it even work?

Federated learning has quite the same principle as ML. The idea is to train an algorithm.

ML uses what it's called artificial neural networks (ANNs). Roughly speaking, they are structures that mimick the way biological neurons communicate with each other.

Source: IBM Cloud Learn Hub

You often find the term "node" instead of "neuron". Federated learning involves nodes exchanging information but not explicit data. Then the game consists of using all local models to create a global model.

Wrap up

There are ways to use Machine Learning without compromising the data of millions of users.

Alternative approaches such as Federated learning have a great potential for data privacy and security, IMHO.

Photo by Jason Dent on Unsplash