DevFest Pisa 2024 - Is Your Model Private?

Is your model private? Luca Corbucci https://lucacorbucci.me/

Why should we care about privacy when training ML models?
i.e. What could possibly go wrong?

There exist attacks against Neural Networks

What’s the color of the cat? An attacker wants to
know If a sample was used to Train the model In a Membership Inference Attack,

What’s the color of the cat? 0 45 90 0
1 2 4 0 20 40 0 1 2 4

What’s the color of the cat? 0 45 90 0
1 2 4 0 20 40 0 1 2 4 The model will be more confident when we query it with the image that was in the training dataset

ML models can memorize sensitive information about their training set

The model could leak sensitive information

How can we defend against these threats?

Differential Privacy

Who uses Differential Privacy? https://desfontain.es/blog/real-world-differential-privacy.html !

Differential Privacy (An intuition using databases) Suppose you have two
databases That differs in one single instance

Differential Privacy (An intuition using databases) You query both of
them and you have two different results

Differential Privacy (An intuition using databases) You query both of
them and you have two different results You can infer something about the missing instance

Differential Privacy Differential Privacy allows you to query the databases
adding some randomisation to the answer. (An intuition using databases) You will have (more or less) the same output regardless of the presence of one sample

Differential Privacy (A slightly more advanced definition) P[A( ) =
O] ≤ P[A( ) = O] eϵ Given two databases which differ in only one instance:

P[A( ) = O] ≤ P[A( ) = O] eϵ
tells us how much these two probabilities are similar eϵ is called “privacy budget” and represents an upper bound on how much we can leak information ϵ How to interpret the ϵ Given two databases which differ in only one instance:

Differential Privacy (A more relaxed definition) P[A( ) = O]
≤ P[A( ) = O] +δ eϵ The parameter quantifies the probability that something goes wrong. The algorithm will be differentially private with probability 1 -δ δ Given two databases which differ in only one instance:

Essentially, instead of returning the real output of the query,
we return a noisy output.

Example We want to query our database to know how
many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] 98

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1)

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 97.19888273257044

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 97.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 94.0943263602294

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 97.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 94.0943263602294 What’s the privacy cost here?

many patients have Diabetes >>> df[df["Disease"] == “Diabetes”].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1) 97.19888273257044 >>> df[df["Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_2) 94.0943263602294 >>> int(df[df[“Disease"] == "Diabetes"].shape[0] + rnd.laplace(loc=0, scale=sensitivity/eps_1)) 95 Am I removing DP when I round the result?

How does this change when it comes to neural networks?

Dataset A Dataset B

Dataset A Dataset B Model A Model B

Differential Privacy P[A( ) = ] ≤ P[A( ) =
] eϵ

Differential Privacy P[A( ) = ] ≤ P[A( ) =
] eϵ The outputs of the two neural networks will be similar regardless of the presence of in the dataset

SGD def sgd(): for each batch L_t: for each sample
x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M

SGD DP-SGD def sgd(): for each batch L_t: for each
sample x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M def sgd(): for each batch L_t: for each sample x_i: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M

SGD DP-SGD def dp_sgd(): for each batch L_t: for each
sample x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient() g_t = average of clipped gradients + Noise M = M - lr * g_t Return M def sgd(): for each batch L_t: for each sample x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M

SGD DP-SGD def dp_sgd(): for each batch L_t: for each
sample x_i: g_t(x_i) = compute_gradient(M, x_i) g_t(x_i) = clip_gradient(C) g_t = average of clipped gradients + Noise M = M - lr * g_t Return M def sgd(): for each batch L_t: for each sample x_i in the batch: g_t(x_i) = compute_gradient(M, x_i) g_t = average of gradients M = M - lr * g_t Return M This can be Gaussian Noise 𝒩(0, σ2C2I)

Does this look scary?

Luckily, there are implementations of DP-SGD both for PyTorch and
Tensorflow

Differentially Private NN are just a wrapper away * *
if you carefully choose your privacy parameters model, optimizer, train_loader = privacy_engine.make_private_with_epsilon( module=model, # the model you want to train with DP optimizer=optimizer, data_loader=train_loader, epochs=EPOCHS, target_epsilon=EPSILON, # privacy budget target_delta=DELTA, max_grad_norm=MAX_GRAD_NORM, # clipping value )

A few notes on the privacy parameters Choosing the is
a tradeoff between the utility of the model and the privacy we want to guarantee ϵ

A few notes on the privacy parameters If we set
a low we will need to introduce a lot of noise during the training Choosing the is a tradeoff between the utility of the model and the privacy we want to guarantee ϵ ϵ

A few notes on the privacy parameters If we set
a low we will need to introduce a lot of noise during the training This will degrade the model performances! Choosing the is a tradeoff between the utility of the model and the privacy we want to guarantee ϵ ϵ

THANK YOU! Luca Corbucci https://lucacorbucci.me/

References 1) Evaluating and Testing Unintended Memorization in Neural Networks
https:// bair.berkeley.edu/blog/2019/08/13/memorization/ 2) Scalable Extraction of Training Data from (Production) Language Models https://arxiv.org/pdf/2311.17035 3) Membership Inference Attacks against Machine Learning Models https:// arxiv.org/abs/1610.05820 4) A friendly, non-technical introduction to differential privacy https:// desfontain.es/blog/friendly-intro-to-differential-privacy.html 5) Deep Learning with Differential Privacy https://arxiv.org/abs/1607.00133 6) Opacus https://opacus.ai/ 7) Tensorﬂow Privacy https://github.com/tensorﬂow/privacy

References 8) A list of real-world uses of differential privacy
https://desfontain.es/blog/ real-world-differential-privacy.html 9) Improving Gboard language models via private federated analytics https://research.google/blog/improving-gboard-language-models-via- private-federated-analytics/ 10) Learning with Privacy at Scale https://docs-assets.developer.apple.com/ ml-research/papers/learning-with-privacy-at-scale.pdf

DevFest Pisa 2024 - Is Your Model Private?

DevFest Pisa 2024 - Is Your Model Private?

More Decks by Luca Corbucci

Featured

Transcript