Cristina BUTUCEA (CREST-ENSAE), 3 décembre 2021 à 11H30

mercredi 1er décembre 2021
par  Alain Celisse

Local differential privacy has prevailed as the most convenient formalism
to randomize sensitive data via privacy mechanisms (that are Markov
kernels) submitted to some constraints. We address the problem of support
recovery of the sparse mean of a $d-$dimensional Gaussian vector,
observed independently $n$ times, under the additional constraints that we have to
produce and use only $\alpha-$locally differentially private data for
inference. We provide lower and upper bounds on the rate of convergence
for the expected Hamming loss over classes of sparse vectors whose non-zero
coordinates are separated from 0 by a constant $a>0$.

We derive necessary and sufficient conditions (up to log factors) for
support recovery. When we restrict our attention to non-interactive
mechanisms that act independently on each coordinate our lower bound shows
that, contrary to the non-private setting, both exact and almost full
recovery are impossible whatever the value of $a$ in the high-dimensional
regime such that $n \alpha^2/ d^2\lesssim 1$. However, in the regime
$n\alpha^2/d^2\gg \log(n\alpha^2/d^2)\log(d)$ we can exhibit a critical
value $a^*$ (up to a logarithmic factor) where a phase transition occurs.
These results can be improved when allowing for all non-interactive
mechanisms that act globally on all coordinates, in the sense that phase
transitions occur at lower levels.