Statistics¶
Content Explanation¶
In this repo, we include the secure statistics developed for LANCELOT-SELECTED. Specifically, it is a library to perform secure data exploratory analyses on vertically partitioned data and quantifying privacy leakage.
Included scripts are:
generate_data.py
generates data to be used in the statistics computationutils.py
provides utils for scriptsrun_statistics.py
runs the desired statistic
Install¶
Install the tno.mpc.mpyc.statistics package using one of the following options.
Personal access token
Deploy tokens
Cloning this repo (developer mode)
Personal access token¶
Generate a personal access token with
read_api
scope. Instruction are found here.Install
python -m pip install tno.mpc.mpyc.statistics --extra-index-url https://__token__:<personal_access_token>@ci.tno.nl/gitlab/api/v4/groups/3209/-/packages/pypi/simple
Deploy tokens¶
Generate a deploy token with
read_package_registry
scope. Instruction are found here.Install
python -m pip install tno.mpc.mpyc.statistics --extra-index-url https://<GITLAB_DEPLOY_TOKEN>:<GITLAB_DEPLOY_PASSWORD>@ci.tno.nl/gitlab/api/v4/groups/3209/-/packages/pypi/simple
Dockerfile¶
FROM python:3.8
ARG GITLAB_DEPLOY_TOKEN
ARG GITLAB_DEPLOY_PASSWORD
RUN python -m pip install tno.mpc.mpyc.statistics --extra-index-url https://$GITLAB_DEPLOY_TOKEN:$GITLAB_DEPLOY_PASSWORD@ci.tno.nl/gitlab/api/v4/groups/3209/-/packages/pypi/simple
Example Usage¶
import numpy as np
from mpyc.runtime import mpc
from tno.mpc.mpyc.statistics import covariance
secnum = mpc.SecFxp(l=64, f=32)
def get_mpc_data(row_1, row_2):
row_1_mpc = [secnum(x) for x in row_1]
row_2_mpc = [secnum(y) for y in row_2]
return row_1_mpc, row_2_mpc
def distribute_data_over_players(row_1_mpc, row_2_mpc):
row_1_mpc_shared = mpc.input(row_1_mpc, senders=0)
row_2_mpc_shared = mpc.input(row_2_mpc, senders=0)
return row_1_mpc_shared, row_2_mpc_shared
async def covariance_example():
print("Covariance example")
row_1 = [1.0, 3.0, 2.0, 1.0, 5.0, 6.0, 3.0]
row_2 = [2.0, 11.0, 9.0, 0.0, 8.0, 2.0, 2.1]
row_1_np = np.array(row_1)
row_2_np = np.array(row_2)
row_1_mpc, row_2_mpc = get_mpc_data(row_1_np, row_2_np)
async with mpc:
row_1_mpc_shared, row_2_mpc_shared = distribute_data_over_players(
row_1_mpc, row_2_mpc
)
secure_cov = covariance(row_1_mpc_shared, row_2_mpc_shared)
revealed_cov = await mpc.output(secure_cov)
np_cov = np.cov(row_1, row_2)[0][1]
print("Secure Covariance: ", revealed_cov)
print("Numpy Covariance:", np_cov)
if __name__ == "__main__":
mpc.run(covariance_example())