Statistics

Content Explanation

In this repo, we include the secure statistics developed for LANCELOT-SELECTED. Specifically, it is a library to perform secure data exploratory analyses on vertically partitioned data and quantifying privacy leakage.

Included scripts are:

  • generate_data.py generates data to be used in the statistics computation

  • utils.py provides utils for scripts

  • run_statistics.py runs the desired statistic

Install

Install the tno.mpc.mpyc.statistics package using one of the following options.

  • Personal access token

  • Deploy tokens

  • Cloning this repo (developer mode)

Personal access token

  1. Generate a personal access token with read_api scope. Instruction are found here.

  2. Install

    python -m pip install tno.mpc.mpyc.statistics --extra-index-url https://__token__:<personal_access_token>@ci.tno.nl/gitlab/api/v4/groups/3209/-/packages/pypi/simple
    

Deploy tokens

  1. Generate a deploy token with read_package_registry scope. Instruction are found here.

  2. Install

    python -m pip install tno.mpc.mpyc.statistics --extra-index-url https://<GITLAB_DEPLOY_TOKEN>:<GITLAB_DEPLOY_PASSWORD>@ci.tno.nl/gitlab/api/v4/groups/3209/-/packages/pypi/simple
    

Dockerfile

FROM python:3.8

ARG GITLAB_DEPLOY_TOKEN
ARG GITLAB_DEPLOY_PASSWORD

RUN python -m pip install tno.mpc.mpyc.statistics --extra-index-url https://$GITLAB_DEPLOY_TOKEN:$GITLAB_DEPLOY_PASSWORD@ci.tno.nl/gitlab/api/v4/groups/3209/-/packages/pypi/simple

Example Usage

import numpy as np
from mpyc.runtime import mpc
from tno.mpc.mpyc.statistics import covariance


secnum = mpc.SecFxp(l=64, f=32)


def get_mpc_data(row_1, row_2):
    row_1_mpc = [secnum(x) for x in row_1]
    row_2_mpc = [secnum(y) for y in row_2]
    return row_1_mpc, row_2_mpc


def distribute_data_over_players(row_1_mpc, row_2_mpc):
    row_1_mpc_shared = mpc.input(row_1_mpc, senders=0)
    row_2_mpc_shared = mpc.input(row_2_mpc, senders=0)
    return row_1_mpc_shared, row_2_mpc_shared


async def covariance_example():
    print("Covariance example")

    row_1 = [1.0, 3.0, 2.0, 1.0, 5.0, 6.0, 3.0]
    row_2 = [2.0, 11.0, 9.0, 0.0, 8.0, 2.0, 2.1]

    row_1_np = np.array(row_1)
    row_2_np = np.array(row_2)

    row_1_mpc, row_2_mpc = get_mpc_data(row_1_np, row_2_np)

    async with mpc:
        row_1_mpc_shared, row_2_mpc_shared = distribute_data_over_players(
            row_1_mpc, row_2_mpc
        )

    secure_cov = covariance(row_1_mpc_shared, row_2_mpc_shared)
    revealed_cov = await mpc.output(secure_cov)

    np_cov = np.cov(row_1, row_2)[0][1]

    print("Secure Covariance: ", revealed_cov)
    print("Numpy Covariance:", np_cov)


if __name__ == "__main__":
    mpc.run(covariance_example())

Indices and tables