Building block: Secure Inner Join¶
Inspired by the work done in the BigMedilytics project. For more information see https://youtu.be/hvBb80eXuZg.
This building block is included in the TNO MPC Python Toolbox.
Protocol description¶
A visual representation of the protocol is shown below.
Install¶
Install the tno.mpc.protocols.secure_inner_join package using one of the following options.
Personal access token
Deploy tokens
Cloning this repo (developer mode)
Personal access token¶
Generate a personal access token with
read_api
scope. Instruction are found here.Install
python -m pip install tno.mpc.protocols.secure_inner_join --extra-index-url https://__token__:<personal_access_token>@ci.tno.nl/gitlab/api/v4/projects/7690/packages/pypi/simple
Deploy tokens¶
Generate a deploy token with
read_package_registry
scope. Instruction are found here.Install
python -m pip install tno.mpc.protocols.secure_inner_join --extra-index-url https://<GITLAB_DEPLOY_TOKEN>:<GITLAB_DEPLOY_PASSWORD>@ci.tno.nl/gitlab/api/v4/projects/7690/packages/pypi/simple
Dockerfile¶
FROM python:3.8
ARG GITLAB_DEPLOY_TOKEN
ARG GITLAB_DEPLOY_PASSWORD
RUN python -m pip install tno.mpc.protocols.secure_inner_join --extra-index-url https://$GITLAB_DEPLOY_TOKEN:$GITLAB_DEPLOY_PASSWORD@ci.tno.nl/gitlab/api/v4/projects/7690/packages/pypi/simple
Usage¶
The protocol is asymmetric. To run the protocol you need to run three separate instances.
Note: Identifiers are assumed to be unique.
example_usage.py
""" Example usage for performing secure set intersection Run three separate instances e.g., $ python example_usage.py -p Alice $ python example_usage.py -p Bob $ python example_usage.py -p Henri """ import argparse import asyncio from typing import Optional import pandas as pd from tno.mpc.communication import Pool from tno.mpc.protocols.secure_inner_join import DatabaseOwner, Helper def parse_args(): parser = argparse.ArgumentParser() parser.add_argument( "-p", "--player", help="Name of the sending player", type=str.lower, required=True, choices=["alice", "bob", "henri"], ) args = parser.parse_args() return args async def main(player_instance): await player_instance.run_protocol() if player_instance.identifier in player_instance.data_parties: print("Gathered shares:") print(player_instance.feature_names) print(player_instance.shares) if __name__ == "__main__": # Parse arguments and acquire configuration parameters args = parse_args() player = args.player parties = { "alice": {"address": "127.0.0.1", "port": 8080}, "bob": {"address": "127.0.0.1", "port": 8081}, "henri": {"address": "127.0.0.1", "port": 8082}, } port = parties[player]["port"] del parties[player] pool = Pool() pool.add_http_server(port=port) for name, party in parties.items(): assert "address" in party pool.add_http_client( name, party["address"], port=party["port"] if "port" in party else 80 ) # default port=80 df: Optional[pd.DataFrame] = None if player == "henri": player_instance = Helper( identifier=player, pool=pool, ) else: if player == "alice": df = pd.DataFrame( { "identifier": ["Thomas", "Michiel", "Bart", "Nicole"], "feature_A1": [2, -1, 3, 1], "feature_A2": [12.5, 31.232, 23.11, 8.3], } ) elif player == "bob": df = pd.DataFrame( { "identifier": ["Thomas", "Victor", "Bart", "Michiel", "Tariq"], "feature_B1": [5, 231, 30, 40, 42], "feature_B2": [10, 2, 1, 8, 6], } ) player_instance = DatabaseOwner( identifier=player, data=df.to_numpy(dtype="object"), feature_names=tuple(df.columns[1:]), pool=pool, ) loop = asyncio.get_event_loop() loop.run_until_complete(main(player_instance))
Run three separate instances specifying the players:
$ python example_usage.py -p Alice
$ python example_usage.py -p Bob
$ python example_usage.py -p Henri