Building block: Kaplan-Meier¶
Inspired by the work done in the CONVINCED project. For more information see https://www.tno.nl/en/tno-insights/articles/privacy-friendly-data-technology-expands-oncology-research-opportunities/.
This building block is included in the TNO MPC Python Toolbox.
Protocol description¶
A more elaborate protocol description can be found in CONVINCED – Enabling privacy-preserving survival analyses using Multi-Party Computation. In ERCIM News 126 (July 2021), we presented some extra context.
Figure 1. The protocol to securely compute the log-rank statistic for vertically-partitioned data. One party (Blue) owns data on patient groups, the other party (Orange) owns data on event times (did the patient experience an event ‘1’ or not ‘0’, and when did this occur). Protocol outline: Blue encrypts its data using additive homomorphic encryption and the encrypted data is sent to Orange. Orange is able to securely, without decryption, split its data in the patient groups specified by Blue (1) using the additive homomorphic properties of the encryptions. Orange performs some preparatory, local, computations (2) and with the help of Blue secret-shares the data (3) between Blue, Orange and Purple, where Purple is introduced for efficiency purposes. All parties together securely compute the log-rank statistic associated with the (never revealed) Kaplan-Meier curves (4) and only reveal the final statistical result (5).
Install¶
Install the tno.mpc.protocols.kaplan_meier package using one of the following options.
Personal access token
Deploy tokens
Cloning this repo (developer mode)
Personal access token¶
Generate a personal access token with
read_api
scope. Instruction are found here.Install
python -m pip install tno.mpc.protocols.kaplan_meier --extra-index-url https://__token__:<personal_access_token>@ci.tno.nl/gitlab/api/v4/projects/7749/packages/pypi/simple
Deploy tokens¶
Generate a deploy token with
read_package_registry
scope. Instruction are found here.Install
python -m pip install tno.mpc.protocols.kaplan_meier --extra-index-url https://<GITLAB_DEPLOY_TOKEN>:<GITLAB_DEPLOY_PASSWORD>@ci.tno.nl/gitlab/api/v4/projects/7749/packages/pypi/simple
Dockerfile¶
FROM python:3.8
ARG GITLAB_DEPLOY_TOKEN
ARG GITLAB_DEPLOY_PASSWORD
RUN python -m pip install tno.mpc.protocols.kaplan_meier --extra-index-url https://$GITLAB_DEPLOY_TOKEN:$GITLAB_DEPLOY_PASSWORD@ci.tno.nl/gitlab/api/v4/projects/7749/packages/pypi/simple
Usage¶
The protocol is asymmetric. To run the protocol you need to run three separate instances.
example_usage.py
""" Example usage for performing Kaplan-Meier analysis Run three separate instances e.g., $ ./script/example_usage.py -M3 -I0 -p Alice $ ./script/example_usage.py -M3 -I1 -p Bob $ ./script/example_usage.py -M3 -I2 -p Helper All but the last argument are passed to MPyC. """ import argparse import asyncio import lifelines import pandas as pd from tno.mpc.communication import Pool from tno.mpc.protocols.kaplan_meier import Alice, Bob, Helper def parse_args(): parser = argparse.ArgumentParser() parser.add_argument( "-p", "--player", help="Name of the sending player", type=str, required=True ) args = parser.parse_args() return args async def main(player_instance): await player_instance.start_protocol() if __name__ == "__main__": # Parse arguments and acquire configuration parameters args = parse_args() player = args.player parties = { "Alice": {"address": "127.0.0.1", "port": 8080}, "Bob": {"address": "127.0.0.1", "port": 8081}, } test_data = pd.DataFrame( { "time": [3, 5, 6, 8, 10, 14, 14, 18, 20, 22, 30, 30], "event": [1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1], "Group A": [1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0], "Group B": [0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1], } ) if player in parties.keys(): port = parties[player]["port"] del parties[player] pool = Pool() pool.add_http_server(port=port) for name, party in parties.items(): assert "address" in party pool.add_http_client( name, party["address"], port=party["port"] if "port" in party else 80 ) # default port=80 if player == "Alice": event_times = test_data[["time", "event"]] player_instance = Alice( identifier=player, data=event_times, pool=pool, ) elif player == "Bob": groups = test_data[["Group A", "Group B"]] player_instance = Bob( identifier=player, data=groups, pool=pool, ) elif player == "Helper": player_instance = Helper(player) else: raise ValueError(f"Unknown player was provided: '{player}'") loop = asyncio.get_event_loop() loop.run_until_complete(main(player_instance)) print("-" * 32) print(player_instance.statistic) print("-" * 32) # Validate results event_times = test_data[["time", "event"]] groups = test_data[["Group A", "Group B"]] print( lifelines.statistics.multivariate_logrank_test( event_times["time"], groups["Group B"], event_times["event"] ) ) print("-" * 32)