Building block: Kaplan-Meier

Inspired by the work done in the CONVINCED project. For more information see https://www.tno.nl/en/tno-insights/articles/privacy-friendly-data-technology-expands-oncology-research-opportunities/.

This building block is included in the TNO MPC Python Toolbox.

Protocol description

A more elaborate protocol description can be found in CONVINCED – Enabling privacy-preserving survival analyses using Multi-Party Computation. In ERCIM News 126 (July 2021), we presented some extra context.

Kaplan-Meier High Level Overview

Figure 1. The protocol to securely compute the log-rank statistic for vertically-partitioned data. One party (Blue) owns data on patient groups, the other party (Orange) owns data on event times (did the patient experience an event ‘1’ or not ‘0’, and when did this occur). Protocol outline: Blue encrypts its data using additive homomorphic encryption and the encrypted data is sent to Orange. Orange is able to securely, without decryption, split its data in the patient groups specified by Blue (1) using the additive homomorphic properties of the encryptions. Orange performs some preparatory, local, computations (2) and with the help of Blue secret-shares the data (3) between Blue, Orange and Purple, where Purple is introduced for efficiency purposes. All parties together securely compute the log-rank statistic associated with the (never revealed) Kaplan-Meier curves (4) and only reveal the final statistical result (5).

Install

Install the tno.mpc.protocols.kaplan_meier package using one of the following options.

  • Personal access token

  • Deploy tokens

  • Cloning this repo (developer mode)

Personal access token

  1. Generate a personal access token with read_api scope. Instruction are found here.

  2. Install

    python -m pip install tno.mpc.protocols.kaplan_meier --extra-index-url https://__token__:<personal_access_token>@ci.tno.nl/gitlab/api/v4/projects/7749/packages/pypi/simple
    

Deploy tokens

  1. Generate a deploy token with read_package_registry scope. Instruction are found here.

  2. Install

    python -m pip install tno.mpc.protocols.kaplan_meier --extra-index-url https://<GITLAB_DEPLOY_TOKEN>:<GITLAB_DEPLOY_PASSWORD>@ci.tno.nl/gitlab/api/v4/projects/7749/packages/pypi/simple
    

Dockerfile

FROM python:3.8

ARG GITLAB_DEPLOY_TOKEN
ARG GITLAB_DEPLOY_PASSWORD

RUN python -m pip install tno.mpc.protocols.kaplan_meier --extra-index-url https://$GITLAB_DEPLOY_TOKEN:$GITLAB_DEPLOY_PASSWORD@ci.tno.nl/gitlab/api/v4/projects/7749/packages/pypi/simple

Usage

The protocol is asymmetric. To run the protocol you need to run three separate instances.

example_usage.py

"""
Example usage for performing Kaplan-Meier analysis
Run three separate instances e.g.,
   $ ./script/example_usage.py -M3 -I0 -p Alice
   $ ./script/example_usage.py -M3 -I1 -p Bob
   $ ./script/example_usage.py -M3 -I2 -p Helper
All but the last argument are passed to MPyC.
"""

import argparse
import asyncio
import lifelines
import pandas as pd

from tno.mpc.communication import Pool

from tno.mpc.protocols.kaplan_meier import Alice, Bob, Helper


def parse_args():
   parser = argparse.ArgumentParser()
   parser.add_argument(
       "-p", "--player", help="Name of the sending player", type=str, required=True
   )
   args = parser.parse_args()
   return args


async def main(player_instance):
   await player_instance.start_protocol()


if __name__ == "__main__":
   # Parse arguments and acquire configuration parameters
   args = parse_args()
   player = args.player
   parties = {
       "Alice": {"address": "127.0.0.1", "port": 8080},
       "Bob": {"address": "127.0.0.1", "port": 8081},
   }

   test_data = pd.DataFrame(
       {
           "time": [3, 5, 6, 8, 10, 14, 14, 18, 20, 22, 30, 30],
           "event": [1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1],
           "Group A": [1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0],
           "Group B": [0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1],
       }
   )

   if player in parties.keys():
       port = parties[player]["port"]
       del parties[player]

       pool = Pool()
       pool.add_http_server(port=port)
       for name, party in parties.items():
           assert "address" in party
           pool.add_http_client(
               name, party["address"], port=party["port"] if "port" in party else 80
           )  # default port=80
       if player == "Alice":
           event_times = test_data[["time", "event"]]
           player_instance = Alice(
               identifier=player,
               data=event_times,
               pool=pool,
           )
       elif player == "Bob":
           groups = test_data[["Group A", "Group B"]]
           player_instance = Bob(
               identifier=player,
               data=groups,
               pool=pool,
           )
   elif player == "Helper":
       player_instance = Helper(player)
   else:
       raise ValueError(f"Unknown player was provided: '{player}'")

   loop = asyncio.get_event_loop()
   loop.run_until_complete(main(player_instance))

   print("-" * 32)
   print(player_instance.statistic)
   print("-" * 32)

   # Validate results
   event_times = test_data[["time", "event"]]
   groups = test_data[["Group A", "Group B"]]
   print(
       lifelines.statistics.multivariate_logrank_test(
           event_times["time"], groups["Group B"], event_times["event"]
       )
   )
   print("-" * 32)

Indices and tables