secure_inner_join.database_owner module

Module contains DatabaseOwner class (either Alice or Bob) for performing secure set intersection

class secure_inner_join.database_owner.DatabaseOwner(*args, identifiers, data, identifiers_phonetic=None, identifiers_phonetic_exact=None, identifier_date=None, identifier_zip6=None, feature_names=(), paillier_scheme=<tno.mpc.encryption_schemes.paillier.paillier.Paillier object>, randomness_length=64, phonetic_algorithm=<function phonem_encode>, lsh_slices=1000, hash_fun=<function sha256_hash_digest>, **kwargs)[source]

Bases: Player

Class for a database owner

class Collection(feature_names=<factory>, intersection_size=None, paillier_scheme=<factory>, randomness=<factory>, share=None)[source]

Bases: object

Nested data class to store received data

feature_names: Dict[str, Tuple[str, ...]]
intersection_size: Optional[int] = None
paillier_scheme: Dict[str, Paillier]
randomness: Dict[str, int]
share: Optional[ndarray[Any, dtype[object_]]] = None
__init__(*args, identifiers, data, identifiers_phonetic=None, identifiers_phonetic_exact=None, identifier_date=None, identifier_zip6=None, feature_names=(), paillier_scheme=<tno.mpc.encryption_schemes.paillier.paillier.Paillier object>, randomness_length=64, phonetic_algorithm=<function phonem_encode>, lsh_slices=1000, hash_fun=<function sha256_hash_digest>, **kwargs)[source]

Initializes a database owner instance

Parameters:
  • identifiers (ndarray[Any, dtype[Any]]) – identifiers to find exactly matching data for

  • data (ndarray[Any, dtype[Any]]) – attributes (feature values) that will end up in the secure inner join

  • identifiers_phonetic (Optional[ndarray[Any, dtype[Any]]]) – identifiers to find matching data for that can contain phonetic errors

  • identifiers_phonetic_exact (Optional[ndarray[Any, dtype[Any]]]) – exact identifiers to append to phonetic encoding

  • identifier_date (Optional[ndarray[Any, dtype[Any]]]) – identifier to find matching data for that can contain erroneous date (of birth). Should be of the form dd-mm-yyyy

  • identifier_zip6 (Optional[ndarray[Any, dtype[Any]]]) – identifier to find matching data for that can contain erroneous zip6 code. Should be of the form 1234AB

  • feature_names (Tuple[str, ...]) – optional names of the shared features

  • paillier_scheme (Paillier) – Instance of a Paillier scheme.

  • randomness_length (int) – number of bits for shared randomness salt

  • phonetic_algorithm (Callable[[str], str]) – phonetic algorithm (function) to use for phonetic matching

  • lsh_slices (int) – number of slices/hyperplanes to construct for LSH hashing, higher number results in higher accuracy

  • hash_fun (Callable[[bytes], bytes]) – hash function used (default sha256).

Raises:

ValueError – raised when helper or data parties are not in the pool.

encode_lsh_data()[source]

Encode the Locality-Sensitive Hashing identifiers of the dataset

Return type:

None

encode_phonetic_data()[source]

Encode and hash the phonetic identifiers of the dataset

Return type:

None

encrypt_data()[source]

Encrypts the own data, by hashing the identifier column using the shared randomness, and by Paillier encrypting the feature values.

Return type:

None

property feature_names: Tuple[str, ...]

The feature names of the inner join (same order for all data parties).

Returns:

Tuple of feature names.

generate_shares()[source]

Generates random additive shares for all other data parties.

Return type:

None

hash_data()[source]

Hash the identifiers of the dataset using the shared randomness.

Return type:

None

property intersection_size: int

The intersection size as was determined by the helper.

Returns:

Intersection size.

Raises:

ValueError – raised when there is no intersection size available yet.

async receive_all_feature_names()[source]

Receive the feature names of all other data parties.

Return type:

None

async receive_all_paillier_schemes()[source]

Receive the Paillier schemes of all other parties, thereby making encryption with their public keys possible.

Return type:

None

async receive_all_randomness()[source]

Receive randomness from other data_owner to be used in the salted hash

Return type:

None

async receive_and_verify_data_parties()[source]

Receive all data parties with accompanying addresses and ports from the helper and verify if it exactly (including order) matches the own data parties tuple.

Raises:

ValueError – In case the data parties do not match exactly (including order).

Return type:

None

async receive_intersection_size()[source]

Receive the computed intersection size from the helper party.

Return type:

None

async receive_share()[source]

Receive an additive share of your own feature values (columns)

Return type:

None

property received_paillier_schemes: Dict[str, Paillier]

The received Paillier schemes of all data parties.

Returns:

A dictionary mapping data party identifiers to Paillier schemes.

Raises:

ValueError – Raised when all Paillier schemes have not yet been received.

async run_protocol()[source]

Run the entire protocol, start to end, in an asynchronous manner

Return type:

None

async send_encrypted_data()[source]

Send the encrypted data to the helper

Return type:

None

async send_feature_names_to_all()[source]

Send the feature names of the own dataset to all other data parties

Return type:

None

async send_hashed_identifiers()[source]

Send the hashed identifiers to the helper

Return type:

None

async send_lsh_identifiers()[source]

Send the encoded Locality-Sensitive Hashing identifiers to the helper

Return type:

None

async send_paillier_scheme_to_all()[source]

Send the Paillier scheme to all other parties, this enables them to encrypt values with your public key. The private key is NOT communicated.

Return type:

None

async send_phonetic_identifiers()[source]

Send the hashed phonetic identifiers to the helper

Return type:

None

async send_randomness_to_all()[source]

Send randomness to other data_owners to be used in the salted hash

Return type:

None

async send_shares()[source]

Send the random generated shares for all other data parties to the helper party

Return type:

None

property shared_randomness: int

The shared randomness (sum of own randomness and that of the other parties).

Returns:

Shared randomness.

property shares: ndarray[Any, dtype[object_]]

The shares of the complete secure inner join.

Returns:

All secure inner join shares.

Raises:

ValueError – Raised when not all shares are available.

secure_inner_join.database_owner.sha256_hash_digest(bytes_string)[source]

Returns the sha256 hash digest of byte length 32 (256 bits).

Parameters:

bytes_string (bytes) – bytes string to be hashed.

Return type:

bytes

Returns:

The hash digest.