secure_inner_join.lsh module

This implements Locality-Sensitive Hashing for dates and zip2-codes.

secure_inner_join.lsh.encode(day, month, year, zip4_code)[source]

Encodes day, month, year and zip2 to a Tuple.

Parameters:
  • day (int) – day of birth

  • month (int) – month of birth

  • year (int) – year of birth

  • zip4_code (int) – the four digits of the postal code

Return type:

Tuple[int, int, int, int]

Returns:

encoded representation

secure_inner_join.lsh.get_hyper_planes(amount=2000, seed=42, mask=False)[source]

Construct a specified number of hyper planes with a set seed. We assume the following order: (day, month, year, zip2-code).

Parameters:
  • amount (int) – number of hyper planes to construct

  • seed (int) – seed to use for the random generator

  • mask (bool) – set to true to generate a bit mask to use for masking

Return type:

Union[ndarray[Any, dtype[int64]], Tuple[ndarray[Any, dtype[int64]], bitarray]]

Returns:

array containing the random hyper planes

secure_inner_join.lsh.lsh_hash(day, month, year, zip4_code, hyper_planes, bit_mask=None)[source]

Computes a hash encoding for a given encoded input, given a collection of hyperplanes

Parameters:
  • day (int) – day of birth

  • month (int) – month of birth

  • year (int) – year of birth

  • zip4_code (int) – the four digits of the postal code

  • hyper_planes (ndarray[Any, dtype[int64]]) – \(n\) hyperplanes sampled from \([0,62) imes[0,12) imes[0,100) imes[10,100)\)

  • bit_mask (Optional[bitarray]) – masking to apply to the hashing

Return type:

bitarray

Returns:

an encode hash, first for \(n\) bits belong to day, second \(n\) bits belong to month, etc.

secure_inner_join.lsh.weighted_hamming_distance(hash_1, hash_2)[source]

if score ~= 1 than we expect at most one element to be one-off

The score represents the actual distance between two encodings if the number of buckets is large enough :type hash_1: bitarray :param hash_1: first hash :type hash_2: bitarray :param hash_2: second hash :rtype: Tuple[float, Tuple[float, float, float, float]] :return: an x-off distance score, and a tuple of x-off distances per (day, month, year, zip2)