secure_inner_join.lsh module

This implements Locality-Sensitive Hashing for dates and zip2-codes.

secure_inner_join.lsh.encode(day, month, year, zip4_code)[source]

Encodes day, month, year and zip2 to a Tuple.

Parameters:
  • day (int) – day of birth

  • month (int) – month of birth

  • year (int) – year of birth

  • zip4_code (int) – the four digits of the postal code

Return type:

tuple[int, int, int, int]

Returns:

encoded representation

secure_inner_join.lsh.get_hyper_planes(amount=2000, seed=42, mask=False)[source]

Construct a specified number of hyper planes with a set seed. We assume the following order: (day, month, year, zip2-code).

Parameters:
  • amount (int) – number of hyper planes to construct

  • seed (int) – seed to use for the random generator

  • mask (bool) – set to true to generate a bit mask to use for masking

Return type:

ndarray[Any, dtype[int64]] | tuple[ndarray[Any, dtype[int64]], bitarray]

Returns:

array containing the random hyper planes

secure_inner_join.lsh.lsh_hash(day, month, year, zip4_code, hyper_planes, bit_mask=None)[source]

Computes a hash encoding for a given encoded input, given a collection of hyperplanes

Parameters:
  • day (int) – day of birth

  • month (int) – month of birth

  • year (int) – year of birth

  • zip4_code (int) – the four digits of the postal code

  • hyper_planes (ndarray[Any, dtype[int64]]) – $n$ hyperplanes sampled from $[0,62) imes[0,12) imes[0,100) imes[10,100)$

  • bit_mask (bitarray | None) – masking to apply to the hashing

Return type:

bitarray

Returns:

an encode hash, first for $n$ bits belong to day, second $n$ bits belong to month, etc.

secure_inner_join.lsh.weighted_hamming_distance(hash_1, hash_2)[source]

if score ~= 1 than we expect at most one element to be one-off

The score represents the actual distance between two encodings if the number of buckets is large enough :type hash_1: bitarray :param hash_1: first hash :type hash_2: bitarray :param hash_2: second hash :rtype: tuple[float, tuple[float, float, float, float]] :return: an x-off distance score, and a tuple of x-off distances per (day, month, year, zip2)