cox_regression.survival_stacking module

Implementation of survival stacking as described in https://arxiv.org/pdf/2107.13480.pdf.

cox_regression.survival_stacking.stack(covariates, times, failed, ids=None, time_bins=None)[source]

This function stacks a dataset as described in https://arxiv.org/pdf/2107.13480.pdf. The input is in the form of separate numpy arrays. So for patient 1, we have its covariates in covariates[1], its failure time in times[1], and its event indicator in failed[1]. All the arrays should have the same length. Based on the input, the function takes into consideration time-dependency and/or discretization. When using discretization, we always use the situation at the beginning of a time interval. E.g. if a covariate changes within a time interval, this is recorded at the start of the next interval. Hence, if a covariate changes twice during an interval, the intermediate value will be lost. Patients censored within an interval will be recorded as having survived the interval.

Parameters:
  • covariates (ndarray[Any, dtype[int64 | float64]]) – The covariates of the patients. Can have multiple columns.

  • times (ndarray[Any, dtype[int64 | float64]]) – The failure/censoring times.

  • failed (ndarray[Any, dtype[bool_]]) – The event indicators. Should contain boolean values.

  • ids (ndarray[Any, dtype[int64]] | None) – The patient ids. Can be used to specify time-varying covariates. The id is unique per patient and a patient can have multiple rows. However, a patient id can have only one failure.

  • time_bins (ndarray[Any, dtype[float64]] | None) – If provided, a discrete stacker is used. The parameter should contain the starting times of each time interval. E.g. [0, 200, 400, 600] denotes time intervals 0-200, 200-400, and 400-600. It’s first value must be zero, and its largest value must be bigger than the biggest failure/censoring time value.

Return type:

tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[bool_]]]

Returns:

The stacked data set in the form a multidimensional array containing the input data and a vector containing the target data.

Raises:

ValueError – if the parameter values are inconsistent or not as specified above.