cox_regression.survival_stacking module
Implementation of survival stacking as described in https://arxiv.org/pdf/2107.13480.pdf.
- cox_regression.survival_stacking.stack(covariates, times, failed, ids=None, time_bins=None)[source]
This function stacks a dataset as described in https://arxiv.org/pdf/2107.13480.pdf. The input is in the form of separate numpy arrays. So for patient 1, we have its covariates in covariates[1], its failure time in times[1], and its event indicator in failed[1]. All the arrays should have the same length. Based on the input, the function takes into consideration time-dependency and/or discretization. When using discretization, we always use the situation at the beginning of a time interval. E.g. if a covariate changes within a time interval, this is recorded at the start of the next interval. Hence, if a covariate changes twice during an interval, the intermediate value will be lost. Patients censored within an interval will be recorded as having survived the interval.
- Parameters:
covariates (
ndarray
[Any
,dtype
[int64
|float64
]]) – The covariates of the patients. Can have multiple columns.times (
ndarray
[Any
,dtype
[int64
|float64
]]) – The failure/censoring times.failed (
ndarray
[Any
,dtype
[bool_
]]) – The event indicators. Should contain boolean values.ids (
ndarray
[Any
,dtype
[int64
]] |None
) – The patient ids. Can be used to specify time-varying covariates. The id is unique per patient and a patient can have multiple rows. However, a patient id can have only one failure.time_bins (
ndarray
[Any
,dtype
[float64
]] |None
) – If provided, a discrete stacker is used. The parameter should contain the starting times of each time interval. E.g. [0, 200, 400, 600] denotes time intervals 0-200, 200-400, and 400-600. It’s first value must be zero, and its largest value must be bigger than the biggest failure/censoring time value.
- Return type:
tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[bool_
]]]- Returns:
The stacked data set in the form a multidimensional array containing the input data and a vector containing the target data.
- Raises:
ValueError – if the parameter values are inconsistent or not as specified above.