cox_regression.survival_stacking module¶
This package is used to stack a survival dataset. This allows it to be used with a classification method. See https://arxiv.org/pdf/2107.13480.pdf for more information.
- cox_regression.survival_stacking.stack(covariates, times, events, time_bins=None)[source]¶
This function stacks a dataset as described in https://arxiv.org/pdf/2107.13480.pdf. The input is in the form of separate numpy arrays. So for patient 1, we have its covariates in covariates[0], its failure time in times[0], and its event indicator in failed[0]. All these arrays should have the same length. Time bins allow for discretized stacking, where there is one stacked block for each time bin.
- Parameters:
covariates (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]]) – The covariates of the patients. Can have multiple columns.times (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]]) – The start time of the time intervals.events (
ndarray
[tuple
[int
,...
],dtype
[bool
]]) – The event indicators. Should contain boolean values.time_bins (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]] |None
) – If provided, a discrete stacker is used. The parameter should contain the starting times of each time interval. E.g. [0, 200, 400, 600] denotes time intervals 0-200, 200-400, and 400-600. Time bins are closed on the left and open on the right. That is, 200-400, includes 200, but not 400. It’s first value must be zero, and its largest value must be bigger than the biggest failure/censoring time value.
- Return type:
tuple
[ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]],ndarray
[tuple
[int
,...
],dtype
[bool
]]]- Returns:
The stacked data set in the form a multidimensional array containing the input data and a vector containing the target data.
- cox_regression.survival_stacking.stack_time_varying(ids, covariates, start_times, end_times, events, time_bins=None)[source]¶
This function stacks a time varying dataset as described in https://arxiv.org/pdf/2107.13480.pdf. The input is in the form of separate numpy arrays. So for the first interval of the first patient, we have its patient id in ids[0], its covariates in covariates[0], the interval times in start_times[0] and end_times[0] and its event indicator in failed[0]. All these arrays should have the same length. Time bins allow for discretized stacking, where there is one stacked block for each time bin. When using discretization, the situation at the beginning of a time interval decides the covariates for that time bin. E.g. if a covariate changes within a time interval, this is recorded at the start of the next interval. Hence, if a covariate changes twice during an interval, the intermediate value will be lost. Patients censored within a time bin will be recorded as having survived the bin. When a patient fails in a time interval, it is recorded as having failed at the end of the time interval.
- Parameters:
ids (
ndarray
[tuple
[int
,...
],dtype
[int64
]]) – The patient ids. Can be used to specify time-varying covariates. The id is unique per patient and a patient can have multiple rows. However, a patient id can have only one failure.covariates (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]]) – The covariates of the patients. Can have multiple columns.start_times (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]]) – The start time of the time intervals.end_times (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]]) – The end times of the time intervals.events (
ndarray
[tuple
[int
,...
],dtype
[bool
]]) – The event indicators. Should contain boolean values.time_bins (
ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]] |None
) – If provided, a discrete stacker is used. The parameter should contain the starting times of each time interval. E.g. [0, 200, 400, 600] denotes time intervals 0-200, 200-400, and 400-600. It’s first value must be zero, and its largest value must be bigger than the biggest failure/censoring time value.
- Return type:
tuple
[ndarray
[tuple
[int
,...
],dtype
[floating
[Any
]]],ndarray
[tuple
[int
,...
],dtype
[bool
]]]- Returns:
The stacked data set in the form a multidimensional array containing the input data and a vector containing the target data.