utility_metrics.spiderplot module¶

Module that contains the main interface for users to test the quality of synthetic datasets. Computation of all four metrics to visualize in one spiderplot.

utility_metrics.spiderplot.compute_all_metrics(original_data, synthetic_datasets, cols_cat, n_bins=50, frac_training=0.7, random_state=1)[source]¶

Computation of the utility metrics.

Parameters:

original_data (DataFrame) – A pandas dataframe that contains the original data
synthetic_datasets (dict[Any, Any]) – A dictionary containing the synthetic data pandas dataframes. The keys specify the names of the datasets. The order and names of the columns should be the same as of the original dataframe
cols_cat (Sequence[str]) – A list containing the names of the categorical columns
n_bins (int) – The number of bins that is used to discretise the numerical columns
frac_training (float) – The fraction of the dataset that is used for training the model. The rest is used for calculating the accuracy and R-squared scores
random_state (int) – The random seed that is used to split the dataset into training and testing

Return type:

dict[Any, Any]

Returns:

Dictionary containing the values for the utility metrics

utility_metrics.spiderplot.compute_spider_plot(metrics_results)[source]¶

Function that visualizes the four main metrics of how good synthetic data is using a spider plot.

Parameters:: metrics_results (dict[Any, Any]) – A dictionary that maps synthetic data set names to an overview of metric values. Each overview of metrics contains an entry for each key in METRIC_NAMES.
Return type:: Any
Returns:: A Figure object representing the spider plot.