utility_metrics.spiderplot module¶
Module that contains the main interface for users to test the quality of synthetic datasets. Computation of all four metrics to visualize in one spiderplot.
- utility_metrics.spiderplot.compute_all_metrics(original_data, synthetic_datasets, cols_cat, n_bins=50, frac_training=0.7, random_state=1)[source]¶
Computation of the utility metrics.
- Parameters:
original_data (
DataFrame
) – A pandas dataframe that contains the original datasynthetic_datasets (
dict
[Any
,Any
]) – A dictionary containing the synthetic data pandas dataframes. The keys specify the names of the datasets. The order and names of the columns should be the same as of the original dataframecols_cat (
Sequence
[str
]) – A list containing the names of the categorical columnsn_bins (
int
) – The number of bins that is used to discretise the numerical columnsfrac_training (
float
) – The fraction of the dataset that is used for training the model. The rest is used for calculating the accuracy and R-squared scoresrandom_state (
int
) – The random seed that is used to split the dataset into training and testing
- Return type:
dict
[Any
,Any
]- Returns:
Dictionary containing the values for the utility metrics
- utility_metrics.spiderplot.compute_spider_plot(metrics_results)[source]¶
Function that visualizes the four main metrics of how good synthetic data is using a spider plot.
- Parameters:
metrics_results (
dict
[Any
,Any
]) – A dictionary that maps synthetic data set names to an overview of metric values. Each overview of metrics contains an entry for each key in METRIC_NAMES.- Return type:
Any
- Returns:
A Figure object representing the spider plot.