graphbin.graphbin module¶
Implementation of the GraphBin algorithm.
- class graphbin.graphbin.GraphBin(source_graph, k, random=None)[source]¶
Bases:
object
Class that implements the GraphBin algorithm.
- __init__(source_graph, k, random=None)[source]¶
Initialize the GraphBin algorithm with a source graph and parameters.
- Parameters:
source_graph (
Graph
) – The original graph from which to generate a synthetic graph.k (
int
) – The amount of clusters with which to bin the source graph.random (
None
|int
|BitGenerator
|SeedSequence
|Generator
) – If given, random generator is initialized with the random state. If None, fresh randomness is used.
- generate()[source]¶
Generate a synthetic graph. The edges are generated using the GraphBin algorithm.
Note: you must first initialize the GraphBin object, either by generating a random source graph from scratch (see GraphBinFromScratch) or by providing one (not yet implemented).
- Raises:
GraphBinNotInitializedError – If the GraphBin object has not been initialized.
- Return type:
- Returns:
Graph object representing the generated graph.
- k:
int
¶ The amount of clusters with which to bin the source graph.
- class graphbin.graphbin.GraphBinFromScratch(n_samples=100, param_feature=28000, param_degree=50, cor=0.5, k=5, param_edges=4, random_state=None)[source]¶
Bases:
GraphBin
This class is a version of the GraphBin algorithm with some parts overwritten to work without a real source graph.
As a starting point, we use the parameters to generate a random graph, which is used both as source graph and as starting point of the synthetic graph.
Since our randomly generated graph does not have real edges, we randomly generate an adjacency matrix.
- __init__(n_samples=100, param_feature=28000, param_degree=50, cor=0.5, k=5, param_edges=4, random_state=None)[source]¶
Initialize the GraphBin algorithm for generating a random synthetic graph from scratch.
To generate the random graph, the various parameters are used in the following way.
First, an amount of sample_size two-dimensional values is sampled from a bi-variate standard normal distribution. A bi-variate distribution is used to impose a correlation between the two variables (feature, degree) which will be derived next. The cumulative density is then used to derive two different variables. The first sampled dimension is transformed to an exponential distribution from which the first variable is sampled, the shape of which can be changed using param_feature parameter. This first variable is a node-level domain feature of the synthetic graph, such as the feature or transaction activity of the node, often referred to as ‘feature’. The second sampled dimension is transformed to a powerlaw distribution from which the second variable is sampled, which can be changed using the param_degree parameter. This second variable corresponds to the degree of the node (number of edges).
- Parameters:
n_samples (
int
) – Number of nodes.param_feature (
int
) – Governs the feature distribution. This is the parameter used to specify the shape of the exponential distribution from which the feature values are sampled (scipy.stats.expon).param_degree (
int
) – Governs the degree distribution. This is the parameter used to specify the shape of the powerlaw distribution from which the degrees are sampled (scipy.stats.powerlaw).cor (
float
) – Strength of correlation between transformed feature and degree. This is the correlation used to generate the two bi-variate normally distributed values.k (
int
) – Number of clusters (bins).param_edges (
int
) – Roughly relates to the strength of effect of the bins on the edge probabilities. A matrix is filled with probabilities of edges occurring between different types of nodes by sampling from a Poisson distribution and randomly filling in these values in the matrix.random_state (
None
|int
|BitGenerator
|SeedSequence
|Generator
) – If int, the random generator is seeded with the integer. If RandomState instance, the random generator is initialized with the random state. If None, fresh randomness is used.
- Raises:
ValueError – If n_samples is smaller than 1.