Analytical

The maximum likelihood method can be used to compute the expected value and the standard deviation of any metric that is based on the adjacency matrix. Depending on the underlying model, some details change, but the principle remains. This formalism allows us to computed z-scores and assess which topological properties are consistent with their randomized value within a statistical error, and which deviate significantly from the null model expectation. In the latter case, the observed property cannot be traced back to the constraints, and therefore requires additional explanations or generating mechanisms besides those required in order to explain the constraints themselves.

Some topological properties are available in the package default (e.g. ANND and different network motifs), but you can define an additional metrics as well. This allows you to obtain both the expected value and the standard deviation of any matrix in a standardized way. In the expression of the variance of a topological property $X$, we find $\frac{\partial X}{\partial a_{ij}}$. We use leverage Julia's autodiff capabilities to compute these terms. If desired, you can always compute the gradient of a specific metric by hand and implement it yourself as well. The downside of using this approach is that you need the complete adjacency matrix, so this is not suited for the analysis of very large graphs due to memory constraints. Depending on the size of the problem, different autodiff techniques can give different performance results. You might want to experiment a bit with this for your own use case (some examples are provided as well).

Expected value

Using the maximum likelihood method method the expected value for any topological property X can be computed from the expected values in the adjacency matrix of the graph G (this approximation ignores the second and higher order terms in the multidimensional Taylor expansion of X).

\[X \left( G \right) = X \left( \left< G \right> \right)\]

Variance

The variance of a topological property S can be written as follows

\[\sigma ^{2} \left[ X \right] = \sum_{i,j} \sum_{t,s} \sigma \left[g_{ij}, g_{ts} \right] \left( \frac{\partial X}{\partial g_{ij}} \frac{\partial X}{\partial g_{ts}} \right)_{G = \left< G \right>}\]

where

\[\sigma \left[ g_{ij}, g_{ts} \right] = \left< g_{ij}g_{ts}\right> - \left< g_{ij}\right>\left< g_{ts}\right>\]

Using the appropriate expressions for $\left< g_{ij} \right>$ and $\left< g_{ij}\right>$ (depending on the model considered, cf. examples), a highly reliable estimate for the variance of the metric can be obtained.

Examples

Some examples using built-in function of package are listed below:

Assortativity in the UBCM
Motif significance in the DBCM
Significance of V-motifs and projection in the BiCM

Assortativity in the UBCM

Let us consider the UBCM applied to the Zachary Karate Club network. We want to analyze if the assortativity of each node (measured by its ANND) is statistically significant from what one would expect under the null model.

First, we define the network and the associated UBCM model.

using Graphs
using MaxEntropyGraphs

# define the network
G = MaxEntropyGraphs.Graphs.SimpleGraphs.smallgraph(:karate)
# generate a UBCM model from the karate club network
model = UBCM(G); 
# compute the maximum likelihood parameters
solve_model!(model); 
# compute and set the expected adjacency matrix
set_Ĝ!(model); 
# compute and set the standard deviation of the adjacency matrix
set_σ!(model); 
nothing

Now we can define our specific metric. Before computing the z-score for all nodes, we illustrate the process for a single node. We use X as variable name for our metric. Defining methods for X in such a way that it can with both an AbstractArray and an AbstractGraph is recommended, but not necessary.

# We consider the ANND of node 1 as our metric
node_id = 1
X = A -> ANND(A, node_id, check_dimensions=false, check_directed=false);    
# Expected value under the null model
X_expected = X(model.Ĝ)
# Expected standard deviation under the null model
X_std = σₓ(model, X)
# Observed value (using the underlying network)
X_observed = X(model.G)
# compute z-score
z_X = (X_observed - X_expected) / X_std

In the same way, we can compute the z-score for every node:

# Observed value
ANND_obs = [ANND(G, i) for i in vertices(G)]
# Expected values
ANND_exp = [ANND(model.Ĝ, i) for i in vertices(G)]
# Standard deviation
ANND_std = [σₓ(model, A -> ANND(A, i, check_dimensions=false, check_directed=false)) for i in vertices(G)]
# Z-score
Z_ANND = (ANND_obs - ANND_exp) ./ ANND_std;

Motif significance in the DBCM

Let us consider the DBCM applied to the Chesapeake Bay foodweb. We want to analyze if any of the different network motifs is statistically significant of what one would expect under the null model.

First, we define the network and the associated UBCM model.

using Graphs
using MaxEntropyGraphs
import Statistics: mean, std

# define the network
G = chesapeakebay()
# extract its adjacency matrix
A = adjacency_matrix(G)
# generate a UBCM model from the karate club network
model = DBCM(G); 
# compute the maximum likelihood parameters
solve_model!(model); 
# compute and set the expected adjacency matrix
set_Ĝ!(model); 
# compute and set the standard deviation of the adjacency matrix
set_σ!(model); 
nothing

We want to know the values of M1, ..., M13. These are network-wide measures.

# compute the observed motif counts
motifs_observed = [@eval begin $(f)(A) end for f in MaxEntropyGraphs.directed_graph_motif_function_names];
# Expected value under the null model
motifs_expected = [@eval begin $(f)(model) end for f in MaxEntropyGraphs.directed_graph_motif_function_names];
# Expected standard deviation under the null model
motifs_std = [@eval begin  σₓ(model, $(f), gradient_method=:ForwardDiff) end for f in MaxEntropyGraphs.directed_graph_motif_function_names];
# compute the z-score
motifs_z = (motifs_observed .- motifs_expected) ./ motifs_std

Significance of V-motifs and projection in the BiCM

Let us consider the BiCM applied to the corporate club membership network. This bipartite network is composed of 25 persons and 15 social organizations. An edge between a person and a social organization shows that the person is a member. We want to obtain the projection of this network on both the person and the social organization layer. We also want to determine if any of these connections in the projected networks are statistically significant under the BiCM null model. This is done by evaluating if the number of organizations that two users have in common is more than what one would expect under the null model.

First, we define the network and the associated BiCM model.

using MaxEntropyGraphs
using Graphs

# define the network
G = corporateclub();
# project the model on its layers
G_persons, G_organizations = project(G, layer=:bottom), project(G, layer=:top);
# generate a BiCM from the corporate club network
model = BiCM(G); 
# compute the maximum likelihood parameters
solve_model!(model); 
# determine the validated projected networks
G_persons_validated, G_organizations_validated = project(model, layer=:bottom), project(model, layer=:top);
# check the number of (validated) edges in both networks
(ne(G_persons),ne(G_persons_validated)),(ne(G_organizations), ne(G_organizations_validated))

# output
((259, 0), (66, 0))

We can observe that no links were found to be statistically significant, using the default settings, i.e. a significance threshold of $\alpha=0.05$ and the Benjamini Hochberg procedure from MultiTest to correct for multiple testing. These can by modified if required.

# using a very high threshold value for significance
project(model, layer=:bottom, α=0.9)

# output
{25, 255} undirected simple Int64 graph