Performance, scalability & GPU
Memory: the analysis is matrix-free
The expected adjacency matrix $\hat{G}$ is dense and $O(n^2)$ — about 512 MB at $n = 8\,000$ and $\approx 500$ GB at $n = 250\,000$ in Float64. The package therefore does not materialize it for standard analyses:
- Metric values (
degree,ANND,wedges, V-motifs, projections, …) and sampling (rand(m)with the defaultprecomputed=false) compute each entry on the fly from the reduced maximum-likelihood parameters via element accessors (MaxEntropyGraphs.A(m, i, j)), so they use $O(n)$ memory. (Measured:degree(m)allocates ~0.25 MB at $n = 8\,000$; a single element accessor allocates nothing.) Ĝ(m)/set_Ĝ!,σˣ(m)/set_σ!, and theprecomputed=truecode paths do materialize a dense $n \times n$ matrix and are intended only for small networks — avoid them for large $n$.- The custom-metric variance via error propagation (
σₓ) currently requires the densem.σ(andm.Ĝ); for large networks prefer the sampling route to estimate variances. (Making this path matrix-free is a known follow-up.)
Multithreading
Sampling rand(m, n) and the p-value computations parallelise over Threads.@threads; start Julia with -t auto (or set JULIA_NUM_THREADS). rand(m, n; rng = …) is reproducible and independent of the thread count — each sample draws its own seeded Xoshiro stream from the supplied rng.
GPU acceleration (assessment)
GPU support is not currently shipped. Where it would (and would not) help:
| Kernel | GPU-amenable? | Verdict |
|---|---|---|
| Sampling Bernoulli draws / $\hat{G}$-element evaluation | Yes — elementwise, embarrassingly parallel | Best (and only) candidate, and only at large $n$ ($\gtrsim 10^5$) × many samples |
| Likelihood / gradient | No — the reduced loops are $O(K^2)$ over unique degrees ($K$ small); kernel-launch overhead dominates | Keep on CPU |
| Metric p-values / motif counts | Partly — but the Poisson / Poisson–binomial CDFs are not trivially GPU-portable | Low priority |
Recommended design (future, optional): a package extension (ext/…) with the GPU backend as a [weakdeps], so the package stays installable without a GPU. Write kernels with KernelAbstractions.jl for vendor portability (CUDA / Metal / ROCm), detect hardware at run time (e.g. CUDA.functional()), default to gpu=false, and keep a CPU fallback that is exercised in CI (which has no GPU). Ship it only if it clearly beats the threaded CPU sampler at scale.