7.10 Stochastic Block Models

7.10.1 Stochastic Block Model

  • Latent Class Model
  • for each actor (node), we assign group memberships

two components in the block model,

  1. the vector of grop membership : z
  2. conditional on the group membership, the block matrix represents the edge probability of two nodes

$$ [0.80.050.020.050.90.030.020.030.7][b11b12b13b21b22b23b31b32b32] $$

  • b11 에 부여된 확률 0.8: the connection prob within group 1
  • b22 에 부여된 확률 0.9: the connection prob within group 1
  • b11 에 부여된 확률 0.05: the connection prob b/w group 1 and group 2

※ notations:

  • Y: adjacency Matrix
  • K: the total membership groups (<n)
  • Zi: k-vector. node i 가 속하게 될 group 에만 1 부여하고 이외에는 모두 0.
  • Z:=(z1,,zn)
  • C: k×k block matrix
    • entry Cij[0,1]: prob of occurence of a edge from a node in g

block Matrix C 라는 발상은, block membership 이 given 일 때, edge 들이 conditionally independent 라는 것.

1. Yij Bernoulli distribution with success probability ziCzj 2. independent of Ykl for (i,j)(k,l), given zi and zj.




7.10.1.1 Likelihood function

P(Y|Z,C)=i<jP(yij|Z,C)=i<j[(ziCzj)yij(1ziCzj)1yij]

In real data, z and C are unknown. we need additional assumptions. 12 1. zi is independent of zj. 2. P(zikgroup=1)=θk, kj=1θk=1 the latent group zp follows multinomial distribution with θ, i.e.,

π(z|θ)=ni=1ziθ=Kk=1θnkk

, where nk is the total number of nodes that belongs to group K.

In a Bayesian inference, we can assign Dirichlet prior α,,α π(α)Gamma(a,b).




7.10.1.2 Inference

By combining equation (1) and (2), we can make an inference by the frequentist way using EM algorithm.




7.10.1.3 Bayesian

we need to assign prior for CBETA, CijBETA(Aij,Bij), where A,B are k×k Matrix.




7.10.1.4 Posterior Distribution

$$ π(Z,θ,C,α|Y)f(Y|Z,θ,C,α)π(Z,θ,C,α)=f(Y|Z,θ,C,α)π(Z|θ,C,α)π(θ|C,α)π(C,α)=f(Y|Z,C)π(Z|θ)π(θ|α)π(C)π(α)i<j[(ziCzj)yij(1ziCzj)1yij]Kk=1θnkk[Γ(kα){ki=θk=1ki=1θα1zΓ(α)}]ki<j[CAij1ij(1Cij)Bij1]αa1ebα $$

Computation is veery straightforward via MCMC.








7.10.2 Mixed Membership Block Model (MMBM)

상기의 SBM 에서, 각 actor 는 오직 1개의 membership 을 가짐. 이러한 SBM 의 제약을 완화 (alleviate) 하기 위해 MMBM 이 제언되었다. 해당 모델에서는 특정 membership 에 1을 부여하는 것이 아니라, 각 membership 발생 여부에 prob 을 할당하여 다중 membership 을 각 ndoe 에 할당한다.

zi=(1,0,0)θi=(0.7,0.2,0.1)

이때 MMSB 의 vector 의 각 항은 각각 group 1, group 2, group 3 일 prob.

  • θi: weights or probabilities in the groups that have to be non-negative and sum to 1.

For each dyad yij, a latent membership zij is drawn from the multinomial distribution with θp.

π(z|Θ)=ij(zijθixzjiθj), where Θ:=(θ1,,θk) is n×k Matrix of membership probability.

The likeliood is f(Y|Z,C)=i<j[(zijCzji)yij(1zijCzji)1yij].

The goal of inference estimating not z but the mixed memberships Θ.




7.10.2.1 Degree-corrected SBM

use Poisson Model.

  • yij: the number of edges from dyad (i,j) following a Poisson dist
  • Cij: the expected number of edges from a node in group i to a node in group j

The likelihood is

π(Yij|Z,C)=(Yij!)1exp{(ziCzj)(ziCzj)yij}

In a large sparse graph, where the edge probability equals the expected number of edges, DC-SBM is asymptotically equivalient to the Bernoulli counterpart in equation (3).

  • ϕp: the ratio of node i’s degree to the sum of degrees in node i’s group.

Under constraint ni=1ϕiI(zik=1)=1, then equation (3) becomes

f(yij|Z,C,ϕ)=(yij!)1exp{(ϕiϕjziCzj)(ϕiϕjziCzj)yij}

which is DC-SBM.

SBM ignores the variation of node degrees in a real network.

With DC-SBM, we can make correction better.