7.10 Stochastic Block Models
7.10.1 Stochastic Block Model
- Latent Class Model
- for each actor (node), we assign group memberships
two components in the block model,
- the vector of grop membership : z
- conditional on the group membership, the block matrix represents the edge probability of two nodes
$$ [0.80.050.020.050.90.030.020.030.7][b11b12b13b21b22b23b31b32b32] $$
- b11 에 부여된 확률 0.8: the connection prob within group 1
- b22 에 부여된 확률 0.9: the connection prob within group 1
- b11 에 부여된 확률 0.05: the connection prob b/w group 1 and group 2
※ notations:
- Y: adjacency Matrix
- K: the total membership groups (<n)
- Zi: k-vector. node i 가 속하게 될 group 에만 1 부여하고 이외에는 모두 0.
- Z:=(z1,⋯,zn)′
- C: k×k block matrix
- entry Cij∈[0,1]: prob of occurence of a edge from a node in g
block Matrix C 라는 발상은, block membership 이 given 일 때, edge 들이 conditionally independent 라는 것.
⇒ 1. Yij∼ Bernoulli distribution with success probability z′iCzj 2. independent of Ykl for (i,j)≠(k,l), given zi and zj.
7.10.1.1 Likelihood function
P(Y|Z,C)=∏i<jP(yij|Z,C)=∏i<j[(z′iCzj)yij(1−z′iCzj)1−yij]
In real data, z and C are unknown. ⇒ we need additional assumptions. 1⏟2 1. zi is independent of zj. 2. P(zik⏟group=1)=θk, ∑kj=1θk=1 ⇒ the latent group zp follows multinomial distribution with θ, i.e.,
π(z|θ)=n∏i=1z′iθ=K∏k=1θnkk
, where nk is the total number of nodes that belongs to group K.
In a Bayesian inference, we can assign Dirichlet prior α,⋯,α π(α)∼Gamma(a,b).
7.10.1.2 Inference
By combining equation (1) and (2), we can make an inference by the frequentist way using EM algorithm.
7.10.1.4 Posterior Distribution
$$ π(Z,θ,C,α|Y)∝f(Y|Z,θ,C,α)⋅π(Z,θ,C,α)=f(Y|Z,θ,C,α)⋅π(Z|θ,C,α)⋅π(θ|C,α)⋅π(C,α)=f(Y|Z,C)⋅π(Z|θ)⋅π(θ|α)⋅π(C)⋅π(α)∝∏i<j[(z′iCzj)yij(1−z′iCzj)1−yij]⋅K∏k=1θnkk[Γ(kα){k∑i=θk=1k∏i=1θα−1zΓ(α)}]⋅k∏i<j[CAij−1ij(1−Cij)Bij−1]⋅αa−1e−bα $$
⇒ Computation is veery straightforward via MCMC.
7.10.2 Mixed Membership Block Model (MMBM)
상기의 SBM 에서, 각 actor 는 오직 1개의 membership 을 가짐. 이러한 SBM 의 제약을 완화 (alleviate) 하기 위해 MMBM 이 제언되었다. 해당 모델에서는 특정 membership 에 1을 부여하는 것이 아니라, 각 membership 발생 여부에 prob 을 할당하여 다중 membership 을 각 ndoe 에 할당한다.
zi=(1,0,0)θi=(0.7,0.2,0.1)
이때 MMSB 의 vector 의 각 항은 각각 group 1, group 2, group 3 일 prob.
- θi: weights or probabilities in the groups that have to be non-negative and sum to 1.
For each dyad yij, a latent membership zij is drawn from the multinomial distribution with θp.
π(z|Θ)=∏i≠j(z′ijθixz′jiθj), where Θ:=(θ1,⋯,θk) is n×k Matrix of membership probability.
The likeliood is f(Y|Z,C)=∏i<j[(z′ijCzji)yij(1−z′ijCzji)1−yij].
The goal of inference estimating not z but the mixed memberships Θ.
7.10.2.1 Degree-corrected SBM
use Poisson Model.
- yij: the number of edges from dyad (i,j) following a Poisson dist
- Cij: the expected number of edges from a node in group i to a node in group j
The likelihood is
π(Yij|Z,C)=(Yij!)−1exp{(−z′iCzj)(z′iCzj)yij}
In a large sparse graph, where the edge probability equals the expected number of edges, DC-SBM is asymptotically equivalient to the Bernoulli counterpart in equation (3).
- ϕp: the ratio of node i’s degree to the sum of degrees in node i’s group.
Under constraint ∑ni=1ϕi⋅I(zik=1)=1, then equation (3) becomes
f(yij|Z,C,ϕ)=(yij!)−1exp{(−ϕiϕjz′iCzj)(ϕiϕjz′iCzj)yij}
which is DC-SBM.
⇒ SBM ignores the variation of node degrees in a real network.
With DC-SBM, we can make correction better.