Cavity method and belief propagation for the Ising model

Variational approach using the Bethe approximation

The Boltzmann distribution is

 P(\underline\sigma)=\frac{1}{Z}e^{-\beta E(\sigma)}
 Z=\sum_{\underline\sigma}e^{-\beta E(\sigma)}

The Bethe approximation asks to use

 \hat P(\underline \sigma)=\frac{\prod_{\langle ij\rangle}P_{ij}(\sigma_i,\sigma_j)}{\prod_iP_i^{d_i-1}(\sigma_i)}

Here \textstyle\langle ij\rangle denotes pair of nodes that is connected in the topology of the Ising model. That is, \textstyle J_{ ij}>0, \textstyle d_i  is the degree of node i in the topology, and

\textstyle P_{ij}(\sigma_i,\sigma_j)=\sum_{\underline\sigma\backslash\sigma_i,\sigma_j}P(\underline\sigma), ,
\textstyle P_{i}(\sigma_i) =\sum_{\underline\sigma\backslash\sigma_i}P(\underline\sigma) are two-point and one-point marginals respectively.

Then the task is to find parameters of the Bethe approximation, such that

\{\hat P_{ij},\hat P_i \}=\arg\min_{P_{ij},P_i}D_{\textrm{KL}}(P,\hat P).

And we can write the Kellback-Leibler divergence as


\begin{align}
D_{\textrm{KL}}(P,\hat P) &= \sum_{\underline\sigma}\hat P(\underline\sigma)\log\frac{\hat P(\underline\sigma)}{ P(\underline\sigma)}\\
&=\sum_{\underline\sigma}\hat P(\underline\sigma)\log \hat P(\underline\sigma)-\sum_{\underline\sigma}\hat P(\underline\sigma)\log P(\underline\sigma)\\
&=-S_{\hat P}+\sum_{\underline\sigma} \hat{P}(\underline\sigma)\beta E(\underline\sigma)+\sum_{\underline\sigma}\hat P(\underline\sigma)\log Z\\
&=-S_{\hat P}+\beta\langle E\rangle_{\hat P} +\log Z\\
&=\beta F_{\hat P}-\beta F_{P}\\
&\geq 0.
\end{align}

It means that

  • A good estimate of parameters in the approximation can be obtained by minimizing the variational free energy \textstyle F_{\hat P}.
  • The variational free energy gives an upper-bound to the true free energy.
\{\hat P_{ij},\hat P_i \}=\arg\min_{P_{ij},P_i}F_{\hat P}.

Since

 \hat P(\underline \sigma)=\frac{\prod_{\langle ij\rangle}P_{ij}(\sigma_i,\sigma_j)}{\prod_iP_i^{d_i-1}(\sigma_i)},

we have

\log \hat{P}(\underline\sigma)=\sum_{\langle ij\rangle}\log P_{ij}-\sum_i(d_i-1)\log P_i.

Thus

 
\begin{align}
F_{\hat P}&=\sum_{\underline\sigma}\hat P(\underline\sigma)\sum_{\langle ij\rangle}J_{ij}\sigma_i\sigma_j+\sum_{\underline\sigma}\hat P(\underline\sigma)\sum_{\langle ij\rangle}\log P_{ij}-\sum_{\underline\sigma}\hat P(\underline\sigma)(d_i-1)\log P_i\\
&=\sum_{\langle ij\rangle}\sum_{\sigma_i\sigma_j}P_{ij}(\sigma_i,\sigma_j)(E_{ij}+\log P_{ij})-\sum_i\sum_{\sigma_i}P_i(\sigma_i)(d_i-1)\log P_i(\sigma_i)
\end{align}

Now let us introduce several Lagragnian with several multipliers

 
\begin{align}
\mathcal L&=\sum_{\langle ij\rangle}\sum_{\sigma_i\sigma_j}P_{ij}(\sigma_i,\sigma_j)(E_{ij}+\log P_{ij})-\sum_i\sum_{\sigma_i}P_i(\sigma_i)(d_i-1)\log P_i(\sigma_i)\\
&-\sum_i\lambda_i\left (\sum_{\sigma_i}P_i(\sigma_i)-1\right )-\sum_{\langle ij\rangle}\lambda_{j\to i}\left ( \sum_{\sigma_j} P_{ij}(\sigma_i,\sigma_j)-P_i(\sigma_i)\right)-\sum_{\langle ij\rangle}\lambda_{i\to j}\left ( \sum_{\sigma_i} P_{ij}(\sigma_i,\sigma_j)-P_j(\sigma_j)\right)
\end{align}

By setting derivatives to zero we have

 
\begin{align}
\frac{\partial \mathcal L}{\partial P_{i}(\sigma_i)}&=(1-d_i)(1+\log P_i(\sigma_i))-\lambda_i-\sum_{\langle ij\rangle}\lambda_{j\to i}=0,\\
\frac{\partial \mathcal L}{\partial P_{ij}(\sigma_i,\sigma_j)}&=E_{ij}(1+\log P_{ij})-\lambda_{j\to i}-\lambda_{i\to j}=0.
\end{align}

Last equations yield that

 
\begin{align}
P_i(\sigma_i)&\propto e^{\sum_{\langle ij\rangle}\lambda_{j\to i}},\\
P_{ij}(\sigma_i,\sigma_j)&\propto e^{-E_{ij}-\lambda_{j\to i}-\lambda_{i\to j}}.
\end{align}

After introducing new variables, we have

 
\begin{align}
P_i(\sigma_i)&\propto \prod_{j\in\partial i}\psi^{j\to i},\\
P_{ij}(\sigma_i,\sigma_j)&\propto e^{-E_{ij}}\psi^{j\to i}\psi^{i\to j}.
\end{align}

where \textstyle \{\psi_{i\to j}\} are cavity messages along the directed edges of the graph, and they can be determined using the Belief Propagation equation (also called Sum-Product equation):


\begin{align}
\psi_{i\to j}(\sigma_j)&=\frac{1}{Z_{i\to j}}\sum_{\sigma_i}e^{\beta E_{ij}(\sigma_i,\sigma_j)}\prod_{k\in\partial i\backslash j}\psi_{k\to i}(\sigma_i),\\
Z_{i\to j}&=\sum_{\sigma_j}\sum_{\sigma_i}e^{\beta E_{ij}(\sigma_i,\sigma_j)}\prod_{k\in\partial i\backslash j}\psi_{k\to i}(\sigma_i).
\end{align}

Computing macro observables


\begin{align}
F&=\sum_{i}\Delta F_i-\sum_{\langle ij\rangle}\Delta F_{ij},\\
\Delta F_i&=-\frac{1}{\beta}\log Z_i=-\frac{1}{\beta}\sum_{\sigma_i}\prod_{k\in\partial i\backslash j}\sum_{\sigma_k}e^{-\beta E_{ik}(\sigma_i,\sigma_k)}\psi_{\sigma_k}^{k\to i},\\
\Delta F_{ij}&=-\frac{1}{\beta}\log Z_{ij}=-\frac{1}{\beta}\sum_{\sigma_i,\sigma_j}e^{-\beta E_{ij}(\sigma_i,\sigma_j)}\psi_{\sigma_i}^{i\to j}\psi_{\sigma_j}^{j\to i}.
\end{align}

Deriving Belief Propagation using directly the conditional independence

We can actually try to solve the marginals directly by adding an additional spin i into a system with n spins. 
\begin{align}
	P_i(\sigma_i)&=\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k}\}}P_{N+1}(\sigma_i;\underline{\sigma_j};\underline{\sigma_k})\\
	&=\frac{\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }\\
	&=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }\\
	&=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\frac{\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})}}{\sum_{\{\underline{\sigma_j} \}}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} } }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\frac{\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})}}{\sum_{\{\underline{\sigma_j} \}}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} } }\\
	&=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}P_N(\underline{\sigma_j},\underline{\sigma_k}) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}P_N{ (\underline{\sigma_j},\underline{\sigma_k})} }\\
	&=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}P_N(\underline{\sigma_j}) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}P_N{ (\underline{\sigma_j})} }\\
	&=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\prod_{j\in\partial i}P_N(\sigma_j) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\prod_{j\in\partial i}P_N{( {\sigma_j})} }\\
	&=\frac{\sum_{\{\underline{\sigma_j} \}}\prod_{j\in\partial i}e^{ -\beta  E_{i,j}(\sigma_i,\sigma_j)}\psi_{j\to i}(\sigma_j) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}\prod_{j\in\partial i}e^{ -\beta E_{i,j}(\sigma_i,{\sigma_j})}\psi_{j\to i}{( {\sigma_j})} }\\
	&=\frac{\prod_{j\in\partial i}\sum_{ {\sigma_j}  }e^{ -\beta  E_{i,j}(\sigma_i,\sigma_j)}\psi_{j\to i}(\sigma_j) }{\sum_{\sigma_i}\prod_{j\in\partial i}\sum_{{\sigma_j} }e^{ -\beta E_{i,j}(\sigma_i,{\sigma_j})}\psi_{j\to i}{( {\sigma_j})} }
\end{align}


Computing macro observables

The Bethe free energy can be computed as


\begin{align}
F&=\sum_{i}\Delta F_i-\sum_{\langle ij\rangle}\Delta F_{ij},\\
\Delta F_i&=-\frac{1}{\beta}\log Z_i=-\frac{1}{\beta}\sum_{\sigma_i}\prod_{k\in\partial i\backslash j}\sum_{\sigma_k}e^{-\beta E_{ik}(\sigma_i,\sigma_k)}\psi_{\sigma_k}^{k\to i},\\
\Delta F_{ij}&=-\frac{1}{\beta}\log Z_{ij}=-\frac{1}{\beta}\sum_{\sigma_i,\sigma_j}e^{-\beta E_{ij}(\sigma_i,\sigma_j)}\psi_{\sigma_i}^{i\to j}\psi_{\sigma_j}^{j\to i}.
\end{align}

Then average internal energy and entropy can be computed as


\begin{align}
E&=\sum_{i}\Delta E_i-\sum_{\langle ij\rangle}\Delta E_{ij},\\
&=\sum_{i}\frac{\partial(\beta F_i)}{\delta \beta}-\sum_{\langle ij\rangle}\frac{\partial (\beta\Delta F_{ij})}{\partial \beta}\\
S&=F-\beta E
\end{align}


Taking ensemble average using the Population Dynamics

If we use \textstyle \mathcal F_{\textrm{RS}}(\{ P_{\sigma_k}^{k\to i} \} to denote the Replica Symmetric equations written in the previous sections, then distribution of messages in the directed edges sampled from ensemble of graphs can be written as


\mathcal P(P^{i\to j}_{\sigma_i})=\int \prod_{k\in\partial i\backslash j}dP_{\sigma_k}^{k\to i}\mathcal P(P_{\sigma_k}^{k\to i})\cdot\delta\left [ P^{i\to j}_{\sigma_i} - \mathcal F_{\textrm{RS}}(\{ P_{\sigma_k}^{k\to i} \}\right ] ).

Then the macro observables like energy and entropy can be computed using the distribution of messages.