 # Cavity method and belief propagation for the Ising model

## Variational approach using the Bethe approximation

The Boltzmann distribution is $P(\underline\sigma)=\frac{1}{Z}e^{-\beta E(\sigma)}$ $Z=\sum_{\underline\sigma}e^{-\beta E(\sigma)}$

The Bethe approximation asks to use $\hat P(\underline \sigma)=\frac{\prod_{\langle ij\rangle}P_{ij}(\sigma_i,\sigma_j)}{\prod_iP_i^{d_i-1}(\sigma_i)}$

Here $\textstyle\langle ij\rangle$ denotes pair of nodes that is connected in the topology of the Ising model. That is, $\textstyle J_{ ij}>0,$ $\textstyle d_i$ is the degree of node i in the topology, and $\textstyle P_{ij}(\sigma_i,\sigma_j)=\sum_{\underline\sigma\backslash\sigma_i,\sigma_j}P(\underline\sigma),$, $\textstyle P_{i}(\sigma_i) =\sum_{\underline\sigma\backslash\sigma_i}P(\underline\sigma)$ are two-point and one-point marginals respectively.

Then the task is to find parameters of the Bethe approximation, such that $\{\hat P_{ij},\hat P_i \}=\arg\min_{P_{ij},P_i}D_{\textrm{KL}}(P,\hat P).$

And we can write the Kellback-Leibler divergence as \begin{align} D_{\textrm{KL}}(P,\hat P) &= \sum_{\underline\sigma}\hat P(\underline\sigma)\log\frac{\hat P(\underline\sigma)}{ P(\underline\sigma)}\\ &=\sum_{\underline\sigma}\hat P(\underline\sigma)\log \hat P(\underline\sigma)-\sum_{\underline\sigma}\hat P(\underline\sigma)\log P(\underline\sigma)\\ &=-S_{\hat P}+\sum_{\underline\sigma} \hat{P}(\underline\sigma)\beta E(\underline\sigma)+\sum_{\underline\sigma}\hat P(\underline\sigma)\log Z\\ &=-S_{\hat P}+\beta\langle E\rangle_{\hat P} +\log Z\\ &=\beta F_{\hat P}-\beta F_{P}\\ &\geq 0. \end{align}

It means that

• A good estimate of parameters in the approximation can be obtained by minimizing the variational free energy $\textstyle F_{\hat P}$.
• The variational free energy gives an upper-bound to the true free energy. $\{\hat P_{ij},\hat P_i \}=\arg\min_{P_{ij},P_i}F_{\hat P}.$

Since $\hat P(\underline \sigma)=\frac{\prod_{\langle ij\rangle}P_{ij}(\sigma_i,\sigma_j)}{\prod_iP_i^{d_i-1}(\sigma_i)},$

we have $\log \hat{P}(\underline\sigma)=\sum_{\langle ij\rangle}\log P_{ij}-\sum_i(d_i-1)\log P_i.$

Thus \begin{align} F_{\hat P}&=\sum_{\underline\sigma}\hat P(\underline\sigma)\sum_{\langle ij\rangle}J_{ij}\sigma_i\sigma_j+\sum_{\underline\sigma}\hat P(\underline\sigma)\sum_{\langle ij\rangle}\log P_{ij}-\sum_{\underline\sigma}\hat P(\underline\sigma)(d_i-1)\log P_i\\ &=\sum_{\langle ij\rangle}\sum_{\sigma_i\sigma_j}P_{ij}(\sigma_i,\sigma_j)(E_{ij}+\log P_{ij})-\sum_i\sum_{\sigma_i}P_i(\sigma_i)(d_i-1)\log P_i(\sigma_i) \end{align}

Now let us introduce several Lagragnian with several multipliers \begin{align} \mathcal L&=\sum_{\langle ij\rangle}\sum_{\sigma_i\sigma_j}P_{ij}(\sigma_i,\sigma_j)(E_{ij}+\log P_{ij})-\sum_i\sum_{\sigma_i}P_i(\sigma_i)(d_i-1)\log P_i(\sigma_i)\\ &-\sum_i\lambda_i\left (\sum_{\sigma_i}P_i(\sigma_i)-1\right )-\sum_{\langle ij\rangle}\lambda_{j\to i}\left ( \sum_{\sigma_j} P_{ij}(\sigma_i,\sigma_j)-P_i(\sigma_i)\right)-\sum_{\langle ij\rangle}\lambda_{i\to j}\left ( \sum_{\sigma_i} P_{ij}(\sigma_i,\sigma_j)-P_j(\sigma_j)\right) \end{align}

By setting derivatives to zero we have \begin{align} \frac{\partial \mathcal L}{\partial P_{i}(\sigma_i)}&=(1-d_i)(1+\log P_i(\sigma_i))-\lambda_i-\sum_{\langle ij\rangle}\lambda_{j\to i}=0,\\ \frac{\partial \mathcal L}{\partial P_{ij}(\sigma_i,\sigma_j)}&=E_{ij}(1+\log P_{ij})-\lambda_{j\to i}-\lambda_{i\to j}=0. \end{align}

Last equations yield that \begin{align} P_i(\sigma_i)&\propto e^{\sum_{\langle ij\rangle}\lambda_{j\to i}},\\ P_{ij}(\sigma_i,\sigma_j)&\propto e^{-E_{ij}-\lambda_{j\to i}-\lambda_{i\to j}}. \end{align}

After introducing new variables, we have \begin{align} P_i(\sigma_i)&\propto \prod_{j\in\partial i}\psi^{j\to i},\\ P_{ij}(\sigma_i,\sigma_j)&\propto e^{-E_{ij}}\psi^{j\to i}\psi^{i\to j}. \end{align}

where $\textstyle \{\psi_{i\to j}\}$ are cavity messages along the directed edges of the graph, and they can be determined using the Belief Propagation equation (also called Sum-Product equation): \begin{align} \psi_{i\to j}(\sigma_j)&=\frac{1}{Z_{i\to j}}\sum_{\sigma_i}e^{\beta E_{ij}(\sigma_i,\sigma_j)}\prod_{k\in\partial i\backslash j}\psi_{k\to i}(\sigma_i),\\ Z_{i\to j}&=\sum_{\sigma_j}\sum_{\sigma_i}e^{\beta E_{ij}(\sigma_i,\sigma_j)}\prod_{k\in\partial i\backslash j}\psi_{k\to i}(\sigma_i). \end{align}

## Computing macro observables \begin{align} F&=\sum_{i}\Delta F_i-\sum_{\langle ij\rangle}\Delta F_{ij},\\ \Delta F_i&=-\frac{1}{\beta}\log Z_i=-\frac{1}{\beta}\sum_{\sigma_i}\prod_{k\in\partial i\backslash j}\sum_{\sigma_k}e^{-\beta E_{ik}(\sigma_i,\sigma_k)}\psi_{\sigma_k}^{k\to i},\\ \Delta F_{ij}&=-\frac{1}{\beta}\log Z_{ij}=-\frac{1}{\beta}\sum_{\sigma_i,\sigma_j}e^{-\beta E_{ij}(\sigma_i,\sigma_j)}\psi_{\sigma_i}^{i\to j}\psi_{\sigma_j}^{j\to i}. \end{align}

## Deriving Belief Propagation using directly the conditional independence

We can actually try to solve the marginals directly by adding an additional spin i into a system with n spins. \begin{align} P_i(\sigma_i)&=\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k}\}}P_{N+1}(\sigma_i;\underline{\sigma_j};\underline{\sigma_k})\\ &=\frac{\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\frac{\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})}}{\sum_{\{\underline{\sigma_j} \}}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} } }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\frac{\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})}}{\sum_{\{\underline{\sigma_j} \}}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} } }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}P_N(\underline{\sigma_j},\underline{\sigma_k}) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}P_N{ (\underline{\sigma_j},\underline{\sigma_k})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}P_N(\underline{\sigma_j}) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}P_N{ (\underline{\sigma_j})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\prod_{j\in\partial i}P_N(\sigma_j) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\prod_{j\in\partial i}P_N{( {\sigma_j})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}\prod_{j\in\partial i}e^{ -\beta E_{i,j}(\sigma_i,\sigma_j)}\psi_{j\to i}(\sigma_j) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}\prod_{j\in\partial i}e^{ -\beta E_{i,j}(\sigma_i,{\sigma_j})}\psi_{j\to i}{( {\sigma_j})} }\\ &=\frac{\prod_{j\in\partial i}\sum_{ {\sigma_j} }e^{ -\beta E_{i,j}(\sigma_i,\sigma_j)}\psi_{j\to i}(\sigma_j) }{\sum_{\sigma_i}\prod_{j\in\partial i}\sum_{{\sigma_j} }e^{ -\beta E_{i,j}(\sigma_i,{\sigma_j})}\psi_{j\to i}{( {\sigma_j})} } \end{align}

## Computing macro observables

The Bethe free energy can be computed as \begin{align} F&=\sum_{i}\Delta F_i-\sum_{\langle ij\rangle}\Delta F_{ij},\\ \Delta F_i&=-\frac{1}{\beta}\log Z_i=-\frac{1}{\beta}\sum_{\sigma_i}\prod_{k\in\partial i\backslash j}\sum_{\sigma_k}e^{-\beta E_{ik}(\sigma_i,\sigma_k)}\psi_{\sigma_k}^{k\to i},\\ \Delta F_{ij}&=-\frac{1}{\beta}\log Z_{ij}=-\frac{1}{\beta}\sum_{\sigma_i,\sigma_j}e^{-\beta E_{ij}(\sigma_i,\sigma_j)}\psi_{\sigma_i}^{i\to j}\psi_{\sigma_j}^{j\to i}. \end{align}

Then average internal energy and entropy can be computed as \begin{align} E&=\sum_{i}\Delta E_i-\sum_{\langle ij\rangle}\Delta E_{ij},\\ &=\sum_{i}\frac{\partial(\beta F_i)}{\delta \beta}-\sum_{\langle ij\rangle}\frac{\partial (\beta\Delta F_{ij})}{\partial \beta}\\ S&=F-\beta E \end{align}

## Taking ensemble average using the Population Dynamics

If we use $\textstyle \mathcal F_{\textrm{RS}}(\{ P_{\sigma_k}^{k\to i} \}$ to denote the Replica Symmetric equations written in the previous sections, then distribution of messages in the directed edges sampled from ensemble of graphs can be written as $\mathcal P(P^{i\to j}_{\sigma_i})=\int \prod_{k\in\partial i\backslash j}dP_{\sigma_k}^{k\to i}\mathcal P(P_{\sigma_k}^{k\to i})\cdot\delta\left [ P^{i\to j}_{\sigma_i} - \mathcal F_{\textrm{RS}}(\{ P_{\sigma_k}^{k\to i} \}\right ] ).$

Then the macro observables like energy and entropy can be computed using the distribution of messages.