Ising模型与最大熵分布

来自集智百科
跳转到: 导航搜索

Definitions

n spins \textstyle\underline \sigma\in \{+1,-1\}^n are connected by couplings \textstyle J\in\mathcal{R}^{n\times n}.
Different realizations of \textstyle J_{ij} give different systems, for example

  • J_{ij}=\textrm{Constant}: Ferromagnets model or anti-ferromagnets.
  • J_{ij}=\mathcal{N}(0,\Delta): Sherrington-Kirkpatrick model, spin glasses.
  • J_{ij} \leftarrow Hebb's rule: Hopfield model, associative memories.
  • J_{ij} are learned from data: neural networks.


The energy of a configuration \underline\sigma is

E(\underline\sigma)=-\sum_{ij}J_{ij}\sigma_i\sigma_j-\sum_i\sigma_i\theta_i,

where \theta_i is the external field added on spin \textstyle i.
Note that In the whole discussions I would set the external field to zero, because this does not change quantitatively the results we are going to show, but significantly reduces the length of formulas :)

In the canonical ensemble, the probability of finding a configuration in the equilibrium at inverse temperature \beta follows the Boltzmann distribution:

P(\sigma)=\frac{1}{Z}e^{\sum_{ij}\beta J_{ij}\sigma_i\sigma_j},

where

Z=\sum_{\underline\sigma}e^{\sum_{ij}\beta J_{ij}\sigma_i\sigma_j}

is the partition function.
Notice that

  • There are totally 2^n configurations in the summation.
  • when \beta =0, every configuration has the identical Boltzmann weights, which is \textstyle 2^{-n}.
  • when  \beta\to\infty, only configurations having the lowest energy has finite probability measure.

Why Ising model?

In addition to physical motivations (phase transitions, criticality, ...), another reason that the Ising model is useful in model science and technique is that it is the Maximum entropy model given first two moments of observations. That is the distribution that make the least bias or claim to the observed data.

Suppose we have m configurations \textstyle \{\underline\sigma\}\in\{1,-1\}^{m\times n} that are sampled from the Boltzmann distribution of the model, then we can define the following statistics that can be observed from data:

  • Magnetization \textstyle m_i= \sum_{t=1}^m\sigma_i^t\langle \sigma_i\rangle\approx
  • Correlations \textstyle C_{ij}= \sum_{t=1}^m\sigma_i^t\sigma_j^t\approx\langle \sigma_i\sigma_j\rangle

Many distributions can be used to generate data with given first and second moments, suppose \textstyle P(\underline\sigma) is such a distribution. Then we can write out the entropy of the distribution as

S_p=-\sum_{\underline\sigma}P(\underline\sigma)\log P(\underline\sigma).

Of cause, there are constraints that need to be satisfied:

\sum_{\underline\sigma}P(\underline\sigma)=1
\forall i,\,\, \sum_{\underline\sigma}P(\underline\sigma)\sigma_i=m_i
\forall (i,j),\,\, \sum_{\underline\sigma}P(\underline\sigma)\sigma_i\sigma_j=C_{ij}.

We define a Lagrangian as

\mathcal {L}_P=-\sum_{\underline\sigma}P(\underline\sigma)\log P(\underline\sigma)+\sum_i\lambda_i\left (m_i-\sum_{\underline\sigma}P(\underline\sigma)\sigma_i\right )+\sum_{ij}\lambda_{ij}\left (C_{ij}-\sum_{\underline\sigma}P(\underline\sigma)\sigma_i\sigma_j\right )+\lambda \sum_{\underline\sigma}P(\underline\sigma)-1,

where \textstyle \{\lambda_i\}\,\,\{\lambda_{ij}\} are multipliers.

By setting \textstyle \frac{\partial\mathcal {L}_P}{\partial P}=0, we have

 -(\log P-1)+\sum_i\lambda_i\sigma_i+\sum_{ij}\lambda_{ij}\sigma_i\sigma_j+\lambda=0,

which yields

P(\sigma)=\frac{1}{Z}e^{\sum_{ij}\beta J_{ij}\sigma_i\sigma_j}.


相关wiki

个人工具
名字空间
操作
导航
工具箱