The Ising model is arguably the most famous model in (condensed matter) physics. It is described by the simple Hamiltonian
\[ H = - J \sum_{\langle i,j \rangle} s_i s_j. \]
Here, the $s_i=\{-1,1\}$ are classical, binary magnetic moments (spins) sitting on a two-dimensional square lattice and the $\langle i,j \rangle$ indicates that only interactions betweens neighboring spins are taken into account. For simplicity, we will set $J=1$.
Most importantly, the Ising model shows a phase transition between a paramagnetic and a ferromagnetic phase as a function of temperature. The critical temperature $T_c$ at which this change of magnetic character occurs has been calculated exactly by Lars Onsager. He found
const IsingTc = 1/(1/2*log(1+sqrt(2))) # Exact Onsager solution
2.269185314213022
In this tutorial we aim to reproduce this result (roughly) using a simple Neural Network. Basically, what we'll do was worth a nature physics publication not so long ago!
We will start by quickly simulating the Ising model using the Monte Carlo method to obtain representative sets of spin configurations for a bunch of temperatures. Afterwards, we do not take the traditional approach of inspecting the magnetization, the order parameter of the transition, and its susceptibility. Instead we use supervised learning to train a simple Neural Network to automagically learn the transition temperature.
The Monte Carlo method for the Ising model is very straightforward: take a random configuration of spins to begin with and propose individual spin flips until you fall asleep. To decide whether a spin should be flipped we use the Metropolis criterium
\[ p = \min{\{1, e^{-\beta\Delta E}\}} \]
where $\Delta E = E' - E$ is the energy difference between the new (spin flipped) and the old configuration according to $H$ above and $\beta = 1/T$ is the inverse of the temperature $T$. Since $\Delta E$ only depends on the local environment of the spin to be flipped (nearest neighbors), we can evaluate it locally. Without going into details, here is a simple implementation of the Monte Carlo idea for the Ising model. It is not optimize by any means, but it will serve its purpose.
using Printf, Dates # functions to obtain neighbors of a given site i up(neighs, i) = neighs[1, i] right(neighs, i) = neighs[2, i] down(neighs, i) = neighs[3, i] left(neighs, i) = neighs[4, i] function montecarlo(; L, T) # set parameters & initialize nsweeps = 10^7 measure_rate = 5_000 beta = 1/T conf = rand([-1, 1], L, L) confs = Matrix{Int64}[] # storing intermediate configurations # build nearest neighbor lookup table lattice = reshape(1:L^2, (L, L)) ups = circshift(lattice, (-1,0)) rights = circshift(lattice, (0,-1)) downs = circshift(lattice,(1,0)) lefts = circshift(lattice,(0,1)) neighs = vcat(ups[:]',rights[:]',downs[:]',lefts[:]') start_time = now() println("Started: ", Dates.format(start_time, "d.u yyyy HH:MM")) # walk over the lattice and propose to flip each spin `nsweeps` times for i in 1:nsweeps # sweep for i in eachindex(conf) # calculate ΔE ΔE = 2.0 * conf[i] * (conf[up(neighs, i)] + conf[right(neighs, i)] + + conf[down(neighs, i)] + conf[left(neighs, i)]) # Metropolis criterium if ΔE <= 0 || rand() < exp(- beta*ΔE) conf[i] *= -1 # flip spin end end # store the spin configuration iszero(mod(i, measure_rate)) && push!(confs, copy(conf)) end end_time = now() println("Ended: ", Dates.format(end_time, "d.u yyyy HH:MM")) @printf("Duration: %.2f minutes", (end_time - start_time).value / 1000. /60.) # return the recorded spin configurations return confs end
montecarlo (generic function with 1 method)
When we run this code for a small system with $L=8$ ($L$ is the linear dimension of a square lattice of size $L \times L$), we obtain 2000 (approximately) representative spin configurations.
montecarlo(L=8, T=5)
Started: 5.Aug 2019 20:42 Ended: 5.Aug 2019 20:42 Duration: 0.28 minutes2000-element Array{Array{Int64,2},1}: [1 1 … -1 1; -1 1 … 1 1; … ; -1 -1 … -1 -1; -1 -1 … -1 -1] [1 -1 … -1 -1; -1 -1 … -1 -1; … ; -1 1 … 1 -1; -1 -1 … -1 -1] [-1 1 … -1 1; 1 1 … 1 -1; … ; -1 -1 … 1 -1; -1 -1 … 1 -1] [-1 1 … -1 1; -1 -1 … 1 1; … ; -1 -1 … -1 -1; -1 -1 … 1 1] [1 -1 … 1 1; 1 1 … -1 -1; … ; -1 1 … 1 1; 1 -1 … 1 1] [-1 -1 … 1 1; 1 -1 … -1 1; … ; -1 -1 … -1 1; -1 -1 … -1 -1] [1 -1 … 1 -1; -1 -1 … 1 -1; … ; 1 -1 … -1 -1; 1 -1 … 1 -1] [1 -1 … 1 -1; -1 1 … 1 -1; … ; 1 -1 … 1 1; -1 -1 … 1 1] [-1 -1 … 1 1; -1 -1 … -1 -1; … ; -1 1 … -1 -1; -1 1 … 1 1] [-1 -1 … -1 1; 1 1 … 1 1; … ; -1 1 … -1 -1; 1 1 … -1 -1] ⋮ [-1 -1 … 1 1; -1 -1 … -1 -1; … ; -1 -1 … -1 1; -1 -1 … -1 1] [-1 1 … 1 1; 1 1 … 1 -1; … ; 1 -1 … -1 -1; 1 1 … -1 -1] [1 1 … -1 1; 1 -1 … -1 -1; … ; -1 1 … 1 -1; -1 -1 … -1 -1] [-1 -1 … -1 -1; -1 1 … -1 1; … ; -1 -1 … -1 -1; -1 -1 … -1 -1] [-1 -1 … -1 -1; -1 1 … 1 -1; … ; -1 1 … 1 1; -1 -1 … 1 1] [-1 -1 … -1 1; -1 -1 … -1 -1; … ; -1 -1 … -1 1; -1 1 … -1 1] [1 -1 … -1 -1; 1 1 … -1 1; … ; 1 1 … 1 1; 1 1 … 1 1] [1 -1 … -1 1; -1 -1 … 1 1; … ; 1 -1 … 1 1; 1 1 … -1 1] [-1 1 … -1 1; 1 -1 … 1 1; … ; -1 -1 … 1 -1; 1 -1 … -1 -1]
Of course, to identify the ferromagnetic phase transition as a function of temperature, we'll need at least a few more temperatures. For example, let's look at these temperatures.
Ts = [1.189, 1.733, 2.069, 2.269, 2.278, 2.469, 2.822, 3.367]
8-element Array{Float64,1}: 1.189 1.733 2.069 2.269 2.278 2.469 2.822 3.367
This set contains the exact Onsager solution IsingTc
and a bunch of temperature around it as shown here:
# visualize temperatures using Plots vline(Ts, grid=false, axis=:x, framestyle=:origin, xlim=(minimum(Ts)-0.1, maximum(Ts)+0.1), size=(800,200), label="Ts") scatter!(Ts, fill(0, length(Ts)), color=:lightblue, label="") vline!([IsingTc], color=:red, label="Tc")
Alright, now that we are prepared let's run those simulations (takes about 4 minutes on my i5 desktop machine) and store the configurations in a T=>confs
dictionary.
confs = Dict{Float64, Array{Float64,3}}() # key: T, value: confs for T in Ts println("T = $T"); flush(stdout); c = montecarlo(L=8, T=T) confs[T] = cat(c..., dims=3) println("Done.\n") end
T = 1.189 Started: 5.Aug 2019 20:42 Ended: 5.Aug 2019 20:42 Duration: 0.33 minutesDone. T = 1.733 Started: 5.Aug 2019 20:42 Ended: 5.Aug 2019 20:43 Duration: 0.34 minutesDone. T = 2.069 Started: 5.Aug 2019 20:43 Ended: 5.Aug 2019 20:43 Duration: 0.33 minutesDone. T = 2.269 Started: 5.Aug 2019 20:43 Ended: 5.Aug 2019 20:43 Duration: 0.32 minutesDone. T = 2.278 Started: 5.Aug 2019 20:43 Ended: 5.Aug 2019 20:44 Duration: 0.32 minutesDone. T = 2.469 Started: 5.Aug 2019 20:44 Ended: 5.Aug 2019 20:44 Duration: 0.31 minutesDone. T = 2.822 Started: 5.Aug 2019 20:44 Ended: 5.Aug 2019 20:44 Duration: 0.31 minutesDone. T = 3.367 Started: 5.Aug 2019 20:44 Ended: 5.Aug 2019 20:45 Duration: 0.29 minutesDone.
How can we utilize machine learning to find $T_c$? Basically, we'll build upon a simple fact: even if we don't know the precise value of $T_c$, clearly, at high temperatures the system should be disorderd, i.e. paramagnetic, since temperature is wiggling our spins such that they cannot align. Similarly, for small $T$, where temperature fluctuations are frozen out, we should be in an ordered (ferromagnetic) phase, where the vast majority of spins points in the same direction.
Ok, but how does that help? In the spirit of supervised learning, this knowledge allows us to put phase labels on some of our configurations, namely those for the highest (Tright = 3.367
) and lowest (Tleft = 1.189
) values of $T$. We'll consider the following simple neural network.
The green neurons will be our input configurations. Note that we'll linearize our two-dimensional configurations, that is we'll just throw away the dimensionality information and take them each as a big, one-dimensional vector.
The red neurons are the hidden neurons. We'll only have a single hidden layer of 10 neurons (only showing 5 above) which is densely connected (n-to-n) to our input neurons.
The blue neurons are the output neurons which will indicate the network's confidence about whether we are in the paramagnetic or ferromagnetic phase (we'll use a softmax activation to assure that their values sum up to 1).
Our training strategy now is as follows. First, we feed the network the Tleft
configurations and, based on our knowledge that we should be ordered at this temperature, optimize the network parameters with respect to producing a 1-0 output in favor of the ferromagnetic phase. Then, we do the same for the Tright
configurations, where we expect to have a 0-1 prediction.
Afterwards, we fix the weights and biases of the network and ask it for every intermediate temperature: How confident are you that we are in the paramagnetic/ferromagnetic phase? The point of maximal confusion, i.e. 0.5-0.5, will be what we'll identify as $T_c$.
Let's get to it. We start by loading Flux.jl, the pure Julia machine-learning stack, and a few standard libraries.
using Flux using Flux: crossentropy, onecold, onehotbatch, params, throttle, @epochs using Statistics, Random
As mentioned above, we won't/can't feed the network two-dimensional input data but have to flatten the configurations. Also, to improve the data from our quick-and-dirty simulation above, we exploit the $Z_2$ symmetry of the Ising Hamiltonian $H$ which states that we can flip all spins collectively without changing the energy of the system ($s_i \rightarrow -s_i$). This way, we effectively double our dataset.
function flatten_and_Z2(confs, T) c = confs[T] cs = Float64.(reshape(c, (64,:))) # flatten space dimension cs = hcat(cs, -one(eltype(cs)) .* cs) # concatenate Z2 (spin flip) symmetry partners return cs end
flatten_and_Z2 (generic function with 1 method)
L = 8 Tleft = 1.189 Tright = 3.367 confs_left = flatten_and_Z2(confs, Tleft) confs_right = flatten_and_Z2(confs, Tright);
Since visualizations are always a good thing, let's visualize the configurations at the lowest and highest temperatures.
# visualize configurations printconfs(confs) = plot([heatmap(Gray.(reshape(confs[:,i], (L,L))), ticks=false) for i in 1:100:size(confs, 2)]...)
printconfs (generic function with 1 method)
printconfs(confs_left)
printconfs(confs_right)
Before we can start defining our model and training it we have to group our data into the (X, Y)
structure that Flux expects. Once again, to fake improve our data, we repeat the data we have 10 times.
# set up as training data neach = size(confs_left, 2) X = hcat(confs_left, confs_right) labels = vcat(fill(1, neach), fill(0, neach)) Y = onehotbatch(labels, 0:1) dataset = Base.Iterators.repeated((X, Y), 10); # repeat dataset 10 times
Alright. So far, we have only prepared our data. Let's finally define our neural network.
Random.seed!(123) m = Chain( Dense(L^2, 10, relu), # 10 hidden neurons Dense(10, 2), # two output neurons for paramagnetic vs ferromagnetic softmax)
Chain(Dense(64, 10, NNlib.relu), Dense(10, 2), NNlib.softmax)
Currently, the network parameters are random and the network has no predictive power. A good way to visualize this fact is a confidence plot.
# classify phases at all intermediate temperatures function confidence_plot() results = Dict{Float64, Vector{Float32}}() for T in Ts c = flatten_and_Z2(confs, T); results[T] = vec(mean(m(c), dims=2).data) end results = sort(results) p = plot(keys(results) |> collect, reduce(hcat, values(results))', marker=:circle, xlab="temperature", ylabel="CNN confidence", labels=["paramagnet", "ferromagnet"], frame=:box) plot!(p, [IsingTc, IsingTc], [0, 1], ls=:dash, color=:black, label="IsingTc") if (@isdefined IJulia) # "animation" in jupyter IJulia.clear_output(true) end display(p) end confidence_plot()
The network clearly has no idea in which phase we are. See what happens after training it for a couple of seconds.
# define cost-function loss(x, y) = crossentropy(m(x), y) # define optimizer opt = ADAM() # train for 100 epochs for i in 1:100 Flux.train!(loss, params(m), dataset, opt) end # confidence_plot()
Boom. The point of maximal confusion agrees with the exact Onsager solution remarkably well.
A couple of things have to be noted.
The quality of the result might depend a bit on where (at which temperatures) we train the network.
We are somewhat stretching things here as our system is tiny ($L=8$) and finite-size effects are expected. After all, the phase transition is a property of the infinite system.
Also, our simulation was quick-and-dirty and our configurations might not be actually representitive. We'd have to use cluster updates and run simulations longer to be safe.
Despite those points, we have seen that Monte Carlo + Machine Learning can be used to identify phase transitions in a physical system - a new field that is interesting and exciting!
Optional: In Jupyter notebooks and Juno, we can visualize the learning process by updating the confidence plot during learning via a callback (uncomment and run the following code).
# # Define a callback # evalcb = () -> begin # # @show(loss(X, Y)) # # @show(accuracy(X, Y)) # confidence_plot() # end # # Reset the network and the optimizer # Random.seed!(123) # m = Chain( # Dense(L^2, 10, relu), # Dense(10, 2), # softmax) # opt = ADAM() # # Train for 100 epochs (with "animation") # for i in 1:100 # Flux.train!(loss, params(m), dataset, opt, cb = throttle(evalcb, 50)) # end