Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC

To get started, we must install pomp and pompExamples. Run the following to do so.

install.packages("pomp",repos="http://R-Forge.R-Project.org")
install.packages("pompExamples",repos="http://kinglab.eeb.lsa.umich.edu/R")

POMPs in pomp

Partially observed Markov process (POMP) models

AKA: stochastic dynamical systems, (non-linear, non-Gaussian) state-space models, hidden Markov models.

CC-BY_NC
Schematic of the structure of a POMP showing causal relations and direction of information flow. The key perspective to keep in mind is that the model is to be viewed as the process that generated the data.

CC-BY_NC

Another POMP model schematic, showing dependence among model variables. State variables, x, at time t depend only on state variables at the previous timestep. Measurements, y, at time t depend only on the state at that time. Formally, Prob[xt|x,y]=Prob[xt|xt1] and Prob[yt|x,y]=Prob[yt|xt] for all x={x1,x2,,xT}, y={y1,y2,,yT}, and t=1,,T.


Example: the Ricker model

The deterministic skeleton

The Ricker map describes the deterministic dynamics of a simple population:

Nt+1=rNtexp(Nt)

Here, Nt is the population density at time t and r is a fixed value (a parameter), related to the population’s intrinsic capacity to increase. N is a state variable, r is a parameter. If we know r and the initial condition N0, the deterministic Ricker equation predicts the future population density at all times t=1,2,. We can view the initial condition, N0 as a special kind of parameter, an initial-value parameter.

Process noise

We can model process noise in this system by making the growth rate r into a random variable. For example, if we assume that the intrinsic growth rate is log-normally distributed, N becomes a stochastic process governed by

Nt+1=rNtexp(Nt+εt),εtnormal(0,σ),
where the new parameter σ is the standard deviation of the noise process ε.

Measurement error

Let’s suppose that the Ricker model is our model for the dynamics of a real population. However, we cannot know the exact population density at any time, but only estimate it through sampling.

Let’s model measurement error by assuming the measurements, yt, are Poisson with mean ϕNt:

ytPoisson(ϕNt)

In this equation,

  1. Nt is the true population density at time t,
  2. yt is the number of individuals sampled at time t,
  3. the choice of units for N is peculiar and depends on the parameters (e.g., N=log(r) is the equilibrium),
  4. the parameter ϕ is proportional to our sampling effort, and also has peculiar units.

Implementing the Ricker model in pomp

The R package pomp provides facilities for modeling POMPs, a toolbox of statistical inference methods for analyzing data using POMPs, and a development platform for implmenting new POMP inference methods. Let’s see how the model we’ve just described can be implemented in pomp. The file contains some daily measurements. Load and plot the data:

loc <- url("http://kinglab.eeb.lsa.umich.edu/SIMM/data/sample_data.csv")
dat <- read.csv(loc)
head(dat)
##   day   y
## 1   1   2
## 2   2  87
## 3   3   0
## 4   4  12
## 5   5 174
## 6   6   0
plot(y~day,data=dat,type='o')

The basic data-structure provided by pomp is the object of class pomp, or, for short, pomp object. It is a container that holds real or simulated data and a POMP model, possibly together with other information such as model parameters, that may be needed to do things with the model and data. We’ll construct a pomp object to hold the data and the Ricker model; we’ll do this in stages.

The call to construct a pomp object is, naturally enough, pomp. Documentation on this function can be had by doing ?pomp. Do class?pomp to get documentation on the pomp class. Learn about the various things you can do once you have a pomp object by doing methods?pomp and following the links therein. Read an overview of the package as a whole with links to its main features by doing package?pomp. A complete index of the functions in pomp is returned by the command library(help=pomp). Finally, the home page for the pomp project is ; there you have access to the compleete source code, manuals, mailing lists, etc.

require(pomp)
po <- pomp(dat,times="day",t0=0)

The times argument specifies that the column labelled “day” gives the measurement times; t0 is the “zero-time”, the time at which the state process will be initialized. One can examine the pomp object in various ways, and convert it into a data-frame:

plot(po)
print(po)
show(po)
time(po)
obs(po)
as.data.frame(po)

Adding in the deterministic skeleton

We can implement the Ricker model deterministic skeleton by writing a function that maps Nt to Nt+1 according to the Ricker model. For pomp to be able to use it, this has to be a function of a certain form.

ricker.skel <- function (x, t, params, ...) {
  r <- params["r"]
  N <- x["N"]
  unname(r*N*exp(-N))
  }

[Note the need to remove the names in the last line (unname). If we don’t do this, we have problems due to R’s efforts to guess the correct name for the result.] We can then add this to the pomp object:

po <- pomp(po,skeleton=ricker.skel,skeleton.type='map')

The skeleton.type argument tells pomp that the skeleton is a discrete-time dynamical system (a map) rather than a continuous-time system (a vectorfield).

With just the skeleton defined, we are in a position to compute the trajectories of the deterministic skeleton at any point in parameter space. For example,

traj <- trajectory(po,params=c(N.0=1,r=12))
class(traj)
## [1] "array"
dim(traj)
## [1]  1  1 50

Sometimes it’s useful to obtain the results as a data-frame:

traj <- trajectory(po,params=c(N.0=1,r=12),as.data.frame=TRUE)
head(traj)
##           N time traj
## 1 4.4145533    1    1
## 2 0.6409909    2    1
## 3 4.0518588    3    1
## 4 0.8455429    4    1
## 5 4.3561445    5    1
## 6 0.6705544    6    1
plot(N~time,data=traj,type='o')

A note on terminology

If we know the state, x(t0), of the system at time t0, it makes sense to speak about the entire trajectory of the system for all t>t0. This is true whether we are thinking of the system as deterministic or stochastic. Of course, in the former case, the trajectory is uniquely determined by x(t0), while in the stochastic case, only the probability distribution of x(t), t>t0 is determined. In pomp, to avoid confusion, we use the term “trajectory” exclusively to refer to trajectories of a deterministic process. Thus, the trajectory command iterates or integrates the deterministic skeleton forward in time, returning the unique trajectory determined by the specified parameters. When we want to speak about sample paths of a stochastic process, we use the term simulation. Accordingly, the simulate command always returns individual sample paths from the POMP. In particular, we avoid “simulating a set of differential equations”, preferring instead to speak of “integrating” the equatioins “computing trajectories”.

Adding in the process model simulator

We can put the stochastic Ricker model into the pomp object by writing a function that simulates one realization of the stochastic process, from an arbitary time t to t+1, given arbitrary states and parameters.

ricker.sim <- function (x, t, params, ...) {
  e <- rnorm(n=1,mean=0,sd=params["sigma"]) 
  c(N=unname(exp(log(params["r"])+log(x["N"])-x["N"]+e)))
  }

po <- pomp(po,rprocess=discrete.time.sim(step.fun=ricker.sim,delta.t=1))

The discrete.time.sim function generates a function suitable for use as the rprocess argument to pomp. It assumes that step.fun is a function like our ricker.sim that makes a single random draw from a stochastic state process. The time-step of the discrete-time process is delta.t, here, 1 day.

At this point, we have what we need to simulate the stochastic Ricker model.

sim <- simulate(po,params=c(N.0=1,r=12,sigma=1),states=TRUE)
class(sim)
## [1] "array"
dim(sim)
## [1]  1  1 50
sim <- simulate(po,params=c(N.0=1,r=12,sigma=1),states=TRUE,
                as.data.frame=TRUE)
plot(N~time,data=sim,type='o')
lines(N~time,data=traj,type='l',col='red')

Adding in the measurement model and parameters

We complete the specification of the POMP by specifying the measurement model. In accordance with the measurement model, we write

po <- pomp(po,measurement.model=y~pois(phi*N))

Note the syntax of the measurement.model argument: R must be able create a distribution function by prepending d or r to the right-hand side of the formula. In this case, dpois and rpois are distribution functions for the Poisson distribution.

Now we can simulate the whole POMP. First, let’s add some parameters to the pomp object:

po <- pomp(po,params=c(N.0=1,r=12,sigma=1,phi=10))
sim <- simulate(po)
class(sim)
## [1] "pomp"
## attr(,"package")
## [1] "pomp"
plot(sim)

sim <- simulate(po,nsim=3,as.data.frame=T)
head(sim)
##   time  y          N sim
## 1    1 92 9.08913451   1
## 2    2  0 0.02480737   1
## 3    3 15 1.27878180   1
## 4    4 67 5.52928081   1
## 5    5  1 0.20530943   1
## 6    6 11 1.02230493   1

Parameter transformations

It frequently happens that it helps to estimate parameters on a scale different from that on which they appear in the model. For example, the parameters r, σ, ϕ, and N0 in the Ricker model are constrained to be positive. When we estimate these parameters using numerical search algorithms, we must find some way to ensure that these constraints will be honored. A straightforward way to accomplish this is to transform the parameters so that they become unconstrained. To introduce some terminology, we want to transform the parameters from the natural scale to another, estimation scale, on which they will be unconstrained.

pomp provides facilities to make it easier to work with parameter transformations. In particular, we can optionally specify two transformation functions when we create a pomp object. One will take transform our parameters from the estimation scale to the natural scale. the other will invert that operation, transforming the parameters from the natural to the estimation scale. In the Ricker model example, the natural-scale parameters are r, σ, ϕ, and N0, all of which are constrained to be positive. We’ll log-transform these parameters to enforce this constrant.

pomp(
  po,
  parameter.transform=function(params,...) {
    exp(params)
    },
  parameter.inv.transform=function(params,...) {
    log(params)
    }
  ) -> po

Note that the function given to the parameter.transform argument transforms parameters to the natural scale while that given to the parameter.inv.transform argument transforms them from the natural scale. Operationally, the defining property of the natural scale is that this is the scale on which the elementary functions rprocess, rmeasure, dprocess, dmeasure, skeleton, and init.state expect to see the parameters.

The parameter transformations come into play in the coef and coef<- methods for getting and setting parameters. If we have a vector of parameters on the natural scale, one can set them as parameters of po by doing

coef(po) <- c(r=44,phi=10,sigma=0.3,N.0=7)

and get the parameters by

theta <- coef(po)

On the other hand, given parameter values on the estimation scale, one can set the parameters by

coef(po,transform=TRUE) <- c(r=3.8,phi=2,sigma=-1,N.0=3)

and get them by

coef(po,transform=TRUE)
##     r   phi sigma   N.0 
##   3.8   2.0  -1.0   3.0

There is also a function partrans that gives you access to the parameter transformation functions inside the pomp object. Do ?partrans to read the documentation.

To be clear: the parameter-setting expression

coef(po) <- theta

assumes that theta is a named vector of parameters on the natural scale, while

coef(po,transform=TRUE) <- theta

assumes that theta is on the estimation scale.

One can also get or set parameters individually or in groups:

coef(po)
##          r        phi      sigma        N.0 
## 44.7011845  7.3890561  0.3678794 20.0855369
coef(po,"sigma") <- 0.3
coef(po,"sigma",transform=TRUE)
##     sigma 
## -1.203973
coef(po,c("r","sigma"),transform=TRUE) <- c(2,0)
coef(po)
##         r       phi     sigma       N.0 
##  7.389056  7.389056  1.000000 20.085537

Note that it’s the user’s responsibility to ensure that the parameter.transform and parameter.inv.transform are actually mutually inverse. A simple (but not foolproof) test of this is

theta <- coef(po)
## theta is on the natural scale
identical(
  theta,
  partrans(po,
           partrans(po,theta,dir='inverse'),
           dir='forward')
  ) 
## [1] TRUE
theta <- coef(po,transform=TRUE)  
## theta is on the estimation scale
identical(
  theta,
  partrans(po,
           partrans(po,theta,dir='forward'),
           dir='inverse')
  ) 
## [1] TRUE

Example pomp objects

pomp comes with several examples built in. These can be loaded with the data command. For example,

pompExample(ricker)
## newly created object(s):
##  ricker

loads a pomp object just like the one we’ve just defined, except that its internals are implemented in C and so are substantially faster. Later on, we’ll learn how to write pomp objects using C.

Exercise

Reparameterize the Ricker model so that the units are standard by making the scaling of N explicit:

Nt+1=rNtexp(Nt/K).

Modify the pomp object we created above to reflect this reparameterization.

Modify the measurement model so that

ytnegbin(ϕNt,k),
i.e., yt is negative-binomially distributed with mean ϕNt and clumping parameter k. [See ?NegBinomial for documentation on the negative binomial distribution.]

Exercise

Create a pomp object encoding a partially-observed Hassell-May host-parasitoid model

Nt+1Poisson(λNt(1aPtk)k)
Pt+1binom(Nt,1(1aPtk)k)
ytbinom(Nt,ϕ)
ztbinom(Pt,ϕ)
where N and P are host and parasitoid numbers, respectively, and y, z are noisy observations of these variables. Simulate and compute trajectories to verify that your pomp object is correct.

Compartmental models in epidemiology

The SIR model

Below is a diagram of the SIR model. The host population is divided into three classes according to their infection status: S, susceptible hosts; I, infected (and infectious) hosts; R, recovered and immune hosts. Births result in new susceptibles and all individuals have a common death rate μ. The raw birth rate is B. If B=μN, then the population size remains constant, on average. The SI rate, λ, called the force of infection, depends on the number of infectious individuals according to λ(t)=βI/N. The IR, or recovery, rate is γ. The case reports, C, result from a process by which infections are recorded with probability ρ. Since diagnosed cases are treated with bed-rest and hence removed, infections are counted upon transition to R.

CC-BY_NC

Implementing compartmental models in pomp

Earlier, we saw how to implement the deterministic skeleton of a compartmental model. Let’s revisit that example.

baseurl <- "http://kinglab.eeb.lsa.umich.edu/ICTPWID/SaoPaulo_2015/"
url <- paste0(baseurl,"data/boarding_school_influenza.csv")
bbs <- read.csv(url,comment.char="#",colClasses=c(date="Date"))
bbs <- subset(bbs,status=="confined",select=-status)
bbs$day <- as.numeric(bbs$date-min(bbs$date))

pomp(
  dat,
  times="day",
  t0=0,
  skeleton.type="vectorfield",
  skeleton=function(t,x,params,...){
    N <- sum(x[c("S","I","R")])
    foi <- params["beta"]*x["I"]/N
    terms <- c(
      x["S"]*foi,
      x["I"]*params["gamma"]
      )
    terms <- unname(terms)
    c(
      S=-terms[1],
      I=terms[1]-terms[2],
      R=terms[2]
      )
    }
  ) -> flu.sir

With flu.sir as written, we are in a position to compute trajectories for any set of parameters. In fact, let’s put some parameters into the pomp object now:

flu.sir <- pomp(flu.sir,
                params=c(gamma=1/3,beta=3,S.0=1600,I.0=1,R.0=0,cases.0=0))

To fit the model to the data, however, we need to think about the process that actually generates the data. In other words, we need to model the measurement process. The data are new infections per day, but the SIR model as we’ve written doesn’t track new infections, only numbers of S, I, and R individuals. We can keep track of the new infections by adding a new state variable, cases, that accumulates the number of recoveries:

pomp(
  flu.sir,
  skeleton=function(t,x,params,...){
    N <- sum(x[c("S","I","R")])
    foi <- params["beta"]*x["I"]/N
    terms <- c(
      x["S"]*foi,
      x["I"]*params["gamma"]
      )
    terms <- unname(terms)
    c(
      S=-terms[1],
      I=terms[1]-terms[2],
      R=terms[2],
      cases=terms[2]                 # d(cases)/dt
      )
    }
  ) -> flu.sir

Note we’ve only changed the skeleton slot of flu.sir; the rest of it remains the same. Compute the trajectories and plot the cases by doing:

x <- trajectory(flu.sir,as.data.frame=TRUE,times=seq(0,100))
plot(cases~time,data=x,type='o')

This is the cumulative count of cases. We can obtain the number of cases per day by telling pomp that cases is an accumulator variable using the zeronames argument. This will result in the cases being reset to zero immediately after each observation.

pomp(flu.sir,zeronames="cases") -> flu.sir

x <- trajectory(flu.sir)

plot(range(time(flu.sir)),
     range(c(x["cases",,],obs(flu.sir))),
     type='n',xlab="time",ylab="cases")
lines(time(flu.sir),obs(flu.sir),col='black',type='o')
lines(time(flu.sir),x["cases",,],col='red')

Now that we’ve got a plausible deterministic skeleton, let’s flesh out the model by specifying the stochastic state process. We’ll use an Euler-multinomial approximation. First, we’ll need a function that takes a single Euler step.

sir.proc.sim <- function (x, t, params, delta.t, ...) {
  N <- sum(x[c("S","I","R")])
  foi <- params["beta"]*x["I"]/N
  c(
    reulermultinom(n=1,size=x["S"],rate=foi,dt=delta.t),
    reulermultinom(n=1,size=x["I"],rate=params["gamma"],dt=delta.t)
    ) -> trans
  x[c("S","I","R","cases")]+c(
    -trans[1],
    trans[1]-trans[2],
    trans[2],
    trans[2]
    )
  }

flu.sir <- pomp(flu.sir,rprocess=euler.sim(sir.proc.sim,delta.t=1/20))

Note that, in the above, the Euler step size has been set to 1/20 day.

It remains to think about the measurement model. To begin with, let’s assume that the measurement errors are binomially distributed.

pomp(
  flu.sir,
  measurement.model=reports~binom(size=cases,prob=rho),
  params=c(rho=0.5,gamma=1/3,beta=3,S.0=1600,R.0=0,I.0=1,cases.0=0)
  ) -> flu.sir

plot(simulate(flu.sir))

Once we start to try to estimate the parameters in this model, we’ll need to come to terms with the fact that there are positivity constraints on our parameters. As before, we can get around this by log-transforming them. The reporting probability, ρ, is constrained to be in (0,1), so let’s logit-transform that.

pomp(
  flu.sir,
  logvars=c("beta","gamma"),
  logitvars=c("rho"),
  parameter.inv.transform=function(params,logvars,logitvars,...){
    params[logvars] <- log(params[logvars])
    params[logitvars] <- qlogis(params[logitvars])
    params
    },
  parameter.transform=function(params,logvars,logitvars,...){
    params[logvars] <- exp(params[logvars])
    params[logitvars] <- plogis(params[logitvars])
    params
    }
  ) -> flu.sir

Again, note that the parameter.transform argument takes us to the natural scale (the parameter scale on which the skeleton and rprocess functions were written) and parameter.inv.transform takes us from the natural scale to the estimation scale. You will not find anything about the arguments logvars and logitvars in the documentation of pomp. Instead, you will find mention of pomp’s ... argument. “Unofficial” arguments given to pomp are not ignored: they are stored within the pomp object in such a way that they canbe used by the functions you write. In the above example, the parameter transformation functions use logvars and logitvars.

Our initial conditions have a special constraint. In particular, S(0)+I(0)+R(0) must equal the size of the population of the school. pomp allows us to choose an arbitrary parameterization of the initial conditions of the state process. We accomplish this via the initializer argument to pomp:

pomp(
  flu.sir,
  initializer=function(params,t0,...) {
    fracs <- params[c("S.0","I.0","R.0")]
    x <- unname(fracs/sum(fracs)*params["pop"])
    c(
      S=round(x[1]),
      I=round(x[2]),
      R=round(x[3]),
      cases=0
      )
    },
  logvars=c("beta","gamma","S.0","I.0","R.0"),
  params=c(rho=0.5,gamma=1/3,beta=3,S.0=1,I.0=0.0006,R.0=0,pop=1600)
  ) -> flu.sir

Other initial-condition parameterizations are possible, but this symmetric one seems to be particularly useful for compartmental models.

Examples of compartmental models provided with pomp

Three examples implementing the SIR model are provided with pomp. gillespie.sir uses the Gillespie algorithm for simulations. euler.sir uses the Euler-multinomial algorithm with a step-size of 1/52/20 yr. You can read about the euler.sir example in the “Introduction to pomp” vignette (vignette("intro_to_pomp")). The bbs example contains the data from the boarding school influenza outbreak together with an SIR model which is just a bit more complex than the one we constructed above. It has the deterministic skeleton

dSdt=μNμSλ(I)S
dIdt=λ(I)SγIμI
dRdt=γIμR
λ(I)=1N(ι+βI)

Here, ι represents “imported infections”. The process model simulator uses an Euler-multinomial algorithm with a 2 hr timestep. The stochastic simulator has extra-demographic stochasticity:

λ(t)=1N(ι+βIdWdt),
where W is an integrated gamma white noise process (He et al. 2010,Bretó and Ionides (2011)). The intensity of this process is set by the parameter beta.sd; setting beta.sd <- 0 turns the extra-demographic stochasticity off. Finally, the measurement model that comes with bbs is
Rnormal(ρC,1+σC)
where R are the reported cases and C is the true incidence. Load bbs and inspect its default parameters by doing

pompExample(bbs)
## newly created object(s):
##  bbs
coef(bbs)
##        gamma           mu         iota         beta      beta.sd 
##    0.3333333    0.0000000    0.0000000    1.4000000    0.0000000 
##          pop          rho        sigma          S.0          I.0 
## 1400.0000000    0.9000000    3.6000000    0.9990000    0.0010000 
##          R.0 
##    0.0000000

Monte Carlo methods

Let’s return to considering a general state-space model. As before, let y1,,yT be the data, x1,,xT be the states. Then the likelihood function is

L(θ)=Prob[y1:T|θ]=x1xTt=1TProb[yt|xt,θ]Prob[xt|xt1,θ]

i.e., computation of the likelihood requires summing over all possible values of the unobserved process at each time point. This is very hard to do, in general. Today, we will learn about Monte Carlo methods for evaluating this and other difficult integrals. An excellent technical reference on Monte Carlo techniques generally is Robert and Casella (2004).

Simulation

Simulation refers to the generation of random variables. The general problem of simulation is: given a probability distribution f, find a procedure that generates random draws from f. This is a very important problem in scientific computing and much thought and effort has gone into producing reliable simulators for many basic random variables. There are two basic ways of solving this problem:

  1. the transformation method and
  2. the rejection method

The transformation method

This method works for discrete or continuous scalar random variables. Let f be the probability distribution function we seek to draw from (the target distribution) and F be the cumulative distribution function, i.e., F(x)=xf(x)dx. Let F1(u)=inf{x:F(x)u} be the inverse of F. A basic fact is that, if Xf, then F(X)uniform(0,1). [Proof: If f(X)>0, Prob[F(X)u]=Prob[X<F1](u)=F(F1(u))=u.] This suggests that, if we can compute F1, we use the following algorithm to generate Xf:

  1. draw Uuniform(0,1).
  2. let X=F1(U).

The rejection method

The transformation method is very efficient in that we are guaranteed to obtain a valid Xf for every Uuniform(0,1) we generate. Sometimes, however, we cannot compute the inverse of the c.d.f., as required by the transformation method. Under such circumstances, the rejection method offers a less efficient, but more flexible, alternative. We’ll see how and why this method works.

Region D lies within region U. A random variable X is said to be uniformly distributed over a region D (Xuniform(D)) if, for any AD, Prob[XA]=area(A)/area(D).
Diagram of the rejection method. We propose points by drawing them uniformly from the area under the graph of Mg and accept them if they lie under the graph of f. The X-coordinate of the points are then distributed according to f.

First, let’s suppose that we wish to simulate Xuniform(D), where DU. If we know how to generate Yuniform(U), then we can simply do so until YD, at which point we take X=Y. Since for any AD,

Prob[XA]=Prob[YA]|YD=area(A)area(U)/area(D)area(U)=area(A)area(D),
it follows that Yuniform(D). This is akin to throwing darts. If the darts are thrown in such a way as to be equally likely to land anywhere in U, then those that do land in D are equally likely to land anywhere in D.

There is another, seemingly trivial fact, that underlies the rejection method and that Robert and Casella (2004) refer to as the fundamental theorem of simulation. Let h be an arbitary positive, integrable function, let D={(x,u):0uh(x)} be the area under the graph of h, and consider the random pair (X,U)uniform(D). What is the marginal distribution of X?

h(x)0du=h(x)

So h is the probability distribution function for X!

This suggests the following rejection method for simulating an arbitrary random variable. Let f be the target distribution and g be another distribution function (from which it is easy to simulate) (see Figure above). Let M be such that Mg(x)f(x) for all x. The following procedure simulates Xf.

  1. draw Yg and Uuniform(0,Mg(Y)).
  2. if Uf(Y), then let X=Y else repeat step 1.

Monte Carlo integration

Monte Carlo integration is a technique developed for evaluating integrals. First let’s look at the example of finding the area of a region, DRn, with complicated boundaries. Solution: find some region UD and throw darts at U. Count up the darts that land in D and reckon the area of D relative to U as the fraction of darts that land in D. Let Xk, k=1,2,,N be N random darts thrown at U.

area(D)area(U)=DdxUdx=UID(x)dxUdxNk=1ID(Xk)Nk=11=1Nk=1NID(Xk)

In the above, the indicator function ID is defined by

ID(x)={10xDxD

We’ve just shown that we can evaluate the integral of ID over U. More generally, we can use this technique to approximate the integral of any integrable function, h, on U.

Uh(x)dxUdx1Nk=1Nh(Xk)

Here, we’re relying on the assumption that the darts fall uniformly on U. More generally still, we can compute an expectation with respect to an arbitrary distribution using Monte Carlo. This leads to what is known as the fundamental theorem of Monte Carlo integration. Specifically, let f(x) be the probability distribution function for x and let Xk, k=1,,N be independent and distributed according to f. We speak of Xk as random samples from f. Let hN¯¯¯¯ be the empirical average of h:

hN¯¯¯¯=1Nk=1Nh(Xk).
Then hN¯¯¯¯ converges to E[h(X)] as N with probability 1. Thus
hN¯¯¯¯=1Nk=1Nh(Xk)E[h(X)]=h(x)f(x)dx.

Moreover, we can estimate the error in this approximation, because the empirical variance

vN=1N1k=1N(h(Xk)hN¯¯¯¯)2
approximates the true variance, Var[h(X)]=E[(h(X)E[h(X)])2]. This implies that the standard error on the approximation hN¯¯¯¯=E[h(X)] is
vNN
and that the error is approximately normally distributed:
hN¯¯¯¯E[h(X)]normal(0,vNN).

Importance sampling

Sometimes it is difficult to sample directly from the distribution of X. In this case, we can often make use of importance sampling, in which we generate random samples from another distribution (easier to simulate) and make the appropriate correction.

Specifically, suppose we wish to compute E[h(X)], where Xf, but that it’s difficult or impossible to draw random samples from f. Suppose g is a probability distribution from which it’s relatively easy to draw samples and let Ykg for k=1,N be i.i.d. random variables drawn from g. Notice that

E[h(X)]=h(x)f(x)dx=h(x)f(x)g(x)g(x)dx.
So we can use the Monte Carlo integration theorem to conclude that
E[h(X)]1Nk=1Nh(Yk)f(Yk)g(Yk).

Since E[f(Y)/g(Y)]=1, we can replace the importance sampling rule above with another, which is slightly less accurate but more precise. Let wk=f(Yk)/g(Yk) be the importance weights. Then we have

E[h(X)]h¯=wkh(Yk)wk.
The Monte Carlo variance associated with this estimate is
wk(h(Yk)h¯)2wk.

Obtaining accurate estimates requires some thought to the importance distribution g. Specifically, if the tails of g are lighter than those of f, the Monte Carlo variance will be inflated and the estimates can be unusable.

Likelihood for state-space models

We can imagine using Monte Carlo integration for computing the likelihood of a state space model,

L(θ)=Prob[y1:T|θ]=x1xTt=1TProb[yt|xt,θ]Prob[xt|xt1,θ].
Specifically, if we have some probabilistic means of proposing trajectories for the unobserved state process, then we could just generate a large number of these and approximate L(θ) by its Monte Carlo estimate. Specifically, we generate N trajectories of length T, xt,k, k=1,N, t=1,,T. Let wk denote the probability of proposing trajectory k. For each trajectory, we compute the likelihood of that trajectory
k(θ)=t=1TProb[yt|xt,k,θ]Prob[xt,k|xt1,k,θ]
Then by the Monte Carlo theorem, we have
L(θ)1Nk=1Nk(θ)wk.

How shall we choose our trajectories? One idea would be to choose them so as to simplify the computation. If we choose them such that

wk=t=1TProb[xt,k]|xt1,k,θ,
then we have
L(θ)1Nk=1Nk(θ)wk=1Nk=1NTt=1Prob[yt|xt,k],θProb[xt,k]|xt1,k,θTt=1Prob[xt,k]|xt1,k,θ=1Nk=1Nt=1TProb[yt|xt,θ]

This implies that if we generate trajectories by simulation, all we have to do is compute the likelihood of the data with given each trajectory and average.

Let’s go back to the British boarding school influenza outbreak to see what this would look like. Load a pomp object containing the boarding-school flu data by doing pompExample(bbs). The process model here is just the SIR model we discussed last time. The measurement model is

flutNegBin(ρCt,1/sigma2)
where Ct are the number of new cases occuring at day t, ρ is the reporting probability, and σ modulates the measurement error variance, which is
Var[flut|Ct]=ρCt+(σρCt)2.

require(pomp)
pompExample(bbs)
## newly created object(s):
##  bbs
plot(bbs)

Let’s generate a bunch of simulated trajectories at some particular point in parameter space.

coef(bbs)
##        gamma           mu         iota         beta      beta.sd 
##    0.3333333    0.0000000    0.0000000    1.4000000    0.0000000 
##          pop          rho        sigma          S.0          I.0 
## 1400.0000000    0.9000000    3.6000000    0.9990000    0.0010000 
##          R.0 
##    0.0000000
x <- simulate(bbs,nsim=60,states=T)
matplot(time(bbs),t(x["cases",,]),type='l',lty=1,
        xlab="time",ylab="infections",bty='l')

We can use the function dmeasure to evaluate the log likelihood of the data given the states, the model, and the parameters:

ell <- dmeasure(bbs,y=obs(bbs),x=x,times=time(bbs),
                params=parmat(coef(bbs),nrep=60),
                log=T)
dim(ell)
## [1] 60 14

According to the equation above, we should sum up the log likelihoods across time:

ell <- apply(ell,1,sum); ell
##  [1]      -Inf      -Inf      -Inf      -Inf      -Inf -98.49835 -94.53501
##  [8] -94.33567      -Inf      -Inf      -Inf -98.20784 -99.72224 -94.81910
## [15] -95.23027 -94.73599      -Inf      -Inf      -Inf -95.49592      -Inf
## [22]      -Inf -95.24240      -Inf      -Inf      -Inf      -Inf -94.68817
## [29]      -Inf      -Inf      -Inf      -Inf      -Inf      -Inf      -Inf
## [36]      -Inf      -Inf      -Inf      -Inf      -Inf      -Inf      -Inf
## [43] -96.49108      -Inf      -Inf      -Inf      -Inf      -Inf      -Inf
## [50]      -Inf      -Inf -95.70486      -Inf -94.61263 -94.61592      -Inf
## [57]      -Inf -94.96263      -Inf      -Inf

The variance of these estimates is very high and therefore the estimated likelihood is very imprecise:

loglik <- mean(ell); loglik
## [1] -Inf
loglik.se <- sd(ell)/sqrt(length(ell)); loglik.se
## [1] NaN

We are going to need very many simulations to get an estimate of the likelihood sufficiently precise to be of any use in parameter estimation or model selection.

What’s the problem? Essentially, way too many of the trajectories don’t pass near the data. Moreover, once a trajectory diverges from the data, it almost never comes back. This is a consequence of the fact that we are proposing trajectories in a way that is completely unconditional on the data. The problem will only get worse with longer data sets!

Sequential Monte Carlo

We can arrive at a more efficient algorithm by factorizing the likelihood in a different way.

L(θ)=Prob[y1:T|θ]=tProb[yt|y1:t1,θ]=txtProb[yt|xt,θ]Prob[xt|y1:t1,θ]
Now the Markov property gives us that
Prob[xt|y1:t1,θ]=xt1Prob[xt|xt1,θ]Prob[xt1|y1:t1,θ]
and Bayes’ theorem tells us that
Prob[xt|y1:t,θ]=Prob[xt|yt,y1:t1,θ]=Prob[yt|xt,θ]Prob[xt|y1:t1,θ]xtProb[yt|xt,θ]Prob[xt|y1:t1,θ].

This suggests that we keep track of two key distributions. We’ll refer to the distribution of xt given y1:t1 as the prediction distribution at time t and the distribution of xt given y1:t as the filtering distribution at time t.

Let’s use Monte Carlo techniques to estimate the sums. Suppose xFt1,k, k=1,,N is a set of points drawn from the filtering distribution at time t1. We obtain a sample xPt,k of points drawn from the prediction distribution at time t by simply simulating the process model:

xPt,kprocess(xFt1,k,θ),k=1,,N.
Having obtained xPt,k, we obtain a sample of points from the filtering distribution at time t by resampling from xPt,k with weights Prob[yt]|xt,θ. The Monte Carlo theorem tells us, too, that the conditional likelihood
t(θ)=Prob[yt|y1:t1,θ]=xtProb[yt|xt,θ]Prob[xt|y1:t1,θ]1NkProb[yt|xPt,k,θ].
We can iterate this procedure through the data, one step at a time, alternately simulating and resampling, until we reach t=T. The full log likelihood is then approximately
logL(θ)=tlogt(θ).

This is known as the sequential Monte Carlo algorithm or the particle filter. Key references include Kitagawa (1987), Arulampalam et al. (2002) and the book Doucet et al. (2001).

Sequential Monte Carlo in pomp

In pomp, the basic particle filter is implemented in the command pfilter. We must choose the number of particles to use by setting the Np argument.

pf <- pfilter(bbs,Np=1000)
logLik(pf)
## [1] -96.79192

We can run a few particle filters to get an estimate of the Monte Carlo variability:

pf <- replicate(n=10,pfilter(bbs,Np=1000))
ll <- sapply(pf,logLik)
logmeanexp(ll,se=TRUE)
##                        se 
## -96.71380549   0.05242974

Note that we’re careful here to counteract Jensen’s inequality. The particle filter gives us an unbiased estimate of the likelihood, not of the log-likelihood.

To get an idea of what the likelihood surface looks like in the neighborhood of the default parameter set supplied by bbs, we can construct some likelihood slices. We’ll make slices in the β and γ directions. Both slices will pass through the default parameter set.

require(pomp)
pompExample(bbs)

center <- coef(bbs)

p <- sliceDesign(
  coef(bbs),
  beta=rep(seq(from=0.5,to=8,length=20),each=3),
  gamma=rep(seq(from=0.05,to=2,length=20),each=3)
  )
p$loglik <- NA

for (k in seq_len(nrow(p))) {
  pf <- pfilter(bbs,params=unlist(p[k,]),Np=1000)
  p$loglik[k] <- logLik(pf)
  }
require(ggplot2)
require(grid)

pl1 <- ggplot(
  data=subset(p,slice=="beta"),
  mapping=aes(x=beta,y=loglik)
  )+
  geom_point()+
  geom_vline(xintercept=center["beta"],color='red')+
  geom_smooth(method="loess",span=0.25)+
  xlab("beta")

pl2 <- ggplot(
  data=subset(p,slice=="gamma"),
  mapping=aes(x=gamma,y=loglik)
  )+
  geom_point()+
  geom_vline(xintercept=center["gamma"],color='red')+
  geom_smooth(method="loess",span=0.25)+
  xlab("gamma")

print(pl1,vp=viewport(x=unit(0.25,"npc"),width=unit(0.5,"npc")))
print(pl2,vp=viewport(x=unit(0.75,"npc"),width=unit(0.5,"npc")))

Exercise

Construct likelihood slices on σ, ρ, and I0, the latter being the fraction of infectives on day 0.

Clearly, the default parameter set is not particularly close to the MLE. One way to find the MLE is to try optimizing the estimated likelihood directly. There are many optimization algorithms to choose from, and many implemented in R.

Two issues arise immediately. First, the particle filter gives us a stochastic estimate of the likelihood. We can reduce this variability by making Np larger, but we cannot make it go away. If we use a deterministic optimizer (i.e., one that assumes the objective function is evaluated deterministically), then we must control this variability somehow. For example, we can fix the seed of the pseudo-random number generator. A side effect will be that the objective function becomes jagged, marked by many small local knolls and pits. Alternatively, we can use a stochastic optimization algorithm, with which we will be only be able to obtain estimates of our MLE. This is the trade-off between a noisy and a rough objective function. Second, because the particle filter gives us just an estimate of the likelihood and no information about the derivative, we must choose an algorithm that is “derivative-free”. There are many such, but we can expect less efficiency than would be possible with derivative information. Note that finite differencing is not an especially promising way of constructing derivatives. The price would be a n-fold increase in cpu time, where n is the dimension of the parameter space. Also, since the likelihood is only estimated, we would expect the derivative estimates to be noisy.

Here, let’s opt for deterministic optimization of a rough function. We’ll try using optim’s default method: Nelder-Mead.

require(pomp)
pompExample(bbs)
## newly created object(s):
##  bbs
set.seed(837065482)

neg.ll <- function (par, est, ...) {
  ## parameters to be estimated are named in 'est'
  allpars <- coef(bbs,transform=TRUE)
  allpars[est] <- par
  try(
    pfilter(
      bbs,
      params=partrans(bbs,allpars,dir="forward"),
      ...
      )
    ) -> pf
  if (inherits(pf,"try-error")) {
    1e10 ## a big number
    } else {
      -logLik(pf)
      }
  }
require(plyr)
## use Nelder-Mead with fixed RNG seed
fit <- optim(
  par=c(-1.1, 0.33, 2.2, 1.28),
  est=c("gamma","beta","rho","sigma"),
  Np=200,
  fn=neg.ll,
  seed=3488755,
  method="Nelder-Mead",
  control=list(maxit=400,trace=0)
  )

mle <- bbs
coef(mle,c("gamma","beta","rho","sigma"),trans=T) <- fit$par
coef(mle,c("gamma","beta","rho","sigma")) ## point estimate
##     gamma      beta       rho     sigma 
## 0.5125661 2.0198047 0.9681525 0.2501303
dat <- as.data.frame(bbs)
dat$type <- "data"

sims <- simulate(mle,nsim=10,as.data.frame=TRUE)
sims$type <- "simulation"
sims <- merge(dat,sims,all=T)

lls <- replicate(n=5,logLik(pfilter(mle,Np=200)))
ll <- logmeanexp(lls,se=TRUE)
ll.se <- ll[2]
ll <- ll[1]

Some simulations at these parameters are shown next:

require(ggplot2)
pl <- ggplot(data=sims,
             mapping=aes(x=time,y=reports,group=sim,color=type)
             )+
  geom_line()+
  geom_text(x=9,y=320,
            label=paste("log~L==",round(ll,1),"%+-%",round(ll.se,1)),
            color="black",
            parse=T,hjust=0)
print(pl)

Exercise

Use simulated annealing to maximize the likelihood. Be sure to try several starting guesses.

Exercise

Construct likelihood slices on β and γ through the MLE you found above.

Iterated Filtering

Basics

Iterated filtering is an algorithm based on sequential Monte Carlo that shares many features of the Liu and West (2001) method. However, it is a likelihood maximization method and so is typically used in a frequentist mode, thus avoiding many of the complexities associated with Bayesian inference. Specifically, iterated filtering is an algorithm for maximizing the likelihood of a partially observed Markov process (POMP) model computed via sequential Monte Carlo (AKA particle filtering). The basic idea of iterated filtering is the following. We wish to compute the likelihood L(θ) given time-series data y1:T and a POMP model, M. As in the algorithm of Liu and West (2001), we replace M by a related model M in which the fixed parameters θ are replaced by time-varying parameters θ(t). Specifically, at each observation time, we perturb the parameters with a random perturbation: we draw θ(t+1)normal(θ(t),σ), so that E[θ(t+1)]=θ(t) and Var[θ(t+1)|θ(t)]=σ2. Effectively M is just like M except that the parameters θ are not fixed but randomly walk in the parameter space. If σ is small, then M is very close to M but the likelihood computation will be easier. This is true for three reasons:

  1. As in the Liu and West (2001) algorithm, the additional variability added to the parameters helps combat particle depletion.
  2. The extra variability added to M relative to M has the effect that the likelihood surface of M is smoother than that of M, and hence is easier to maximize.
  3. In a particle-filter context, the added variability to the parameters allows us to construct an approxmation to the gradient of the likelihood at the same time we compute its value, with negligible additional computational cost.

The idea will be to use this approximate gradient to climb the likelihood surface, while at the same time reducing the added variability so that M converges to M.

The iterated filtering algorithm

In the method of Liu and West (2001), we make one particle filtering pass through the data. In iterated filtering we make many passes. After each pass, we obtain an estimate of the likelihood and its gradient. We take a step of judicious size in the direction of the gradient, reduce the added variability, σ, and repeat. In pseudocode:

  1. Choose the algorithmic parameters: the initial random-walk s.d. σ, the cooling factor 0<α<1, a variance factor C, the number of particles B, and the total number of iterations, N.
  2. Provide a guess for the MLE; call it θ(0).
  3. For k=0,,N1, repeat the following steps:

    1. Draw an initial set of B particles, θ(k)i(0)normal(θ(k),Cσαk), i=1,,B.
    2. Initialize the state process: let xi(0), be the initial condition corresponding to θ(k)i(0), i=1,,B.
    3. Do particle filtering on the random-parameters model: for t=1,,T, do

      1. Simulate the state process: for i=1,,B, draw
        xi(t)rprocess(X(t)X(t1)=xi(t1),θ(k)i(t1))
      2. Evaluate the likelihood of each particle: for i=1,,B, let
        wi=dprocess(Y(t)=ytX(t)=xi(t),θ(k)i(t1))
      3. Let t=log1Bwi be the SMC estimate of the conditional log likelihood logProb[yt|y1:(t1)].
      4. Compute the prediction mean, prediction variance, and filtering mean of the parameters:
        PM(k)(t)=1Biθ(k)i(t1)

        PV(k)(t)=1B1i(θ(k)i(t1)PM(k)(t))2+(σαk)2

        FM(k)(t)=iwiθ(k)i(t1)iwi

      5. Resample the particles with replacement according to their weights.
      6. Perturb each of the particles: for i=1,,B, draw
        θ(k)i(t)normal(θ(k)i(t1),σαk)

    4. Compute the approximate gradient of the likelihood according to the MIF formula:
      logL(θ(k))G(θ(k),σαk)t=1TPV(k)(t)1(FM(k)(t)FM(k)(t1))
    5. Take a step in that direction: let
      θ(k+1)=θ(k)+(1+C2)(σαk)2G(θ(k),σαk)
  4. Take θ(N) to be the MLE.

The iterated filtering theorem states that, if θ(k) converges, it converges to a local maximum of the likelihood.

Remark

We’ve assumed that the observation times t are integers; this is just for notational convenience. Nothing stops us from using arbitrary, even irregularly-spaced sequence of observation times.

Remark

The random-walk s.d. can (and typically should) be different for each parameter. One can take σ to be a diagonal matrix of random-walk s.d.s.

Remark

In the above, we draw the initial set of particles from a normal distribution; one could use another distribution. Likewise, the assumption that the parameter perturbation be normal is not needed.

Remark

The basic MIF algorithm described above does not work particularly well for initial condition parameters. This is because the information pertaining to such parameters is not evenly distributed through the data. Rather, it is concentrated in the first part of the time series. As implemented in pomp, mif treats initial condition parameters differently. Specifically, for initial condition parameters, step 3e is replaced by:
3e. Let θ(k+1)=FM(k)(TIC).

That is, we update the initial condition parameters at each MIF iteration by their filtering mean at some initial condition lag TIC. This is called fixed-lag smoothing of the initial condition parameters.

Remark

The cooling factor we use here is geometric (αk); there is some argument to be made for slower cooling schedules.

Remark

We might want to let the number of particles change through time; there is nothing stopping us from doing this and there are advantages to be gained.

Remark

There are other reasonable choices for step 3e. In particular, one might instead take θ(k+1)=FM(k)(T).

Ionides et al. (2006) first present the method; a thorough development of its theoretical foundation is given by Ionides et al. (2011). Applications of the method include studies of cholera (King et al. 2008,Bretó et al. (2009)), measles (He et al. 2010), and malaria (Laneri et al. 2010). Although the applications have for the most part been to infectious disease systems, this does not reflect any limitation of the method.

Iterated filtering in practice

In pomp, iterated filtering is implemented via the mif command. The following shows the basic usage. Here, we are trying to maximize the likelihood for the Ricker model example.

require(pomp)
pompExample(ricker)
## newly created object(s):
##  ricker
set.seed(1251928907L)
dat <- window(ricker,end=20)
estpars <- c("r","sigma","phi")
coef(dat)
##        r    sigma      phi      N.0      e.0 
## 44.70118  0.30000 10.00000  7.00000  0.00000
guess <- c(r=exp(5),sigma=1,phi=20,N.0=7,e.0=0)
mf <- mif(dat,start=guess,
          transform=TRUE,
          Nmif=100,Np=1000,
          cooling.factor=0.99,var.factor=2,ic.lag=3,
          rw.sd=c(r=0.1,sigma=0.1,phi=0.1))

We can have a look at the results of this by doing

plot(mf)

Since we seem to be still climbing, let’s continue:

mf <- continue(mf,Nmif=300)

To really see if we’re converging, we should run several MIFs from different starting guesses.

replicate(
  n=5,
  {
    guess <- coef(dat)
    guess[estpars] <- rlnorm(
      n=3,
      meanlog=log(guess[estpars]),
      sdlog=0.5
      )
    mif(
      dat,
      Nmif=300,
      start=guess,
      transform=TRUE,
      pars=estpars,
      Np=1000,
      var.factor=2,
      ic.lag=3,
      cooling.factor=0.99,
      rw.sd=c(r=0.1,sigma=0.1,phi=0.1)
      )
    }
  ) -> mf

We can then compare the results using compare.mif(mf). Here’s a prettier version of the results.

To compute the likelihoods at these putative MLEs, we’ll need to run pfilter. As before, we’ll want to do this several times to get an idea of the Monte Carlo error in the likelihood computation.

pf <- replicate(n=10,lapply(mf,pfilter))
ll <- lapply(pf,logLik); dim(ll) <- dim(pf)
ll <- apply(ll,1,
            function(x){
              x <- as.numeric(x)
              m <- mean(x)
              loglik <- m+log(mean(exp(x-m)));
              se <- sd(exp(x-m))/exp(loglik-m)
              c(loglik=loglik,loglik.se=se)
              }
            )
mletable <- rbind(sapply(mf,coef,c("r","sigma","phi")),ll)
colnames(mletable) <- paste("mif",seq_along(mf))
truth <- coef(dat,c("r","sigma","phi"))
ll.truth <- replicate(n=10,logLik(pfilter(dat,Np=1000)))
m <- mean(ll.truth)
loglik <- m+log(mean(exp(ll.truth-m)))
se <- sd(exp(ll.truth-m))/exp(loglik-m)
signif(cbind(mletable,truth=c(truth,loglik,se)),digits=3)
##              mif 1   mif 2    mif 3    mif 4    mif 5   truth
## r          45.5000  45.500  45.5000  45.7000  45.7000  44.700
## sigma       0.0397   0.036   0.0391   0.0399   0.0445   0.300
## phi        10.1000  10.100  10.2000  10.2000  10.2000  10.000
## loglik    -51.0000 -50.700 -50.7000 -50.9000 -51.0000 -57.200
## loglik.se   0.2540   0.323   0.2740   0.2840   0.3100   0.157

Exercise

Experiment with estimating SIR model parameters for the British boarding school data using iterated filtering. Try using 500–1000 particles (Np), a cooling factor of 0.999, a variance multiplier of 1–4 (var.factor), and random-walk intensities (rw.sd) on the order of 0.01-0.02. The lag for the initial-value parameter smoother can be set to the length of the data set (ic.lag=14).

References

Arulampalam, M. S., S. Maskell, N. Gordon, and T. Clapp. 2002. A tutorial on particle filters for online nonlinear, non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50:174–188.

Bretó, C., and E. L. Ionides. 2011. Compound markov counting processes and their applications to modeling infinitesimally over-dispersed systems. Stochastic Processes and their Applications 121:2571–2591.

Bretó, C., D. He, E. L. Ionides, and A. A. King. 2009. Time series analysis via mechanistic models. Annals of Applied Statistics 3:319–348.

Doucet, A., N. de Freitas, and N. Gordon, editors. 2001. Sequential Monte Carlo Methods in Practice. Springer-VerlagNew York.

He, D., E. L. Ionides, and A. A. King. 2010. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. Journal of The Royal Society Interface 7:271–283.

Ionides, E. L., A. Bhadra, Y. Atchadé, and A. A. King. 2011. Iterated filtering. Annals of Statistics 39:1776–1802.

Ionides, E. L., C. Bretó, and A. A. King. 2006. Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the U.S.A. 103:18438–18443.

King, A. A., E. L. Ionides, M. Pascual, and M. J. Bouma. 2008. Inapparent infections and cholera dynamics. Nature 454:877–880.

Kitagawa, G. 1987. Non-Gaussian state-space modeling of nonstationary time series. Journal of the American Statistical Association 82:1032–1041.

Laneri, K., A. Bhadra, E. L. Ionides, M. Bouma, R. C. Dhiman, R. S. Yadav, and M. Pascual. 2010. Forcing versus feedback: epidemic malaria and monsoon rains in northwest India. PLoS Computational Biology 6:e1000898.

Liu, J., and M. West. 2001. Combining Parameter and State Estimation in Simulation-Based Filtering. Pages 197–224 in A. Doucet, N. de Freitas, and N. J. Gordon, editors. Sequential Monte Carlo methods in practice. Springer, New York.

Robert, C. P., and G. Casella. 2004. Monte Carlo Statistical Methods. 2nd editions. Springer-Verlag.