© 2018 Aaron A. King.
It is frequently natural to formulate expected relationships among variables in terms of differential equations (DE). Simply put, such equations express the relationship between the values of variables and the rates at which those values are changing. This maps well onto the way we typically think about systems. The solution of a set of DE is the trajectory of the variables through time. Thus DE often naturally arise when we analyze dynamical data.
None but the very simplest DE have solutions that can be expressed in closed form, i.e., as simple expressions in terms of elementary functions. However, as a result of decades of intensive research, various highly reliable and accurate numerical algorithms exist for the approximation of solutions of DE. In this laboratory exercise, we will see how these numerical integration algorithms can be accessed in R and explore some uses of DE in statistical analysis.
We begin by introducing a simple DE model. We next learn how to obtain numerical solutions in R. Finally, we embed numerical integration in a nonlinear least squares context to obtain parameter estimates and perform inference.
The classical SIR compartmental model divides a population of hosts into three classes: susceptible, infected, recovered (see the diagram below). The model describes how the numbers of individuals in each of these classes changes with time. Births are modeled as flows from “nowhere” into the susceptible class; deaths are modeled as flows from the S, I, or R compartment into “nowhere”.
Diagram of the SIR compartmental model.
The host population is divided into three classes according to their infection status:
Let’s let \(S\), \(I\), and \(R\) refer to the numbers of hosts in the respective compartments. Then these state variables change according to the system of differential equations: \[\label{eq:sir} \begin{aligned} {\frac{\mathrm{d}^{}{S}}{\mathrm{d}{t}^{}}} &= B-\lambda\,S-\mu\,S\\ {\frac{\mathrm{d}^{}{I}}{\mathrm{d}{t}^{}}} &= \lambda\,S-\gamma\,I-\mu\,I\\ {\frac{\mathrm{d}^{}{R}}{\mathrm{d}{t}^{}}} &= \gamma\,I-\mu\,R\\ \end{aligned} \] Here, \(B\) is the crude birth rate (births per unit time), \(\mu\) is the death rate, and \(\gamma\) is the recovery rate. We’ll assume that the force of infection, \(\lambda\), has the form \[ \lambda = \beta\,\frac{I}{N} \] where \(N\) is the total population size (\(N=S+I+R\)), so that the risk of infection a susceptible faces is proportional to the prevalence (the fraction of the population that is infected). This is known as the assumption of frequency-dependent transmission.
Since these equations are nonlinear, it’s not surprising that one can’t solve them analytically. However, we can compute the trajectories of a continuous-time model such as this one by integrating the equations numerically. Doing this accurately involves a lot of calculation, and there are smart ways and not-so-smart ways of going about it. This very common problem has been very thoroughly studied by numerical analysts for generations. The result of all that work is that, when the equations are smooth, well-behaved functions, excellent numerical integration algorithms are readily available to compute approximate solutions to high precision. In particular, R has several sophisticated DE solvers which (for many problems) will give highly accurate solutions. These algorithms are flexible, automatically perform checks, and give informative errors and warnings. To use the numerical differential equation solver package, we load the deSolve package
library(deSolve)
The ODE solver we’ll use is called ode
. Have a look at the help page for this function.
?ode
The help page tells us that a call to ode
looks like
ode(y, times, func, parms,
method = c("lsoda", "lsode", "lsodes", "lsodar", "vode", "daspk",
"euler", "rk4", "ode23", "ode45", "radau",
"bdf", "bdf_d", "adams", "impAdams", "impAdams_d", "iteration"),
...)
In particular, ode
needs to know the initial values of the state variables (y
), the times
at which we want solutions, and the right-hand side of the ODE func
. The latter can optionally depend on some parameters (parms
). Finally, there are a large number of different algorithms (method
) that we can choose from; the default is the Livermore solver of ordinary differential equations with automatic method switching (LSODA). It’s an excellent, general purpose solver.
Let’s study the SIR model for a closed population, i.e., one in which we can neglect births and deaths. Putting \(B=\mu=0\) into the SIR equations, we obtain \[\label{eq:closed-sir}
\begin{aligned}
{\frac{\mathrm{d}^{}{S}}{\mathrm{d}{t}^{}}} &= -\frac{\beta\,S\,I}{N}\\
{\frac{\mathrm{d}^{}{I}}{\mathrm{d}{t}^{}}} &= \frac{\beta\,S\,I}{N}-\gamma\,I\\
{\frac{\mathrm{d}^{}{R}}{\mathrm{d}{t}^{}}} &= \gamma\,I\\
\end{aligned}
\] To encode these equations in a form suitable for use as the func
argument to ode
, we’ll need to write a function. For example:
closed.sir.model <- function (t, x, params) {
## first extract the state variables
S <- x[1]
I <- x[2]
R <- x[3]
## now extract the parameters
beta <- params["beta"]
gamma <- params["gamma"]
N <- S+I+R
## now code the model equations
dSdt <- -beta*S*I/N
dIdt <- beta*S*I/N-gamma*I
dRdt <- gamma*I
## combine results into a single vector
dxdt <- c(dSdt,dIdt,dRdt)
## return result as a list!
list(dxdt)
}
Note that the order and type of the arguments and output of this function must exactly match ode
’s expectations. Thus, for instance, the time variable t
must be the first argument even if, as is the case here, nothing in the function depends on time. [When the RHS of the ODE are independent of time, we say the ODE are autonomous; when they depend explicitly on time, they are said to be nonautonomous.] Note also, that ode
expects the values of the ODE RHS to be the first element of a list
.
Now we can call ode
to compute trajectories of the model, provided we have values for the parameters and the initial conditions, i.e., the values of the state variables \(S\), \(I\), and \(R\) at some initial time. If we’re thinking of a disease something like measles, and measuring time in years, we might use something like:
parms <- c(beta=400,gamma=365/13)
We now state the times at which we want solutions and specify the initial conditions:
times <- seq(from=0,to=60/365,by=1/365/4)
xstart <- c(S=999,I=1,R=0)
Next, we compute a model trajectory with the ode
command and store the result in a data-frame:
library(tidyverse)
ode(
func=closed.sir.model,
y=xstart,
times=times,
parms=parms
) %>%
as.data.frame() -> out
and plot the results using the commands:
out %>%
gather(variable,value,-time) %>%
ggplot(aes(x=time,y=value,color=variable))+
geom_line(size=2)+
theme_classic()+
labs(x='time (yr)',y='number of individuals')