© 2018 Aaron A. King.

library(plyr)
library(reshape2)
library(magrittr)
library(ggplot2)
options(stringsAsFactors=FALSE)
theme_set(theme_bw())
set.seed(291807036)

Evolution of a trait controlled by a single genetic locus with two alleles

Here, we study evolution (i.e., how allele frequencies change with time) in a simple, but not trivial, case. We suppose that a trait is controlled by a single gene (a single locus, i.e., one place on one single chromosome) and that there are two alleles for this gene.

Neutral evolution, an individual-based model

We’ll start by making the simplest assumptions, then gradually make things more interesting. Specifically, we’ll begin by assuming that mating is random, that each coupling produces 2 offspring (one male and one female), and that all genotypes have the same fitness.

[Q: What is fitness, precisely?]

The following codes simulate multiple generations of random mating with neutral evolution. The plot shows how the frequencies of the three genotypes change through time.

popsize <- 100
freq <- c(AA=0.6,AB=0.0,BB=0.4)
tmax <- 50

femmes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)
hommes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)

mate <- function (femme, homme, noffspring = 1) {
  f <- sample(strsplit(femme,"")[[1]],size=noffspring,replace=TRUE)
  m <- sample(strsplit(homme,"")[[1]],size=noffspring,replace=TRUE)
  paste0(f,m) -> offspring
  offspring[offspring=="BA"] <- "AB"
  offspring
}

t <- seq(0,tmax)

g <- array(dim=c(length(t),3),dimnames=list(time=t,genotype=names(freq)))
n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
g[1,] <- n/sum(n)

for (k in seq(from=2,to=length(t))) {
  partners <- hommes[order(runif(n=length(hommes)))]
  hommes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=1)
  femmes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=1)
  n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
  g[k,] <- n/sum(n)
}

g %>% melt(value.name="frequency") %>%
  ggplot(aes(x=time,y=frequency,color=genotype,group=genotype))+
  geom_line()+
  expand_limits(y=c(0,1))

Task A.

  1. Study the codes above and annotate them with comments to explain what they are doing.
  2. Modify the codes above to study how the dynamics depend on (a) initial rarity of the B allele and (b) population size. Describe what you find.
  3. Are the dynamics consistent with the Hardy-Weinberg principle? Why or why not? Use your simulations to verify your conclusion.

Evolution with selection, an individual-based model

We now introduce selection by supposing that the different genotypes can vary in their survival. Specifically, each coupling will now produce exactly \(b\) females and \(b\) males. However, not all of these will survive. The relative probability of survival of genotype \(X\) will be \(w_X\). This is achieved by randomly selecting survivors from among the offspring of each generation so that the population size remains constant.

We take inspiration from the example of sickle-cell anemia, a heritable red blood cell disorder. The disease is caused by an abnormal hemoglobin, hemoglobin S, which is coded for by the HgbS allele. People who are heterozygous for this allele are said to have sickle cell trait. For the most part, heterozygotes show no symptoms, because they are able to produce enough normal hemoglobin to sustain normal functioning under most circumstances. Homozygotes, however, produce red blood cells which are fragile and susceptible to rupture. Also, due to abnormalities in the hemoglobin S structure, affected red blood cells occasionally collapse, which can lead to occlusion of blood vessels.

The following codes simulate evolution under this scenario.

popsize <- 100
freq <- c(AA=0.9,AB=0.1,BB=0)
b <- 2
w <- c(AA=1,AB=1,BB=0.7)
tmax <- 50

femmes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)
hommes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)

mate <- function (femme, homme, noffspring = 1) {
  f <- sample(strsplit(femme,"")[[1]],size=noffspring,replace=TRUE)
  m <- sample(strsplit(homme,"")[[1]],size=noffspring,replace=TRUE)
  paste0(f,m) -> offspring
  offspring[offspring=="BA"] <- "AB"
  offspring
}

t <- seq(0,tmax)

g <- array(dim=c(length(t),3),dimnames=list(time=t,genotype=names(freq)))
n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
g[1,] <- n/sum(n)

for (k in seq(from=2,to=length(t))) {
  partners <- hommes[order(runif(n=length(hommes)))]
  hommes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=b)
  femmes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=b)
  hommes <- sample(hommes,size=popsize/2,replace=FALSE,prob=w[hommes])
  femmes <- sample(femmes,size=popsize/2,replace=FALSE,prob=w[femmes])
  n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
  g[k,] <- n/sum(n)
}

g %>% 
  melt(value.name="frequency") %>%
  ggplot(aes(x=time,y=frequency,color=genotype,group=genotype))+
  geom_line()+
  expand_limits(y=c(0,1))

Task B.

  1. Study the codes above, paying special attention to the differences between these and the preceding ones (i.e., the neutral case). As before, annotate the code to explain what it is doing.
  2. By modifying the above codes, explore how the dynamics and the outcome depend on (a) initial rarity of the B allele and (b) population size.

The multinomial approximation

The above codes represent each and every individual in computer memory explicitly. Consequently, they are expensive when the population size is large. We can achieve much faster run-times with an approximation that tracks only the numbers of individuals of each genotype.

The following codes implement evolution with selection using the multinomial approximation.

popsize <- 100
freq <- c(AA=0.9,AB=0.1,BB=0)
w <- c(AA=1,AB=1,BB=0.7)
tmax <- 50

t <- seq(0,tmax)
N <- rmultinom(n=1,size=popsize,prob=freq)

g <- array(dim=c(length(t),3),dimnames=list(time=t,genotype=names(freq)))
g[1,] <- N/sum(N)

for (k in seq(from=2,to=length(t))) {
  f <- c(A=2*N[1]+N[2], B=N[2]+2*N[3])/popsize/2
  N <- rmultinom(n=1,size=popsize,prob=c(AA=w[1]*f[1]^2,AB=2*w[2]*f[1]*f[2],BB=w[3]*f[2]^2))
  g[k,] <- N/sum(N)
}

g %>% 
  melt(value.name="frequency") %>%
  ggplot(aes(x=time,y=frequency,color=genotype,group=genotype))+
  geom_line()+
  expand_limits(y=c(0,1))

Interestingly, in regions of the world where malaria is endemic, heterozygotes have a fitness advantage over normal homozygotes. This is because the rupturing of red blood cells disrupts the life cycle of the malaria parasite, which reduces the severity of disease symptoms. The HgbS homozygotes, unfortunately, are deprived of this benefit, and actually experience more severe malaria symptoms than normal homozygotes.

Task C.

  1. Study the codes above and annotate them as before.
  2. Modify these codes to explore how the evolutionary dynamics might change in the presence of malaria.
  3. Generate histograms showing plausible distributions of allele frequencies in regions where malaria is present and where it is absent.
  4. Show how these distributions depend on population size.

Underdominance

Underdominance refers to the situation where the heterozygote has a fitness that is depressed relative to that of either homozygote.

Task D.

  1. Modify the above codes (using the multinomial approximation) to explore the dynamics associated with underdominance. Specifically, quantify how (a) the magnitude of the heterozygote fitness depression and (b) the initial allele frequencies affect the evolutionary outcome.
  2. Explain your findings in biological terms.

Important Note: You will turn in each draft of your report via the course Canvas site. Upload both the Rmarkdown document and a PDF version of your report. Choose the filenames according to the following formula: Challenge_X-Y.Z where X is the challenge problem number, Y is the draft number (1, 2, or 3), and Z is the appropriate extension (.Rmd or .pdf).