© 2018 Aaron A. King.
library(plyr)
library(reshape2)
library(magrittr)
library(ggplot2)
options(stringsAsFactors=FALSE)
theme_set(theme_bw())
set.seed(291807036)
Here, we study evolution (i.e., how allele frequencies change with time) in a simple, but not trivial, case. We suppose that a trait is controlled by a single gene (a single locus, i.e., one place on one single chromosome) and that there are two alleles for this gene.
We’ll start by making the simplest assumptions, then gradually make things more interesting. Specifically, we’ll begin by assuming that mating is random, that each coupling produces 2 offspring (one male and one female), and that all genotypes have the same fitness.
[Q: What is fitness, precisely?]
The following codes simulate multiple generations of random mating with neutral evolution. The plot shows how the frequencies of the three genotypes change through time.
popsize <- 100
freq <- c(AA=0.6,AB=0.0,BB=0.4)
tmax <- 50
femmes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)
hommes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)
mate <- function (femme, homme, noffspring = 1) {
f <- sample(strsplit(femme,"")[[1]],size=noffspring,replace=TRUE)
m <- sample(strsplit(homme,"")[[1]],size=noffspring,replace=TRUE)
paste0(f,m) -> offspring
offspring[offspring=="BA"] <- "AB"
offspring
}
t <- seq(0,tmax)
g <- array(dim=c(length(t),3),dimnames=list(time=t,genotype=names(freq)))
n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
g[1,] <- n/sum(n)
for (k in seq(from=2,to=length(t))) {
partners <- hommes[order(runif(n=length(hommes)))]
hommes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=1)
femmes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=1)
n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
g[k,] <- n/sum(n)
}
g %>% melt(value.name="frequency") %>%
ggplot(aes(x=time,y=frequency,color=genotype,group=genotype))+
geom_line()+
expand_limits(y=c(0,1))
We now introduce selection by supposing that the different genotypes can vary in their survival. Specifically, each coupling will now produce exactly \(b\) females and \(b\) males. However, not all of these will survive. The relative probability of survival of genotype \(X\) will be \(w_X\). This is achieved by randomly selecting survivors from among the offspring of each generation so that the population size remains constant.
We take inspiration from the example of sickle-cell anemia, a heritable red blood cell disorder. The disease is caused by an abnormal hemoglobin, hemoglobin S, which is coded for by the HgbS allele. People who are heterozygous for this allele are said to have sickle cell trait. For the most part, heterozygotes show no symptoms, because they are able to produce enough normal hemoglobin to sustain normal functioning under most circumstances. Homozygotes, however, produce red blood cells which are fragile and susceptible to rupture. Also, due to abnormalities in the hemoglobin S structure, affected red blood cells occasionally collapse, which can lead to occlusion of blood vessels.
The following codes simulate evolution under this scenario.
popsize <- 100
freq <- c(AA=0.9,AB=0.1,BB=0)
b <- 2
w <- c(AA=1,AB=1,BB=0.7)
tmax <- 50
femmes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)
hommes <- sample(c("AA","AB","BB"),prob=freq,replace=TRUE,size=popsize/2)
mate <- function (femme, homme, noffspring = 1) {
f <- sample(strsplit(femme,"")[[1]],size=noffspring,replace=TRUE)
m <- sample(strsplit(homme,"")[[1]],size=noffspring,replace=TRUE)
paste0(f,m) -> offspring
offspring[offspring=="BA"] <- "AB"
offspring
}
t <- seq(0,tmax)
g <- array(dim=c(length(t),3),dimnames=list(time=t,genotype=names(freq)))
n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
g[1,] <- n/sum(n)
for (k in seq(from=2,to=length(t))) {
partners <- hommes[order(runif(n=length(hommes)))]
hommes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=b)
femmes <- mapply(mate,femmes,partners,USE.NAMES=FALSE,noffspring=b)
hommes <- sample(hommes,size=popsize/2,replace=FALSE,prob=w[hommes])
femmes <- sample(femmes,size=popsize/2,replace=FALSE,prob=w[femmes])
n <- table(factor(c(femmes,hommes),levels=c("AA","AB","BB")))
g[k,] <- n/sum(n)
}
g %>%
melt(value.name="frequency") %>%
ggplot(aes(x=time,y=frequency,color=genotype,group=genotype))+
geom_line()+
expand_limits(y=c(0,1))
The above codes represent each and every individual in computer memory explicitly. Consequently, they are expensive when the population size is large. We can achieve much faster run-times with an approximation that tracks only the numbers of individuals of each genotype.
The following codes implement evolution with selection using the multinomial approximation.
popsize <- 100
freq <- c(AA=0.9,AB=0.1,BB=0)
w <- c(AA=1,AB=1,BB=0.7)
tmax <- 50
t <- seq(0,tmax)
N <- rmultinom(n=1,size=popsize,prob=freq)
g <- array(dim=c(length(t),3),dimnames=list(time=t,genotype=names(freq)))
g[1,] <- N/sum(N)
for (k in seq(from=2,to=length(t))) {
f <- c(A=2*N[1]+N[2], B=N[2]+2*N[3])/popsize/2
N <- rmultinom(n=1,size=popsize,prob=c(AA=w[1]*f[1]^2,AB=2*w[2]*f[1]*f[2],BB=w[3]*f[2]^2))
g[k,] <- N/sum(N)
}
g %>%
melt(value.name="frequency") %>%
ggplot(aes(x=time,y=frequency,color=genotype,group=genotype))+
geom_line()+
expand_limits(y=c(0,1))
Interestingly, in regions of the world where malaria is endemic, heterozygotes have a fitness advantage over normal homozygotes. This is because the rupturing of red blood cells disrupts the life cycle of the malaria parasite, which reduces the severity of disease symptoms. The HgbS homozygotes, unfortunately, are deprived of this benefit, and actually experience more severe malaria symptoms than normal homozygotes.
Underdominance refers to the situation where the heterozygote has a fitness that is depressed relative to that of either homozygote.
Important Note: You will turn in each draft of your report via the course Canvas site. Upload both the Rmarkdown document and a PDF version of your report. Choose the filenames according to the following formula: Challenge_X-Y.Z
where X
is the challenge problem number, Y
is the draft number (1, 2, or 3), and Z
is the appropriate extension (.Rmd
or .pdf
).