The following compiles and loads the Moran Genealogy Process (MGP) simulator, encoded in the file mgp.cc
. The contents of this file are displayed below.
The following functions run simulators of the various genealogy processes. The first simulates a realization of the MGP (without sampling). It returns snapshots of the MGP at the specified times.
The second simulates one realization of the Sampled MGP. If drop = TRUE
, the black balls will be dropped.
The third simulates a realization of the Observed Chain of the stationary Sampled Moran Genealogy Process.
The following function extracts information about an MGP state. In addition to the sample times and event counts, it returns the final tree, plus the times of the epochs (the interval boundaries) and the number of lineages on each interval. Finally, it computes the conditional log likelihoods and cumulative hazards, of each successive sample, conditional on the previous ones.
The treeplot
function uses ggtree [1] to generate a plot of the genealogy from its Newick representation.
In the following animation depicting one simulation of the SMGP, colored points are drawn to indicate players colored according to the scheme in the text. Red points correspond to “live samples” and “red players”: each one holds one red and one blue ball. Blue points correspond to “dead samples” and “blue players”: each one holds one blue and one green ball. These are samples that are the direct ancestor of another sample. Green points (“green players”) represent branching points: each green player holds two green balls.
The following tests the newick routines.
Now we investigate the distributions of the cumulative hazards of the conditional attachment process (main text, Theorem 9). These should be perfectly distributed according to the standard exponential distribution.
::bake(file="mgp_hazards.rds",{
pompregisterDoParallel()
registerDoRNG()
2000
nrep <-expand.grid(
rep=seq_len(nrep),
popsize=c(10,100,1000),
relsr=c(0.1,1,10),
nsamples=21
design
) ->
foreach (i=iter(design,"row"),
.options.multicore=list(preschedule=FALSE)) %dopar%
{
i$relsr * i$popsize
samplerate <- cumsum(rexp(n=i$nsamples,rate=samplerate))
tout <-system.time({
playObsSMGP(n=i$popsize,times=tout,t0=0,tree=FALSE) |>
getMGPinfo(drop=TRUE) -> x
c("cumhaz","nuggets","dd","loglik")] |>
x[ as_tibble() |>
bind_cols(times=tout) |>
tail(-1) |>
mutate(
sample=seq_along(times),
rep=i$rep,
samplerate=samplerate,
popsize=i$popsize
x
) ->
} tm
) ->$etime <- tm[3]
x
x|>
} bind_rows() |>
mutate(
p=exp(-cumhaz-nuggets)
dat
) ->
attr(dat,"cores") <- getDoParWorkers()
dat
dat }) ->
The above required 4.6 min on 32 cores. Efficiency: 0.207.
We use Kolmogorov-Smirnov to test the hypothesis that the cumulative hazards are distributed as expected.
|>
dat ggplot(aes(x=p))+
geom_abline(slope=1)+
stat_ecdf()+
facet_wrap(~sample)+
coord_equal()+
labs(x=expression(italic(p)),y=expression(italic(F(p))))+
expand_limits(x=c(0,1),y=c(0,1))
|>
dat mutate(ratio=factor(samplerate/popsize)) |>
ggplot(aes(x=p))+
geom_abline(slope=1)+
stat_ecdf()+
facet_grid(popsize~ratio,labeller=label_both)+
coord_equal()+
labs(x=expression(italic(p)),y=expression(italic(F(p))))+
expand_limits(x=c(0,1),y=c(0,1))
library(broom)
|>
dat filter(p>0) |>
group_by(sample) |>
do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
select(p.value) |>
ungroup()
# A tibble: 20 × 2
sample p.value
<int> <dbl>
1 1 0.225
2 2 0.840
3 3 0.233
4 4 0.948
5 5 0.878
6 6 0.388
7 7 0.00514
8 8 0.723
9 9 0.138
10 10 0.960
11 11 0.807
12 12 0.203
13 13 0.640
14 14 0.0192
15 15 0.288
16 16 0.933
17 17 0.457
18 18 0.962
19 19 0.642
20 20 0.0232
|>
dat filter(p>0) |>
group_by(popsize,samplerate) |>
do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
select(p.value) |>
ungroup()
# A tibble: 9 × 3
popsize samplerate p.value
<dbl> <dbl> <dbl>
1 10 1 0.0662
2 10 10 0.408
3 10 100 0.878
4 100 10 0.677
5 100 100 0.499
6 100 1000 0.262
7 1000 100 0.612
8 1000 1000 0.875
9 1000 10000 0.820
|>
dat filter(p>0) |>
group_by(popsize) |>
do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
select(p.value) |>
ungroup()
# A tibble: 3 × 2
popsize p.value
<dbl> <dbl>
1 10 0.767
2 100 0.977
3 1000 0.626
|>
dat filter(p>0) |>
group_by(samplerate/popsize) |>
do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
select(p.value) |>
ungroup()
# A tibble: 3 × 2
`samplerate/popsize` p.value
<dbl> <dbl>
1 0.1 0.747
2 1 0.414
3 10 0.743
|>
dat filter(p>0) |>
do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
select(p.value)
# A tibble: 1 × 1
p.value
<dbl>
1 0.753
The following C++ codes are contained in the file mgp.cc
.
// Moran Genealogy Process Simulator (C++)
// To build, execute: 'R CMD SHLIB <this_file>.cc'
#include <R.h>
#include <Rmath.h>
#include <Rdefines.h>
#include <vector>
#include <string>
#include <cstring>
#define err(...) errorcall(R_NilValue,__VA_ARGS__)
#define rprint(S) Rprintf("%s\n",(S).c_str())
const int na = R_NaInt;
const double inf = R_PosInf;
const double default_slate = R_NaReal;
// check if the slate is blank.
static bool blank_slate (double &x) {
return ISNA(x);
}
// interface with R's integer RNG
static int random_integer (int n) {
return int(floor(R_unif_index(double(n))));
}
typedef unsigned char raw_t; // must match with R's 'Rbyte' (see Rinternals.h)
// MGP CLASS
// the class to hold the state of the MGP (a "tableau").
class mgp_tableau_t {
private:
// ball colors
typedef enum {red = 0, green = 1, black = 2, blue = 4} color_t;
class ball_t;
class player_t;
// BALL CLASS
// each ball has:
// - a color
// - a unique name within its color
// - the name of the player in whose hand it lies.
class ball_t {
private:
int _hand;
public:
int name;
color_t color;
// basic constructor
ball_t (color_t col = red, int nom = na, int who = na) {
color = col;
name = nom;
_hand = who;
};
// copy constructor
ball_t (const ball_t &b) = default;
// copy assignment operator
ball_t & operator= (const ball_t &b) = delete;
// in whose hand do I lie?
int hand (void) const {
return _hand;
};
// change hands
void hand (int &who) {
_hand = who;
};
int is_green (void) const {
return color==green;
};
int is_black (void) const {
return color==black;
};
int is_blue (void) const {
return color==blue;
};
int is_red (void) const {
return color==red;
};
// human-readable colors
std::string color_name (void) const {
std::string o;
switch (color) {
case black: o = "black"; break;
case blue: o = "blue"; break;
case green: o = "green"; break;
case red: default: o = "red"; break;
}
return o;
};
// human-readable info
std::string describe (void) const {
return color_name() + "(" + std::to_string(name) + ")";
};
};
// PLAYER CLASS
// each player has:
// - a unique name
// - two balls
// - a slate to hold the time of seating
// - knowledge of the players to left and right
class player_t {
public:
int name;
ball_t *ballA, *ballB;
player_t *left, *right;
double slate;
// basic constructor
player_t (int nom = na, ball_t *a = 0, ball_t *b = 0) {
name = nom;
ballA = a;
ballB = b;
left = 0;
right = 0;
slate = default_slate;
};
// copy constructor
player_t (const player_t &p) = delete;
// copy assignment operator
player_t & operator= (const player_t & p) = delete;
// does this player hold this ball?
int holds (ball_t *b) const {
return (ballA == b) || (ballB == b);
};
// does this player hold a ball of this color?
int holds (color_t c) const {
return (ballA->color==c) || (ballB->color==c);
};
// return a pointer to the other ball
ball_t *other (const ball_t *b) const {
ball_t *o = 0;
if (b == ballA)
o = ballB;
else if (b == ballB)
o = ballA;
else // this should never occur
err("inconceivable! 'other' error");
return o;
};
// human-readable info
std::string describe (void) const {
return "player(" + std::to_string(name) + ") {"
+ ballA->describe() + ","
+ ballB->describe() + "}, t = "
+ std::to_string(slate) + "\n";
};
// binary output
int serial (raw_t *o = 0) const {
int buf[5];
color_t buf1[2];
buf[0] = name;
buf[1] = (left != 0) ? left->name : na;
buf[2] = (right != 0) ? right->name : na;
buf[3] = ballA->name;
buf[4] = ballB->name;
buf1[0] = ballA->color;
buf1[1] = ballB->color;
if (o != 0) {
memcpy(o,buf,sizeof(buf)); o += sizeof(buf);
memcpy(o,buf1,sizeof(buf1)); o += sizeof(buf1);
memcpy(o,&slate,sizeof(double)); o += sizeof(double);
}
return sizeof(buf)+sizeof(buf1)+sizeof(double);
};
// reset from serialized binary input
raw_t *reset (raw_t *o, mgp_tableau_t *tab) {
int buf[5];
color_t buf1[2];
memcpy(buf,o,sizeof(buf)); o += sizeof(buf);
memcpy(buf1,o,sizeof(buf1)); o += sizeof(buf1);
memcpy(&slate,o,sizeof(double)); o += sizeof(double);
left = (buf[1] != na) ? tab->player[buf[1]] : 0;
right = (buf[2] != na) ? tab->player[buf[2]] : 0;
ballA = tab->get_ball(buf1[0],buf[3]);
ballB = tab->get_ball(buf1[1],buf[4]);
ballA->hand(name);
ballB->hand(name);
return o;
};
};
private:
int popsize; // number of black balls
int nlive; // number of seated players
int nsample; // number of samples taken so far
std::vector<ball_t*> ball; // array of pointers to balls
std::vector<ball_t*> verde, negro, azul, rojo; // arrays for each color class
std::vector<player_t*> player; // array of pointers to players
player_t *rightmost; // pointer to player seated farthest to right
double _time; // current time
double _scale; // mean waiting time between Moran events
// draw a random integer in the interval [0,n-1]
void draw_one (int n, int *x) const {
*x = random_integer(n);
};
// draw a pair of random integers in the interval [0,n-1]
void draw_two (int n, int *x) const {
x[0] = random_integer(n);
x[1] = random_integer(n-1);
if (x[1] >= x[0]) x[1]++;
};
// swap balls a and b, wherever they lie
void ballswap (ball_t *a, ball_t *b) {
player_t *A = player[a->hand()];
player_t *B = player[b->hand()];
if (a == A->ballA)
A->ballA = b;
else if (a == A->ballB)
A->ballB = b;
else
err("inconceivable! ballswap error A");
if (b == B->ballA)
B->ballA = a;
else if (b == B->ballB)
B->ballB = a;
else
err("inconceivable! ballswap error B");
a->hand(B->name);
b->hand(A->name);
};
// initialize a tableau with:
// - popsize = n
// - maximum number of samples = m,
// - each player holds his own green ball.
// - only the first player is seated.
// - time is NA.
void setup (int n, int m = 0) {
popsize = n;
nlive = 1;
nsample = 0;
int maxp = n+m+m; // max number of players
player.reserve(maxp);
ball.reserve(maxp+maxp);
verde.reserve(maxp);
negro.reserve(n);
azul.reserve(m);
rojo.reserve(m);
_time = default_slate;
_scale = double(popsize);
// initialize the players and balls
// first the n players of the basic game
int k = 0;
for (int i = 0; i < n; i++) {
ball_t *g = new ball_t(green,k,k);
ball_t *b = new ball_t(black,i,k);
ball.push_back(g); verde.push_back(g);
ball.push_back(b); negro.push_back(b);
player_t *p = new player_t(k++,g,b);
player.push_back(p);
}
rightmost = player[0];
// now the 2m players representing samples
for (int i = 0; i < m; i++) {
ball_t *g, *b;
player_t *p;
g = new ball_t(green,k,k);
b = new ball_t(blue,i,k);
ball.push_back(g); verde.push_back(g);
ball.push_back(b); azul.push_back(b);
p = new player_t(k++,g,b);
player.push_back(p);
g = new ball_t(green,k,k);
b = new ball_t(red,i,k);
ball.push_back(g); verde.push_back(g);
ball.push_back(b); rojo.push_back(b);
p = new player_t(k++,g,b);
player.push_back(p);
}
};
public:
// binary output
int serial (raw_t *o = 0) const {
int buf1[4] = {
popsize, nsample, nlive, rightmost->name
};
double buf2[2] = {_time, _scale};
int n = sizeof(buf1)+sizeof(buf2);
int ps = player[0]->serial();
int np = popsize+2*nsample;
n += np*ps;
if (o != 0) {
memcpy(o,buf1,sizeof(buf1)); o += sizeof(buf1);
memcpy(o,buf2,sizeof(buf2)); o += sizeof(buf2);
for (int i = 0; i < np; i++) {
o += player[i]->serial(o);
}
}
return n;
};
private:
// set the tableau from serialized binary.
// allow for extra samples.
void reset (raw_t *o, int extra = 0) {
int buf1[4];
double buf2[2];
memcpy(buf1,o,sizeof(buf1)); o += sizeof(buf1);
memcpy(buf2,o,sizeof(buf2)); o += sizeof(buf2);
setup(buf1[0],buf1[1]+extra);
nsample = buf1[1];
nlive = buf1[2];
rightmost = player[buf1[3]];
_time = buf2[0];
_scale = buf2[1];
int np = buf1[0]+2*buf1[1];
for (int k = 0; k < np; k++)
o = player[k]->reset(o,this);
};
// get pointer to ball by name and color
ball_t *get_ball (color_t &col, int nom = 0) const {
ball_t *b;
switch (col) {
case green: b = verde[nom]; break;
case black: b = negro[nom]; break;
case blue: b = azul[nom]; break;
case red: b = rojo[nom]; break;
default: err("inconceivable! there is no color %d",col);
}
return b;
};
public:
// basic constructor
// n is population size
// m >= total number of samples to be taken
mgp_tableau_t (void) = delete;
mgp_tableau_t (int n, int m = 0) {
setup(n,m);
};
// constructor from serialized binary form
mgp_tableau_t (raw_t *o, int extra = 0) {
reset(o,extra);
};
// copy constructor
mgp_tableau_t (const mgp_tableau_t &T) {
int s = T.serial();
raw_t *o = new raw_t[s];
T.serial(o);
reset(o);
delete[] o;
};
// copy assignment operator
mgp_tableau_t & operator= (const mgp_tableau_t &T) = delete;
// initialize the MGP with a draw from its stationary distribution at time t0.
// 'mu' is the Moran event rate.
void init (double mu = 1, double t0 = 0) {
// seat each of the active players.
// each chooses an ancestor by selecting a black ball to the left at random.
// the coalescent intervals are chosen from the stationary distribution.
_time = 0;
_scale = 1/mu;
double scale = _scale*choose(popsize,2);
int draw;
for (int j = 1; j < popsize; j++) {
draw_one(j,&draw);
birth(verde[j],negro[draw]);
double deltat = rexp(scale/choose(j+1,2));
_time += deltat;
}
// shift time so that the t0 is the current time.
for (int j = 1; j < popsize; j++) {
player[j]->slate += t0-_time;
}
_time = t0;
};
// initialize an MGP at a given state.
void init (raw_t *o) {
reset(o);
};
// destructor
~mgp_tableau_t (void) {
for (int i = 0; i < int(ball.size()); i++) delete ball[i];
for (int i = 0; i < int(player.size()); i++) delete player[i];
};
// get current time.
double time (void) const {
return _time;
};
// get Moran rate.
double rate (void) const {
return 1/_scale;
};
// set Moran rate.
void rate (double &mu) {
_scale = 1/mu;
};
// get root player (just to the right of the "shelf")
player_t *get_root (void) const {
return player.front()->right;
};
// get number of samples
int num_sample (void) const {
return nsample;
};
// report all the seating times
int get_times (double *x = 0) const {
player_t *p = get_root();
int n = 0;
while (p != 0) {
n++;
if (x != 0) *(x++) = p->slate;
p = p->right;
}
return n;
};
// report the times of live samples
int get_epochs (double *x = 0) const {
player_t *p = get_root();
int n = 0;
while (p != 0) {
if (p->holds(red)) {
n++;
if (x != 0) *(x++) = p->slate;
}
p = p->right;
}
return n;
};
// count the number of extant sample lineages on each interval.
void get_lineage_count (int *x) const {
player_t *p = get_root();
int count = 1;
while (p != 0) {
count--;
if (p->ballA->is_green()) count++;
if (p->ballB->is_green()) count++;
*(x++) = count;
p = p->right;
}
};
// check the validity of the mgp_tableau.
void valid (void) const {
if (popsize != int(negro.size())) err("ooops");
if (player.size() != verde.size()) err("oopsy");
if (ball.size() != 2*player.size()) err("no no NOOOO!");
// the leftmost player is the "shelf"
player_t *p = player[verde[0]->hand()];
if (!(p->name == 0 && p == player.front() && p->left == 0 && blank_slate(p->slate)))
err("invalid shelf:\n%s",p->describe().c_str());
// check seating arrangement
int n = 1;
while (p->right != 0) {
n++;
if (p->right->slate < p->slate) err("inconCEIVable!"); // seating times are out of order
if (p->right->left != p) err("seven years' bad luck"); // right and left are not mirrored
p = p->right;
}
if (n != nlive) err("cannot traverse right %d %d",n,nlive);
if (p != rightmost) err("rightmost player is not rightmost");
while (p->left != 0) {
n--;
if (p->left->right != p) err("holy MOSES!"); // right and left are not mirrored
p = p->left;
}
if (n != 1) err("cannot traverse left %d",n);
if (_time < rightmost->slate)
err("invalid 'time': %le < %le",_time,rightmost->slate);
// for each player, check that both balls are in hand
for (int j = 0; j < int(player.size()); j++) {
p = player[j];
if (p->name != j)
err("player %d has incorrect name (%d)",j,p->name);
if (p->ballA->hand() != j)
err("player %d, ball 0 (color %s) is not in hand",j,p->ballA->color_name().c_str());
if (p->ballB->hand() != j)
err("player %d, ball 1 (color %s) is not in hand",j,p->ballB->color_name().c_str());
if (p->left==0 && p->right==0 &&
!(blank_slate(p->slate) && p->holds(verde[p->name])))
err("zombie players!");
}
// check each ball
for (int j = 0; j < int(ball.size()); j++) {
ball_t *b = ball[j];
if (!player[b->hand()]->holds(b))
err("ball %d (color %s) is not in hand of its player",j,b->color_name().c_str());
}
// check each color-class of balls
for (int j = 0; j < int(verde.size()); j++) {
ball_t *b = verde[j];
if (!b->is_green())
err("ball %d is not green",b->name);
if (b->name != j)
err("ball %d is misnamed",b->name);
}
for (int j = 0; j < int(negro.size()); j++) {
ball_t *b = negro[j];
if (!b->is_black())
err("ball %d is not black",b->name);
if (b->name != j)
err("ball %d is misnamed",b->name);
}
for (int j = 0; j < int(azul.size()); j++) {
ball_t *b = azul[j];
if (!b->is_blue())
err("ball %d is not blue",b->name);
if (b->name != j)
err("ball %d is misnamed",b->name);
}
for (int j = 0; j < int(rojo.size()); j++) {
ball_t *b = rojo[j];
if (!b->is_red())
err("ball %d is not red",b->name);
if (b->name != j)
err("ball %d is misnamed",b->name);
}
// check that direct descents are proper, i.e.:
// p->left->slate == p->slate => p->left->holds(verde[p->name])
// p->right->slate == p->slate => p->holds(verde[p->right->name])
for (int j = 0; j < nsample; j++) {
player_t *p = player[azul[j]->hand()];
if (p->left->slate == p->slate && !p->left->holds(verde[p->name]))
err("inconceivable!! bad sample %d",j);
}
p = get_root();
while (p->right != 0) {
if (p->right->slate == p->slate && !p->holds(verde[p->right->name])) {
err("improper length 0 interval:\n%s %s",
p->describe().c_str(),p->right->describe().c_str());
}
p = p->right;
}
};
// human-readable info
std::string describe (void) const {
player_t *p = player.front();
std::string o = "";
while (p != 0) {
o += p->describe();
p = p->right;
}
o += "time = " + std::to_string(time()) + "\n"
+ "rate = " + std::to_string(rate()) + "\n";
return o;
};
private:
// recursive function to put genealogy into Newick format.
// TODO: modify so as to handle "pruned" trees.
std::string newick (int &name, double &tpar) const {
std::string o;
player_t *p = player[name];
if (p->holds(blue)) {
ball_t *b = (p->ballA->is_blue()) ? p->ballA : p->ballB;
ball_t *c = p->other(b);
switch (c->color) {
case green:
o = "(" + newick(c->name,p->slate) + ")";
break;
case red:
o = "";
break;
default:
err("INCONceivABLE!");
break;
}
o += std::to_string(b->name) + ":" + std::to_string(p->slate - tpar);
} else {
ball_t *b;
o = "(";
b = p->ballA;
switch (b->color) {
case black:
o += "o:" + std::to_string(_time - p->slate) + ",";
break;
case green:
o += newick(b->name,p->slate) + ",";
break;
default:
err("InCoNcEiVaBlE!");
break;
}
b = p->ballB;
switch (b->color) {
case black:
o += "o:" + std::to_string(_time - p->slate);
break;
case green:
o += newick(b->name,p->slate);
break;
default:
err("InCoNcEiVaBlE!");
break;
}
o += ")g:" + std::to_string(p->slate - tpar);
}
return o;
};
public:
// put genealogy at current time into Newick format
std::string newick (void) const {
player_t *p = get_root();
return newick(p->name,p->slate) + ";";
// note that this ensures that the branch length at the root is 0
};
// seat the player holding ball a.
// take as parent the player holding ball b.
// a does not change hands; b does.
// the newly seated player will be in the rightmost position
// holding balls a and b.
// the parent will have exchanged b for the green ball with
// the newly seated player's name.
void birth (const ball_t *a, ball_t *b) {
int child = a->hand();
player_t *A = player[child];
player_t *C = rightmost; // this is why we need 'rightmost'
ball_t *g = verde[child];
if (!A->holds(g)) err("INconCEIVable!");
ballswap(g,b);
C->right = A;
A->left = C;
A->right = 0;
rightmost = A;
A->slate = _time;
nlive++;
};
// unseat the player holding ball b.
// b does not change hands.
// the newly unseated player will hold b and its own green ball,
// having exchanged its other ball for the latter.
void death (const ball_t *b) {
int name = b->hand();
player_t *A = player[name];
ball_t *g = verde[name];
ball_t *o = A->other(b);
player_t *L = A->left;
player_t *R = A->right;
ballswap(o,g);
if (L != 0)
L->right = R;
if (R != 0) {
R->left = L;
} else {
rightmost = L;
}
// return A to its prenatal state
A->left = 0;
A->right = 0;
A->slate = default_slate;
nlive--;
};
// A "Moran move" is a death followed by a birth.
void moran_move (void) {
int draw[2];
draw_two(popsize,draw);
death(negro[draw[0]]);
birth(negro[draw[0]],negro[draw[1]]);
};
// a sample adds 2 players at the same time.
// The first ends up holding the random black ball plus
// the green ball of the second player.
// The second player ends up holding a blue ball and an
// red ball.
void sample () {
int draw;
draw_one(popsize,&draw);
birth(azul[nsample],negro[draw]);
birth(rojo[nsample],azul[nsample]);
nsample++;
};
// run MGP to a specified time.
// return number of Moran events that have occurred.
int play (double tfin) {
int count = 0;
double deltat = rexp(_scale);
while (_time+deltat < tfin) {
_time += deltat;
moran_move();
count++;
deltat = rexp(_scale);
}
_time = tfin; // relies on Markov property
return count;
};
// take one step of the MGP.
// return new time.
double play1 (void) {
double deltat = rexp(_scale);
_time += deltat;
moran_move();
return _time;
};
// successively drop all players holding black balls.
void drop_black (void) {
for (int j = 0; j < popsize; j++)
death(negro[j]);
negro.clear(); // this will prevent all subsequent births, deaths, and samples.
};
// successively drop all players holding blue and red balls.
void drop_samples (void) {
for (int j = 0; j < nsample; j++) {
death(rojo[j]);
death(azul[j]);
}
};
// drop black balls and zero-length branches.
void prune (void) {
drop_black();
for (int j = 0; j < nsample; j++) {
player_t *p = player[azul[j]->hand()];
ball_t *g = verde[p->name];
if (p->left->holds(g) && p->slate == p->left->slate) {
ballswap(rojo[j],p->left->other(g));
death(rojo[j]);
nlive--;
}
}
};
// walk back along each sampled lineage, computing
// cumulative hazard of coalescence and log likelihood,
// conditional on previous samples.
// 'll': conditional log likelihoods
// 'hz': continuous portion of cumulative hazard
// 'ng': discrete portion of cumulative hazard
// 'dd': indicator of direct descent events
void walk (double *ll = 0, double *hz = 0, double *ng = 0, int *dd = 0) const {
int np = int(player.size());
int *lineages = new int[np]; // number of extant lineages in each interval
int *breadcrumb = new int[np];
int i, n = popsize;
double mu = rate()/choose(n,2);
// initialize lineage counter and breadcrumbs
for (int j = 0; j < np; j++) {
lineages[j] = 0;
breadcrumb[j] = 0;
}
// loop over samples, in order of their addition
for (int s = 0; s < nsample; s++) {
if (ll != 0) ll[s] = 0;
if (hz != 0) hz[s] = 0;
if (ng != 0) ng[s] = 0;
if (dd != 0) dd[s] = 0;
player_t *p = player[azul[s]->hand()];
player_t *pl = p->left;
ball_t *g = verde[p->name];
double t, deltat, H;
// traverse from right to left
while (pl != 0) {
t = p->slate;
deltat = t - pl->slate;
p = pl;
pl = p->left;
i = lineages[p->name]; // just for convenience
// cumulative hazard of coalescence within an interval
H = (i > 0) ? mu*i*deltat : 0;
if (ll != 0) ll[s] -= H;
if (hz != 0) hz[s] += H;
// deal with direct descent (or lack thereof).
// NB: no sample can be the direct ancestor of more than one later sample
// (hence the breadcrumb).
if (p->holds(blue) && !breadcrumb[p->name]) {
// hazard of coalescing by direct descent: -log(1-1/(n-i))
double nug = double(1)/double(n-i);
if (nug > 1) err("c'est impossible!");
if (pl->slate == p->slate && pl->holds(g)) { // coalescence
if (ll != 0) ll[s] += log(nug); // log(1/(n-i))
if (ng != 0) ng[s] -= log(1-runif(0,nug));
if (dd != 0) dd[s] += 1;
breadcrumb[p->name] = 1; // ensures no second coalescence here
lineages[pl->name]++; // no effect, since deltat = 0
pl = 0;
} else { // no coalescence
H = log(1-nug);
if (nug >= 1) err("coalescence should be assured here");
if (ng != 0) ng[s] -= H;
if (ll != 0) ll[s] += H;
}
} else if (p->holds(g)) { // is this player an ancestor?
if (breadcrumb[p->name]) { // coalescence
if (ll != 0 && deltat > 0) ll[s] += log(mu*i);
pl = 0; // stop
} else { // haven't come this way yet
g = verde[p->name]; // continue
breadcrumb[p->name] = 1;
}
}
// increase the number of lineages that cross this interval:
lineages[p->name]++;
}
}
delete[] lineages;
delete[] breadcrumb;
};
// first prune, then walk back along each sampled lineage, computing
// cumulative hazard of coalescence and log likelihood,
// conditional on previous samples.
// 'll': conditional log likelihoods
// 'hz': continuous portion of cumulative hazard
// 'ng': discrete portion of cumulative hazard
// 'dd': indicator of direct descent events
// NB: this modifies the MGP tableau.
void prune_and_walk (double *ll = 0, double *hz = 0, double *ng = 0, int *dd = 0) {
prune();
int np = int(player.size());
int *lineages = new int[np]; // number of extant lineages in each interval
int *breadcrumb = new int[np];
int i, n = popsize;
double mu = rate()/choose(n,2);
// initialize lineage counter and breadcrumbs
for (int j = 0; j < np; j++) {
lineages[j] = 0;
breadcrumb[j] = 0;
}
// loop over samples, in order of their addition
for (int s = 0; s < nsample; s++) {
if (ll != 0) ll[s] = 0;
if (hz != 0) hz[s] = 0;
if (ng != 0) ng[s] = 0;
if (dd != 0) dd[s] = 0;
player_t *p = player[azul[s]->hand()];
player_t *pl = p->left;
ball_t *g = verde[p->name];
double t, deltat, H;
// traverse from right to left
while (pl != 0) {
t = p->slate;
deltat = t - pl->slate;
if (deltat == 0) err("inconceivable! a duration-zero interval!");
p = pl;
pl = p->left;
i = lineages[p->name]; // just for convenience
// cumulative hazard of coalescence within an interval
H = (i > 0) ? mu*i*deltat : 0;
if (ll != 0) ll[s] -= H;
if (hz != 0) hz[s] += H;
// deal with direct descent (or lack thereof).
if (p->holds(blue) && !breadcrumb[p->name]) {
// hazard of coalescing by direct descent: -log(1-1/(n-i))
H = double(1)/double(n-i);
if (p->holds(g)) { // coalescence
if (ll != 0) ll[s] -= log(H); // log(1/(n-i))
if (ng != 0)
ng[s] -= log(1-runif(0,H)); // jitter
if (dd != 0) dd[s] += 1;
breadcrumb[p->name] = 1;
pl = 0; // stop
} else { // no coalescence
if (H >= 1) err("coalescence should be assured here");
if (ng != 0) ng[s] += -log(1-H);
if (ll != 0) ll[s] += log(1-H);
}
} else if (p->holds(g)) { // is this player an ancestor?
if (breadcrumb[p->name]) { // coalescence
if (ll != 0) ll[s] += log(mu*i);
pl = 0; // stop
} else { // haven't come this way yet
g = verde[p->name]; // continue
breadcrumb[p->name] = 1;
}
}
// increase the number of lineages that cross this interval:
lineages[p->name]++;
}
}
delete[] lineages;
delete[] breadcrumb;
};
};
// helper function for filling a return list
static int set_list_elem (SEXP list, SEXP names, SEXP element,
const char *name, int pos) {
SET_ELEMENT(list,pos,element);
SET_STRING_ELT(names,pos,mkChar(name));
return ++pos;
}
// create the serialized MGP state:
static SEXP serialize (const mgp_tableau_t *T) {
SEXP out;
int s = T->serial();
PROTECT(out = NEW_RAW(s));
T->serial(RAW(out));
UNPROTECT(1);
return out;
}
// create a human-readable description
static SEXP describe (const mgp_tableau_t *T) {
SEXP out;
PROTECT(out = NEW_CHARACTER(1));
SET_STRING_ELT(out,0,mkChar(T->describe().c_str()));
UNPROTECT(1);
return out;
}
// extract the tree structure in Newick form.
// store in element k of character-vector x.
static void newick (SEXP x, int k, const mgp_tableau_t *T) {
SET_STRING_ELT(x,k,mkChar(T->newick().c_str()));
}
extern "C" {
// MPG without sampling.
// optionally compute genealogies in Newick form ('tree = TRUE').
SEXP playMGP (SEXP X, SEXP Rate, SEXP Times, SEXP T0, SEXP Tree) {
int nprotect = 0;
int nout = 3;
SEXP out, outnames;
SEXP times, count, tree = R_NilValue;
int ntimes = LENGTH(Times);
mgp_tableau_t *mgp = 0;
double t = R_NaReal;
int do_newick = *(INTEGER(AS_INTEGER(Tree)));
PROTECT(times = AS_NUMERIC(duplicate(Times))); nprotect++;
PROTECT(count = NEW_INTEGER(ntimes)); nprotect++;
if (do_newick) {
PROTECT(tree = NEW_CHARACTER(ntimes)); nprotect++;
nout++;
}
int *xc = INTEGER(count);
double *xt = REAL(times);
double rate = R_NaReal;
if (!isNull(Rate)) { // Moran event rate
rate = *(REAL(AS_NUMERIC(Rate)));
}
GetRNGstate();
if (LENGTH(X)>1) { // restart the MGP from the specified state
mgp = new mgp_tableau_t(RAW(X));
mgp->valid();
t = mgp->time();
if (!isNull(Rate)) mgp->rate(rate); // override the stored rate
} else { // a fresh MGP
PROTECT(X = AS_INTEGER(X)); nprotect++;
t = *(REAL(AS_NUMERIC(T0)));
mgp = new mgp_tableau_t(*(INTEGER(X)));
mgp->init(rate,t);
}
if (t > xt[0]) err("must not have t0 > times[1]");
for (int k = 0; k < ntimes; k++, xc++, xt++) {
*xc = mgp->play(*xt);
if (do_newick) {
newick(tree,k,mgp);
}
R_CheckUserInterrupt();
}
PutRNGstate();
// pack everything up in a list
int k = 0;
PROTECT(out = NEW_LIST(nout)); nprotect++;
PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
k = set_list_elem(out,outnames,times,"times",k);
k = set_list_elem(out,outnames,count,"count",k);
if (do_newick) {
k = set_list_elem(out,outnames,tree,"tree",k);
}
k = set_list_elem(out,outnames,serialize(mgp),"state",k);
SET_NAMES(out,outnames);
delete mgp;
UNPROTECT(nprotect);
return out;
}
// Sampled MGP.
// optionally drop black balls ('drop = TRUE').
// optionally compute Newick form of genealogies at sample times ('tree = TRUE').
SEXP playSMGP (SEXP X, SEXP Rate, SEXP Times, SEXP T0, SEXP Drop, SEXP Tree) {
int nprotect = 0;
int nout = 3;
SEXP out, outnames;
SEXP times, count, tree = R_NilValue;
int ntimes = LENGTH(Times);
double t;
mgp_tableau_t *mgp = 0;
int drop = *(INTEGER(AS_INTEGER(Drop)));
int do_newick = *(INTEGER(AS_INTEGER(Tree)));
PROTECT(times = AS_NUMERIC(duplicate(Times))); nprotect++;
PROTECT(count = NEW_INTEGER(ntimes)); nprotect++;
if (do_newick) {
PROTECT(tree = NEW_CHARACTER(ntimes)); nprotect++;
nout++;
}
int *xc = INTEGER(count);
double *xt = REAL(times);
double rate = R_NaReal; // Moran event rate
if (!isNull(Rate)) {
rate = *(REAL(AS_NUMERIC(Rate)));
}
GetRNGstate();
if (LENGTH(X)>1) { // restart the MGP from the specified serialized state
mgp = new mgp_tableau_t(RAW(X),ntimes);
mgp->valid();
t = mgp->time();
if (!isNull(Rate)) mgp->rate(rate);
} else { // run a fresh MGP
PROTECT(X = AS_INTEGER(X)); nprotect++;
t = *(REAL(AS_NUMERIC(T0)));
mgp = new mgp_tableau_t(*(INTEGER(X)),ntimes);
mgp->init(rate,t); // initialize with a draw from the stationary distribution
}
if (t > xt[0]) err("must not have t0 > times[1]");
for (int k = 0; k < ntimes; k++, xc++, xt++) {
*xc = mgp->play(*xt); // play the MGP to the specified time
mgp->sample(); // make a sample
if (do_newick) {
mgp_tableau_t U = *mgp;
if (drop) U.drop_black();
newick(tree,k,&U);
}
R_CheckUserInterrupt();
}
PutRNGstate();
int k = 0;
PROTECT(out = NEW_LIST(nout)); nprotect++;
PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
k = set_list_elem(out,outnames,times,"times",k);
k = set_list_elem(out,outnames,count,"count",k);
if (do_newick) {
k = set_list_elem(out,outnames,tree,"tree",k);
}
k = set_list_elem(out,outnames,serialize(mgp),"state",k);
SET_NAMES(out,outnames);
delete mgp;
UNPROTECT(nprotect);
return out;
}
// Observed chain of the stationary SMGP.
// optionally compute Newick form of genealogies at sample times ('tree = TRUE').
SEXP playObsSMGP (SEXP X, SEXP Rate, SEXP Times, SEXP T0, SEXP Tree) {
int nprotect = 0;
int nout = 3;
SEXP out, outnames;
SEXP times, count, tree = R_NilValue;
int ntimes = LENGTH(Times);
double t;
mgp_tableau_t *mgp = 0;
int do_newick = *(INTEGER(AS_INTEGER(Tree)));
PROTECT(times = AS_NUMERIC(duplicate(Times))); nprotect++;
PROTECT(count = NEW_INTEGER(ntimes)); nprotect++;
if (do_newick) {
PROTECT(tree = NEW_CHARACTER(ntimes)); nprotect++;
nout++;
}
int *xc = INTEGER(count);
double *xt = REAL(times);
double rate = R_NaReal; // Moran event rate
if (!isNull(Rate)) {
rate = *(REAL(AS_NUMERIC(Rate)));
}
GetRNGstate();
if (LENGTH(X)>1) { // restart the MGP from the specified serialized state
mgp = new mgp_tableau_t(RAW(X),ntimes);
mgp->valid();
t = mgp->time();
if (!isNull(Rate)) mgp->rate(rate);
} else { // run a fresh MGP
PROTECT(X = AS_INTEGER(X)); nprotect++;
t = *(REAL(AS_NUMERIC(T0)));
mgp = new mgp_tableau_t(*(INTEGER(X)),ntimes);
mgp->init(rate,t); // initialize with a draw from the stationary distribution
}
if (t > xt[0]) err("must not have t0 > times[1]");
for (int k = 0; k < ntimes; k++, xc++, xt++) {
*xc = mgp->play(*xt); // play the MGP to the specified time
mgp->sample(); // make a sample
if (do_newick) {
mgp_tableau_t U = *mgp;
U.prune();
newick(tree,k,&U);
}
R_CheckUserInterrupt();
}
int k = 0;
PROTECT(out = NEW_LIST(nout)); nprotect++;
PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
k = set_list_elem(out,outnames,times,"times",k);
k = set_list_elem(out,outnames,count,"count",k);
if (do_newick) {
k = set_list_elem(out,outnames,tree,"tree",k);
}
k = set_list_elem(out,outnames,serialize(mgp),"state",k);
SET_NAMES(out,outnames);
PutRNGstate();
delete mgp;
UNPROTECT(nprotect);
return out;
}
// extract/compute basic information.
SEXP get_MGPS_info (SEXP X, SEXP Drop) {
int nprotect = 0;
int nout = 11;
SEXP out, outnames;
SEXP tout, rate, tree, epoch, etimes, lineage, desc;
SEXP loglik, cumhaz, nuggets, dd;
int drop = *(INTEGER(AS_INTEGER(Drop)));
PROTECT(tout = NEW_NUMERIC(1)); nprotect++;
PROTECT(rate = NEW_NUMERIC(1)); nprotect++;
PROTECT(tree = NEW_CHARACTER(1)); nprotect++;
// reconstruct the tableau from its serialization
mgp_tableau_t *mgp = new mgp_tableau_t(RAW(X));
// check validity
mgp->valid();
// extract current time and MGP event rate
*REAL(tout) = mgp->time();
*REAL(rate) = mgp->rate();
// drop the black balls if requested
if (drop) mgp->drop_black();
// get human-readable description
PROTECT(desc = describe(mgp)); nprotect++;
// get the tree in Newick form
if (drop) mgp->prune();
newick(tree,0,mgp);
// get the lineage count
int nt = mgp->get_times();
PROTECT(etimes = NEW_NUMERIC(nt)); nprotect++;
PROTECT(lineage = NEW_INTEGER(nt)); nprotect++;
mgp->get_times(REAL(etimes));
mgp->get_lineage_count(INTEGER(lineage));
// get epochs
int nepoch = mgp->get_epochs();
PROTECT(epoch = NEW_NUMERIC(nepoch)); nprotect++;
mgp->get_epochs(REAL(epoch));
// walk the tree for cumulative hazards and log likelihood
int ns = mgp->num_sample();
PROTECT(cumhaz = NEW_NUMERIC(ns)); nprotect++;
PROTECT(nuggets = NEW_NUMERIC(ns)); nprotect++;
PROTECT(dd = NEW_INTEGER(ns)); nprotect++;
PROTECT(loglik = NEW_NUMERIC(ns)); nprotect++;
mgp->prune_and_walk(REAL(loglik),REAL(cumhaz),REAL(nuggets),INTEGER(dd));
// pack it all up in a list
int k = 0;
PROTECT(out = NEW_LIST(nout)); nprotect++;
PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
k = set_list_elem(out,outnames,tout,"time",k);
k = set_list_elem(out,outnames,rate,"rate",k);
k = set_list_elem(out,outnames,tree,"tree",k);
k = set_list_elem(out,outnames,desc,"description",k);
k = set_list_elem(out,outnames,epoch,"epochs",k);
k = set_list_elem(out,outnames,etimes,"etimes",k);
k = set_list_elem(out,outnames,lineage,"lineages",k);
k = set_list_elem(out,outnames,cumhaz,"cumhaz",k);
k = set_list_elem(out,outnames,nuggets,"nuggets",k);
k = set_list_elem(out,outnames,dd,"dd",k);
k = set_list_elem(out,outnames,loglik,"loglik",k);
SET_NAMES(out,outnames);
delete mgp;
UNPROTECT(nprotect);
return out;
}
}
// TODO:
// - newick input?
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: America/Detroit
tzcode source: system (glibc)
attached base packages:
[1] parallel stats graphics grDevices utils
[6] datasets methods base
other attached packages:
[1] broom_1.0.5 doRNG_1.8.6 rngtools_1.5.2
[4] doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2
[7] ggtree_3.8.2 lubridate_1.9.3 forcats_1.0.0
[10] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[13] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[16] ggplot2_3.5.0 tidyverse_2.0.0 knitr_1.45
loaded via a namespace (and not attached):
[1] gtable_0.3.4 xfun_0.43
[3] bslib_0.7.0 lattice_0.22-6
[5] tzdb_0.4.0 Cairo_1.6-2
[7] vctrs_0.6.5 tools_4.3.3
[9] generics_0.1.3 yulab.utils_0.1.4
[11] fansi_1.0.6 highr_0.10
[13] pkgconfig_2.0.3 data.table_1.15.4
[15] ggplotify_0.1.2 lifecycle_1.0.4
[17] compiler_4.3.3 farver_2.1.1
[19] treeio_1.24.3 munsell_0.5.1
[21] codetools_0.2-19 ggfun_0.1.4
[23] htmltools_0.5.8 sass_0.4.9
[25] yaml_2.3.8 lazyeval_0.2.2
[27] pillar_1.9.0 jquerylib_0.1.4
[29] cachem_1.0.8 nlme_3.1-164
[31] pomp_5.7.0.2 tidyselect_1.2.1
[33] aplot_0.2.2 digest_0.6.35
[35] mvtnorm_1.2-4 stringi_1.8.3
[37] labeling_0.4.3 fastmap_1.1.1
[39] grid_4.3.3 colorspace_2.1-0
[41] cli_3.6.2 magrittr_2.0.3
[43] patchwork_1.2.0 utf8_1.2.4
[45] ape_5.7-1 withr_3.0.0
[47] backports_1.4.1 scales_1.3.0
[49] timechange_0.3.0 rmarkdown_2.26
[51] deSolve_1.40 hms_1.1.3
[53] coda_0.19-4.1 memoise_2.0.1
[55] evaluate_0.23 gridGraphics_0.5-1
[57] rlang_1.1.3 Rcpp_1.0.12
[59] glue_1.7.0 tidytree_0.4.6
[61] jsonlite_1.8.8 R6_2.5.1
[63] fs_1.6.3
1. Yu, G., Smith, D., Zhu, H., Guan, Y. & Lam, T. T.-Y. 2017 Ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36. (doi:10.1111/2041-210X.12628)