The Moran game simulator

The following compiles and loads the Moran Genealogy Process (MGP) simulator, encoded in the file mgp.cc. The contents of this file are displayed below.

The following functions run simulators of the various genealogy processes. The first simulates a realization of the MGP (without sampling). It returns snapshots of the MGP at the specified times.

The second simulates one realization of the Sampled MGP. If drop = TRUE, the black balls will be dropped.

The third simulates a realization of the Observed Chain of the stationary Sampled Moran Genealogy Process.

The following function extracts information about an MGP state. In addition to the sample times and event counts, it returns the final tree, plus the times of the epochs (the interval boundaries) and the number of lineages on each interval. Finally, it computes the conditional log likelihoods and cumulative hazards, of each successive sample, conditional on the previous ones.

Plotting utilities.

The treeplot function uses ggtree [1] to generate a plot of the genealogy from its Newick representation.

One simulation of the MGP

One simulation of the SMGP

Simulations of the Observed SMGP

Observed SMGP with more detail

In the following animation depicting one simulation of the SMGP, colored points are drawn to indicate players colored according to the scheme in the text. Red points correspond to “live samples” and “red players”: each one holds one red and one blue ball. Blue points correspond to “dead samples” and “blue players”: each one holds one blue and one green ball. These are samples that are the direct ancestor of another sample. Green points (“green players”) represent branching points: each green player holds two green balls.

Verifying the SMGP formulae

Now we investigate the distributions of the cumulative hazards of the conditional attachment process (main text, Theorem 9). These should be perfectly distributed according to the standard exponential distribution.

pomp::bake(file="mgp_hazards.rds",{
  registerDoParallel()
  registerDoRNG()
  
  nrep <- 2000
  expand.grid(
    rep=seq_len(nrep),
    popsize=c(10,100,1000),
    relsr=c(0.1,1,10),
    nsamples=21
  ) -> design
  
  foreach (i=iter(design,"row"),
           .options.multicore=list(preschedule=FALSE)) %dopar%
    {
      samplerate <- i$relsr * i$popsize
      tout <- cumsum(rexp(n=i$nsamples,rate=samplerate))
      system.time({
        playObsSMGP(n=i$popsize,times=tout,t0=0,tree=FALSE) |>
          getMGPinfo(drop=TRUE) -> x
        x[c("cumhaz","nuggets","dd","loglik")] |>
          as_tibble() |>
          bind_cols(times=tout) |>
          tail(-1) |>
          mutate(
            sample=seq_along(times),
            rep=i$rep,
            samplerate=samplerate,
            popsize=i$popsize
          ) -> x
      }
      ) -> tm
      x$etime <- tm[3]
      x
    } |>
    bind_rows() |>
    mutate(
      p=exp(-cumhaz-nuggets)
    ) -> dat
  
  attr(dat,"cores") <- getDoParWorkers()
  
  dat
  
}) -> dat

The above required 4.6 min on 32 cores. Efficiency: 0.207.

We use Kolmogorov-Smirnov to test the hypothesis that the cumulative hazards are distributed as expected.

dat |>
  ggplot(aes(x=p))+
  geom_abline(slope=1)+
  stat_ecdf()+
  facet_wrap(~sample)+
  coord_equal()+
  labs(x=expression(italic(p)),y=expression(italic(F(p))))+
  expand_limits(x=c(0,1),y=c(0,1))

dat |>
  mutate(ratio=factor(samplerate/popsize)) |>
  ggplot(aes(x=p))+
  geom_abline(slope=1)+
  stat_ecdf()+
  facet_grid(popsize~ratio,labeller=label_both)+
  coord_equal()+
  labs(x=expression(italic(p)),y=expression(italic(F(p))))+
  expand_limits(x=c(0,1),y=c(0,1))

library(broom)

dat |>
  filter(p>0) |>
  group_by(sample) |>
  do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
  select(p.value) |>
  ungroup()
# A tibble: 20 × 2
   sample p.value
    <int>   <dbl>
 1      1 0.225  
 2      2 0.840  
 3      3 0.233  
 4      4 0.948  
 5      5 0.878  
 6      6 0.388  
 7      7 0.00514
 8      8 0.723  
 9      9 0.138  
10     10 0.960  
11     11 0.807  
12     12 0.203  
13     13 0.640  
14     14 0.0192 
15     15 0.288  
16     16 0.933  
17     17 0.457  
18     18 0.962  
19     19 0.642  
20     20 0.0232 
dat |>
  filter(p>0) |>
  group_by(popsize,samplerate) |>
  do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
  select(p.value) |>
  ungroup()
# A tibble: 9 × 3
  popsize samplerate p.value
    <dbl>      <dbl>   <dbl>
1      10          1  0.0662
2      10         10  0.408 
3      10        100  0.878 
4     100         10  0.677 
5     100        100  0.499 
6     100       1000  0.262 
7    1000        100  0.612 
8    1000       1000  0.875 
9    1000      10000  0.820 
dat |>
  filter(p>0) |>
  group_by(popsize) |>
  do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
  select(p.value) |>
  ungroup()
# A tibble: 3 × 2
  popsize p.value
    <dbl>   <dbl>
1      10   0.767
2     100   0.977
3    1000   0.626
dat |>
  filter(p>0) |>
  group_by(samplerate/popsize) |>
  do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
  select(p.value) |>
  ungroup()
# A tibble: 3 × 2
  `samplerate/popsize` p.value
                 <dbl>   <dbl>
1                  0.1   0.747
2                  1     0.414
3                 10     0.743
dat |>
  filter(p>0) |>
  do(tidy(ks.test(x=.$cumhaz+.$nuggets,y=pexp,rate=1))) |>
  select(p.value)
# A tibble: 1 × 1
  p.value
    <dbl>
1   0.753

MGP simulator code

The following C++ codes are contained in the file mgp.cc.

// Moran Genealogy Process Simulator (C++)
// To build, execute: 'R CMD SHLIB <this_file>.cc'

#include <R.h>
#include <Rmath.h>
#include <Rdefines.h>

#include <vector>
#include <string>
#include <cstring>

#define err(...) errorcall(R_NilValue,__VA_ARGS__)
#define rprint(S) Rprintf("%s\n",(S).c_str())

const int na = R_NaInt;
const double inf = R_PosInf;
const double default_slate = R_NaReal;

// check if the slate is blank.
static bool blank_slate (double &x) {
  return ISNA(x);
}

// interface with R's integer RNG
static int random_integer (int n) {
  return int(floor(R_unif_index(double(n))));
}

typedef unsigned char raw_t; // must match with R's 'Rbyte' (see Rinternals.h)

// MGP CLASS
// the class to hold the state of the MGP (a "tableau").
class mgp_tableau_t {

private:

  // ball colors
  typedef enum {red = 0, green = 1, black = 2, blue = 4} color_t;

  class ball_t;
  class player_t;

  // BALL CLASS
  // each ball has:
  // - a color
  // - a unique name within its color
  // - the name of the player in whose hand it lies.
  class ball_t {
  private:
    int _hand;
  public:
    int name;
    color_t color;
    // basic constructor
    ball_t (color_t col = red, int nom = na, int who = na) {
      color = col;
      name = nom;
      _hand = who;
    };
    // copy constructor
    ball_t (const ball_t &b) = default;
    // copy assignment operator
    ball_t & operator= (const ball_t &b) = delete;
    // in whose hand do I lie?
    int hand (void) const {
      return _hand;
    };
    // change hands
    void hand (int &who) {
      _hand = who;
    };
    int is_green (void) const {
      return color==green;
    };
    int is_black (void) const {
      return color==black;
    };
    int is_blue (void) const {
      return color==blue;
    };
    int is_red (void) const {
      return color==red;
    };
    // human-readable colors
    std::string color_name (void) const {
      std::string o;
      switch (color) {
      case black: o = "black"; break;
      case blue: o = "blue"; break;
      case green: o = "green"; break;
      case red: default: o = "red"; break;
      }
      return o;
    };
    // human-readable info
    std::string describe (void) const {
      return color_name() + "(" + std::to_string(name) + ")";
    };
  };

  // PLAYER CLASS
  // each player has:
  // - a unique name
  // - two balls
  // - a slate to hold the time of seating
  // - knowledge of the players to left and right
  class player_t {

  public:

    int name;
    ball_t *ballA, *ballB;
    player_t *left, *right;
    double slate;

    // basic constructor
    player_t (int nom = na, ball_t *a = 0, ball_t *b = 0) {
      name = nom;
      ballA = a;
      ballB = b;
      left = 0;
      right = 0;
      slate = default_slate;
    };
    // copy constructor
    player_t (const player_t &p) = delete;
    // copy assignment operator
    player_t & operator= (const player_t & p) = delete;
    // does this player hold this ball?
    int holds (ball_t *b) const {
      return (ballA == b) || (ballB == b);
    };
    // does this player hold a ball of this color?
    int holds (color_t c) const {
      return (ballA->color==c) || (ballB->color==c);
    };
    // return a pointer to the other ball
    ball_t *other (const ball_t *b) const {
      ball_t *o = 0;
      if (b == ballA)
        o = ballB;
      else if (b == ballB)
        o = ballA;
      else                      // this should never occur
        err("inconceivable! 'other' error");
      return o;
    };
    // human-readable info
    std::string describe (void) const {
      return "player(" + std::to_string(name) + ") {"
        + ballA->describe() + ","
        + ballB->describe() + "}, t = "
        + std::to_string(slate) + "\n";
    };
    // binary output
    int serial (raw_t *o = 0) const {
      int buf[5];
      color_t buf1[2];
      buf[0] = name;
      buf[1] = (left != 0) ? left->name : na;
      buf[2] = (right != 0) ? right->name : na;
      buf[3] = ballA->name;
      buf[4] = ballB->name;
      buf1[0] = ballA->color;
      buf1[1] = ballB->color;
      if (o != 0) {
        memcpy(o,buf,sizeof(buf)); o += sizeof(buf);
        memcpy(o,buf1,sizeof(buf1)); o += sizeof(buf1);
        memcpy(o,&slate,sizeof(double)); o += sizeof(double);
      }
      return sizeof(buf)+sizeof(buf1)+sizeof(double);
    };
    // reset from serialized binary input
    raw_t *reset (raw_t *o, mgp_tableau_t *tab) {
      int buf[5];
      color_t buf1[2];
      memcpy(buf,o,sizeof(buf)); o += sizeof(buf);
      memcpy(buf1,o,sizeof(buf1)); o += sizeof(buf1);
      memcpy(&slate,o,sizeof(double)); o += sizeof(double);
      left = (buf[1] != na) ? tab->player[buf[1]] : 0;
      right = (buf[2] != na) ? tab->player[buf[2]] : 0;
      ballA = tab->get_ball(buf1[0],buf[3]);
      ballB = tab->get_ball(buf1[1],buf[4]);
      ballA->hand(name);
      ballB->hand(name);
      return o;
    };
  };

private:

  int popsize;                  // number of black balls
  int nlive;                    // number of seated players
  int nsample;                  // number of samples taken so far
  std::vector<ball_t*> ball;    // array of pointers to balls
  std::vector<ball_t*> verde, negro, azul, rojo; // arrays for each color class
  std::vector<player_t*> player; // array of pointers to players
  player_t *rightmost;   // pointer to player seated farthest to right
  double _time;          // current time
  double _scale;         // mean waiting time between Moran events

  // draw a random integer in the interval [0,n-1]
  void draw_one (int n, int *x) const {
    *x = random_integer(n);
  };
  // draw a pair of random integers in the interval [0,n-1]
  void draw_two (int n, int *x) const {
    x[0] = random_integer(n);
    x[1] = random_integer(n-1);
    if (x[1] >= x[0]) x[1]++;
  };
  // swap balls a and b, wherever they lie
  void ballswap (ball_t *a, ball_t *b) {
    player_t *A = player[a->hand()];
    player_t *B = player[b->hand()];
    if (a == A->ballA)
      A->ballA = b;
    else if (a == A->ballB)
      A->ballB = b;
    else
      err("inconceivable! ballswap error A");
    if (b == B->ballA)
      B->ballA = a;
    else if (b == B->ballB)
      B->ballB = a;
    else
      err("inconceivable! ballswap error B");
    a->hand(B->name);
    b->hand(A->name);
  };

  // initialize a tableau with:
  // - popsize = n
  // - maximum number of samples = m,
  // - each player holds his own green ball.
  // - only the first player is seated.
  // - time is NA.
  void setup (int n, int m = 0) {

    popsize = n;
    nlive = 1;
    nsample = 0;

    int maxp = n+m+m;           // max number of players

    player.reserve(maxp);
    ball.reserve(maxp+maxp);
    verde.reserve(maxp);
    negro.reserve(n);
    azul.reserve(m);
    rojo.reserve(m);

    _time = default_slate;
    _scale = double(popsize);

    // initialize the players and balls
    // first the n players of the basic game
    int k = 0;

    for (int i = 0; i < n; i++) {
      ball_t *g = new ball_t(green,k,k);
      ball_t *b = new ball_t(black,i,k);
      ball.push_back(g); verde.push_back(g);
      ball.push_back(b); negro.push_back(b);
      player_t *p = new player_t(k++,g,b);
      player.push_back(p);
    }

    rightmost = player[0];

    // now the 2m players representing samples
    for (int i = 0; i < m; i++) {
      ball_t *g, *b;
      player_t *p;
      g = new ball_t(green,k,k);
      b = new ball_t(blue,i,k);
      ball.push_back(g); verde.push_back(g);
      ball.push_back(b); azul.push_back(b);
      p = new player_t(k++,g,b);
      player.push_back(p);

      g = new ball_t(green,k,k);
      b = new ball_t(red,i,k);
      ball.push_back(g); verde.push_back(g);
      ball.push_back(b); rojo.push_back(b);
      p = new player_t(k++,g,b);
      player.push_back(p);
    }

  };
  
public:
  // binary output
  int serial (raw_t *o = 0) const {
    int buf1[4] = {
      popsize, nsample, nlive, rightmost->name
    };
    double buf2[2] = {_time, _scale};
    int n = sizeof(buf1)+sizeof(buf2);
    int ps = player[0]->serial();
    int np = popsize+2*nsample;
    n += np*ps;
    if (o != 0) {
      memcpy(o,buf1,sizeof(buf1)); o += sizeof(buf1);
      memcpy(o,buf2,sizeof(buf2)); o += sizeof(buf2);
      for (int i = 0; i < np; i++) {
        o += player[i]->serial(o);
      }
    }
    return n;
  };
  
private:
  // set the tableau from serialized binary.
  // allow for extra samples.
  void reset (raw_t *o, int extra = 0) {
    int buf1[4];
    double buf2[2];
    
    memcpy(buf1,o,sizeof(buf1)); o += sizeof(buf1);
    memcpy(buf2,o,sizeof(buf2)); o += sizeof(buf2);

    setup(buf1[0],buf1[1]+extra);

    nsample = buf1[1];
    nlive = buf1[2];
    rightmost = player[buf1[3]];
    _time = buf2[0];
    _scale = buf2[1];

    int np = buf1[0]+2*buf1[1];
    for (int k = 0; k < np; k++)
      o = player[k]->reset(o,this);

  };

  // get pointer to ball by name and color
  ball_t *get_ball (color_t &col, int nom = 0) const {
    ball_t *b;
    switch (col) {
    case green: b = verde[nom]; break;
    case black: b = negro[nom]; break;
    case blue:  b = azul[nom]; break;
    case red: b = rojo[nom]; break;
    default: err("inconceivable! there is no color %d",col);
    }
    return b;
  };
  
public:
  
  // basic constructor
  //   n is population size
  //   m >= total number of samples to be taken
  mgp_tableau_t (void) = delete;
  mgp_tableau_t (int n, int m = 0) {
    setup(n,m);
  };
  // constructor from serialized binary form
  mgp_tableau_t (raw_t *o, int extra = 0) {
    reset(o,extra);
  };
  // copy constructor
  mgp_tableau_t (const mgp_tableau_t &T) {
    int s = T.serial();
    raw_t *o = new raw_t[s];
    T.serial(o);
    reset(o);
    delete[] o;
  };
  // copy assignment operator
  mgp_tableau_t & operator= (const mgp_tableau_t &T) = delete;

  // initialize the MGP with a draw from its stationary distribution at time t0.
  // 'mu' is the Moran event rate.
  void init (double mu = 1, double t0 = 0) {

    // seat each of the active players.
    // each chooses an ancestor by selecting a black ball to the left at random.
    // the coalescent intervals are chosen from the stationary distribution.
    _time = 0;
    _scale = 1/mu;
    double scale = _scale*choose(popsize,2);
    int draw;
    for (int j = 1; j < popsize; j++) {
      draw_one(j,&draw);
      birth(verde[j],negro[draw]);
      double deltat = rexp(scale/choose(j+1,2));
      _time += deltat;
    }

    // shift time so that the t0 is the current time.
    for (int j = 1; j < popsize; j++) {
      player[j]->slate += t0-_time;
    }

    _time = t0;

  };

  // initialize an MGP at a given state.
  void init (raw_t *o) {
    reset(o);
  };

  // destructor
  ~mgp_tableau_t (void) {
    for (int i = 0; i < int(ball.size()); i++) delete ball[i];
    for (int i = 0; i < int(player.size()); i++) delete player[i];
  };

  // get current time.
  double time (void) const {
    return _time;
  };

  // get Moran rate.
  double rate (void) const {
    return 1/_scale;
  };

  // set Moran rate.
  void rate (double &mu) {
    _scale = 1/mu;
  };

  // get root player (just to the right of the "shelf")
  player_t *get_root (void) const {
    return player.front()->right;
  };

  // get number of samples
  int num_sample (void) const {
    return nsample;
  };

  // report all the seating times
  int get_times (double *x = 0) const {
    player_t *p = get_root();
    int n = 0;
    while (p != 0) {
      n++;
      if (x != 0) *(x++) = p->slate;
      p = p->right;
    }
    return n;
  };

  // report the times of live samples
  int get_epochs (double *x = 0) const {
    player_t *p = get_root();
    int n = 0;
    while (p != 0) {
      if (p->holds(red)) {
        n++;
        if (x != 0) *(x++) = p->slate;
      }
      p = p->right;
    }
    return n;
  };

  // count the number of extant sample lineages on each interval.
  void get_lineage_count (int *x) const {
    player_t *p = get_root();
    int count = 1;
    while (p != 0) {
      count--;
      if (p->ballA->is_green()) count++;
      if (p->ballB->is_green()) count++;
      *(x++) = count;
      p = p->right;
    }
  };

  // check the validity of the mgp_tableau.
  void valid (void) const {

    if (popsize != int(negro.size())) err("ooops");
    if (player.size() != verde.size()) err("oopsy");
    if (ball.size() != 2*player.size()) err("no no NOOOO!");

    // the leftmost player is the "shelf"
    player_t *p = player[verde[0]->hand()];
    if (!(p->name == 0 && p == player.front() && p->left == 0 && blank_slate(p->slate)))
      err("invalid shelf:\n%s",p->describe().c_str());

    // check seating arrangement
    int n = 1;
    while (p->right != 0) {
      n++;
      if (p->right->slate < p->slate) err("inconCEIVable!"); // seating times are out of order
      if (p->right->left != p) err("seven years' bad luck"); // right and left are not mirrored
      p = p->right;
    }
    if (n != nlive) err("cannot traverse right %d %d",n,nlive);
    if (p != rightmost) err("rightmost player is not rightmost");
    while (p->left != 0) {
      n--;
      if (p->left->right != p) err("holy MOSES!"); // right and left are not mirrored
      p = p->left;
    }
    if (n != 1) err("cannot traverse left %d",n);

    if (_time < rightmost->slate)
      err("invalid 'time': %le < %le",_time,rightmost->slate);

    // for each player, check that both balls are in hand
    for (int j = 0; j < int(player.size()); j++) {
      p = player[j];
      if (p->name != j)
        err("player %d has incorrect name (%d)",j,p->name);
      if (p->ballA->hand() != j)
        err("player %d, ball 0 (color %s) is not in hand",j,p->ballA->color_name().c_str());
      if (p->ballB->hand() != j)
        err("player %d, ball 1 (color %s) is not in hand",j,p->ballB->color_name().c_str());
      if (p->left==0 && p->right==0 &&
          !(blank_slate(p->slate) && p->holds(verde[p->name])))
        err("zombie players!");
    }

    // check each ball
    for (int j = 0; j < int(ball.size()); j++) {
      ball_t *b = ball[j];
      if (!player[b->hand()]->holds(b))
        err("ball %d (color %s) is not in hand of its player",j,b->color_name().c_str());
    }

    // check each color-class of balls
    for (int j = 0; j < int(verde.size()); j++) {
      ball_t *b = verde[j];
      if (!b->is_green())
        err("ball %d is not green",b->name);
      if (b->name != j)
        err("ball %d is misnamed",b->name);
    }
    for (int j = 0; j < int(negro.size()); j++) {
      ball_t *b = negro[j];
      if (!b->is_black())
        err("ball %d is not black",b->name);
      if (b->name != j)
        err("ball %d is misnamed",b->name);
    }
    for (int j = 0; j < int(azul.size()); j++) {
      ball_t *b = azul[j];
      if (!b->is_blue())
        err("ball %d is not blue",b->name);
      if (b->name != j)
        err("ball %d is misnamed",b->name);
    }
    for (int j = 0; j < int(rojo.size()); j++) {
      ball_t *b = rojo[j];
      if (!b->is_red())
        err("ball %d is not red",b->name);
      if (b->name != j)
        err("ball %d is misnamed",b->name);
    }

    // check that direct descents are proper, i.e.:
    //   p->left->slate == p->slate  => p->left->holds(verde[p->name])
    //   p->right->slate == p->slate => p->holds(verde[p->right->name])
    for (int j = 0; j < nsample; j++) {
      player_t *p = player[azul[j]->hand()];
      if (p->left->slate == p->slate && !p->left->holds(verde[p->name]))
        err("inconceivable!! bad sample %d",j);
    }
    p = get_root();
    while (p->right != 0) {
      if (p->right->slate == p->slate && !p->holds(verde[p->right->name])) {
        err("improper length 0 interval:\n%s %s",
            p->describe().c_str(),p->right->describe().c_str());
      }
      p = p->right;
    }

  };

  // human-readable info
  std::string describe (void) const {
    player_t *p = player.front();
    std::string o = "";
    while (p != 0) {
      o += p->describe();
      p = p->right;
    }
    o += "time = " + std::to_string(time()) + "\n"
      + "rate = " + std::to_string(rate()) + "\n";
    return o;
  };

private:

  // recursive function to put genealogy into Newick format.
  // TODO: modify so as to handle "pruned" trees.
  std::string newick (int &name, double &tpar) const {

    std::string o;
    player_t *p = player[name];

    if (p->holds(blue)) {

      ball_t *b = (p->ballA->is_blue()) ? p->ballA : p->ballB;
      ball_t *c = p->other(b);
      switch (c->color) {
      case green:
        o = "(" + newick(c->name,p->slate) + ")";
        break;
      case red:
        o = "";
        break;
      default:
        err("INCONceivABLE!");
        break;
      }

      o += std::to_string(b->name) + ":" + std::to_string(p->slate - tpar);

    } else {
    
      ball_t *b;

      o = "(";

      b = p->ballA;
      switch (b->color) {
      case black:
        o += "o:" + std::to_string(_time - p->slate) + ",";
        break;
      case green:
        o += newick(b->name,p->slate) + ",";
        break;
      default:
        err("InCoNcEiVaBlE!");
        break;
      }
      
      b = p->ballB;
      switch (b->color) {
      case black:
        o += "o:" + std::to_string(_time - p->slate);
        break;
      case green:
        o  += newick(b->name,p->slate);
        break;
      default:
        err("InCoNcEiVaBlE!");
        break;
      }
      o += ")g:" + std::to_string(p->slate - tpar);
    }

    return o;
  };

public:

  // put genealogy at current time into Newick format
  std::string newick (void) const {
    player_t *p = get_root();
    return newick(p->name,p->slate) + ";";
    // note that this ensures that the branch length at the root is 0
  };

  // seat the player holding ball a.
  // take as parent the player holding ball b.
  // a does not change hands; b does.
  // the newly seated player will be in the rightmost position
  // holding balls a and b.
  // the parent will have exchanged b for the green ball with
  // the newly seated player's name.
  void birth (const ball_t *a, ball_t *b) {
    int child = a->hand();
    player_t *A = player[child];
    player_t *C = rightmost; // this is why we need 'rightmost'
    ball_t *g = verde[child];
    if (!A->holds(g)) err("INconCEIVable!");
    ballswap(g,b);
    C->right = A;
    A->left = C;
    A->right = 0;
    rightmost = A;
    A->slate = _time;
    nlive++;
  };

  // unseat the player holding ball b.
  // b does not change hands.
  // the newly unseated player will hold b and its own green ball,
  // having exchanged its other ball for the latter.
  void death (const ball_t *b) {
    int name = b->hand();
    player_t *A = player[name];
    ball_t *g = verde[name];
    ball_t *o = A->other(b);
    player_t *L = A->left;
    player_t *R = A->right;
    ballswap(o,g);
    if (L != 0)
      L->right = R;
    if (R != 0) {
      R->left = L;
    } else {
      rightmost = L;
    }
    // return A to its prenatal state
    A->left = 0;
    A->right = 0;
    A->slate = default_slate;
    nlive--;
  };

  // A "Moran move" is a death followed by a birth.
  void moran_move (void) {
    int draw[2];
    draw_two(popsize,draw);
    death(negro[draw[0]]);
    birth(negro[draw[0]],negro[draw[1]]);
  };

  // a sample adds 2 players at the same time.
  // The first ends up holding the random black ball plus
  // the green ball of the second player.
  // The second player ends up holding a blue ball and an
  // red ball.
  void sample () {
    int draw;
    draw_one(popsize,&draw);
    birth(azul[nsample],negro[draw]);
    birth(rojo[nsample],azul[nsample]);
    nsample++;
  };

  // run MGP to a specified time.
  // return number of Moran events that have occurred.
  int play (double tfin) {
    int count = 0;
    double deltat = rexp(_scale);
    while (_time+deltat < tfin) {
      _time += deltat;
      moran_move();
      count++;
      deltat = rexp(_scale);
    }
    _time = tfin;                // relies on Markov property
    return count;
  };

  // take one step of the MGP.
  // return new time.
  double play1 (void) {
    double deltat = rexp(_scale);
    _time += deltat;
    moran_move();
    return _time;
  };

  // successively drop all players holding black balls.
  void drop_black (void) {
    for (int j = 0; j < popsize; j++)
      death(negro[j]);
    negro.clear(); // this will prevent all subsequent births, deaths, and samples.
  };

  // successively drop all players holding blue and red balls.
  void drop_samples (void) {
    for (int j = 0; j < nsample; j++) {
      death(rojo[j]);
      death(azul[j]);
    }
  };

  // drop black balls and zero-length branches.
  void prune (void) {
    drop_black();
    for (int j = 0; j < nsample; j++) {
      player_t *p = player[azul[j]->hand()];
      ball_t *g = verde[p->name];
      if (p->left->holds(g) && p->slate == p->left->slate) {
        ballswap(rojo[j],p->left->other(g));
        death(rojo[j]);
        nlive--;
      }
    }
  };

  // walk back along each sampled lineage, computing
  // cumulative hazard of coalescence and log likelihood,
  // conditional on previous samples.
  //   'll': conditional log likelihoods
  //   'hz': continuous portion of cumulative hazard
  //   'ng': discrete portion of cumulative hazard
  //   'dd': indicator of direct descent events
  void walk (double *ll = 0, double *hz = 0, double *ng = 0, int *dd = 0) const {
    
    int np = int(player.size());
    int *lineages = new int[np]; // number of extant lineages in each interval
    int *breadcrumb = new int[np];
    int i, n = popsize;
    double mu = rate()/choose(n,2);
    
    // initialize lineage counter and breadcrumbs
    for (int j = 0; j < np; j++) {
      lineages[j] = 0;
      breadcrumb[j] = 0;
    }

    // loop over samples, in order of their addition
    for (int s = 0; s < nsample; s++) {

      if (ll != 0) ll[s] = 0;
      if (hz != 0) hz[s] = 0;
      if (ng != 0) ng[s] = 0;
      if (dd != 0) dd[s] = 0;

      player_t *p = player[azul[s]->hand()];
      player_t *pl = p->left;
      ball_t *g = verde[p->name];
      double t, deltat, H;

      // traverse from right to left
      while (pl != 0) {
        t = p->slate;
        deltat = t - pl->slate;
        p = pl;
        pl = p->left;
        i = lineages[p->name]; // just for convenience

        // cumulative hazard of coalescence within an interval
        H = (i > 0) ? mu*i*deltat : 0;
        if (ll != 0) ll[s] -= H;
        if (hz != 0) hz[s] += H;

        // deal with direct descent (or lack thereof).
        // NB: no sample can be the direct ancestor of more than one later sample
        // (hence the breadcrumb).
        if (p->holds(blue) && !breadcrumb[p->name]) {
          // hazard of coalescing by direct descent: -log(1-1/(n-i))
          double nug = double(1)/double(n-i);
      if (nug > 1) err("c'est impossible!");
      if (pl->slate == p->slate && pl->holds(g)) { // coalescence
            if (ll != 0) ll[s] += log(nug);    // log(1/(n-i))
            if (ng != 0) ng[s] -= log(1-runif(0,nug));
            if (dd != 0) dd[s] += 1;
            breadcrumb[p->name] = 1; // ensures no second coalescence here
            lineages[pl->name]++;    // no effect, since deltat = 0
            pl = 0;
          } else {              // no coalescence
        H = log(1-nug);
            if (nug >= 1) err("coalescence should be assured here");
            if (ng != 0) ng[s] -= H;
            if (ll != 0) ll[s] += H;
          }
        } else if (p->holds(g)) {    // is this player an ancestor?
          if (breadcrumb[p->name]) { // coalescence
            if (ll != 0 && deltat > 0) ll[s] += log(mu*i);
            pl = 0;             // stop
          } else {              // haven't come this way yet
            g = verde[p->name];   // continue
            breadcrumb[p->name] = 1;
          }
        }
        // increase the number of lineages that cross this interval:
        lineages[p->name]++;
      }

    }

    delete[] lineages;
    delete[] breadcrumb;

  };

  // first prune, then walk back along each sampled lineage, computing
  // cumulative hazard of coalescence and log likelihood,
  // conditional on previous samples.
  //   'll': conditional log likelihoods
  //   'hz': continuous portion of cumulative hazard
  //   'ng': discrete portion of cumulative hazard
  //   'dd': indicator of direct descent events
  // NB: this modifies the MGP tableau.
  void prune_and_walk (double *ll = 0, double *hz = 0, double *ng = 0, int *dd = 0) {

    prune();
    
    int np = int(player.size());
    int *lineages = new int[np]; // number of extant lineages in each interval
    int *breadcrumb = new int[np];
    int i, n = popsize;
    double mu = rate()/choose(n,2);
    
    // initialize lineage counter and breadcrumbs
    for (int j = 0; j < np; j++) {
      lineages[j] = 0;
      breadcrumb[j] = 0;
    }

    // loop over samples, in order of their addition
    for (int s = 0; s < nsample; s++) {

      if (ll != 0) ll[s] = 0;
      if (hz != 0) hz[s] = 0;
      if (ng != 0) ng[s] = 0;
      if (dd != 0) dd[s] = 0;

      player_t *p = player[azul[s]->hand()];
      player_t *pl = p->left;
      ball_t *g = verde[p->name];
      double t, deltat, H;

      // traverse from right to left
      while (pl != 0) {
        t = p->slate;
        deltat = t - pl->slate;
        if (deltat == 0) err("inconceivable! a duration-zero interval!");
        p = pl;
        pl = p->left;
        i = lineages[p->name];  // just for convenience

        // cumulative hazard of coalescence within an interval
        H = (i > 0) ? mu*i*deltat : 0;
        if (ll != 0) ll[s] -= H;
        if (hz != 0) hz[s] += H;

        // deal with direct descent (or lack thereof).
        if (p->holds(blue) && !breadcrumb[p->name]) {
          // hazard of coalescing by direct descent: -log(1-1/(n-i))
          H = double(1)/double(n-i);
          if (p->holds(g)) {                        // coalescence
            if (ll != 0) ll[s] -= log(H); // log(1/(n-i))
            if (ng != 0)
              ng[s] -= log(1-runif(0,H)); // jitter
            if (dd != 0) dd[s] += 1;
            breadcrumb[p->name] = 1;
            pl = 0;             // stop
          } else {              // no coalescence
            if (H >= 1) err("coalescence should be assured here");
            if (ng != 0) ng[s] += -log(1-H);
            if (ll != 0) ll[s] += log(1-H);
          }
        } else if (p->holds(g)) {    // is this player an ancestor?
          if (breadcrumb[p->name]) { // coalescence
            if (ll != 0) ll[s] += log(mu*i);
            pl = 0;             // stop
          } else {              // haven't come this way yet
            g = verde[p->name];   // continue
            breadcrumb[p->name] = 1;
          }
        }
        // increase the number of lineages that cross this interval:
        lineages[p->name]++;
      }

    }

    delete[] lineages;
    delete[] breadcrumb;

  };
};

// helper function for filling a return list
static int set_list_elem (SEXP list, SEXP names, SEXP element,
                          const char *name, int pos) {
  SET_ELEMENT(list,pos,element);
  SET_STRING_ELT(names,pos,mkChar(name));
  return ++pos;
}

// create the serialized MGP state:
static SEXP serialize (const mgp_tableau_t *T) {
  SEXP out;
  int s = T->serial();
  PROTECT(out = NEW_RAW(s));
  T->serial(RAW(out));
  UNPROTECT(1);
  return out;
}

// create a human-readable description
static SEXP describe (const mgp_tableau_t *T) {
  SEXP out;
  PROTECT(out = NEW_CHARACTER(1));
  SET_STRING_ELT(out,0,mkChar(T->describe().c_str()));
  UNPROTECT(1);
  return out;
}

// extract the tree structure in Newick form.
// store in element k of character-vector x.
static void newick (SEXP x, int k, const mgp_tableau_t *T) {
  SET_STRING_ELT(x,k,mkChar(T->newick().c_str()));
}

extern "C" {

  // MPG without sampling.
  // optionally compute genealogies in Newick form ('tree = TRUE').
  SEXP playMGP (SEXP X, SEXP Rate, SEXP Times, SEXP T0, SEXP Tree) {
    int nprotect = 0;
    int nout = 3;
    SEXP out, outnames;
    SEXP times, count, tree = R_NilValue;
    int ntimes = LENGTH(Times);
    mgp_tableau_t *mgp = 0;
    double t = R_NaReal;

    int do_newick = *(INTEGER(AS_INTEGER(Tree)));

    PROTECT(times = AS_NUMERIC(duplicate(Times))); nprotect++;
    PROTECT(count = NEW_INTEGER(ntimes)); nprotect++;
    if (do_newick) {
      PROTECT(tree = NEW_CHARACTER(ntimes)); nprotect++;
      nout++;
    }

    int *xc = INTEGER(count);
    double *xt = REAL(times);
    double rate = R_NaReal;
    if (!isNull(Rate)) {        // Moran event rate
      rate = *(REAL(AS_NUMERIC(Rate)));
    }

    GetRNGstate();
      
    if (LENGTH(X)>1) {     // restart the MGP from the specified state

      mgp = new mgp_tableau_t(RAW(X));
      mgp->valid();
      t = mgp->time();
      if (!isNull(Rate)) mgp->rate(rate); // override the stored rate
      
    }  else {                   // a fresh MGP

      PROTECT(X = AS_INTEGER(X)); nprotect++;
      t = *(REAL(AS_NUMERIC(T0)));
      mgp = new mgp_tableau_t(*(INTEGER(X)));
      mgp->init(rate,t);

    }
      
    if (t > xt[0]) err("must not have t0 > times[1]");

    for (int k = 0; k < ntimes; k++, xc++, xt++) {
      *xc = mgp->play(*xt);
      if (do_newick) {
        newick(tree,k,mgp);
      }
      R_CheckUserInterrupt();
    }
      
    PutRNGstate();
      
    // pack everything up in a list
    int k = 0;
    PROTECT(out = NEW_LIST(nout)); nprotect++;
    PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
    k = set_list_elem(out,outnames,times,"times",k);
    k = set_list_elem(out,outnames,count,"count",k);
    if (do_newick) {
      k = set_list_elem(out,outnames,tree,"tree",k);
    }
    k = set_list_elem(out,outnames,serialize(mgp),"state",k);
    SET_NAMES(out,outnames);
      
    delete mgp;

    UNPROTECT(nprotect);
    return out;
  }

  // Sampled MGP.
  // optionally drop black balls ('drop = TRUE').
  // optionally compute Newick form of genealogies at sample times ('tree = TRUE').
  SEXP playSMGP (SEXP X, SEXP Rate, SEXP Times, SEXP T0, SEXP Drop, SEXP Tree) {
    int nprotect = 0;
    int nout = 3;
    SEXP out, outnames;
    SEXP times, count, tree = R_NilValue;
    int ntimes = LENGTH(Times);
    double t;
    mgp_tableau_t *mgp = 0;

    int drop = *(INTEGER(AS_INTEGER(Drop)));
    int do_newick = *(INTEGER(AS_INTEGER(Tree)));

    PROTECT(times = AS_NUMERIC(duplicate(Times))); nprotect++;
    PROTECT(count = NEW_INTEGER(ntimes)); nprotect++;
    if (do_newick) {
      PROTECT(tree = NEW_CHARACTER(ntimes)); nprotect++;
      nout++;
    }

    int *xc = INTEGER(count);
    double *xt = REAL(times);
    double rate = R_NaReal;     // Moran event rate
    if (!isNull(Rate)) {
      rate = *(REAL(AS_NUMERIC(Rate)));
    }

    GetRNGstate();

    if (LENGTH(X)>1) { // restart the MGP from the specified serialized state

      mgp = new mgp_tableau_t(RAW(X),ntimes);
      mgp->valid();
      t = mgp->time();
      if (!isNull(Rate)) mgp->rate(rate);

    }  else {                   // run a fresh MGP

      PROTECT(X = AS_INTEGER(X)); nprotect++;
      t = *(REAL(AS_NUMERIC(T0)));
      mgp = new mgp_tableau_t(*(INTEGER(X)),ntimes);
      mgp->init(rate,t); // initialize with a draw from the stationary distribution

    }
      
    if (t > xt[0]) err("must not have t0 > times[1]");

    for (int k = 0; k < ntimes; k++, xc++, xt++) {
      *xc = mgp->play(*xt);     // play the MGP to the specified time
      mgp->sample();            // make a sample
      if (do_newick) {
        mgp_tableau_t U = *mgp;
        if (drop) U.drop_black();
        newick(tree,k,&U);
      }
      R_CheckUserInterrupt();
    }

    PutRNGstate();

    int k = 0;
    PROTECT(out = NEW_LIST(nout)); nprotect++;
    PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
    k = set_list_elem(out,outnames,times,"times",k);
    k = set_list_elem(out,outnames,count,"count",k);
    if (do_newick) {
      k = set_list_elem(out,outnames,tree,"tree",k);
    }
    k = set_list_elem(out,outnames,serialize(mgp),"state",k);
    SET_NAMES(out,outnames);

    delete mgp;

    UNPROTECT(nprotect);
    return out;
  }

  // Observed chain of the stationary SMGP.
  // optionally compute Newick form of genealogies at sample times ('tree = TRUE').
  SEXP playObsSMGP (SEXP X, SEXP Rate, SEXP Times, SEXP T0, SEXP Tree) {
    int nprotect = 0;
    int nout = 3;
    SEXP out, outnames;
    SEXP times, count, tree = R_NilValue;
    int ntimes = LENGTH(Times);
    double t;
    mgp_tableau_t *mgp = 0;

    int do_newick = *(INTEGER(AS_INTEGER(Tree)));

    PROTECT(times = AS_NUMERIC(duplicate(Times))); nprotect++;
    PROTECT(count = NEW_INTEGER(ntimes)); nprotect++;
    if (do_newick) {
      PROTECT(tree = NEW_CHARACTER(ntimes)); nprotect++;
      nout++;
    }

    int *xc = INTEGER(count);
    double *xt = REAL(times);
    double rate = R_NaReal;     // Moran event rate
    if (!isNull(Rate)) {
      rate = *(REAL(AS_NUMERIC(Rate)));
    }

    GetRNGstate();

    if (LENGTH(X)>1) { // restart the MGP from the specified serialized state

      mgp = new mgp_tableau_t(RAW(X),ntimes);
      mgp->valid();
      t = mgp->time();
      if (!isNull(Rate)) mgp->rate(rate);

    }  else {                   // run a fresh MGP

      PROTECT(X = AS_INTEGER(X)); nprotect++;
      t = *(REAL(AS_NUMERIC(T0)));
      mgp = new mgp_tableau_t(*(INTEGER(X)),ntimes);
      mgp->init(rate,t); // initialize with a draw from the stationary distribution

    }
      
    if (t > xt[0]) err("must not have t0 > times[1]");

    for (int k = 0; k < ntimes; k++, xc++, xt++) {
      *xc = mgp->play(*xt);     // play the MGP to the specified time
      mgp->sample();            // make a sample
      if (do_newick) {
        mgp_tableau_t U = *mgp;
        U.prune();
        newick(tree,k,&U);
      }
      R_CheckUserInterrupt();
    }

    int k = 0;
    PROTECT(out = NEW_LIST(nout)); nprotect++;
    PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
    k = set_list_elem(out,outnames,times,"times",k);
    k = set_list_elem(out,outnames,count,"count",k);
    if (do_newick) {
      k = set_list_elem(out,outnames,tree,"tree",k);
    }
    k = set_list_elem(out,outnames,serialize(mgp),"state",k);
    SET_NAMES(out,outnames);

    PutRNGstate();

    delete mgp;

    UNPROTECT(nprotect);
    return out;
  }

  // extract/compute basic information.
  SEXP get_MGPS_info (SEXP X, SEXP Drop) {
    int nprotect = 0;
    int nout = 11;
    SEXP out, outnames;
    SEXP tout, rate, tree, epoch, etimes, lineage, desc;
    SEXP loglik, cumhaz, nuggets, dd;

    int drop = *(INTEGER(AS_INTEGER(Drop)));

    PROTECT(tout = NEW_NUMERIC(1)); nprotect++;
    PROTECT(rate = NEW_NUMERIC(1)); nprotect++;
    PROTECT(tree = NEW_CHARACTER(1)); nprotect++;

    // reconstruct the tableau from its serialization
    mgp_tableau_t *mgp = new mgp_tableau_t(RAW(X));
    // check validity
    mgp->valid();

    // extract current time and MGP event rate
    *REAL(tout) = mgp->time();
    *REAL(rate) = mgp->rate();

    // drop the black balls if requested
    if (drop) mgp->drop_black();

    // get human-readable description
    PROTECT(desc = describe(mgp)); nprotect++;

    // get the tree in Newick form
    if (drop) mgp->prune();
    newick(tree,0,mgp);
    
    // get the lineage count
    int nt = mgp->get_times();
    PROTECT(etimes = NEW_NUMERIC(nt)); nprotect++;
    PROTECT(lineage = NEW_INTEGER(nt)); nprotect++;
    mgp->get_times(REAL(etimes));
    mgp->get_lineage_count(INTEGER(lineage));

    // get epochs
    int nepoch = mgp->get_epochs();
    PROTECT(epoch = NEW_NUMERIC(nepoch)); nprotect++;
    mgp->get_epochs(REAL(epoch));

    // walk the tree for cumulative hazards and log likelihood
    int ns = mgp->num_sample();
    PROTECT(cumhaz = NEW_NUMERIC(ns)); nprotect++;
    PROTECT(nuggets = NEW_NUMERIC(ns)); nprotect++;
    PROTECT(dd = NEW_INTEGER(ns)); nprotect++;
    PROTECT(loglik = NEW_NUMERIC(ns)); nprotect++;
    mgp->prune_and_walk(REAL(loglik),REAL(cumhaz),REAL(nuggets),INTEGER(dd));

    // pack it all up in a list
    int k = 0;
    PROTECT(out = NEW_LIST(nout)); nprotect++;
    PROTECT(outnames = NEW_CHARACTER(nout)); nprotect++;
    k = set_list_elem(out,outnames,tout,"time",k);
    k = set_list_elem(out,outnames,rate,"rate",k);
    k = set_list_elem(out,outnames,tree,"tree",k);
    k = set_list_elem(out,outnames,desc,"description",k);
    k = set_list_elem(out,outnames,epoch,"epochs",k);
    k = set_list_elem(out,outnames,etimes,"etimes",k);
    k = set_list_elem(out,outnames,lineage,"lineages",k);
    k = set_list_elem(out,outnames,cumhaz,"cumhaz",k);
    k = set_list_elem(out,outnames,nuggets,"nuggets",k);
    k = set_list_elem(out,outnames,dd,"dd",k);
    k = set_list_elem(out,outnames,loglik,"loglik",k);
    SET_NAMES(out,outnames);

    delete mgp;

    UNPROTECT(nprotect);
    return out;
  }

}

// TODO:
// - newick input?

Session Info

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C          
 [3] LC_TIME=en_GB.UTF-8    LC_COLLATE=C          
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: America/Detroit
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
 [1] broom_1.0.5       doRNG_1.8.6       rngtools_1.5.2   
 [4] doParallel_1.0.17 iterators_1.0.14  foreach_1.5.2    
 [7] ggtree_3.8.2      lubridate_1.9.3   forcats_1.0.0    
[10] stringr_1.5.1     dplyr_1.1.4       purrr_1.0.2      
[13] readr_2.1.5       tidyr_1.3.1       tibble_3.2.1     
[16] ggplot2_3.5.0     tidyverse_2.0.0   knitr_1.45       

loaded via a namespace (and not attached):
 [1] gtable_0.3.4       xfun_0.43         
 [3] bslib_0.7.0        lattice_0.22-6    
 [5] tzdb_0.4.0         Cairo_1.6-2       
 [7] vctrs_0.6.5        tools_4.3.3       
 [9] generics_0.1.3     yulab.utils_0.1.4 
[11] fansi_1.0.6        highr_0.10        
[13] pkgconfig_2.0.3    data.table_1.15.4 
[15] ggplotify_0.1.2    lifecycle_1.0.4   
[17] compiler_4.3.3     farver_2.1.1      
[19] treeio_1.24.3      munsell_0.5.1     
[21] codetools_0.2-19   ggfun_0.1.4       
[23] htmltools_0.5.8    sass_0.4.9        
[25] yaml_2.3.8         lazyeval_0.2.2    
[27] pillar_1.9.0       jquerylib_0.1.4   
[29] cachem_1.0.8       nlme_3.1-164      
[31] pomp_5.7.0.2       tidyselect_1.2.1  
[33] aplot_0.2.2        digest_0.6.35     
[35] mvtnorm_1.2-4      stringi_1.8.3     
[37] labeling_0.4.3     fastmap_1.1.1     
[39] grid_4.3.3         colorspace_2.1-0  
[41] cli_3.6.2          magrittr_2.0.3    
[43] patchwork_1.2.0    utf8_1.2.4        
[45] ape_5.7-1          withr_3.0.0       
[47] backports_1.4.1    scales_1.3.0      
[49] timechange_0.3.0   rmarkdown_2.26    
[51] deSolve_1.40       hms_1.1.3         
[53] coda_0.19-4.1      memoise_2.0.1     
[55] evaluate_0.23      gridGraphics_0.5-1
[57] rlang_1.1.3        Rcpp_1.0.12       
[59] glue_1.7.0         tidytree_0.4.6    
[61] jsonlite_1.8.8     R6_2.5.1          
[63] fs_1.6.3          

References

1. Yu, G., Smith, D., Zhu, H., Guan, Y. & Lam, T. T.-Y. 2017 Ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36. (doi:10.1111/2041-210X.12628)