\documentclass[twoside]{book}
\usepackage{epsfig}
\input ICCSsty.tex
\shtitle{Formalizing the gene centered view of evolution}
\title{Formalizing the gene centered view of evolution}
\author{Yaneer Bar-Yam and Hiroki Sayama%
\affil{New England Complex Systems Institute\\%
24 Mt.\ Auburn St., Cambridge, MA 02138, USA\\%
yaneer@necsi.net / sayama@necsi.net}}
\abstract{A historical dispute in the conceptual underpinnings of
evolution is the validity of the gene centered view of evolution. We
transcend this debate by formalizing the gene centered view and
establishing the limits on its applicability. We show that the
gene-centered view is a dynamic version of the well known mean field
approximation. It breaks down for trait divergence which corresponds
to symmetry breaking in evolving populations.}
\begin{document}
\maketitle
\section{Introduction}
A basic formulation of evolution requires reproduction (trait
heredity) with variation and selection with competition. At a
particular time, there are a number of organisms which differ from
each other in traits that affect their ability to survive and
reproduce. Differential reproduction over generations leads one
organism's offsprings to progressively dominate over others and
changes the composition of the population of organisms. Variation
during reproduction allows offspring to differ from the parent and an
ongoing process of change over multiple generations is possible. One
of the difficulties with this conventional view of evolution is that
many organisms reproduce sexually, and thus the offspring of an
organism may be quite different from the parent. A partial solution to
this problem is recognizing that it is sufficient for offspring traits
to be correlated to parental traits for the principles of evolution to
apply.
However, the gene centered view\cite{dawkins89} gives a more
simplified perspective for addressing this problem. In the gene
centered view there are assumed to be indivisible elementary units of
the genome (thought of as individual genes) that are preserved from
generation to generation. Different versions of the gene (alleles)
compete and mutate rather than the organism as a whole. Thus the
subject of evolution is the allele, and, in effect, the selection is
of alleles rather than organisms. This simple picture was strongly
advocated by some evolutionary biologists, while others maintained
more elaborate pictures which, for example, differentiate between
vehicles of selection (the organisms) and replicators (the genes).
However, a direct analysis of the gene centered view to reveal its
domain of applicability has not yet been discussed.
In this article we will review the mathematics of some standard
conceptual models of evolution to clarify the relationship between
gene centered and organism based notions of evolution. We will show
that the gene centered view is of limited validity and is equivalent
to a mean field approximation where correlations between the different
genes are ignored, i.e.\ each gene evolves in an average environment
(mean field) within a sexually reproducing population. By showing this
we can recognize why the gene centered view is useful, and also when
it is invalid---when correlations are relevant.
Correlations between genes arise when the presence of one allele in
one place in the genome affects the probability of another allele
appearing in another place in the genome, which is technically called
linkage disequilibrium. One of the confusing points about the gene
centered theory is that there are two stages in which the dynamic
introduction of correlations must be considered: selection and sexual
reproduction (gene mixing). Correlations occur in selection when the
probability of survival favors certain combinations of alleles, rather
than being determined by a product of terms given by each allele
separately. Correlations occur in reproduction when parents are more
likely to mate if they have certain combinations of alleles. If
correlations only occur in selection and not in reproduction, the mean
field approximation continues to be at least partially valid. However,
if there are correlations in both selection and sexual reproduction
then the mean field approximation and the gene centered view break
down. Indeed, there are cases for which it is sufficient for there to
be very weak correlations in sexual reproduction for the breakdown to
occur. For example, populations of organisms distributed over space
and an assumption that reproductive coupling is biased toward
organisms that are born closer to each other can self-consistently
generate allelic correlations in sexual reproduction by symmetry
breaking. This is thus particularly relevant to considering trait
divergence of subpopulations. Simulations of models that illustrate
trait divergence through symmetry breaking can be found
elsewhere\cite{sayama00,sayama00pre}.
\section{Formalizing the gene centered view}
To clarify how standard models of evolution are related to the picture
described above, it must be recognized that the assumptions used to
describe the effect of sexual reproduction are as important as the
assumptions that are made about selection.
A standard first model of sexual reproduction assumes that
recombination of the genes during sexual reproduction results in a
complete mixing of the possible alleles not just in each pair of
mating organisms but rather throughout the species---the group of
organisms that is mating and reproducing. Offspring are assumed to be
selected from the ensemble which represents all possible combinations
of the genomes from reproducing organisms (panmixia).
If we further simplify the model by assuming that each gene controls a
particular phenotypic trait for which selection occurs independent of
other gene-related traits, then each gene would evolve independently;
a selected allele reproduces itself and its presence within an
organism is irrelevant. Without this further assumption, selection
should be considered to operate on the genome of organism, which may
induce correlations in the allele populations in the surviving
(reproducing) organisms. As the frequency of one allele in the
population changes due to evolution over generations, the fitness of
another allele at a different gene will be affected. However, due to
the assumption of complete mixing in sexual reproduction, the
correlations disappear in the offspring and only the average effect
(mean field) of one gene on another is relevant. From the point of
view of a particular allele at a particular gene, the complete mixing
means that at all other genes alleles will be present in the same
proportion that they appear in the population. Thus the assumption of
complete mixing in sexual reproduction is equivalent to a gene based
mean field approximation.
The mean field approximation is widely used in statistical physics as
a ``zeroth'' order approximation to understanding the properties of
systems. There are many cases where it provides important insight to
some aspects of a system (e.g.\ the Ising model of magnets) and others
where is essentially valid (conventional BCS superconductivity). The
application of the mean field approximation to a problem involves
assuming an element (or small part of the system) can be treated in
the average environment that it finds throughout the system. This is
equivalent to assuming that the probability distribution of the states
of the elements factor.\footnote{Systematic strategies for improving
the study of systems beyond the mean field approximation both
analytically and through simulations allow the inclusions of
correlations between element behavior. An introduction to the mean
field approximation and a variety of applications can be found in
Bar-Yam\cite{baryam97}.}
This qualitative discussion of standard models of evolution and their
relationship to the mean field approximation can be shown formally. In
the mean field approximation, the probability of appearance of a
particular state of the system $s$ (e.g.\ a particular genome) is
considered as the product of probabilities of the components $a_i$
(e.g.\ its alleles):
\begin{equation}
P(s) = P(a_1,\ldots,a_n) = \prod_i P_i(a_i)
\end{equation} In the usual application of this approximation, it can
be shown to be equivalent to allowing each of the components to be
placed in an environment which is an average over the possible
environments formed by the other components of the system, hence the
term ``mean field approximation.''
The key to applying this in the context of evolution is to consider
carefully the effect of the reproduction step, not just the selection
step. The two steps of reproduction and selection can be written quite
generally as:
\begin{eqnarray}
\{N(s,t+1)\} &=& R[\{N'(s,t)\}] \\
\{N'(s,t)\} &=& D[\{N(s,t)\}]
\end{eqnarray} The first equation describes reproduction. The number of
offspring $N(s,t+1)$ having a particular genome $s$ is written as a
function of the reproducing organisms $N'(s,t)$ from the previous
generation. The second equation describes selection. The reproducing
population $N'(s,t)$ is written as a function of the same generation
at birth $N(s,t)$. The brackets on the left indicate that each
equation represents a set of equations for each value of the genome.
The brackets within the functions indicate, for example, that each of
the offspring populations depends on the entire parent population.
The proportion of alleles can be written as the number of organisms
which have a particular allele $a_i$ at gene $i$ divided by the total
number of organisms:
\begin{equation}
P'_i(a_i,t) = \frac{1}{N'_0(t)} \sum_{a_j,~j\neq i} N'(s,t)
\eqlabel{baryam:eq3}
\end{equation} where $s=(a_1,\ldots,a_n)$ represents the genome in
terms of alleles $a_i$.\footnote{This expression applies generally to
haploid, diploid, or other cases.} The sum is over all alleles of
genes $j$ except gene $i$ that is fixed to allele $a_i$. $N'_0(t)$ is
the total reproducing population at time $t$. Using the assumption of
complete allelic mixing by sexual reproduction, the frequency of
allele $a_i$ in the offspring is determined by only the proportion of
$a_i$ in the parent population. Then, the same offspring would be
achieved by an `averaged' population with a number of reproducing
organisms given by
\begin{equation}
\tilde{N}'(s,t) = N'_0(t) \prod_i P'_i(a_i,t) \eqlabel{baryam:eq4}
\end{equation} since this $\tilde{N}'(s,t)$ has the same allelic
proportions as $N'(s,t)$ in \eq{baryam:eq3}. Thus complete
reproductive mixing assumes that:
\begin{equation}
R[\{\tilde{N}'(s,t)\}] \approx
R[\{N'(s,t)\}]
\end{equation} The form of \eq{baryam:eq4} indicates that the effective
probability of a particular genome can be considered as a product of
the probabilities of the individual genes---as if they were
independent. It follows that a complete step including both
reproduction and selection can also be written in terms of the allele
probabilities in the whole population. Given the above equations the
update of an allele probability is:
\begin{equation}
P'_i(a_i,t+1) \approx \frac{1}{N'_0(t+1)} \sum_{a_j,~j\neq i}
D_s[R[\{\tilde{N}'(s,t)\}]]
\end{equation} where $D_s$ is a function which satisfies
$N'(s,t)=D_s[\{N(s,t)\}]$. Given the form of \eq{baryam:eq4} and the
additional assumption that the relative dynamics of change of genome
proportions is not affected by the absolute population size $N'_0$, we
could write this as an effective one-step update
\begin{equation}
P'_i(a_i,t+1) = \tilde{D}[\{P'_i(a_i,t)\}] \eqlabel{baryam:eq7}
\end{equation} which describes the allele population change from one
generation to the next of offspring. Since this equation describes the
behavior of a single allele it corresponds to the gene centered view.
There is still a difficulty pointed out by Sober and
Lewontin\cite{sober82}. The effective fitness of each allele depends
on the distribution of alleles in the population. Thus, the fitness of
an allele is coupled to the evolution of other alleles. This is
apparent in \eq{baryam:eq7} which, as indicated by the brackets, is a
function of all the allele populations. It corresponds, as in other
mean field approximations, to placing an allele in an average
environment formed from the other alleles. This problem with fitness
assignment would not be present if each allele separately coded for an
organism trait. While this is a partial violation of the simplest
conceptual view of evolution, however, the applicability of a gene
centered view can still be justified, as long as the contextual
assignment of fitness is included. When the fitness of organism
phenotype is dependent on the relative frequency of phenotypes in a
population of organisms it is known as frequency dependent selection,
which is a concept that is being applied to genes in this context.
A more serious breakdown of the mean field approximation occurs
when the assumption of complete mixing during reproduction does
not hold. This corresponds to symmetry breaking.
\section{Breakdown of the gene centered view}
We can provide a specific example of breakdown of the mean field
approximation using a simple example. We start by using a simple model
for population growth, where an organism that reproduces at a rate of
$\lambda$ offspring per individual per generation has a population
growth described by an iterative equation:
\begin{equation}
N(t+1) = \lambda N(t) \eqlabel{baryam:eq8}
\end{equation} We obtain a standard model for fitness and selection by
taking two equations of the form \eq{baryam:eq8} for two populations
$N_1(t)$ and $N_2(t)$ with $\lambda_1$ and $\lambda_2$ respectively,
and normalize the population at every step so that the total number of
organisms remains fixed at $N_0$. We have that:
\begin{equation}
\def\arraystretch{2.2}\begin{array}{rcl}
N_1(t+1) &=& \displaystyle \frac{\lambda_1 N_1(t)}{\lambda_1 N_1(t)
+\lambda_2 N_2(t)}N_0 \\
N_2(t+1) &=& \displaystyle \frac{\lambda_2 N_2(t)}{\lambda_1 N_1(t)
+\lambda_2 N_2(t)}N_0
\end{array}
\end{equation} The normalization does not change the relative dynamics
of the two populations, thus the faster-growing population will
dominate the slower-growing one according to their relative
reproduction rates. If we call $\lambda_i$ the fitness of the $i$th
organism we see that according to this model the organism populations
grow at a rate that is determined by the ratio of their fitness to the
average fitness of the population.
Consider now sexual reproduction where we have multiple genes. In
particular, consider two nonhomologue genes with selection in favor of
a particular combination of alleles on genes. Specifically, after
selection, when allele $A_1$ appears in one gene, allele $B_1$ must
appear on the second gene, and when allele $A_{-1}$ appears on the
first gene allele $B_{-1}$ must appear on the second gene. We can
write these high fitness organisms with the notation $(1,1)$ and
$(-1,-1)$, and the organisms with lower fitness (for simplicity,
$\lambda=0$) as $(1,-1)$ and $(-1,1)$. When correlations in
reproduction are neglected there are two stable states of the
population with all organisms $(1,1)$ or all organisms $(-1,-1)$. If
we start with exactly 50\% of each allele, then there is an unstable
steady state in which 50\% of the organisms reproduce and 50\% do not
in every generation. Any small bias in the proportion of one or the
other will cause there to be progressively more of one type over the
other, and the population will eventually have only one set of alleles.
We can solve this example explicitly for the change in population in
each generation when correlations in reproduction are neglected. It
simplifies matters to realize that the reproducing parent population
(either $(1,1)$ or $(-1,-1)$) must contain the same proportion of the
correlated alleles ($A_1$ and $B_1$) so that:
\begin{equation}
\def\arraystretch{1.2}\begin{array}{rclcl}
P_{1,1}(t)+P_{1,-1}(t) &=& P_{1,1}(t)+P_{-1,1}(t) &=& p(t) \\
P_{-1,1}(t)+P_{-1,-1}(t) &=& P_{1,-1}(t)+P_{-1,-1}(t) &=& 1-p(t)
\end{array}
\end{equation} where $p$ is a proportion of allele $A_1$ or $B_1$. The
reproduction equations are:
\begin{equation}
\def\arraystretch{1.2}\begin{array}{rcl}
P_{1,1}(t+1) &=& p(t)^2 \\
P_{1,-1}(t+1) = P_{-1,1}(t+1) &=& p(t)(1-p(t)) \\
P_{-1,-1}(t+1) &=& (1-p(t))^2
\end{array}
\end{equation}
The proportion of the alleles in the generation $t$ is given by the
selected organisms:
\begin{equation}
p(t) = P'_{1,1}(t) + P'_{1,-1}(t) = P'_{1,1}(t) + P'_{-1,1}(t) \eqlabel{baryam:eq12}
\end{equation} Since the less fit organisms $(1,-1)$ and $(-1,1)$ do
not reproduce this is described by:
\begin{equation}
p(t) = P'_{1,1}(t) = \frac{P_{1,1}(t)}{P_{1,1}(t)+P_{-1,-1}(t)}
\end{equation}
This gives the update equation
\begin{equation}
p(t+1) = \frac{p(t)^2}{p(t)^2+(1-p(t))^2} \eqlabel{baryam:eq14}
\end{equation} which has the behavior described above and shown in
\fig{baryam:fig1}. This problem is reminiscent of an Ising ferromagnet
at very low temperature. Starting from a nearly random state with a
slight bias in the number of {\sc up} and {\sc down} spins, the spins
align becoming either all {\sc up} or all {\sc down}.
\Fig{baryam:fig1}{\psfig{file=bs-fig1.eps,width=0.6\columnwidth}}
{Behavior of $p$ in \eq{baryam:eq14} with several different initial
values.}
Since we can define the proportion of a gene in generation $t$ and in
generation $t+1$ we can always write an expression for allele
evolution in the form
\begin{equation}
P_i(a_i,t+1) =
\frac{\lambda_{a_i}}{\sum_{a_i}\lambda_{a_i}P_i(a_i,t)}P_i(a_i,t)
\end{equation} so that we have evolution that can be described in terms
of gene rather than organism behavior. The fitness coefficient
$\lambda_1$ for allele $A_1$ or $B_1$ is seen from \eq{baryam:eq14} to
be
\begin{equation}
\lambda_1(t) = p(t) \eqlabel{baryam:eq16}
\end{equation} with the corresponding $\lambda_{-1}=1-\lambda_1$. The
assignment of a fitness to an allele reflects the gene centered view.
The explicit dependence on the population composition has been
objected to on grounds of biological appropriateness\cite{sober82}.
For our purposes, we recognize this dependence as the natural outcome
of a mean field approximation.
It is interesting to consider when this picture breaks down more
severely due to a breakdown in the assumption of complete reproductive
mixing. In this example, if there is a spatial distribution in the
organism population with mating correlated by spatial location and
fluctuations so that the starting population has more of the alleles
represented by $1$ in one region and more of the alleles represented
by $-1$ in another region, then patches of organisms that have
predominantly $(1,1)$ or $(-1,-1)$ will form after several generations.
This symmetry breaking, like in a ferromagnet, is the usual breakdown
of the mean field approximation. Here it creates correlations in the
genetic makeup of the population. When the correlations become
significant then the whole population becomes to contain a number of
types. The formation of organism types depends on the existence of
correlations in reproduction that are, in effect, a partial form of
speciation. For an example of such symmetry breaking and pattern
formation see reference\cite{sayama00,sayama00pre}.
Thus we see that the most dramatic breakdown of the mean field
approximation / gene centered view occurs when multiple organism types
form. This is consistent with our understanding of ergodicity
breaking, phase transitions and the mean field approximation.
Interdependence at the genetic level is echoed in the population
through the development of subpopulations. We should emphasize again
that this symmetry breaking required both selection and reproduction
to be coupled to gene correlations.
\section{Conclusion}
The gene centered view can be applied directly in populations where
sexual reproduction causes complete allelic mixing, and only so long
as effective fitnesses are understood to be relative to the prevailing
gene pool. However, structured populations (e.g.\ species with
demes---local mating neighborhoods) are unlikely to conform to the
mean field approximation / gene centered view. Moreover, it does not
apply to considering the consequences of trait divergence, which can
occur when such correlations in organism mating occur. These issues
are important in understanding problems that lie at scales
traditionaly between the problems of population biology and those of
evolutionary theory: e.g.\ the understanding of ecological diversity
and sympatric speciation\cite{sayama00,sayama00pre}.
\bibliography{bs-revised}
\end{document}