JavaScript is disabled on your browser. Please enable JavaScript to use all the features on this page. This page uses JavaScript to progressively load the article content as a user scrolls. Click the View full text link to bypass dynamically loaded article content. View full text

Volume 391, 21 February 2016, Pages 21–34

Selection maintaining protein stability at equilibrium

Sanzo Miyazawa

Show more Show less

doi:10.1016/j.jtbi.2015.12.001

Get rights and content

Highlights

•: Protein stability is kept at equilibrium by random drift and positive selection.
•: Neutral selection is predominant only for low-abundant, non-essential proteins.
•: Protein abundance more decreases evolutionary rate for less-constrained proteins.
•: Structural constraint more decreases evolutionary rate for less-abundant, less-essential proteins.
•: Protein stability (−ΔG_e/kT) $(- Δ G_{e} / kT)$ and 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ are predicted to decrease as growth temperature increases.

Abstract

The common understanding of protein evolution has been that neutral mutations are fixed by random drift, and a proportion of neutral mutations depending on the strength of structural and functional constraints primarily determines evolutionary rate. Recently it was indicated that fitness costs due to misfolded proteins are a determinant of evolutionary rate and selection originating in protein stability is a driving force of protein evolution. Here we examine protein evolution under the selection maintaining protein stability.

Protein fitness is a generic form of fitness costs due to misfolded proteins; $s = κ \exp (Δ G / kT) (1 - \exp (Δ Δ G / kT))$ , where s and ΔΔG $Δ Δ G$ are selective advantage and stability change of a mutant protein, ΔG $Δ G$ is the folding free energy of the wildtype protein, and κ is a parameter representing protein abundance and indispensability. The distribution of ΔΔG $Δ Δ G$ is approximated to be a bi-Gaussian distribution, which represents structurally slightly- or highly-constrained sites. Also, the mean of the distribution is negatively proportional to ΔG $Δ G$ .

The evolution of this gene has an equilibrium point (ΔG_e $Δ G_{e}$ ) of protein stability, the range of which is consistent with observed values in the ProTherm database. The probability distribution of K_a/K_s $K_{a} / K_{s}$ , the ratio of nonsynonymous to synonymous substitution rate per site, over fixed mutants in the vicinity of the equilibrium shows that nearly neutral selection is predominant only in low-abundant, non-essential proteins of ΔG_e>−2.5 $Δ G_{e} > - 2.5$ kcal/mol. In the other proteins, positive selection on stabilizing mutations is significant to maintain protein stability at equilibrium as well as random drift on slightly negative mutations, although the average 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ is less than 1. Slow evolutionary rates can be caused by both high protein abundance/indispensability and large effective population size, which produces positive shifts of ΔΔG $Δ Δ G$ through decreasing ΔG_e $Δ G_{e}$ , and strong structural constraints, which directly make ΔΔG $Δ Δ G$ more positive. Protein abundance/indispensability more affect evolutionary rate for less constrained proteins, and structural constraint for less abundant, less essential proteins. The effect of protein indispensability on evolutionary rate may be hidden by the variation of protein abundance and detected only in low-abundant proteins. Also, protein stability (−ΔG_e/kT) $(- Δ G_{e} / kT)$ and 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ are predicted to decrease as growth temperature increases.

Graphical abstract

Figure options

Keywords

Neutral theory;
Positive selection;
Evolutionary rate;
Structural constraints;
Protein abundance

1. Introduction

The common understanding of protein evolution has been that amino acid substitutions observed in homologous proteins are selectively neutral (Kimura, 1968, Kimura, 1969, Kimura and Ohta, 1971 and Kimura and Ohta, 1974) or slightly deleterious (Ohta, 1973 and Ohta, 1992), and random drift is a primary force to fix amino acid substitutions in population. The rate of protein evolution has been understood to be determined primarily by the proportion of neutral mutations, which may be measured by the ratio of nonsynonymous to synonymous substitution rate per site (K_a/K_s $K_{a} / K_{s}$ ) (Miyata and Yasunaga, 1980) and determined by functional density (Zuckerkandl, 1976) weighted by the relative variability at specific-function sites of a protein (Go and Miyazawa, 1980). Since then, these theories have been widely accepted, however, recently a question has been raised on whether the diversity of protein evolutionary rate among genes can be explained only by the proportion and the variability of specific-function sites, and molecular and population-genetic constraints on protein evolutionary rate have been explored.

Recent works have revealed that protein evolutionary rate is correlated with gene expression level; highly expressed genes evolve slowly, accounting for as much as 34% of rate variation in yeast (Pál et al., 2001). Of course, there are many reports that support a principle of lower evolution rate for stronger functional density. Broadly expressed proteins in many tissues tend to evolve slower than tissue-specific ones (Kuma et al., 1995 and Duret and Mouchiro, 2000). The connectivity of well-conserved proteins in a network is shown (Fraser et al., 2002) to be negatively correlated with their rate of evolution, because a greater proportion of the protein sequence is directly involved in its function. A fitness cost due to protein–protein misinteraction affects the evolutionary rate of surface residues (Yang et al., 2012). Protein dispensability in yeast is correlated with the rate of evolution (Hirsh and Fraser, 2001 and Hirsh and Fraser, 2003), although there is a report insisting on no correlation between them (Pál et al., 2003). Other reports indicate that the correlation between gene dispensability and evolutionary rate, although low, is significant (Zhang and He, 2005, Wall et al., 2005 and Jordan et al., 2002).

It was proposed (Drummond et al., 2005, Drummond and Wilke, 2008 and Geiler-Samerotte et al., 2011) that low substitution rates of highly expressed genes could be explained by fitness costs due to functional loss and toxicity (Stoebel et al., 2008 and Geiler-Samerotte et al., 2011) of misfolded proteins. Misfolding reduces the concentration of functional proteins, and wastes cellular time and energy on production of useless proteins. Also misfolded proteins form insoluble aggregates (Geiler-Samerotte et al., 2011). Fitness cost due to misfolded proteins is larger for highly expressed genes than for less expressed ones.

Fitness cost due to misfolded proteins was formulated (Drummond and Wilke, 2008 and Geiler-Samerotte et al., 2011) to be related to the proportion of misfolded proteins. Knowledge of protein folding indicates that protein folding primarily occurs in two-state transition (Miyazawa and Jernigan, 1982a and Miyazawa and Jernigan, 1982b), which means that the ensemble of protein conformations are a mixture of completely folded and unfolded conformations. Free energy (ΔG $Δ G$ ) of protein stability, which is equal to the free energy of the denatured state subtracted from that of the native state, and stability change (ΔΔG $Δ Δ G$ ) due to amino acid substitutions are collected in the ProTherm database (Kumar et al., 2006), although the data are not sufficient. Prediction methods, however, for ΔΔG $Δ Δ G$ are improved enough to reproduce real distributions of ΔΔG $Δ Δ G$ (Schymkowitz et al., 2005 and Yin et al., 2007). Therefore, on the biophysical basis, the distribution of fitness can be estimated and protein evolution can be studied. Shakhnovich group studied protein evolution on the basis of knowledge of protein folding (Serohijos and Shakhnovich, 2014 and Dasmeh et al., 2014) and showed (Serohijos et al., 2012) that the negative correlation between protein abundance and K_a/K_s $K_{a} / K_{s}$ was caused by the distribution of ΔΔG $Δ Δ G$ that negatively correlates with the ΔG $Δ G$ of a wild type. Also, it was shown (Serohijos et al., 2013) that highly abundant proteins had to be more stable than low abundant ones. Relationship between evolutionary rate and protein stability is studied from various points of view (Echave et al., 2015 and Faure and Koonin, 2015).

Here we study relationship between evolutionary rate and selection on protein stability in a monoclonal approximation. A fitness assumed here for a protein is a generic form to which all formulations (Drummond and Wilke, 2008, Geiler-Samerotte et al., 2011, Serohijos et al., 2012, Serohijos et al., 2013, Serohijos and Shakhnovich, 2014 and Dasmeh et al., 2014) previously employed for protein fitness are reduced in the condition of exp(βΔG)⪡1 $\exp (β Δ G) ⪡ 1$ , which is satisfied in the typical range of folding free energies shown in Fig. 1; β=1/(kT) $β = 1 / (kT)$ , k is the Boltzmann constant and T is absolute temperature. The generic form of Malthusian fitness of a protein-coding gene is $m \equiv - κ \exp (β Δ G)$ , where κ is a parameter, which may be a function of protein abundance and dispensability; see Methods for details. The distribution of stability change ΔΔG $Δ Δ G$ due to single amino acid substitutions is approximated as a weighted sum of two Gaussian functions that was shown (Tokuriki et al., 2007) to well reproduce actual distributions of ΔΔG $Δ Δ G$ . One of the two Gaussian functions describes substitutions at structurally less-constrained surface sites, and the other at more-constrained core sites of proteins. The proportion of less-constrained surface sites is a parameter (θ).

Fig. 1.

Distribution of folding free energies of monomeric protein families. Stability data of monomeric proteins for which the item of dG_H2O or dG was obtained in the experimental condition of 6.7≤pH≤7.3 $6.7 \leq pH \leq 7.3$ and $20 ° C \leq T \leq 30 ° C$ and their folding-unfolding transition is two state and reversible are extracted from the ProTherm (Kumar et al., 2006); in the case of dG only thermal transition data are used. Thermophilic proteins, and proteins observed with salts or additives are also removed. An equal sampling weight is assigned to each species of homologous protein, and the total sampling weight of each protein family is normalized to one. In the case in which multiple data exist for the same species of protein, its sampling weight is divided to each of the data. However, proteins whose stabilities are known may be samples biased from the protein universe. The value, ΔG_e=−5.24 $Δ G_{e} = - 5.24$ , kcal/mol of equilibrium stability at the representative parameter values, $\log 4 N_{e} κ = 7.55$ and θ=0.53 $θ = 0.53$ , agrees with the most probable value of ΔG in the distribution above. Also, the range of ΔG $Δ G$ shown above is consistent with that range, −2 $- 2$ to $- 12.5 kcal / mol$ , expected from the present model. The kcal/mol unit is used for ΔG $Δ G$ . A similar distribution was also compiled (Zeldovich et al., 2007).

Figure options

The fixation probability of a mutant with ΔΔG $Δ Δ G$ can be calculated for a duploid population with effective population size N_e (Crow and Kimura, 1970). In the population of genes with such a fitness protein stability is evolutionarily maintained at equilibrium, and equilibrium stability (ΔG_e $Δ G_{e}$ ) negatively correlates with protein abundance/dispensability (κ ). The range of ΔG_e $Δ G_{e}$ is consistent with the observed range of folding free energies shown in Fig. 1.

The probability density functions (PDF) of K_a/K_s $K_{a} / K_{s}$ , the ratio of nonsynonymous to synonymous substitution rate per site (Miyata and Yasunaga, 1980), at equilibrium and also in the vicinity of equilibrium are numerically examined over a whole domain of the parameters, $0 \leq \log 4 N_{e} κ \leq 20$ and 0≤θ≤1 $0 \leq θ \leq 1$ . The dependences of evolutionary rate on protein abundance/dispensability and on structural constraint are quantitatively described, and it is shown that both factors cannot be ignored on protein evolutionary rate, although protein abundance/indispensability more affect evolutionary rate for less constrained proteins, and structural constraint for less abundant, less essential proteins. Like protein abundance, protein indispensability must correlate with evolutionary rate, but a correlation between them may be hidden by the variation of protein abundance as well as effective population size, and detected only in low-abundant proteins. It has also become clear that nearly neutral selection is predominant only in low-abundant, non-essential proteins with $\log 4 N_{e} κ < 2$ or ΔG_e>−2.5 $Δ G_{e} > - 2.5$ kcal/mol, and in the other proteins positive selection is significant to more stabilize a less-stable wild type. Also, a significant amount of slightly negative mutants are fixed in population by random drift. This view of protein evolution is contrary to the previous understanding. The present model based on a biophysical knowledge of protein stability also indicates that protein stability (−βΔG_e) $(- β Δ G_{e})$ and the average of K_a/K_s $K_{a} / K_{s}$ decrease as growth temperature increases.

2. Methods

2.1. Fitness costs due to misfolded proteins

Misfolding can impose costs in three distinct ways (Geiler-Samerotte et al., 2011); loss of function, diversion of protein synthesis resources away from essential proteins, and toxicity of the misfolded molecules. Fitness cost due to functional loss was formulated (Drummond and Wilke, 2008) by taking account of protein dispensability. Assuming that fitness cost of each gene is additive in the Malthusian fitness scale, the total Malthusian fitness of a genome was estimated as

equation1

m_{dispensability} \equiv - \sum_{i} γ_{i} (1 - f_{i}^{native})

Turn MathJax on

where −γ_i

- γ_{i}

is defined as −γ_i≡log

- γ_{i} \equiv \log

(deletion-strain growth rate/max growth rate), and

f_{i}^{native}

is the fraction of the native conformation for gene i.

Protein folding primarily occurs in the two-state transition, which means that protein conformations are a mixture of completely folded and unfolded conformations (Miyazawa and Jernigan, 1982a and Miyazawa and Jernigan, 1982b). Therefore, if the completely folded (native) state is more stable by a free energy difference ΔG $Δ G$ than the unfolded (denatured) state, then the native fraction in the conformational ensemble will be equal to

equation2

f^{native} = \frac{e^{- β Δ G}}{1 + e^{- β Δ G}}

Turn MathJax on

where β=1/kT

β = 1 / kT

; k is the Boltzmann constant and T is absolute temperature.

Thus, Eq. (1) for the Malthusian fitness of a genome can be transformed as follows in terms of the folding free energy ΔG $Δ G$ of the native conformation:

equation3

m_{dispensability} = - \sum_{i} γ_{i} \frac{e^{β Δ G_{i}}}{e^{β Δ G_{i}} + 1}

Turn MathJax on

Because of

\exp (β Δ G) ⪡ 1

in the typical range of folding free energies shown in Fig. 1, the above definition of fitness is approximated by

equation4

m_{dispensability} = - \sum_{i} γ_{i} [e^{β Δ G_{i}} - O (e^{2 β Δ G_{i}})]

Turn MathJax on

Drummond and Wilke (2008) took notice of toxicity of misfolded proteins as well as diversion of protein synthesis resources, and formulated the Malthusian fitness (m_misfolds $m_{misfolds}$ ) of a genome to be negatively proportional to the total amount of misfolded proteins, which must be produced to obtain the necessary amount of folded proteins (Serohijos et al., 2012):

equation5

m_{misfolds} = - c \sum_{i} A_{i} \frac{1 - f_{i}^{native}}{f_{i}^{native}}

Turn MathJax on

equation6

m_{misfolds} = - c \sum_{i} A_{i} e^{β Δ G_{i}}

Turn MathJax on

where c is a positive constant and assumed to be c=0.0001, and A_i is the abundance of protein i.

2.2. Fitness of a linear metabolic pathway

Serohijos and Shakhnovich (2014) examined the evolution of a linear metabolic pathway whose Wrightian fitness was defined as

equation7

w_{linear pathway} \equiv w_{flux} + w_{misfolds}

Turn MathJax on

equation8

w_{flux} \equiv \frac{\sum_{i} ε_{i} A_{i}^{- 1}}{\sum_{i} ε_{i} {(A_{i} f_{i}^{native})}^{- 1}}

Turn MathJax on

equation9

w_{misfolds} \equiv - c \sum_{i} A_{i} (1 - f_{i}^{native})

Turn MathJax on

where ε_i was defined as enzyme efficiency and assumed to be ε_i=1

ε_{i} = 1

. The w_flux

w_{flux}

is a fitness originating from the enzymatic flux of a linear metabolic pathway, and w_misfolds

w_{misfolds}

represents the effect of toxicity of misfolded proteins, and is the same functional form as Eq. (1), although Eq. (1) is a definition for Malthusian fitness. Then, the Malthusian fitness corresponding to the Wrightian fitness above can be represented as

equation10

m_{linear_pathway}

m_{linear_pathway}

Turn MathJax on

equation11

= \log [{1 + \sum_{i} \frac{ε_{i} A_{i}^{- 1}}{\sum_{i} ε_{i} A_{i}^{- 1}} {se}^{β Δ G_{i}}}^{- 1} - c \sum_{i} A_{i} e^{β Δ G_{i}} {(1 + e^{β Δ G_{i}})}^{- 1}] = - \sum_{i} {\frac{ε_{i} A_{i}^{- 1}}{(\sum_{i} ε_{i} A_{i}^{- 1})} + {cA}_{i}} e^{β Δ G_{i}} + O ({(\sum_{i} {\frac{ε_{i} A_{i}^{- 1}}{(\sum_{i} ε_{i} A_{i}^{- 1})} + {cA}_{i}} e^{β Δ G_{i}})}^{2})

Turn MathJax on

Because cA_i≤0.459

{cA}_{i} \leq 0.459

(Serohijos and Shakhnovich, 2014), ΔG<−3

Δ G < - 3

and

\sum_{i = 1}^{10} {cA}_{i} \exp (β Δ G_{i}) < 0.03

, the higher order terms can be neglected in this case. However, the fitness costs due to the flux and misfolded proteins may be formulated to be additive in the Malthusian scale rather than in the Wrightian scale, employing Eq. (6) for the fitness cost due to misfolded proteins;

equation12

m_{linear_pathway} \equiv - \sum_{i} {\frac{ε_{i} A_{i}^{- 1}}{\sum_{i} ε_{i} A_{i}^{- 1}} + {cA}_{i}} e^{β Δ G_{i}}

Turn MathJax on

2.3. Other formulations of protein fitness

Also, the following simple definition for fitness to maintain protein stability was used (Dasmeh et al., 2014):

equation13

w∝f_native

w \propto f_{native}

Turn MathJax on

equation14

m=−e^βΔG+O(e^2βΔG)+constant

m = - e^{β Δ G} + O (e^{2 β Δ G}) + constant

Turn MathJax on

In addition, Eq. (3) for functional loss was employed with γ_i⇒cA_i $γ_{i} \Rightarrow {cA}_{i}$ to represent toxicity of misfolded proteins in (Serohijos et al., 2012 and Serohijos et al., 2013).

2.4. A generic form of protein fitness

Thus, all expressions above for Malthusian fitness of protein can be well approximated by the following expression, because of exp(βΔG)⪡1 $\exp (β Δ G) ⪡ 1$ in the typical range of folding free energies shown in Fig. 1:

equation15

m \equiv - \sum_{i} κ_{i} e^{β Δ G_{i}} with κ_{i} \geq 0

Turn MathJax on

where κ_i is a parameter. If the fitness costs of functional loss and toxicity due to misfolded proteins are taken into account, κ_i will be defined as

equation16

κ_i=cA_i+γ_i≥0

κ_{i} = {cA}_{i} + γ_{i} \geq 0

Turn MathJax on

assuming their additivity in the Malthusian fitness scale.

The selective advantage of a mutant, in which each protein is destabilized by ΔΔG_i $Δ Δ G_{i}$ , to the wild type can be represented by

equation17

s \equiv m^{mutant} - m^{wildtype} = \sum_{i} s_{i}

Turn MathJax on

equation18

s_{i} = κ_{i} e^{β Δ G_{i}} (1 - e^{β Δ Δ G_{i}}) with κ_{i} \geq 0

Turn MathJax on

3. Results

3.1. Protein stability and fitness

Here, we consider the evolution of a single protein-coding gene in which the selective advantage of mutant proteins in Malthusian parameters is assumed to be

equation19

s = κ e^{β Δ G} (1 - e^{β Δ Δ G}) with κ \geq 0

Turn MathJax on

and therefore s is upper-bounded by

equation20

s≤κe^βΔG

s \leq κ e^{β Δ G}

Turn MathJax on

where ΔG

Δ G

is the stability of a wild-type protein, ΔΔG

Δ Δ G

is a stability change of a mutant protein, β=1/kT

β = 1 / kT

; unless specified, β=1/0.593

β = 1 / 0.593

kcal⁻¹ mol corresponding to

T = 298 °

K. κ is a parameter whose meaning may depend on the situation; refer to Method for details. If the fitness costs of functional loss and toxicity due to misfolded proteins are both taken into account and assumed to be additive in the Malthusian fitness scale, κ will be defined as

equation21

κ=cA+γ

κ = cA + γ

Turn MathJax on

where c is fitness cost per misfolded protein ( Drummond and Wilke, 2008 and Geiler-Samerotte et al., 2011), A is the cellular abundance of the protein ( Drummond and Wilke, 2008 and Geiler-Samerotte et al., 2011), and γ is indispensability ( Drummond and Wilke, 2008) and defined to be γ=−log(

γ = - \log (

deletion-strain growth rate/max growth rate). Equation (19) indicates that the selective advantage s is upper-bounded by

κ \exp (β Δ G)

. The parameter κ is assumed in the present analysis to take values in the range of

0 \leq \log 4 N_{e} κ \leq 20

with effective population size N_e, taking account of the values of the parameters, c~10⁻⁴

c ~ 10^{- 4}

(Drummond and Wilke, 2008), 10<A<10⁶

10 < A < 10^{6}

(Ghaemmaghami et al., 2003), γ=10

γ = 10

for essential genes (Drummond and Wilke, 2008), and N_e~10⁴

N_{e} ~ 10^{4}

to 10⁵ for vertebrates, ~10⁵

~ 10^{5}

to 10⁶ for invertebrates, ~10⁷

~ 10^{7}

to 10⁸ for unicellular eukaryotes, and >10⁸

> 10^{8}

for prokaryotes (Lynch and Conery, 2003). The above ranges of the parameters indicate that the effect of protein indispensability (γ ) may be hidden by the variation of protein abundance (cA

cA

) as well as effective population size (N_e), and may be detected only in low-abundant proteins.

Based on measurements of stability changes due to single amino acid substitutions in proteins, which are collected in the ProTherm database (Kumar et al., 2006), Serohijos et al. (2012) reported that the distribution of ΔΔG $Δ Δ G$ is approximately a Gaussian distribution with mean=1 kcal/mol and standard deviation=1.7 kcal/mol. In addition, it was shown (Serohijos et al., 2012) that the mean of ΔΔG $Δ Δ G$ is negatively proportional to ΔG $Δ G$ , and that this dependence of the mean of ΔΔG $Δ Δ G$ on ΔG $Δ G$ is not large but still important to cause the observed negative correlation between protein abundance and evolutionary rate. On the other hand, (Tokuriki et al., 2007) computationally predicted ΔΔG $Δ Δ G$ for all possible single amino acid substitutions in 21 different globular, single domain proteins, and showed that the predicted distributions of ΔΔG $Δ Δ G$ were strikingly similar despite a range of protein sizes and folds and largely follow a bi-Gaussian distribution: one of the two Gaussian distributions results from substitutions on protein surfaces and is a narrow distribution with a mildly destabilizing mean ΔΔG $Δ Δ G$ , whereas the other due to substitutions in protein cores is a wider distribution with a stronger destabilizing mean (Tokuriki et al., 2007).

Here, according to Tokuriki et al. (2007), the distribution of ΔΔG $Δ Δ G$ due to single amino acid substitutions is approximated as a bi-Gaussian function with the dependence of mean ΔΔG $Δ Δ G$ on ΔG $Δ G$ , in order to examine the effects of structural constraint on evolutionary rate. The probability density function (PDF) of ΔΔG $Δ Δ G$ , p(ΔΔG) $p (Δ Δ G)$ , for nonsynonymous substitutions is assumed to be

equation22

p(ΔΔG)=θN(μ_s,σ_s)+(1−θ)N(μ_c,σ_c)

p (Δ Δ G) = θ N (μ_{s}, σ_{s}) + (1 - θ) N (μ_{c}, σ_{c})

Turn MathJax on

where 0≤θ≤1

0 \leq θ \leq 1

, and N(μ,σ)

N (μ, σ)

is a normal distribution with mean μ and standard deviation σ. Since the majority of substitutions appear to be single nucleotide substitutions, the values of the standard deviations (σ_s and σ_c) estimated in Tokuriki et al. (2007) for single nucleotide substitutions are employed here; in kcal/mol units,

equation23

μ_{s} = - 0.14 Δ G - 0.17, σ_{s} = 0.90

Turn MathJax on

equation24

μ_{c} = - 0.14 Δ G + 1.23, σ_{c} = 1.93

Turn MathJax on

To analyze the dependences of the means, μ_s and μ_c, on ΔG

Δ G

, we plotted the observed values of ΔΔG

Δ Δ G

of single amino acid mutants against ΔG

Δ G

of the wild type, which are collected in the ProTherm database (Kumar et al., 2006); the same analysis was done by Serohijos et al. (2012). Fig. 2 shows a significant dependence of ΔΔG

Δ Δ G

on ΔG

Δ G

; the regression line is μ=−0.14ΔG+0.49

μ = - 0.14 Δ G + 0.49

. The linear slopes of μ_s and μ_c are taken to be equal to the slope (−0.14

- 0.14

) of the regression line. The intercepts have been estimated to satisfy the following two conditions:

1.: Equations (23) and (24) satisfy μ_s(ΔG₀)=0.56 $μ_{s} (Δ G_{0}) = 0.56$ and μ_c(ΔG₀)=1.96 $μ_{c} (Δ G_{0}) = 1.96$ , which were estimated for single nucleotide substitutions in Tokuriki et al. (2007), at a certain value (ΔG₀ $Δ G_{0}$ ) of ΔG $Δ G$ .
2.: The total mean of the two Gaussian functions agrees with the regression line, μ=−0.14ΔG+0.49 $μ = - 0.14 Δ G + 0.49$ . The value of θ is taken to be 0.53, which is equal to the average of θ over proteins used in Tokuriki et al. (2007).

A representative value, 7.550, of

\log 4 N_{e} κ

is determined in such way that the equilibrium value of ΔG

Δ G

is equal to ΔG₀=−5.24

Δ G_{0} = - 5.24

introduced above; ΔG_e

Δ G_{e}

is explicitly defined later. It is interesting that this value ΔG_e=−5.24

Δ G_{e} = - 5.24

kcal/mol agrees with the most probable value of ΔG

Δ G

in the observed distribution of protein stabilities shown in Fig. 1. The fraction θ of less-constrained residues such as most residues on protein surface is correlated with protein length for globular, monomeric proteins;

θ = 1.27 - 0.33 \cdot \log_{10} (protein length) for 50 \leq length \leq 330

(Tokuriki et al., 2007). However, residues taking part in protein-protein interactions may be regarded as core residues rather than surface residues.

Fig. 2.

Dependence of stability changes, ΔΔG $Δ Δ G$ , due to single amino acid substitutions on the protein stability, ΔG $Δ G$ , of the wild type. A solid line shows the regression line, ΔΔG=−0.14ΔG+0.49 $Δ Δ G = - 0.14 Δ G + 0.49$ ; the correlation coefficient and p -value are equal to −0.20 $- 0.20$ and <10⁻⁷ $< 10^{- 7}$ , respectively. Broken lines show two means of bi-Gaussian distributions, μ_s in blue and μ_c in red. Blue dotted lines show μ_s±2σ_s $μ_{s} \pm 2 σ_{s}$ and red dotted lines μ_c±2σ_c $μ_{c} \pm 2 σ_{c}$ . See Eqs. (22), (23) and (24) for the bi-Gaussian distribution. Stability data of single amino acid mutants for which the items dG_H2O and ddG_H2O or dG and ddG were obtained in the experimental condition of 6.7≤pH≤7.3 $6.7 \leq pH \leq 7.3$ and $20 ° C \leq T \leq 30 ° C$ and their folding-unfolding transitions are two state and reversible are extracted from the ProTherm (Kumar et al., 2006). In the case of dG only thermal transition data are used. In the case in which multiple data exist for the same protein, only one of them is used. The kcal/mol unit is used for ΔΔG $Δ Δ G$ and ΔG $Δ G$ . A similar distribution was also compiled (Serohijos et al., 2012). (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

Figure options

The dependence of the PDF, p(ΔΔG) $p (Δ Δ G)$ , of ΔΔG $Δ Δ G$ on θ is shown in Fig. 3. Also, the PDF of selective advantage, p(4N_es)=−p(ΔΔG)dΔΔG/d4N_es $p (4 N_{e} s) = - p (Δ Δ G) d Δ Δ G / d 4 N_{e} s$ , is shown in Fig. S.4 to have a peak at a small, positive value of selective advantage, which moves toward more positive values as θ and/or 4N_eκ $4 N_{e} κ$ increase.

Fig. 3.

PDFs of stability changes, ΔΔG $Δ Δ G$ , due to single amino acid substitutions in all mutants and in fixed mutants at equilibrium of protein stability, ΔG=ΔG_e $Δ G = Δ G_{e}$ . The PDF of ΔΔG $Δ Δ G$ due to single amino acid substitutions in all arising mutants is assumed to be bi-Gaussian; see Eq. (22). Unless specified, $\log 4 N_{e} κ = 7.55$ and θ=0.53 $θ = 0.53$ are employed. The kcal/mol unit is used for ΔΔG $Δ Δ G$ and ΔG_e $Δ G_{e}$ .

Figure options

3.2. Equilibrium state of protein stability in protein evolution

The fixation probability u for a mutant gene with selective advantage s and gene frequency q in a duploid system of effective population size N_e was given as a function of 4N_es $4 N_{e} s$ and q by ( Crow and Kimura, 1970)

equation25

u (4 N_{e} s) = \frac{1 - e^{- 4 N_{e} sq}}{1 - e^{- 4 N_{e} s}}

Turn MathJax on

where q=1/(2N)

q = 1 / (2 N)

for a single mutant gene in a population of size N . Population size is taken to be N=10⁶

N = 10^{6}

. The ratio of the substitution rate per nonsynonymous site (K_a) for nonsynonymous substitutions with selective advantage s to the substitution rate per synonymous site (K_s) for nonsynonymous substitutions with s=0 is

equation26

\frac{K_{a}}{K_{s}} = \frac{u (4 N_{e} s)}{u (0)} = \frac{u (4 N_{e} s)}{q} with q = \frac{1}{2 N}

Turn MathJax on

equation27

≃ \frac{4 N_{e} s}{1 - e^{- 4 N_{e} s}} for \frac{| 4 N_{e} sq |}{2} ⪡ 1

Turn MathJax on

assuming that synonymous substitutions are completely neutral and mutation rates at both types of sites are the same. Equations (19) and (25) indicate that 4N_eκ

4 N_{e} κ

can be regarded as a single parameter for K_a/K_s

K_{a} / K_{s}

. Furthermore, if the dependence of the mean ΔΔG

Δ Δ G

on ΔG

Δ G

could be neglected, 4N_eκexp(βΔG)

4 N_{e} κ \exp (β Δ G)

could be regarded as a single parameter. In the range of |4N_esq|/2⪡1

| 4 N_{e} sq | / 2 ⪡ 1

, both K_a/K_s

K_{a} / K_{s}

and the PDF of K_a/K_s

K_{a} / K_{s}

do not depend on q=1/(2N)

q = 1 / (2 N)

; see Eq. (27) and (S.15).

The PDF of ΔΔG $Δ Δ G$ of fixed mutant genes, p(ΔΔG_fixed) $p (Δ Δ G_{fixed})$ , is

equation28

p (Δ Δ G_{fixed}) \equiv p (Δ Δ G) \frac{u (4 N_{e} s)}{〈 u 〉}

Turn MathJax on

equation29

〈 u 〉 \equiv \int_{- \infty}^{\infty} u (4 N_{e} s) p (Δ Δ G) d Δ Δ G

Turn MathJax on

where 〈u〉

〈 u 〉

is the average fixation rate. Fig. 3 shows the PDF of ΔΔG

Δ Δ G

of fixed mutant genes. The PDF of 4N_es

4 N_{e} s

in fixed mutants is also shown in Fig. S.4; p(4N_es_fixed)=−p(ΔΔG_fixed)dΔΔG/d4N_es

p (4 N_{e} s_{fixed}) = - p (Δ Δ G_{fixed}) d Δ Δ G / d 4 N_{e} s

. Then, the average of ΔΔG

Δ Δ G

in fixed mutant genes can be calculated;

{〈 Δ Δ G 〉}_{fixed} \equiv \int_{- \infty}^{\infty} Δ Δ G p (Δ Δ G_{fixed}) d Δ Δ G

Fig. 4 shows the average of the ΔΔG $Δ Δ G$ over fixed mutant genes, 〈ΔΔG〉_fixed ${〈 Δ Δ G 〉}_{fixed}$ , to monotonically decrease with ΔG $Δ G$ , indicating that the temporal process of ΔG $Δ G$ is stable at 〈ΔΔG〉_fixed(ΔG_e)=0 ${〈 Δ Δ G 〉}_{fixed} (Δ G_{e}) = 0$ due to the balance between random drift on destabilizing mutations and positive selection on stabilizing mutations; ΔG_e $Δ G_{e}$ is the folding free energy at the equilibrium state. If a wild-type protein becomes less stable than the equilibrium, ΔG>ΔG_e $Δ G > Δ G_{e}$ , more stabilizing mutants will fix due to primarily positive selection and secondarily random drift, because stabilizing mutants will increase due to negative shifts of ΔΔG $Δ Δ G$ and also the effect of stability change on selective advantage will be more amplified; see Eqs. (23) and (24) for the dependence of ΔΔG $Δ Δ G$ on ΔG $Δ G$ , and Eq. (19) for the fitness of stability change. As shown in Fig. S.6, the probability of K_a/K_s>1.0 $K_{a} / K_{s} > 1.0$ , that is, positive selection, significantly increases as ΔG $Δ G$ becomes more positive than the equilibrium stability ΔG_e $Δ G_{e}$ . On the other hand, if a wild-type protein becomes more stable than the equilibrium, ΔG<ΔG_e $Δ G < Δ G_{e}$ , more destabilizing mutants will fix due to random drift, because destabilizing mutants will increase due to positive shifts of ΔΔG $Δ Δ G$ and also more destabilizing mutants become nearly neutral due to the less-amplified effect of stability change on selective advantage. As shown later, the PDF of K_a/K_s $K_{a} / K_{s}$ in the vicinity of equilibrium confirms this mechanism for maintaining protein stability at equilibrium.

Fig. 4.

The average, 〈ΔΔG〉_fixed ${〈 Δ Δ G 〉}_{fixed}$ , of stability changes over fixed mutants versus protein stability, ΔG $Δ G$ , of the wild type. ΔG_e $Δ G_{e}$ , where 〈ΔΔG〉=0 $〈 Δ Δ G 〉 = 0$ , is the stable equilibrium value of folding free energy, ΔG $Δ G$ , in protein evolution. The averages of ΔΔG $Δ Δ G$ , 4N_es $4 N_{e} s$ , and K_a/Ks $K_{a} / Ks$ over fixed mutants are plotted against protein stability, ΔG $Δ G$ , of the wild type by solid, broken, and dash-dot lines, respectively. Thick dotted lines show the values of ${〈 Δ Δ G 〉}_{fixed} \pm Δ Δ G_{fixed}^{sd}$ , where $Δ Δ G_{fixed}^{sd}$ is the standard deviation of ΔΔG $Δ Δ G$ over fixed mutants. $\log 4 N_{e} κ = 7.55$ and θ=0.53 $θ = 0.53$ are employed. The kcal/mol unit is used for ΔΔG $Δ Δ G$ and ΔG $Δ G$ .

Figure options

It was claimed (Serohijos et al., 2012 and Serohijos et al., 2013) that the equilibrium point would correspond to the minimum of the average fixation probability. However, in Fig. 4 for $\log 4 N_{e} κ = 7.550$ and θ=0.53 $θ = 0.53$ , the average 〈s〉_fixed ${〈 s 〉}_{fixed}$ of selective advantage in fixed mutants has a minimum at ΔG=−5.50 $Δ G = - 5.50$ kcal/mol and changes its sign at ΔG=−4.58 $Δ G = - 4.58$ kcal/mol, where the average 〈K_a/K_s〉=〈u〉/q $〈 K_{a} / K_{s} 〉 = 〈 u 〉 / q$ has a minimum and which is more positive than the equilibrium stability ΔG_e=−5.24 $Δ G_{e} = - 5.24$ kcal/mol. In other words, Fig. 4 and Fig. S.16 show that the values of ΔG $Δ G$ at 〈ΔΔG〉_fixed=0 ${〈 Δ Δ G 〉}_{fixed} = 0$ and at the minimum of 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ may be close but differ from each other, and indicate that the value of ΔG $Δ G$ corresponding to the minimum of 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ is not a good approximation for the equilibrium stability, because 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ gently changes in the vicinity of the equilibrium stability as shown in Fig. S.16.

3.3. Equilibrium stability, ΔG_e $Δ G_{e}$

The equilibrium value, ΔG_e $Δ G_{e}$ , of ΔG $Δ G$ that satisfies 〈ΔΔG_fixed〉 $〈 Δ Δ G_{fixed} 〉$ =0 in fixed mutants depends directly on θ and indirectly on 4N_eκ $4 N_{e} κ$ through fixation probability; see Eqs. (19) and (25). As shown in Fig. 5, ΔG_e $Δ G_{e}$ depends weakly on θ . On the other hand, ΔG_e $Δ G_{e}$ depends more strongly on and is almost negatively proportional to $\log 4 N_{e} κ$ , as also shown in real proteins (Serohijos et al., 2013). If the dependence of the means, μ_s and μ_c in Eqs. (23) and (24), of ΔΔG $Δ Δ G$ in all mutants on ΔG $Δ G$ could be neglected, $4 N_{e} κ \exp (β Δ G)$ could be regarded as a single parameter, and so $4 N_{e} κ \exp (β Δ G_{e})$ would be constant, irrespective of 4N_eκ $4 N_{e} κ$ . Thus, the dependence of $\log 4 N_{e} κ \exp (β Δ G_{e})$ on $\log 4 N_{e} κ$ shown in Fig. 5 is caused solely by the linear dependence of the means μ_s and μ_c of ΔΔG $Δ Δ G$ on ΔG $Δ G$ (Serohijos et al., 2012). It is interesting to know that as $\log 4 N_{e} κ$ varies from 0 to 20,ΔG_e $20, Δ G_{e}$ changes from $- 1.5 to - 12.5 kcal / mol$ , the range of which is consistent with experimental values of protein folding free energies shown in Fig. 1.

Fig. 5.

Dependence of equilibrium stability, ΔG_e $Δ G_{e}$ , on parameters, 4N_eκ $4 N_{e} κ$ and θ $θ$ . ΔG_e $Δ G_{e}$ is the equilibrium value of folding free energy,ΔG $Δ G$ , in protein evolution. The value of $β Δ G_{e} + \log 4 N_{e} κ$ is the upper bound of $\log 4 N_{e} s$ , and would be constant if the mean of ΔΔG $Δ Δ G$ in all arising mutants did not depend on ΔG $Δ G$ ; see Eq. (19). The kcal/mol unit is used for ΔG_e $Δ G_{e}$ .

Figure options

3.4. K_a/K_s $K_{a} / K_{s}$ at equilibrium, ΔG=ΔG_e $Δ G = Δ G_{e}$

Equations (23) and (24) indicate that the distribution of ΔΔG $Δ Δ G$ shifts toward the positive direction as ΔG $Δ G$ becomes more negative. Hence, increasing 4N_eκ $4 N_{e} κ$ that makes ΔG_e $Δ G_{e}$ more negative results in positive shifts of the distribution of ΔΔG $Δ Δ G$ , which increase destabilizing mutations. In addition, as indicated by Eq. (19), the upper bound of $4 N_{e} s, 4 N_{e} κ e x p (β Δ G_{e})$ , scales the effect of ΔΔG $Δ Δ G$ on protein fitness. The larger $4 N_{e} κ \exp (β Δ G_{e})$ is, the larger the effect of ΔΔG $Δ Δ G$ on selective advantage becomes. Thus, the increase of 4N_eκexp(βΔG_e) $4 N_{e} κ \exp (β Δ G_{e})$ caused by the increase of κ $κ$ and/or N_e $N_{e}$ increases both destabilizing mutations and their fitness costs, and results in slow evolutionary rates for proteins with large κ $κ$ and/or N_e $N_{e}$ . In other words, highly expressed and indispensable genes, and genes with a large effective population size must evolve slowly. On the other hand, the decrease of θ $θ$ , that is, the increase of highly constrained residues directly shifts the average of ΔΔG $Δ Δ G$ in all arising mutants toward the positive direction, and causes slow evolutionary rates.

The average of K_a/K_s $K_{a} / K_{s}$ over all mutants, which can be observed as the ratio of average nonsynonymous substitution rate per nonsynonymous site to average synonymous substitution rate per synonymous site, and also that over fixed mutants only are shown in Fig. 6. At any value of θ,〈K_a/K_s〉 $θ, 〈 K_{a} / K_{s} 〉$ decreases as $\log 4 N_{e} κ$ increases, explaining the observed relationship that highly expressed and indispensable genes evolve slowly (Drummond et al., 2005, Drummond and Wilke, 2008 and Serohijos et al., 2012). Likewise, at any value of $\log 4 N_{e} κ$ , 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ increases as θ $θ$ increases. In other words, the more structurally constrained a protein is, the more slowly it evolves. The effect of protein abundance/indispensability on evolutionary rate is more remarkable for less constrained proteins and the effect of structural constraint is more remarkable for less abundant, less essential proteins.

Fig. 6.

The average of K_a/K_s $K_{a} / K_{s}$ over all mutants or over fixed mutants only at equilibrium of protein stability, ΔG=ΔG_e $Δ G = Δ G_{e}$ .

Figure options

The average of K_a/K_s $K_{a} / K_{s}$ over all mutants, 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ , is less than 1.0 on the whole domain shown in the figure, indicating that the average of K_a/K_s $K_{a} / K_{s}$ over a long time interval and over many sites should not show any positive selection. Even the average of K_a/K_s $K_{a} / K_{s}$ over fixed mutants is less than 1.0, and falls into a narrow range of 0.97–0.85, which is much narrower than a range of 0.96–0.15 for that over all mutants; the average of K_a/K_s $K_{a} / K_{s}$ over fixed mutants is equal to 〈(K_a/K_s)²〉/〈K_a/K_s〉 $〈 {(K_{a} / K_{s})}^{2} 〉 / 〈 K_{a} / K_{s} 〉$ , and as a matter of course must be equal to or larger than the averages of K_a/K_s $K_{a} / K_{s}$ over all mutants. However, the average of K_a/K_s $K_{a} / K_{s}$ over a short time interval and over a small number of sites may exhibit values larger than one. In Fig. 7, the PDFs of K_a/K_s $K_{a} / K_{s}$ for all mutants and also for fixed mutants only are shown; p(K_a/K_s)=p(4N_es)d(4N_es)/d(K_a/K_s) $p (K_{a} / K_{s}) = p (4 N_{e} s) d (4 N_{e} s) / d (K_{a} / K_{s})$ . A significant fraction of fixed mutants fix with K_a/K_s>1 $K_{a} / K_{s} > 1$ .

Fig. 7.

PDFs of K_a/K_s $K_{a} / K_{s}$ in all mutants and in fixed mutants only at equilibrium of protein stability, ΔG=ΔG_e $Δ G = Δ G_{e}$ . Unless specified, $\log 4 N_{e} κ = 7.55$ and θ=0.53 $θ = 0.53$ are employed.

Figure options

Arbitrarily, the value of K_a/K_s $K_{a} / K_{s}$ is categorized into four classes; negative, slightly negative, nearly neutral, and positive selection categories whose K_a/K_s $K_{a} / K_{s}$ are within the range of K_a/K_s≤0.5,0.5<K_a/K_s≤0.95,0.95<K_a/K_s≤1.05 $K_{a} / K_{s} \leq 0.5, 0.5 < K_{a} / K_{s} \leq 0.95, 0.95 < K_{a} / K_{s} \leq 1.05$ , and 1.05<K_a/K_s $1.05 < K_{a} / K_{s}$ , respectively. Then, the probabilities of each selection category in all mutants and in fixed mutants are calculated and shown in Fig. S.10 and Fig. 8, respectively. At the largest abundance ( $\log 4 N_{e} κ = 20$ ) most arising mutations are negative mutations whose K_a/K_s $K_{a} / K_{s}$ are less than 0.5. This is reasonable, because at this condition the wild-type protein is very stable with low equilibrium values ΔG_e $Δ G_{e}$ as shown in Fig. 5, and therefore most mutations destabilize the wild-type and tend to be removed from population. Most fixed mutants are positive mutants or slightly negative mutants fixed by random drift. Nearly neutral mutants are less than 3% of all mutants, and less than 15% of fixed mutants.

Fig. 8.

Probability of each selection category in fixed mutants at equilibrium of protein stability, ΔG=ΔG_e $Δ G = Δ G_{e}$ . Arbitrarily, the value of K_a/K_s $K_{a} / K_{s}$ is categorized into four classes; negative, slightly negative, nearly neutral, and positive selection categories in which K_a/K_s $K_{a} / K_{s}$ is within the ranges of K_a/K_s≤0.5,0.5<K_a/K_s≤0.95,0.95<K_a/K_s≤1.05 $K_{a} / K_{s} \leq 0.5, 0.5 < K_{a} / K_{s} \leq 0.95, 0.95 < K_{a} / K_{s} \leq 1.05$ , and 1.05<K_a/K_s $1.05 < K_{a} / K_{s}$ , respectively.

Figure options

On the other hand, at the other extreme of $\log 4 N_{e} κ < 2$ , there are no mutations of the positive selection category, this is because the upper bound of K_a/K_s $K_{a} / K_{s}$ , which corresponds to the upper bound ( $4 N_{e} κ \exp (β Δ G)$ ) of 4N_es $4 N_{e} s$ , at the equilibrium stability ΔG_e $Δ G_{e}$ becomes less than 1.05 that is the lower bound for the positive selection category; see Eq. (19). The significant amount of mutations become nearly neutral. As θ $θ$ changes from 1 to 0, that is, structural constraints increase, the proportion of nearly neutral mutations changes from 0.75(0.56) to 0.31(0.22), and instead negative mutations increase and most of them are removed from population. Thus, the selection mechanism for fixing stabilizing mutants in little expressed, non-essential genes ( $\log 4 N_{e} κ < 2$ ) is not positive selection but nearly neutral selection, that is, random drift.

The probability of each selection category in fixed mutants depends strongly on 4N_eκ $4 N_{e} κ$ , but much less on θ $θ$ . Current common understanding is that amino acid substitutions in protein evolution are either neutral (Kimura, 1968) or lethal, at most slightly deleterious (Ohta, 1973) or lethal, unless functional selection operates and functional changes occur. On the contrary, nearly neutral fixations are predominant only in proteins with $\log 4 N_{e} κ < 2$ or ΔG_e>−2.5 $Δ G_{e} > - 2.5$ kcal/mol, and positive selection is significant in the other proteins. On the other hand, slightly negative selection is always significant. An interesting result is that the effects of structural constraint on K_a/K_s $K_{a} / K_{s}$ are the most remarkable in proteins with $\log 4 N_{e} κ < 2$ or instead ΔG_e>−2.5 $Δ G_{e} > - 2.5$ kcal/mol in which nearly neutral fixations are predominant.

3.5. K_a/K_s $K_{a} / K_{s}$ in the vicinity of equilibrium

In Fig. 4, the 〈ΔΔG〉_fixed± ${〈 Δ Δ G 〉}_{fixed} \pm$ standard deviation of ΔΔG $Δ Δ G$ of fixed mutants are also drawn. The standard deviation of ΔΔG $Δ Δ G$ of fixed mutants is equal to 0.84 kcal/mol at the equilibrium, ΔG_e $Δ G_{e}$ , indicating that protein stability ΔG $Δ G$ fluctuates more or less within ΔG_e±0.84 $Δ G_{e} \pm 0.84$ kcal/mol instantaneously. Such a deviation from the equilibrium must be canceled by compensatory substitutions that consecutively occur, otherwise the protein stability would far depart from its equilibrium point.

In Fig. 9 and Fig. 10 and Figs. S.12 and S.14, the probabilities of each selection category in fixed mutants and in all arising mutants are shown as a function of ΔG $Δ G$ and 4N_eκ $4 N_{e} κ$ or θ $θ$ , respectively. The range of ΔG $Δ G$ around ΔG_e $Δ G_{e}$ shown by a blue line on the surface grid is within two times of the standard deviation of ΔΔG $Δ Δ G$ over fixed mutants at ΔG_e $Δ G_{e}$ .

Fig. 9.

Dependence of the probability of each selection category in fixed mutants on 4N_eκ $4 N_{e} κ$ and ΔG $Δ G$ . A blue line on the surface grid shows ΔG=ΔG_e $Δ G = Δ G_{e}$ , which is the equilibrium value of ΔG $Δ G$ in protein evolution. The range of ΔG $Δ G$ shown in the figures is $| Δ G - Δ G_{e} | < 2 \cdot Δ Δ G_{fixed}^{sd}$ , where $Δ Δ G_{fixed}^{sd}$ is the standard deviation of ΔΔG $Δ Δ G$ over fixed mutants at ΔG=ΔG_e $Δ G = Δ G_{e}$ . Arbitrarily, the value of K_a/K_s $K_{a} / K_{s}$ is categorized into four classes; negative, slightly negative, nearly neutral, and positive selection categories in which K_a/K_s $K_{a} / K_{s}$ is within the ranges of K_a/K_s≤0.5,0.5<K_a/K_s≤0.95,0.95<K_a/K_s≤1.05 $K_{a} / K_{s} \leq 0.5, 0.5 < K_{a} / K_{s} \leq 0.95, 0.95 < K_{a} / K_{s} \leq 1.05$ , and 1.05<K_a/K_s $1.05 < K_{a} / K_{s}$ , respectively. θ=0.53 $θ = 0.53$ is employed. The kcal/mol unit is used for ΔG $Δ G$ . (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

Figure options

Fig. 10.

Dependence of the probability of each selection category in fixed mutants on θ $θ$ and ΔG $Δ G$ . A blue line on the surface grid shows ΔG=ΔG_e $Δ G = Δ G_{e}$ , which is the equilibrium value of ΔG $Δ G$ in protein evolution. The range of ΔG $Δ G$ shown in the figures is $| Δ G - Δ G_{e} | < 2 \cdot Δ Δ G_{fixed}^{sd}$ , where $Δ Δ G_{fixed}^{sd}$ is the standard deviation of ΔΔG $Δ Δ G$ over fixed mutants at ΔG=ΔG_e $Δ G = Δ G_{e}$ . Arbitrarily, the value of K_a/K_s $K_{a} / K_{s}$ is categorized into four classes; negative, slightly negative, nearly neutral, and positive selection categories in which K_a/K_s $K_{a} / K_{s}$ is within the ranges of K_a/K_s≤0.5,0.5<K_a/K_s≤0.95,0.95<K_a/K_s≤1.05 $K_{a} / K_{s} \leq 0.5, 0.5 < K_{a} / K_{s} \leq 0.95, 0.95 < K_{a} / K_{s} \leq 1.05$ , and 1.05<K_a/K_s $1.05 < K_{a} / K_{s}$ , respectively. $\log 4 N_{e} κ = 7.55$ is employed. The kcal/mol unit is used for ΔG $Δ G$ . (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

Figure options

As indicated by Eqs. (23) and (24), it is shown in Figs. S.12 and S14 that stabilizing mutations increase due to negative shifts of ΔΔG $Δ Δ G$ as the wild type becomes less stable than the equilibrium, ΔG>ΔG_e $Δ G > Δ G_{e}$ , and that destabilizing mutations increase due to positive shifts of ΔΔG $Δ Δ G$ as ΔG<ΔG_e $Δ G < Δ G_{e}$ . In addition, as indicated by Eq. (19), it is shown in Fig. 9 and Fig. 10 that positive selection on stabilizing mutants is more amplified as ΔG>ΔG_e $Δ G > Δ G_{e}$ , and that more destabilizing mutations become nearly neutral due to the less-amplified effect of stability change on selective advantage as ΔG<ΔG_e $Δ G < Δ G_{e}$ . This is a mechanism of maintaining protein stability at equilibrium.

3.6. Lower bound of K_a/K_s $K_{a} / K_{s}$ for adaptive substitutions on protein function

The observed value of K_a/K_s>1 $K_{a} / K_{s} > 1$ is often used to indicate functional selection. The averages of K_a/K_s $K_{a} / K_{s}$ over all mutants and even over fixed mutants are less than 1 as shown in Fig. 6. Therefore, the average of K_a/K_s $K_{a} / K_{s}$ over long time or many sites does not indicate positive selection. However, the probability of K_a/K_s>1 $K_{a} / K_{s} > 1$ is not negligible as shown in Fig. 7 and Fig. 8. Then, a question is how large K_a/K_s $K_{a} / K_{s}$ due to selection on protein stability can be.

The distribution of K_a/K_s $K_{a} / K_{s}$ significantly changes with ΔG $Δ G$ , as shown in Fig. S.6 and Fig. 11. It may be appropriate to see the average of K_a/K_s,〈K_a/K_s〉_fixed $K_{a} / K_{s}, {〈 K_{a} / K_{s} 〉}_{fixed}$ , in mutants fixed at ΔG>ΔG_e $Δ G > Δ G_{e}$ , because the upper bound of K_a/K_s $K_{a} / K_{s}$ becomes larger for ΔG>ΔG_e $Δ G > Δ G_{e}$ than at the equilibrium, and also positive mutants must fix to improve the protein stability of the wild type. Fig. 11 shows that 〈K_a/K_s〉_fixed ${〈 K_{a} / K_{s} 〉}_{fixed}$ can be very large for proteins with low equilibrium stabilities (large 4N_eκ $4 N_{e} κ$ and small θ $θ$ ), although 〈K_a/K_s〉_fixed ~ 1in ΔG < ΔG_e in which nearly neutral and slightly negative selections are predominant 1.7(1.2) at $Δ G_{e} + Δ Δ G_{fixed}^{sd}$ and 6.1(5.6) $6.1 (5.6)$ at $Δ G_{e} + 2 \cdot Δ Δ G_{fixed}^{sd}$ for $\log 4 N_{e} κ = 20 (θ = 0.0)$ , where $Δ Δ G_{fixed}^{sd}$ means the standard deviation of ΔΔG $Δ Δ G$ in fixed mutants at ΔG_e $Δ G_{e}$ . The 85% of fixed mutants have ΔΔG $Δ Δ G$ within the standard deviation. Therefore, a lower bound for adaptive substitutions may be taken to be about 1.7, which is almost equal to the upper bound of K_a/K_s $K_{a} / K_{s}$ at the equilibrium for $\log 4 N_{e} κ = 20$ and θ=1 $θ = 1$ ; see Fig. S.17. However, as shown in Fig. S.17, the more genes are expressed and/or the stronger structural constraints are, the larger the upper bound of K_a/K_s $K_{a} / K_{s}$ at the equilibrium is. Judging of adaptive changes may need not only K_a/K_s>1 $K_{a} / K_{s} > 1$ but also other supporting evidences; such that substitutions are localized at specific sites.

Fig. 11.

Dependence of the average of K_a/K_s $K_{a} / K_{s}$ over all mutants or over fixed mutants only on protein stability, ΔG $Δ G$ , of the wild type. A blue line on the surface grid shows ΔG=ΔG_e $Δ G = Δ G_{e}$ , which is the equilibrium value of ΔG $Δ G$ in protein evolution. The range of ΔG $Δ G$ shown in the figures is $| Δ G - Δ G_{e} | < 2 \cdot Δ Δ G_{fixed}^{sd}$ , where $Δ Δ G_{fixed}^{sd}$ is the standard deviation of ΔΔG $Δ Δ G$ over fixed mutants at ΔG=ΔG_e $Δ G = Δ G_{e}$ . Unless specified, $\log 4 N_{e} κ = 7.55$ and θ=0.53 $θ = 0.53$ are employed. The kcal/mol unit is used for ΔG $Δ G$ . (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

Figure options

3.7. Dependences of protein stability (βΔG_e) $(β Δ G_{e})$ and evolutionary rate (〈K_a/K_s〉) $(〈 K_{a} / K_{s} 〉)$ on growth temperature

It is natural that the folding free energies, ΔG_e $Δ G_{e}$ , of proteins in organisms growing at higher temperatures must be lower than those at the normal temperature, in order to attain the same stabilities and fitnesses as in the normal temperature. Equations (2) and (15) indicate that the same stability and fitness will be attained if βΔG_e $β Δ G_{e}$ is constant. It means that it is sufficient for ΔG_e $Δ G_{e}$ at the $100 ° C$ to decrease 373/298=1.25 $373 / 298 = 1.25$ times of that at the normal temperature $(25 ° C)$ . Is this a figure expected for folding free energies of thermophilic proteins at high growth temperature? It is not enough data of ΔG $Δ G$ at high temperature in the ProTherm (Kumar et al., 2006) to answer this question; $Δ G (T = 75 ° C) = 10.76$ kcal/mol for oxidized and 4.3 for reduced CuA domain of cytochrome oxidase from Thermus thermophilus (Wittung-Stafshede et al., 1998) and $Δ G (T = 60 ° C) = 13.01$ kcal/mol for pyrrolidone carboxyl peptidase from Pyrococcus (Ogasahara et al., 1998). The present model indicates that βΔG_e $β Δ G_{e}$ slightly increases as growth temperature increases.

In Fig. 12 and Fig. S.18, $β Δ G_{e} + \log 4 N_{e} κ$ is shown as a function of absolute temperature T $T$ and $\log 4 N_{e} κ$ or θ $θ$ , assuming that the distribution of ΔΔG $Δ Δ G$ and its dependency on ΔG $Δ G$ do not depend on T $T$ , that is, Eqs. (22), (23) and (24). At fixed values of $\log 4 N_{e} κ$ and $θ, β Δ G_{e} + \log 4 N_{e} κ$ increases as T $T$ increases, meaning that protein stability, −βΔG $- β Δ G$ , decreases as growth temperature increases. This tendency is slightly larger at smaller values of $\log 4 N_{e} κ$ , that is, for less abundant proteins.

Fig. 12.

Dependence of equilibrium stability, ΔG_e $Δ G_{e}$ , on parameters, 4N_eκ $4 N_{e} κ$ and T $T$ . ΔG_e $Δ G_{e}$ is the equilibrium value of folding free energy, ΔG $Δ G$ , in protein evolution. T $T$ is absolute temperature; β=1/kT $β = 1 / kT$ , where k $k$ is the Boltzmann constant. Equations (22), (23) and (24) are assumed for the distribution of ΔΔG $Δ Δ G$ and its dependency on ΔG $Δ G$ ; they are assumed to be independent of T $T$ . θ=0.53 $θ = 0.53$ is employed. The value of $β Δ G_{e} + \log 4 N_{e} κ$ is the upper bound of $\log 4 N_{e} s$ , and would not depend on $\log 4 N_{e} κ$ if the mean of ΔΔG $Δ Δ G$ in all arising mutants did not depend on ΔG $Δ G$ ; see Eq. (19). The kcal/mol unit is used for ΔG_e $Δ G_{e}$ .

Figure options

The effects of growth temperature on K_a/K_s $K_{a} / K_{s}$ are shown in Fig. S.19. The present model predicts that 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ decreases as growth temperature increases unless any other parameter does not change.

4. Discussion

Recently, fitness costs due to misfolded proteins have been widely noticed, particularly neurological disorder linked to misfolded protein toxicity (Bucciantini et al., 2002). Fitness costs that originate in functional loss (Geiler-Samerotte et al., 2011) and in diversion of protein synthesis and aggregation of proteins have been evaluated (Drummond and Wilke, 2008) to be related to the proportion of misfolded proteins. Also, previous studies indicate that factors that relate protein stability to protein fitness are protein abundance, protein indispensability, and structural constraints of protein. Current knowledge of protein folding can provide an exact formulation for the proportion of misfolded proteins as a function of folding free energy, and reasonable predictions (Schymkowitz et al., 2005, Yin et al., 2007 and Tokuriki et al., 2007) of stability changes due to single amino acid substitutions in protein native structures. Thus, on the basis of knowledge of protein biophysics it became possible to study the effects of amino acid substitutions on protein stability and then the evolution of protein (Drummond and Wilke, 2008, Serohijos et al., 2012, Serohijos and Shakhnovich, 2014, Echave et al., 2015 and Faure and Koonin, 2015).

Here, the effects of protein abundance and indispensability (κ) $(κ)$ and of structural constraint (θ) $(θ)$ on protein evolutionary rate (K_a/K_s) $(K_{a} / K_{s})$ have been examined in detail. Both the effects are represented with different functional forms. Structural constraints affect the distribution of stability change ΔΔG $Δ Δ G$ due to mutations. On the other hand, protein abundance/indispensability affects the effectiveness of stability change on protein fitness as well as the distribution of ΔΔG $Δ Δ G$ .

The common understanding of protein evolution has been that amino acid substitutions found in homologous proteins are selectively neutral (Kimura, 1968, Kimura, 1969, Kimura and Ohta, 1971 and Kimura and Ohta, 1974) or slightly deleterious (Ohta, 1973 and Ohta, 1992), and random drift is a primary force to fix amino acid substitutions in population. However, there is a selection maintaining protein stability at equilibrium (Drummond and Wilke, 2008, Serohijos et al., 2012 and Serohijos and Shakhnovich, 2014). From the present analysis of the PDF of K_a/K_s $K_{a} / K_{s}$ , it has become clear how the equilibrium of stability is maintained; see Fig. 9 and Fig. 10. In less-stable proteins of ΔG>ΔG_e $Δ G > Δ G_{e}$ , more stabilizing mutations fix due to positive selection, because negative shifts of ΔΔG $Δ Δ G$ increase stabilizing mutants and also more amplify the effect of stability change on selective advantage; see Eqs. (23), (24) and (19). In more-stable proteins of ΔG<ΔG_e $Δ G < Δ G_{e}$ , more destabilizing mutants are fixed by random drift, because positive shifts of ΔΔG $Δ Δ G$ increase destabilizing mutants and also make more destabilizing mutants become nearly neutral with the less-amplified effect of stability change on selective advantage. It has been revealed that contrary to the neutral theory nearly neutral selection is predominant only in low-abundant, non-essential proteins with $\log 4 N_{e} κ < 2$ or with low equilibrium stability $(Δ G_{e} > - 2.5 kcal / mol)$ ; see Fig. 8.

The average 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ and even 〈K_a/K_s〉_fixed ${〈 K_{a} / K_{s} 〉}_{fixed}$ at equilibrium stability ΔG=ΔG_e $Δ G = Δ G_{e}$ are less than one over the whole parameter range; see Fig. 6. Hence, as far as selection is on protein stability, the average of K_a/K_s $K_{a} / K_{s}$ over a long time interval and over many sites will be expected to be less than one, if all synonymous mutations are neutral (Spielman and Wilke, 2015). However, because the probability of K_a/K_s>1 $K_{a} / K_{s} > 1$ is significant, branches with K_a/K_s>1 $K_{a} / K_{s} > 1$ in phylogenetic trees may be observed, as observed in a population dynamics simulation (Serohijos and Shakhnovich, 2014), even though synonymous mutations are neutral and no adaptive selection operates on protein function. According to the present estimate, a lower bound of K_a/K_s $K_{a} / K_{s}$ to indicate adaptive substitutions must be at least as large as 1.7.

Protein equilibrium stability (ΔG_e) $(Δ G_{e})$ has been clearly described here as a function of 4N_eκ $4 N_{e} κ$ and θ $θ$ . The more expressed a gene is (the larger 4N_eκ $4 N_{e} κ$ is), the stabler the wild-type protein at equilibrium is (the more negative ΔG_e $Δ G_{e}$ becomes); see Fig. 5. The decrease of ΔG_e $Δ G_{e}$ shifts the distribution of ΔΔG $Δ Δ G$ toward the positive direction, generating more highly destabilizing mutants; see Eqs. (23) and (24). In addition, as 4N_eκ $4 N_{e} κ$ increases, the net effect, $4 N_{e} κ \exp (β Δ G_{e})$ , increases and more amplifies the effects of stability changes (ΔΔG) $(Δ Δ G)$ on selective advantage (s) $(s)$ ; see Fig. 5 and Eq. (19). As a result, highly expressed and indispensable genes, and genes with a large effective population size evolve slowly; see Fig. 6. However, if the distribution of ΔΔG $Δ Δ G$ did not depend on ΔG $Δ G$ , $4 N_{e} κ \exp (β Δ G_{e})$ would be constant, and K_a/K_s $K_{a} / K_{s}$ would not depend on 4N_eκ $4 N_{e} κ$ , that is, protein abundance/indispensability and effective population size.

On the other hand, structural constraints on protein affect protein evolutionary rate by changing the distribution of ΔΔG $Δ Δ G$ due to amino acid substitutions. As shown in Fig. 6, at any value of $\log 4 N_{e} κ$ , 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ decreases as θ $θ$ decreases. In other word, the more a protein is structurally constrained, the more slowly it evolves, as claimed by Zuckerkandl (1976). Fig. 6 shows that the effect of protein abundance/indispensability on evolutionary rate is more remarkable for less constrained proteins, and the effect of structural constraint is more remarkable for less abundant, less essential proteins.

In the result, the average of K_a/K_s $K_{a} / K_{s}$ over all arising mutants decreases roughly by 0.4–0.8 as $\log 4 N_{e} κ$ increases from 0 to 20; see Fig. 6. On the other hand, it decreases by 0.1–0.4 as the proportion of the residues of the surface type, θ $θ$ , decreases from 1 to 0. For monomeric, globular proteins, the proportion of protein surface may range from 0.7 to 0.45. Thus, in typical globular proteins, protein abundance/indispensability may cause larger differences of evolutionary rate between proteins than structural constraint. However, proteins that interact with other molecules on protein surface effectively reduce residues of the protein-surface type (Franzosa and Xia, 2009). Both protein abundance/indispensability and structural constraint must be taken into account for protein evolutionary rate.

Protein abundance and indispensability both affect evolutionary rate similarly through protein fitness. It was shown in real proteins that protein abundance correlates with evolutionary rate (Pál et al., 2001). The present model of protein fitness Eq. (19) also indicates that protein indispensability must correlate with evolutionary rate (Hirsh and Fraser, 2001, 2003), but a correlation between them may be hidden by the variation of protein abundance and detected only in low-abundant proteins (Pál et al., 2003); see Eq. (21). In addition, effective population size must affect ΔG_e $Δ G_{e}$ and 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ together with κ $κ$ as 4N_eκ $4 N_{e} κ$ .

In the present model, protein equilibrium stability (ΔG_e) $(Δ G_{e})$ and evolutionary rate (〈K_a/K_s〉) $(〈 K_{a} / K_{s} 〉)$ are predictable from θ $θ$ and 4N_eκ $4 N_{e} κ$ . The proportion of the surface type of residues may be estimated as those whose surface accessibility values (ASA) are less than 0.25 (Tokuriki et al., 2007), but experimental measurements of protein abundance, indispensability, and effective population size to determine 4N_eκ $4 N_{e} κ$ may be relatively hard. Instead the experimental value of protein stability may be employed as equilibrium stability to predict evolutionary rate and others, although it is not an independent variable. Fig. 13 shows evolutionary rate as a function of ΔG_e $Δ G_{e}$ and θ $θ$ . Needless to say, mutational effects on ΔΔG $Δ Δ G$ , such as θ $θ$ and the distribution of ΔΔG $Δ Δ G$ , must be well estimated for various categories of proteins (Faure and Koonin, 2015) to obtain successful predictions. Also, accurate estimations of ΔG $Δ G$ for various proteins are needed to examine the present predictions. It is interesting to examine if protein stability (−βΔG) $(- β Δ G)$ and the average of K_a/K_s $K_{a} / K_{s}$ decrease as growth temperature increases.

Fig. 13.

The average of K_a/K_s $K_{a} / K_{s}$ over all mutants as a function of ΔG_e $Δ G_{e}$ and θ $θ$ .

Figure options

5. Conclusions

•: The range, $- 2 to - 12.5 kcal / mol$ , of equilibrium values, ΔG_e $Δ G_{e}$ , of protein stability calculated with the present fitness model is consistent with the distribution of experimental values shown in Fig. 1.
•: Contrary to the neutral theory, nearly neutral selection is predominant only in low-abundant, non-essential proteins of $\log 4 N_{e} κ < 2$ or ΔG_e>−2.5 $Δ G_{e} > - 2.5$ kcal/mol. In the other proteins, positive selection on stabilizing mutations is significant to maintain protein stability at equilibrium as well as random drift on slightly negative mutations. However, 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ and even 〈K_a/K_s〉_fixed ${〈 K_{a} / K_{s} 〉}_{fixed}$ at ΔG=ΔG_e $Δ G = Δ G_{e}$ are less than 1.
•: Protein abundance/indispensability (κ) $(κ)$ and effective population size (N_e) $(N_{e})$ more affect evolutionary rate for less constrained proteins, and structural constraint (θ) $(θ)$ for less abundant, less essential proteins.
•: Protein indispensability must negatively correlate with evolutionary rate like protein abundance, but the correlation between them may be hidden by the variation of protein abundance and detected only in low-abundant proteins.
•: Evolutionary rates of proteins may be predicted from equilibrium stability (ΔG_e) $(Δ G_{e})$ and structural constraints (PDF of ΔΔG $Δ Δ G$ ) of the protein.
•: The present model indicates that protein stability (−βΔG_e) $(- β Δ G_{e})$ and 〈K_a/K_s〉 $〈 K_{a} / K_{s} 〉$ decrease as growth temperature increases.

Appendix A. Supplementary data

Application 1.

Help with PDF files

Options

Download file (2975 K)

Application 2.

Help with ZIP files

Options

Download file (13 K)

Application 3.

Help with ZIP files

Options

Download file (9 K)

References

- Bucciantini et al., 2002
- M. Bucciantini, E. Giannoni, F. Chiti, F. Baroni, L. Formigli, J. Zurdo, N. Taddei, G. Ramponi, C.M. Dobson, M. Stefani
- Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases
- Nature, 416 (2002), pp. 507–511
- Crow and Kimura, 1970
- J.F. Crow, M. Kimura
- An Introduction to Population Genetics Theory
- Harper & Row publishers, New York (1970)
- Dasmeh et al., 2014
- P. Dasmeh, A.W. Serohijos, K.P. Kepp, E.I. Shakhnovich
- The influence of selection for protein stability on dN/dS estimations
- Genome Biol. Evol., 6 (2014), pp. 2956–2967
- Drummond et al., 2005
- Drummond, D.A., Bloom, J.D., Adami, C., Wilke, C.O., Arnold, F.H., 2005. Why highly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. USA 102 (40), 14338–14343.
- Drummond and Wilke, 2008
- D.A. Drummond, C.O. Wilke
- Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution
- Cell, 134 (2) (2008), pp. 341–352 http://dx.doi.org/10.1016/j.cell.2008.05.042
- Duret and Mouchiro, 2000
- L. Duret, D. Mouchiro
- Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate
- Mol. Biol. Evol., 17 (2000), pp. 68–85
- Echave et al., 2015
- J. Echave, E.L. Jackson, C.O. Wilke
- Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites
- Phys. Biol., 12 (2) (2015), p. 025002 〈http://stacks.iop.org/1478-3975/12/i=2/a=025002〉
- Faure and Koonin, 2015
- G. Faure, E.V. Koonin
- Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins
- Phys. Biol., 12 (3) (2015), p. 035001 〈http://stacks.iop.org/1478-3975/12/i=3/a=035001〉
- Franzosa and Xia, 2009
- E.A. Franzosa, Y. Xia
- Structural determinants of protein evolution are context-sensitive at the residue level
- Mol. Biol. Evol., 26 (2009), pp. 2387–2395
- Fraser et al., 2002
- H.B. Fraser, A.E. Hirsh, L.M. Steinmetz, C. Scharfe, M.W. Feldman
- Evolutionary rate in the protein interaction network
- Science, 296 (2002), pp. 750–752
- Geiler-Samerotte et al., 2011
- Geiler-Samerotte, K.A., Dion, M.F., Budnik, B.A., Wang, S.M., Hartl, D.L., Drummond, D.A., 2011. Misfolded proteins impose a dosage-dependent fitness cost and trigger a cytosolic unfolded protein response in yeast. Proc. Natl. Acad. Sci. USA 108, 680–685.
- Ghaemmaghami et al., 2003
- S. Ghaemmaghami, W.-K. Huh, K. Bower, R.W. Howson, A. Belle, N. Dephoure, E.K. ÓShea, J.S. Weissman
- Global analysis of protein expression in yeast
- Nature, 425 (2003), pp. 737–741
- Go and Miyazawa, 1980
- M. Go, S. Miyazawa
- Relationship between mutability, polarity and exteriority of amino acid residues in protein evolution
- Int. J. Pept. Protein Res., 15 (1980), pp. 211–224
- Hirsh and Fraser, 2001
- A.E. Hirsh, H.B. Fraser
- Protein dispensability and rate of evolution
- Nature, 411 (2001), pp. 1047–1049
- Hirsh and Fraser, 2003
- A.E. Hirsh, H.B. Fraser
- Genomic function (communication arising): rate of evolution and gene dispensability
- Nature, 421 (2003), pp. 497–498
- Jordan et al., 2002
- I.K. Jordan, I.B. Rogozin, Y.I. Wolf, E.V. Koonin
- Essential genes are more evolutionarily conserved than are nonessential genes in bacteria
- Genome Res., 12 (2002), pp. 962–968
- Kimura, 1968
- M. Kimura
- Evolutionary rate at the molecular level
- Nature, 217 (1968), pp. 624–626
- Kimura, 1969
- Kimura, M., 1969. The rate of molecular evolution considered from the standpoint of population genetics. Proc. Natl. Acad. Sci. USA 63, 1181–1188.
- Kimura and Ohta, 1971
- M. Kimura, T. Ohta
- Protein polymorphism as a phase of molecular evolution
- Nature, 229 (1971), pp. 467–469
- Kimura and Ohta, 1974
- Kimura, M., Ohta, T., 1974. On some principles governing molecular evolution. Proc. Natl. Acad. Sci. USA 71, 2848–2852.

- Kuma et al., 1995
- K. Kuma, N. Iwabe, T. Miyata
- Functional constraints against variations on molecules from the tissue level: slowly evolving brain-specific genes demonstrated by protein kinase and immunoglobulin supergene families
- Mol. Biol. Evol., 12 (1995), pp. 123–130
- Kumar et al., 2006
- M. Kumar, K. Bava, M. Gromiha, P. Prabakaran, K. Kitajima, H. Uedaira, A. Sarai
- ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions
- Nucl. Acid Res., 34 (2006), pp. D204–D206
- Lynch and Conery, 2003
- M. Lynch, J.S. Conery
- The origins of genome complexity
- Science, 302 (2003), pp. 1401–1404
- Miyata and Yasunaga, 1980
- T. Miyata, T. Yasunaga
- Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its applications
- J. Mol. Evol., 16 (1980), pp. 23–36
- Miyazawa and Jernigan, 1982a
- S. Miyazawa, R.L. Jernigan
- Equilibrium folding and unfolding pathways for a model protein
- Biopolymers, 21 (1982), pp. 1333–1363
- Miyazawa and Jernigan, 1982b
- S. Miyazawa, R.L. Jernigan
- Most probable intermediates in protein folding-unfolding with a non-interacting globule-coil model
- Biochemistry, 21 (1982), pp. 5203–5213
- Ogasahara et al., 1998
- K. Ogasahara, M. Nakamura, S. Nakura, S. Tsunasawa, I. Kato, T. Yoshimoto, K. Yutani
- The unusually slow unfolding rate causes the high stability of pyrrolidone carboxyl peptidase from a hyperthermophile, Pyrococcus furiosus: equilibrium and kinetic studies of guanidine hydrochloride-induced unfolding and refolding
- Biochemistry, 37 (1998), pp. 17537–17544
- Ohta, 1973
- T. Ohta
- Slightly deleterious mutant substitutions in evolution
- Nature, 246 (1973), pp. 96–98
- Ohta, 1992
- T. Ohta
- The nearly neutral theory of molecular evolution
- Annu. Rev. Ecol. Syst., 23 (1992), pp. 263–286
- Pál et al., 2001
- C. Pál, B. Papp, L.D. Hurst
- Highly expressed genes in yeast evolve slowly
- Genetics, 158 (2001), pp. 927–931
- Pál et al., 2003
- C. Pál, B. Papp, L.D. Hurst
- Genomic function (communication arising): rate of evolution and gene dispensability
- Nature, 421 (2003), pp. 496–497
- Schymkowitz et al., 2005
- J. Schymkowitz, J. Borg, F. Stricher, R. Nys, F. Rousseau, L. Serrano
- The FoldX web server: an online force field
- Nucl. Acid Res., 33 (2005), pp. W382–W388
- Serohijos et al., 2012
- A. Serohijos, Z. Rimas, E. Shakhnovich
- Protein biophysics explains why highly abundant proteins evolve slowly
- Cell Rep., 2 (2) (2012), pp. 249–256 〈http://dx.doi.org/10.1016/j.celrep.2012.06.022〉
- Serohijos et al., 2013
- A. Serohijos, S.Y. Ryan Lee, E. Shakhnovich
- Highly abundant proteins favor more stable 3D structures in yeast
- Biophys. J., 104 (3) (2013), pp. L1–L3 http://dx.doi.org/10.1016/j.bpj.2012.11.3838
- Serohijos and Shakhnovich, 2014
- A.W. Serohijos, E.I. Shakhnovich
- Contribution of selection for protein folding stability in shaping the patterns of polymorphisms in coding regions
- Mol. Biol. Evol., 31 (2014), pp. 165–176
- Spielman and Wilke, 2015
- S.J. Spielman, C.O. Wilke
- The relationship between dN/dS and scaled selection coefficients
- Mol. Biol. Evol., 32 (2015), pp. 1097–1108
- Stoebel et al., 2008
- D.M. Stoebel, A.M. Dean, D.E. Dykhuizen
- The cost of expression of Escherichia coli lac operon proteins ss in the process, not in the product
- Genetics, 178 (2008), pp. 1653–1660
- Tokuriki et al., 2007
- N. Tokuriki, F. Stricher, J. Schymkowitz, L. Serrano, D.S. Tawfik
- The stability effects of protein mutations appear to be universally distributed
- J. Mol. Biol., 369 (2007), pp. 1318–1332
- Wall et al., 2005
- Wall, DP., Hirsh, AE., Fraser, HB., Kumm, J., Giaever, G., Eisen, MB., Feldman, MW., 2005. Functional genomic analysis of the rates of protein evolution. Proc. Natl. Acad. Sci. USA 102, 5483–5488.
- Wittung-Stafshede et al., 1998
- P. Wittung-Stafshede, B.G. Malmstrom, D. Sanders, J.A. Fee, J.R. Winkler, H.B. Gray
- Effect of redox state on the folding free energy of a thermostable electron-transfer metalloprotein: the CuA domain of cytochrome oxidase from thermus thermophilus
- Biochemistry, 37 (1998), pp. 3172–3177

- Yang et al., 2012
- Yang, JR., Liao, BY., Zhuang, SM., Zhang, J., 2012. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc. Natl. Acad. Sci. USA 109, E831–E840.
- Yin et al., 2007
- S. Yin, F. Ding, N.V. Dokholyan
- Eris: an automated estimator of protein stability
- Nat. Methods, 4 (2007), pp. 466–467
- Zeldovich et al., 2007
- Zeldovich, K.B., Chen, P., Shakhnovich, E.I., 2007. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc. Natl. Acad. Sci. USA 104, 16152–16157.
- Zhang and He, 2005
- J. Zhang, X. He
- Significant impact of protein dispensability on the instantaneous rate of protein evolution
- Mol. Biol. Evol., 22 (2005), pp. 1147–1155
- Zuckerkandl, 1976
- E. Zuckerkandl
- Evolutionary processes and evolutionary noise at the molecular level
- J. Mol. Evol., 7 (1976), pp. 167–183

Login via your institution

Direct export

Export file

Content

Article outline

Figures and tables

Selection maintaining protein stability at equilibrium

Highlights

Abstract

Graphical abstract

Keywords

1. Introduction

2. Methods

2.1. Fitness costs due to misfolded proteins

2.2. Fitness of a linear metabolic pathway

2.3. Other formulations of protein fitness

2.4. A generic form of protein fitness

3. Results

3.1. Protein stability and fitness

3.2. Equilibrium state of protein stability in protein evolution

3.3. Equilibrium stability, ΔG_e $Δ G_{e}$

3.4. K_a/K_s $K_{a} / K_{s}$ at equilibrium, ΔG=ΔG_e $Δ G = Δ G_{e}$

3.5. K_a/K_s $K_{a} / K_{s}$ in the vicinity of equilibrium

3.6. Lower bound of K_a/K_s $K_{a} / K_{s}$ for adaptive substitutions on protein function

3.7. Dependences of protein stability (βΔG_e) $(β Δ G_{e})$ and evolutionary rate (〈K_a/K_s〉) $(〈 K_{a} / K_{s} 〉)$ on growth temperature

4. Discussion

5. Conclusions

Appendix A. Supplementary data

References

Download PDFs

Entitle Gadget

Selection maintaining protein stability at equilibrium

Highlights

Abstract

Graphical abstract

Keywords

1. Introduction

2. Methods

2.1. Fitness costs due to misfolded proteins

2.2. Fitness of a linear metabolic pathway

2.3. Other formulations of protein fitness

2.4. A generic form of protein fitness

3. Results

3.1. Protein stability and fitness

3.2. Equilibrium state of protein stability in protein evolution

3.3. Equilibrium stability, ΔGeΔGe

3.4. Ka/KsKa/Ks at equilibrium, ΔG=ΔGeΔG=ΔGe

3.5. Ka/KsKa/Ks in the vicinity of equilibrium

3.6. Lower bound of Ka/KsKa/Ks for adaptive substitutions on protein function

3.7. Dependences of protein stability (βΔGe)(βΔGe) and evolutionary rate (〈Ka/Ks〉)(〈Ka/Ks〉) on growth temperature

4. Discussion

5. Conclusions

Appendix A. Supplementary data

References

Recommended articles

Citing articles (0)

Related book content

Metrics

Download PDFs

Entitle Gadget

3.3. Equilibrium stability, ΔG_e $Δ G_{e}$

3.4. K_a/K_s $K_{a} / K_{s}$ at equilibrium, ΔG=ΔG_e $Δ G = Δ G_{e}$

3.5. K_a/K_s $K_{a} / K_{s}$ in the vicinity of equilibrium

3.6. Lower bound of K_a/K_s $K_{a} / K_{s}$ for adaptive substitutions on protein function

3.7. Dependences of protein stability (βΔG_e) $(β Δ G_{e})$ and evolutionary rate (〈K_a/K_s〉) $(〈 K_{a} / K_{s} 〉)$ on growth temperature