Hypotheses on the origin and evolution of metabolic pathways
The emergence and refinement of basic biosynthetic pathways allowed primitive organisms to become increasingly less dependent on exogenous sources of amino acids, purines, and other compounds accumulated in the primitive environment as a result of prebiotic syntheses. But how did these metabolic pathways originate and evolve? Then, which is the role that the molecular mechanisms described above (gene elongation, duplication and/or fusion) played in the assembly of metabolic routes? How the major metabolic pathways actually originated is still an open question, but several different theories have been suggested to account for the establishment of metabolic routes All these ideas are based on gene duplication.
The Retrograde hypothesis (Horowitz, 1945)
The first attempt to explain in detail the origin of metabolic pathways was made by Horowitz, who based this on two pieces of work. The first was the “primordial soup” hypothesis and the second was the one-to-one correspondence between genes and enzymes noticed by Beadle and Tatum.
Horowitz suggested that biosynthetic enzymes had been acquired via gene duplication that took place in the reverse order found in current pathways. This idea, also known as the Retrograde hypothesis, has intuitive appeal and states that if the contemporary biosynthesis of compound “A” requires the sequential transformations of precursors “D”, “C” and “B” via the corresponding enzymes, the final product “A” of a given metabolic route was the first compound used by the primordial heterotrophs.
In other words, if a compound A was essential for the survival of primordial cells, when A became depleted from the primitive soup, this should have imposed a selective pressure allowing the survival and reproduction of those cells that were become able to perform the transformation of a chemically related compound “B” into “A” catalyzed by enzyme “a” that would have lead to a simple, one-step pathway.
The selection of variants having a mutant “b” enzyme related to “a” via a duplication event and capable of mediating the transformation of molecule “C” chemically related into “B”, would lead into an increasingly complex route, a process that would continue until the entire pathway was established in a backward fashion, starting with the synthesis of the final product, then the penultimate pathway intermediate, and so on down the pathway to the initial precursor.
Twenty years later, the discovery of operons prompted Horowitz to restate his model, arguing that it was supported also by the clustering of genes, that could be explained by a series of early tandem duplications of an ancestral gene; in other words, genes belonging to the same operon and/or to the same metabolic pathway should have formed a paralogous gene family.
The retrograde hypothesis establishes a clear evolutionary connection between prebiotic chemistry and the development of metabolic pathways, and may be invoked to explain some routes. However, the evolution of metabolic pathways in a backward direction requires special environmental conditions in which useful organic compounds and potential precursors have accumulated. Although these conditions might have existed at the dawn of life, they must have become less common as life forms became more complex and depleted the environment of ready-made useful compounds. Furthermore, the origin of many other anabolic routes cannot be understood in terms of their backwards development as they involve many unstable intermediates and it is difficult to explain their synthesis and accumulation in both the prebiotic and extant environments.
In addition to this, many of these metabolic intermediates are phosphorylated compounds that could not permeate primordial membranes in the absence of specialized transport systems that were probably absent in primitive cells. It has been also argued that the Horowitz hypothesis fails to account for the origin of catabolic pathway regulatory mechanisms, and for the development of biosynthetic routes involving dissimilar reactions. In addition to this, if the enzymes catalyzing successive steps in a given metabolic pathway resulted from a series of gene duplication events, then they must share structural similarities. Eventhough there is a handful of examples where adjacent enzymes in a pathway are indeed homologous, the list of known examples confirmed by sequence comparisons is small. Maybe the most extensively documented examples pertain to the pair of genes hisA and hisF and four of the genes involved in nitrogen fixation (nifD, K, E, and N).
The Granick hypothesis
An alternative and less-well known proposal is the development of biosynthetic pathways in the forward direction, where the prebiotic compounds do not play any role. Granick proposed that the biosynthesis of some end-products could be explained by forward evolution from relatively simple precursors. This model predicts that simpler biochemical compounds predated the appearance of more complicated ones; hence, the enzymes catalyzing earlier steps of a metabolic route are older than the latter ones.
For this to operate it is necessary for each of the intermediates to be useful to the organism, since the development of multiple genes simultaneously in a sequence is too improbable. This might work with heme and chlorophyll as cited by Granick, but problems arise with pathways such as purine and branched chain amino acid syntheses, where the intermediates are of no apparent use. Another example where the Granick proposal has been applied is the development of the isoprene lipid pathway.
The Patchwork hypothesis (Ycas, 1974; Jensen, 1976)
Gene duplication has also been invoked in another theory proposed to explain the origin and evolution of metabolic pathways, the so-called “patchwork” hypothesis according to which metabolic pathways may have been assembled through the recruitment of primitive enzymes that could react with a wide range of chemically related substrates. Such relatively slow, non-specific enzymes may have enabled primitive cells containing small genomes to overcome their limited coding capabilities.
1. the ancestral enzyme E1 endowed with low substrate specificity is able to bind to three substrates (S1, S2 and S3) and catalyze three different, but similar reactions;
2. a paralogous duplication of the gene encoding enzyme E1 and the subsequent divergence of the new sequence lead to the appearance of enzyme E2 with an increased and narrowed specificity;
3. a further duplication event occurred leading to E3 showing a diversification of function and narrowing of specificity.
In this way the ancestral enzyme E1, belonging to a given metabolic route is “recruited” to serve other novel pathways. The patchwork hypothesis is also consistent with the possibility that an ancestral pathway may have had a primitive enzyme catalyzing two or more similar reactions on related substrates of the same metabolic route and whose substrate specificity was refined as a result of later duplication events. In this way primordial cells might have expanded their metabolic apabilities. Additionally, this mechanism may have permitted the evolution of regulatory mechanisms coincident with the development of new pathways.
Related to this view is that in which enzyme evolution has been driven by retention of catalytic mechanisms. There is good evidence to suggest that this has occurred within many protein families. The patchwork hypothesis is supported by several lines of evidence. The broad substrate specificity of some enzymes means they can catalyze a class of different chemical reactions and this provides a support for the patchwork theory.
As demonstrated by whole genome sequence comparisons, there is a significant percentage of metabolic genes that are the outcome of paralogous duplications described in completely sequenced cellular genomes. Sequence comparisons of enzymes catalyzing different reactions in the biosynthesis of threonine, tryptophan, isoleucine and methionine indicate that each protein has evolved from a single common ancestral molecule active in several metabolic pathways.
The recruitment of enzymes belonging to different metabolic pathways to serve novel biosynthetic routes is well documented under laboratory conditions. These are the so-called “directed evolution experiments”, in which microbial populations are subjected to a strong selective pressure leading to heterettophic phenotypes capable of using new substrates (see below). Some fascinating examples of Nature’s opportunism in assembling new pathways using this ‘patchwork’ approach have been found. The urea cycle in terrestrial animals clearly evolved by addition of a new enzyme, arginase, to a set of four enzymes previously involved in the biosynthesis of arginine.
The Krebs cycle is postulated to have evolved by combination of several pre-existing enzymes from pathways for biosynthesis of aspartate and glutamate with four additional enzymes. Besides, some ancestral biosynthetic routes, such as histidine and tryptophan biosynthesis, nitrogen fixation, as well as lysine, arginine and leucine were highly likely assembled through this mechanism. However, there are also very nice examples of recent adaptation to completely newly compounds by the patchwork mechanism. This is particularly true for metabolic pathways evolved by microorganisms in order to either exploit new carbon sources or detoxify toxic compounds, such as xenobiotic chemicals.
One of the most striking examples is the evolution of the pathway for degradation of pentachlorophenol (PCP), a xenobiotic pesticide, in Sphingomonas chlorophenolica, which has been suggested to be the outcome of the “patchwork” combination of enzymes from two different existing pathways.
Semienzymatic origin of metabolic pathways (Lazcano and Miller,1996)
In order to explain the origin of the very early metabolic pathways, Lazcano and Miler proposed a different approach that may be applicable to the origin of some but not all metabolic routes. They based their idea on the following assumptions:
* a set of rather stable prebiotic compounds was available in the primitive ocean;
* compounds due to leakage from existing pathways within cells were also available. These compounds need not be particularly stable because they are produced within the cell and used rapidly;
* existing enzyme types are assumed to be available from gene duplication and they were non-specific according to Jensen;
* starter-type enzymes are assumed to arise by non-enzymatic reactions followed by acquisition of the enzyme.
It is known that most steps in biosynthetic routes are mediated by enzymes, but some occur spontaneously. In other cases the corresponding chemical step can be achieved by changing the reaction conditions and reagents in the absence of the enzyme.
Experimental evidence has demonstrated prototrophic growth under high ammonia concentrations of a Klebsiella pneumoniae strain with a mutated hisH gene. Lazcano and Miller propose that the reaction first took place with NH3, followed by the development of HisH, followed in turn by the substitution of glutamine or NH3 as this compound disappeared from the prebiotic soup.
The reconstruction of the origin and evolution of metabolic pathways
How can the origin ad evolution of metabolic pathways be studied and reconstructed? By assuming that useful hints may be inferred from the analysis of metabolic pathways existing in contemporary cells, important insights of the evolutionary development of microbial metabolic pathways can be obtained by:
* the use of bioinformatic tools, which allow the comparison of gene and genomes from organisms belonging to the three cell domains (Archaea, Bacteria and Eukarya). This approach takes advantage of the availability of the phylogenetic relationships among (micro)organisms, and possibly on the existence of different structure and organization exhibited by orthologous genes. Beside, the more ancient is a pathway, the more information can be retrieved from this comparative analysis;
* laboratory studies in which new substrates are used as carbon, nitrogen, or energy sources. These are the so-called “directed-evolution experiments”, in which a microbial (typically, bacterial) population is subjected to a (strong) selective pressure that leads to the establishment of new phenotypes capable of exploiting different substrates. By assuming that the processes involved in acquiring new metabolic abilities are comparable to those found in natural populations, “directed-evolution experiments” can provide useful insights in early cellular evolution.
The bioinformatic approach
Recent years saw a dramatic increase in genomics and proteomics data deriving from organisms belonging to all of the three major domains of life. By the way, the use of bioinformatic tools allowed the storage and interpretation of several sources of information (gene structure and organization, gene regulation, protein–protein interactions) and, probably more importantly, their integration, a fundamental step for the global understanding of genomes properties and dynamics. This, in turn, has allowed the emergence of comparative analyses of a huge number of genes and genomes of organisms belonging to taxonomically unrelated groups. All the data gained both from the genomic and the evolutionary studies of different species can be combined together, resulting in a new kind of approach, referred to as phylogenomics.
This novel way of investigating the evolutionary history of genes introduced several advantages, in fact, adopting a genome-scale approach theoretically overcomes incongruence derived from molecular phylogenies based on single genes mainly because (i) non-orthologous comparison (i.e. the comparison of those genes erroneously defined as orthologous) is much more misleading when the analysis is performed on a single gene, whereas it is probably buffered in a multigene analysis and (ii) stochastic error naturally vanishes when more and more genes are considered. Genomics data is a fundamental step for addressing the topic of the evolution of metabolic pathways, and strictly depends on a correct identification of orthologous proteins shared by different genomes.
The identification of orthologs between two genomes often relies on the so-called bidirectional best-hit (BBH) criterion, a reiteration of the BLAST algorithm: two proteins, a and b, from genomes A and B respectively, are orthologs if a is the best-hit (i.e. the most similar) of b in genome A and vice versa. For three or more genomes, groups of orthologous sequences can be constructed by extending the BBH relationships with a clustering algorithm. One of the most interesting topic in studying metabolic genes is to explore their organization on the chromosome. In fact, since genes belonging to the same operon are often involved in the same metabolic pathway, it follows that the analysis of gene organization may provide useful hints to disclose the forces driving the assembly and the shaping of a given metabolic route.
Moreover, if an unknown gene is found in operon with genes of a specific process, it might be involved in the same or a related process, especially if this association is evolutionary conserved. The analysis of such genes may provide useful hints in the understanding of the evolution of the complex systems of metabolic interconnections within the cell. In this context, both operon detection and prediction are two of the main phases when facing the issue of the evolution of biosynthetic genes. Although the evolutionary origin of operons and the selective forces promoting or demoting it are still a matter of debate, it is well-established that one of the major benefits of an operon is the co-expression of component genes, leading for example to the co-expression of all the genes that are involved in the same biosynthetic route.
Following and integrating these computational approaches, it is now possible to trace back the evolutionary dynamics of all the genes belonging to a certain metabolic pathway.
The directed evolution experiments
Microorganisms have the potential to adapt to changes in their environment. They may develop novel metabolic functions, via activation of cryptic and silent genes, the selection of mutations in regulatory or structural genes, or by the acquisition of new genes by horizontal transfer.
Extensive studies by R.P. Mortlock and his co-workers have shown that under laboratory conditions bacterial catabolic evolution can lead to the use of new sugars and that the development of new catabolic activities is often due to the recruitment of pre-existing enzymes following regulatory mutations. Besides, these experiments have been also performed to demonstrate that the capture of horizontally transferred (xenologous) sequences carried by plasmids of different host ranges can also lead to the accretion of biosynthetic or catabolic abilities.
Indeed, the acquisition of metabolic activities from donor cells to heterologous recipients may have taken place since Archaean times. However, how can the newly acquired genes be brought into the pre-existing regulatory system of the host organism?
It is reasonable to assume that mutational adjustment of pre-existing promoter sequences would take place. This issue can be analyzed under laboratory conditions, by transferring genes for a given metabolic route from a donor organism into a heterologous recipient lacking that pathway, and whose transcriptional apparatus does not recognize the regulatory signals of the donor DNA.
It has been reported that under starvation conditions recognition by the host of the transcription signals of the donor DNA can be adjusted to the host transcriptional milieu. There are experimental evidences that when either a promoterless (or harboring transcription promoters not recognized by the host RNA polymerase) cluster of biosynthetic genes (involved in histidine biosynthesis), catabolic genes (Pseudomonas phe genes) or an antibiotic resistance gene (E. coli cam gene) is transferred from a donor bacterium to a heterologous strain initially unable to recognize the transcriptional signal of the donor gene(s), regulatory point mutations occurring under stress conditions can lead to the activation of the donor DNA by the host RNA polymerase on a very short timescale.
In addition to point mutations, the movement of mobile elements, such as transposons, from the host genome to the introgressed plasmid may be responsible for the activation of promoterless genes. Hence, the genetic and molecular analysis showed that, under stress conditions, the transcriptional barriers to the heterologous gene expression were overcome by the occurrence of genetic changes in the donor plasmid, leading to mutated sequences that were efficiently recognized by the host RNA polymerase. By assuming that the processes involved in acquiring new metabolic abilities are comparable to those found in natural populations, it is plausible that during the early stages of cellular evolution entire metabolic pathways might have been spread through the bacterial communities via HGT and adaptation of existing promoters to the new genetic background(s).