De novo gene birth


De novo gene birth is the process by which new genes evolve from DNA sequences that were ancestrally non-genic.[3] De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes.[4] The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.

Although de novo gene birth may have occurred at any point in an organism's evolutionary history, ancient de novo gene birth events are difficult to detect. Most studies of de novo genes to date have thus focused on young genes, typically taxonomically restricted genes (TRGs) that are present in a single species or lineage, including so-called orphan genes, defined as genes that lack any identifiable homolog. It is important to note, however, that not all orphan genes arise de novo, and instead may emerge through fairly well characterized mechanisms such as gene duplication (including retroposition) or horizontal gene transfer followed by sequence divergence, or by gene fission/fusion.[5][6]

Although de novo gene birth was once viewed as a highly unlikely occurrence,[7] several unequivocal examples have now been described,[8] and some researchers speculate that de novo gene birth could play a major role in evolutionary innovation.[9][10]

As early as the 1930s, J. B. S. Haldane and others suggested that copies of existing genes may lead to new genes with novel functions.[6] In 1970, Susumu Ohno published the seminal text Evolution by Gene Duplication.[11] For some time subsequently, the consensus view was that virtually all genes were derived from ancestral genes,[12] with François Jacob famously remarking in a 1977 essay that "the probability that a functional protein would appear de novo by random association of amino acids is practically zero."[7]

In the same year, however, Pierre-Paul Grassé coined the term "overprinting" to describe the emergence of genes through the expression of alternative open reading frames (ORFs) that overlap preexisting genes.[13] These new ORFs may be out of frame with or antisense to the preexisting gene. They may also be in frame with the existing ORF, creating a truncated version of the original gene, or represent 3’ extensions of an existing ORF into a nearby ORF. The first two types of overprinting may be thought of as a particular subtype of de novo gene birth; although overlapping with a previously coding region of the genome, the primary amino-acid sequence of the new protein is entirely novel and derived from a frame that did not previously contain a gene. The first examples of this phenomenon in bacteriophages were reported in a series of studies from 1976 to 1978,[14][15][16] and since then numerous other examples have been identified in viruses, bacteria, and several eukaryotic species.[17][18][19][20][21][22]

The phenomenon of exonization also represents a special case of de novo gene birth, in which, for example, often-repetitive intronic sequences acquire splice sites through mutation, leading to de novo exons. This was first described in 1994 in the context of Alu sequences found in the coding regions of primate mRNAs.[23] Interestingly, such de novo exons are frequently found in minor splice variants, which may allow the evolutionary “testing” of novel sequences while retaining the functionality of the major splice variant(s).[24]


Novel genes can emerge from ancestrally non-genic regions through poorly understood mechanisms. (A) A non-genic region first gains transcription and an open reading frame (ORF), in either order, facilitating the birth of a de novo gene. The ORF is for illustrative purposes only, as de novo genes may also be multi-exonic, or lack an ORF, as with RNA genes. (B) Overprinting. A novel ORF is created that overlaps with an existing ORF, but in a different frame. (C) Exonization. A formerly intronic region becomes alternatively spliced as an exon, such as when repetitive sequences are acquired through retroposition and new splice sites are created through mutational processes. Overprinting and exonization may be considered as special cases of de novo gene birth.
Novel genes can be formed from ancestral genes through a variety of mechanisms.[1] (A) Duplication and divergence. Following duplication, one copy experiences relaxed selection and gradually acquires novel function(s). (B) Gene fusion. A hybrid gene formed from some or all of two previously separate genes. Gene fusions can occur by different mechanisms; shown here is an interstitial deletion. (C) Gene fission. A single gene separates to form two distinct genes, such as by duplication and differential degeneration of the two copies.[2] (D) Horizontal gene transfer. Genes acquired from other species by horizontal transfer undergo divergence and neofunctionalization. (E)Retroposition. Transcripts may be reverse transcribed and integrated as an intronless gene elsewhere in the genome. This new gene may then undergo divergence.