Genome

Genome

A genome comprises the entire complement of genetic material present in an organism. This material consists of nucleotide sequences made of DNA, or RNA in the case of many viruses. A genome includes protein-coding genes, non-coding functional elements, and often extensive regions of non-coding DNA, including repetitive sequences and segments with no clearly defined function. In eukaryotes, genetic material exists not only in the nucleus but also within mitochondria, and in algae and plants within chloroplasts. The study of genomes—genomics—seeks to catalogue, sequence, and interpret this full set of hereditary information.
Advances in sequencing technologies and bioinformatics have transformed genomics into a major branch of modern biology. From the first viral genome sequenced in the 1970s to contemporary large-scale projects, genomic research has illuminated evolution, development, health, disease, and biodiversity across the tree of life.

Origin and Concept

The term genome was introduced in 1920 by Hans Winkler, combining the words gene and chromosome to denote the complete set of genes carried by an organism. Although initially coined in the context of nuclear genetic material, the definition broadened with growing awareness that many organisms also carry distinct genomic elements in organelles. Related words such as biome and rhizome illustrate the pattern of -ome terminology in biological sciences.
A genome generally refers to the principal hereditary material of an organism. In bacteria, it typically denotes the main chromosomal DNA molecule, rather than plasmids, even though plasmids can carry important genes. In eukaryotes, usage usually distinguishes between the nuclear genome and the organelle genomes of mitochondria and chloroplasts.

Nuclear and Organelle Genomes

In eukaryotes, the nuclear genome consists of one full set of chromosomes. Most species are diploid, meaning they possess two copies of each chromosome in somatic cells. The technical definition of a nuclear genome includes one copy of each autosome and, where relevant, one copy of each sex chromosome. For humans, the reference nuclear genome comprises 22 autosomes plus an X and a Y chromosome.
Mitochondria in nearly all eukaryotes contain a small, typically circular genome. In algae and plants, chloroplasts likewise hold distinct genomes known as plastomes. These organelle genomes reflect the evolutionary origins of mitochondria and chloroplasts from bacterial ancestors and contribute genes essential for cellular respiration and photosynthesis.

Genome Sequencing and Mapping

Genome sequencing identifies the full nucleotide order of an organism’s genetic material. Although individuals of a species share the majority of their genome, sequencing multiple genomes allows scientists to assess genetic diversity, identify polymorphisms, and detect structural variation.
Major milestones in sequencing include:

  • 1976: sequencing of the RNA genome of bacteriophage MS2.
  • 1977: first DNA genome sequenced, the bacteriophage ΦX174.
  • 1995: first complete bacterial genome, Haemophilus influenzae.
  • 1996: first complete eukaryotic genome, the yeast Saccharomyces cerevisiae.
  • 1996: first archaeal genome, Methanococcus jannaschii.
  • 2001: publication of draft human genome sequences as part of the Human Genome Project.
  • 2013: sequencing of a Neanderthal genome from ancient bone material.

The number of sequenced genomes now spans thousands of species, including plants such as rice and Arabidopsis thaliana, model animals such as the mouse, and diverse microbial taxa. High-throughput sequencing dramatically reduces cost and time, enabling broad comparative studies and applications in medicine, conservation, and evolutionary biology.

Viral Genomes

Viral genomes exhibit remarkable diversity. They may contain DNA or RNA, and each may be single-stranded or double-stranded. Genome architecture varies from monopartite forms, where all genetic material is contained within a single molecule, to multipartite forms containing several separate molecules.
Key viral genome types include:

  • Single-stranded RNA genomes, as seen in many human and plant viruses.
  • Double-stranded RNA genomes, characteristic of reoviruses.
  • Single-stranded DNA genomes, typical of certain small bacteriophages.
  • Double-stranded DNA genomes, common in large bacteriophages and many animal viruses.

The structure and size of viral genomes influence replication mechanisms, mutation rates, and evolutionary dynamics.

Prokaryotic Genomes

Prokaryotic genomes are usually compact and efficient. Most bacteria and archaea possess a single circular chromosome, although exceptions include species with multiple or linear chromosomes. Genome replication strategies must accommodate high growth rates, often resulting in overlapping rounds of DNA replication.
General features include:

  • Low levels of repetitive DNA, with most sequences coding for proteins or regulatory elements.
  • Auxiliary genetic material, notably plasmids, which encode useful traits such as antibiotic resistance and metabolic capacities.
  • Occasional genome reduction, especially in symbiotic bacteria, where non-essential genes may become pseudogenes.

Prokaryotic genomes provide valuable insights into metabolic diversity, adaptation, and microbial ecology.

Eukaryotic Genomes

Eukaryotic genomes typically consist of multiple linear chromosomes housed within the cell nucleus. They show immense variation in chromosome number and genome size. For example, some ants and nematodes possess only one pair of chromosomes, while certain ferns may have hundreds of chromosome pairs.
Despite the large amounts of DNA in many eukaryotes, only a small fraction encodes proteins. Genome size variation largely reflects differing quantities of:

  • Repetitive DNA
  • Transposable elements
  • Introns
  • Non-coding regulatory regions

Eukaryotic genomes often contain large proportions of repetitive sequences, particularly in mammals and plants. Nuclear genomes also include numerous gene families derived from duplication events.

Coding and Non-Coding Sequences

The term coding sequence refers to DNA segments that direct protein synthesis. The proportion of coding DNA varies widely among organisms. Larger genomes do not necessarily contain more genes; rather, they often include more non-coding regions.
Non-coding sequences encompass:

  • Introns
  • Non-coding RNAs
  • Regulatory elements
  • Repetitive DNA

In humans, approximately 98 per cent of the genome is non-coding. These regions contribute to gene regulation, chromosomal structure, and genome stability, although many repetitive elements remain poorly understood.
Repetitive DNA includes:

  • Tandem repeats, such as microsatellites and minisatellites.
  • Interspersed repeats, many of which derive from transposable elements.

Tandem repeats can serve important functions, illustrated by telomeres, which include repeated motifs that protect chromosome ends.

Genome Sequencing Technologies and Structural Variation

Modern sequencing platforms enable deep sequencing coverage, facilitating the detection of polymorphisms, structural variations, and chromosomal rearrangements. Analyses of read depth, mapping patterns, and alignment discrepancies reveal translocations, inversions, duplications, and deletions.
Comparisons with reference genomes support studies in:

  • Population genetics
  • Medical diagnostics
  • Conservational genomics
  • Evolutionary relationships
Originally written on July 3, 2018 and last modified on November 20, 2025.

Leave a Reply

Your email address will not be published. Required fields are marked *