Protein biosynthesis
Protein biosynthesis, or protein synthesis, is a fundamental biological process by which cells build new proteins to maintain homeostasis, replace degraded proteins and respond to internal and external signals. Proteins are essential macromolecules that act as enzymes, structural components, transporters, signalling molecules and hormones. The overall process is broadly conserved between prokaryotes and eukaryotes, though important mechanistic differences exist, particularly in RNA processing and cellular compartmentalisation.
Overview of the Process
Protein biosynthesis can be divided into two major stages: transcription and translation.
- In transcription, a specific segment of DNA (a gene) is copied into messenger RNA (mRNA) by RNA polymerase.
- In translation, the mRNA sequence is decoded by ribosomes to assemble a polypeptide chain of amino acids in the correct order.
After translation, the newly synthesised polypeptide must fold into a precise three-dimensional structure and may undergo further post-translational modifications to become a fully functional protein. Errors at any stage can impair protein function and are frequently implicated in human disease.
Transcription: From DNA to pre-mRNA
In eukaryotic cells, transcription occurs in the nucleus, using DNA as a template. In prokaryotes, which lack a nucleus, transcription takes place directly in the cytoplasm.
DNA is an antiparallel double helix composed of two complementary strands held together by hydrogen bonds between paired bases. An enzyme called helicase locally unwinds the double helix, disrupting the hydrogen bonds and exposing a short region of single-stranded DNA corresponding to the gene.
Only one of the two strands acts as the template strand, which is read by RNA polymerase in the 3′→5′ direction. The opposite strand, the coding strand, has the same sequence as the RNA transcript (except that RNA uses uracil instead of thymine).
RNA polymerase catalyses the formation of phosphodiester bonds between ribonucleotides, synthesising pre-mRNA in the 5′→3′ direction. Free activated nucleotides in the nucleus base-pair with exposed bases on the template strand (A with U, T with A, C with G, G with C), and the enzyme joins them into a growing RNA chain.
As RNA polymerase progresses, DNA behind it re-anneals, so only a short window of about 10–12 base pairs remains unwound at any time. The enzyme adds nucleotides at a rapid rate (around 20 per second), enabling many pre-mRNA molecules to be synthesised from a single gene within an hour.
RNA polymerase contains an intrinsic proofreading mechanism. When an incorrect nucleotide is incorporated, the enzyme can excise it and replace it with the correct one, improving transcript fidelity. Transcription terminates when RNA polymerase reaches a specific termination sequence; it then releases the newly formed pre-mRNA and detaches from the DNA.
Post-transcriptional Modifications in Eukaryotes
In eukaryotes, the initial transcript is called pre-mRNA or primary transcript. It must be processed into mature mRNA before export to the cytoplasm. Three major post-transcriptional modifications occur:
-
Addition of a 5′ capA modified guanine nucleotide, added via methylation, is attached to the 5′ end. This 5′ cap:
- protects the mRNA from degradation by exonucleases,
- facilitates ribosome binding at the start of translation, and
- helps the cell distinguish mRNA from other RNAs.
- Addition of a 3′ poly(A) tailA sequence of roughly 100–200 adenine nucleotides (the poly(A) tail) is added to the 3′ end. Together, the 5′ cap and 3′ tail signal that the mRNA is intact and ready for export and translation, and they both contribute to mRNA stability.
- RNA splicingMost eukaryotic genes contain introns (non-coding regions) and exons (coding regions). Introns are transcribed into pre-mRNA but must be removed. A large ribonucleoprotein complex called the spliceosome recognises specific splice sites and excises introns, joining exons together to form a continuous coding sequence. Splicing can be alternative, allowing different combinations of exons to be joined and thereby increasing the diversity of proteins produced from a single gene.
Once capped, tailed and spliced correctly, the mature mRNA is exported through nuclear pores into the cytoplasm.
In prokaryotes, post-transcriptional processing is minimal: mRNA is typically usable immediately after transcription, and transcription and translation are often coupled in space and time.
Translation: From mRNA to Polypeptide
Translation occurs on ribosomes, which are complex assemblies of ribosomal RNA (rRNA) and protein. Each ribosome consists of a large and a small subunit that together clamp around the mRNA molecule.
The ribosome reads the mRNA in the 5′→3′ direction, interpreting the nucleotide sequence in triplets known as codons, each codon specifying a particular amino acid (or a start or stop signal). Translation proceeds through three main stages: initiation, elongation and termination.
- InitiationThe small ribosomal subunit binds near the 5′ end of the mRNA and scans along it until it recognises a start codon (usually AUG, coding for methionine). An initiator tRNA carrying methionine binds to this codon via complementary base-pairing between the tRNA anticodon and the mRNA codon. The large ribosomal subunit then joins, forming a complete initiation complex.
-
ElongationDuring elongation, the ribosome catalyses the stepwise addition of amino acids:
- A new aminoacyl-tRNA, carrying the next amino acid, enters the ribosome and binds to the next mRNA codon.
- The ribosome catalyses the formation of a peptide bond between the amino acid of the incoming tRNA and the growing polypeptide chain.
- The ribosome then translocates along the mRNA by one codon, shifting the tRNAs between binding sites and freeing one site for the next aminoacyl-tRNA.
Each tRNA is about 70–80 nucleotides long and has a characteristic cloverleaf structure with an anticodon loop, which recognises the mRNA codon, and an acceptor stem, which carries the corresponding amino acid. The accurate pairing of tRNA anticodon and mRNA codon ensures that the amino acid sequence of the polypeptide matches the genetic code.
- TerminationWhen the ribosome encounters a stop codon (UAA, UAG or UGA), no tRNA corresponds to it. Instead, release factors bind to the ribosome, prompting it to hydrolyse the bond between the final tRNA and the completed polypeptide chain. The ribosomal subunits then dissociate and can be reused.
Multiple ribosomes may translate the same mRNA simultaneously, forming a polysome, which greatly increases the efficiency of protein synthesis.
Protein Folding and Post-translational Modifications
The newly synthesised polypeptide is initially a linear chain of amino acids and is typically non-functional. To become an active protein, it must fold into a precise secondary and tertiary structure, driven by interactions such as hydrogen bonds, ionic interactions, hydrophobic effects and disulphide bonds.
- Secondary structure includes motifs such as α-helices and β-pleated sheets.
- Tertiary structure refers to the overall three-dimensional folding of the entire polypeptide.
- Some proteins also form quaternary structures, assembling into complexes of multiple polypeptide subunits.
Cells use chaperone proteins to assist in proper folding and to prevent aggregation. Following folding, proteins may undergo post-translational modifications, including phosphorylation, glycosylation, methylation, cleavage of signal peptides or addition of lipid groups. These modifications can alter a protein’s activity, stability, localisation (for example, directing it to the nucleus, cytoplasm or membrane) and interactions with other molecules.
Protein Biosynthesis and Disease
Because protein biosynthesis is central to cell function, disturbances in this process are a major source of disease. DNA mutations can alter the mRNA sequence, leading to:
- Nonsense mutations, which introduce a premature stop codon and cause truncated, often non-functional proteins.
- Missense mutations, which substitute one amino acid for another. This can disrupt the protein’s structure, catalytic activity or folding.
Proteins that fail to fold correctly may be targeted for degradation, but misfolded proteins can also form aggregates, overwhelming quality control systems. Such aggregates are characteristic of several neurodegenerative diseases, including Alzheimer’s disease and Parkinson’s disease, where they damage neurons and impair normal brain function.
Errors in splicing, defects in translation factors, ribosomal mutations and failures in post-translational processing similarly contribute to a wide range of genetic and acquired disorders. Because of this, many therapeutic strategies aim either to correct defective protein synthesis or to modulate the stability and folding of specific proteins.