Preprocessor

Preprocessor

In computer science, a preprocessor or precompiler refers to a tool that transforms source text before it undergoes further stages of compilation or interpretation. Preprocessors may carry out operations such as macro expansion, file inclusion and the application of language extensions. They operate either on the raw lexical structure of a program or on its abstract syntactic representation, depending on their type and purpose. Although many preprocessors are designed for specific languages or systems, some are described as general-purpose because of their broad applicability to diverse text-processing tasks.

Lexical Preprocessing

Lexical preprocessors operate at the textual level and rely exclusively on lexical analysis. They manipulate character sequences without requiring knowledge of the host language’s syntax. Typical operations include:

  • macro substitution
  • conditional inclusion of text
  • insertion of external files
  • simple rule-based text substitution

The most prominent example is the C preprocessor, which interprets directives beginning with a special character and performs transformations independently of the C grammar. While devised for C and related languages, it can be applied to other types of text because it performs strictly lexical substitution.
Other lexical preprocessors include m4, widely employed in cross-platform build systems, and various open-source macro processors that operate on context-based substitution rules.
Some programming languages avoid separate lexical preprocessing by incorporating built-in features. These may include inlining, template systems, compile-time imports or language-level conditional constructs combined with dead-code elimination.

Syntactic Preprocessing

Syntactic preprocessors modify or generate code by transforming syntax trees rather than raw text. They are closely associated with languages in the Lisp family, where macros operate directly on program structure using the same language as the host program. This allows compile-time reflection and powerful forms of language extension.
Other languages, such as those using XML, may integrate external syntactic transformation systems. XSLT and its typed counterpart CDuce are notable examples. These processors allow users to define conversions that reshape syntax trees or implement custom compile-time behaviours.
Syntactic preprocessors can:

  • customise language syntax, as seen in OCaml, which supports revised and standard syntactic forms
  • extend a language with new primitives, enabling paradigms such as object-oriented or imperative programming in languages originally designed around functional cores
  • build internal domain-specific languages (DSLs), a frequent practice in large Lisp systems, where specialised mini-languages support tasks such as database querying, graphical user interface design or structured iteration

Examples include the LOOP macro of Common Lisp and the use of MetaOCaml to construct DSL compilers by combining interpretation and code generation.

General-Purpose Preprocessors

A preprocessor is described as general-purpose when it is not tied to a single programming language or narrowly defined task. Instead, it is designed to support a broad range of text-processing applications. Such preprocessors can be employed for generating configuration files, expanding macros in arbitrary documents or automating repetitive text transformations.
The m4 language is among the most widely known general-purpose preprocessors. Although developed alongside Unix systems, it has been applied in build systems, document generation and software configuration because of its pattern-based substitution capabilities. The C preprocessor, although designed for C, is sometimes used in similar generic roles.
General-purpose preprocessors may also serve as template engines, generating output files such as HTML or code by substituting variables and processing directives. Their capacity to manipulate plain text gives them flexibility comparable to that of dedicated template engines.
Typical applications include:

  • preprocessing JavaScript or other scripting languages
  • supporting build systems for operating-system interfaces
  • processing configuration files in scientific software such as GROMACS, which uses standard preprocessing constructs for simulation topology interpretation

Preprocessors as Template Engines

General-purpose preprocessors may function as template processors when they are used to produce formatted output based on variable substitution and rule-based expansion. Tools such as m4, or even the C preprocessor, can be used to generate HTML, configuration templates or other structured documents. Template engines and general-purpose preprocessors share core features: text substitution, conditional processing and reusable patterns.
Because both aim to automate text generation, general-purpose preprocessors can effectively operate as template systems, and many template engines derive conceptually from macro-processing techniques.

Language Customisation and Extension

Preprocessors, whether lexical or syntactic, provide mechanisms for modifying or extending the capabilities of programming languages. They enable developers to:

  • customise syntactic appearance
  • embed specialised languages within a host language
  • introduce compile-time constructs unavailable in the core language
  • support domain-specific abstractions
Originally written on October 1, 2016 and last modified on December 4, 2025.

Leave a Reply

Your email address will not be published. Required fields are marked *