Word Sense Disambiguation
Word sense disambiguation (WSD) is the process of determining which meaning, or sense, of a word is intended when the word appears in a particular sentence or linguistic context. Because many words in natural language are polysemous, the correct interpretation depends on contextual cues that human readers resolve subconsciously. For computational systems, however, identifying the intended sense has long posed a central challenge in natural language processing.
WSD is essential for tasks such as machine translation, information retrieval, semantic search and text understanding, where accurate interpretation of ambiguous vocabulary is required. The problem reflects a deeper issue: natural language is shaped by the neural processing abilities of the human brain, and translating these abilities into computational systems requires extensive modelling of linguistic structure, context and world knowledge.
Techniques and Performance
Over the decades, many approaches to WSD have been explored:
- Dictionary-based methods, which rely on lexical resources to match context with predefined sense descriptions.
- Supervised machine learning, where classifiers are trained on corpora of manually sense-annotated examples. These have historically produced the highest accuracy.
- Unsupervised clustering methods, which infer sense distinctions by grouping similar contextual usages without relying on annotated data.
Supervised techniques have been particularly successful, often exceeding 90 per cent accuracy at coarse-grained homograph distinctions. At the finer-grained level, evaluations such as SemEval and Senseval have reported upper-bound performance between roughly 59 and 69 per cent, with the baseline of selecting the most frequent sense typically around 51–57 per cent. Accuracy figures are highly dependent on sense granularity, corpus characteristics and sense inventories.
Variants of the WSD Task
Two major task formats are recognised:
- Lexical sample tasks, which focus on disambiguating a predetermined set of target words. Because annotators repeatedly encounter the same words, these tasks are less labour-intensive.
- All-words tasks, which require disambiguation of every open-class word in a running text. These tasks more closely resemble real-world application but demand significantly greater annotation effort.
Both formats require a dictionary or sense inventory defining the possible meanings, and some methods additionally use labelled training data.
Historical Development
WSD emerged in the 1940s as one of the earliest recognised problems in computational linguistics. Warren Weaver’s 1949 memorandum on machine translation framed the issue in computational terms. Yehoshua Bar-Hillel later argued that the task was unsolvable without modelling extensive world knowledge, highlighting the complexity inherent in sense interpretation.
Rule-based systems dominated early research, particularly in the 1970s with approaches such as preference semantics. These systems were limited by the knowledge acquisition bottleneck: the difficulty of encoding sufficiently rich linguistic and commonsense knowledge by hand. In the 1980s, large lexical resources such as the Oxford Advanced Learner’s Dictionary became available, enabling automated knowledge extraction, though methods remained largely dictionary-based.
The 1990s marked a statistical shift, as machine learning approaches began to outperform rule-based systems. By the 2000s, improvements in supervised methods began to plateau, prompting renewed interest in coarser sense distinctions, graph-based and knowledge-based approaches, semisupervised techniques and domain adaptation. Despite these developments, supervised learning remains the most accurate general approach.
Difficulties and Challenges
Differences in Sense Inventories
One major obstacle in WSD is deciding what constitutes a distinct sense. Dictionaries and thesauruses often divide meanings differently. While selecting a single reference dictionary can reduce inconsistency, fine-grained distinctions frequently lead to lower accuracy. Consequently, many studies favour coarser sense sets. WordNet is commonly used for English WSD, with additional resources such as Roget’s Thesaurus, Wikipedia and the multilingual BabelNet also serving as sense inventories.
Interaction with Part-of-Speech Tagging
Sense assignment is closely related to grammatical classification. Part-of-speech tagging affects sense interpretation and vice versa, though the immediate context required for the two tasks differs. Part-of-speech tagging typically achieves much higher accuracy—around 96 per cent for English—because it depends strongly on local context, whereas WSD often relies on more distant contextual cues.
Interjudge Variability
Human annotators often disagree on sense assignments, particularly for fine-grained distinctions. While part-of-speech categories are relatively easy to memorise and apply, sense inventories may be extensive or interpretively ambiguous. Human agreement therefore establishes an upper bound for computational performance and is typically lower for fine-grained tasks. This discrepancy has encouraged increasing use of coarse-grained evaluations.
Task Dependence and Algorithm Suitability
A universal sense inventory is unrealistic because different applications prioritise different distinctions. For example, machine translation requires selecting an appropriate target-language equivalent, whereas information retrieval may only require matching sense usage between queries and documents. As a result, different algorithms and sense granularities suit different tasks.
Discreteness and the Nature of Word Meaning
Word senses are not always discrete; meaning shades into context-dependent nuances. Lexicographers generalise from corpus evidence to create dictionary senses, but these categories do not necessarily align with computational requirements. Sense boundaries are often fuzzy or overlapping, which complicates automatic classification. The lexical substitution task introduced in 2009 attempts to bypass the discreteness issue by requiring systems to propose context-appropriate substitute words rather than select from fixed sense lists.
Approaches and Methods
Two broad methodological families exist:
- Deep approaches, which aim to use extensive commonsense knowledge bases. These approaches are limited by the scarcity of comprehensive, machine-readable world knowledge.
- Shallow approaches, which rely on statistical relationships, distributional context, lexical resources and pattern recognition. These methods have dominated practical WSD due to their comparative efficiency and adaptability.