VirtualWorks firmly believes that natural language processing applications without recourse to high-coverage and coherent morpho-semantic dictionaries make little sense.
The central bottom layer of any natural language processing system must be an extensive lexicon containing information about the form and meanings of words. Without such a lexicon most analytic operations are simply more or less sophisticated forms of “guessing”: but guessing is rarely enough! A system that does not have any reliable information in advance about the entities (words, entities, propositions) that it pretends to identify is simply not trustworthy.
VirtualWorks has developed (in over 40 years of total development time) a system of electronic dictionaries covering all the basic full-forms (and their lemmas) as well as morphological, syntactic and semantic information. These dictionary systems are available for all the major European languages. In addition to so-called “simple words”, VirtualWorks dictionaries also contain millions of complex (i.e. multi-word) words, which are typically absent from most existing dictionaries elsewhere. These dictionaries are implemented as finite-state machines which can perform lexical look-ups for millions of words per second.