Ogden's Basic English as a Lexical Database In Natural Language Processing
by Scott R. Hawkins
Early efforts in the area of associative networks illustrated the difficulty
of coming up with a representative cross section of the English language. The
tendency is to explore the areas of the language with which one's system
deals well in great detail while ignoring other areas almost completely. In
partial explanation of such practices let me say that this is not necessarily pure
laziness on the part of the designer, but often at least partially a side effect of
the designer's working in his or her native language. A speaker's vocabulary
is often as fundamental to the speaker as water is to a fish; sorting out the
essential from the superfluous can be a monumental task in itself.
Fortunately it is not necessary to do so, as C. K. Ogden has already collected in
his System of Basic English (Ogden, 1934) a vocabulary well suited to such
The System of Basic English was first published in 1934. (See Appendix
A) Its stated purpose was to extract from the approximately 20,000 English
words in common use a subset which would provide enough vocabulary to
get by in day-to-day operations, yet still be simple enough for an intelligent
foreign student to absorb in a month or so of study.
The first and most obvious advantage of the system of Basic English is
that it is well thought out and fairly representative. In associative network
design it is tempting to concentrate on the concepts which one's network
handles well. With 20,000 words or more to choose from, it is possible to
successfully encode a fairly large vocabulary using an approach that is not
generalizable. Using Basic English as an input set closed off many blind alleys
of that sort. The system contains vocabulary adequate to discuss a great many
concepts, but few are explored in detail.
As a side effect,
the vocabulary's purpose tended to focus the subjects of
the words chosen even as it limited their number. Words dealing with the
basic areas of personal finance, interpersonal relations, food, water, medicine
and sleep are treated heavily,3 while words dealing with, for example, the
sciences are left almost untouched. I hasten to point out that this is not
necessarily a blessing. Systems with a working knowledge of the 'common
sense' subjects dealt with in Basic English still remain largely the province of
As the previous paragraph would suggest, Basic English leans heavily
toward nouns.4 The verb vocabulary has been confined to the absolute
essentials, though many of the General Things category can be treated as verbs
3 Interestingly enough, there was no word for 'bathroom.'
[ water closet ]
4 Every word in the Things category can be used as a noun. Things comprise
600 of the 850 words, 70.5%.
as well as nouns. There are 150 Qualities, more commonly referred to as
adjectives and adverbs. The operations category is a grab bag of necessary
words that are not nouns, adjectives or adverbs, mostly consisting of the so-
called 'function words.'
Though Basic English was originally selected mostly as a matter of
convenience, it had many other advantages which slowly emerged during the
categorization process. The most abstract and hence most difficult words
(from a categorical standpoint) were omitted. For example, Ogden did not
include the word 'ascend' because the same meaning could be communicated
using the construct 'go up.' With few exceptions, all of Ogden's selected verbs
were primitive and irreducible.
In retrospect, this fad may have saved me from yet another trap. If I
had planned to encode 'ascend' in my network, the obvious approach would
be to reduce it to 'go up,' or some variation thereof. This is certainly possible,
and might have left the impression that I had accomplished something
which I had not. In reality, such a reduction simply pushes back the
fundamental issues one step farther: the computer would be no closer to an
understanding of the concepts of 'go' and 'up,' much less 'ascend.'
For the three reasons mentioned above:
- Representative nature of the vocabulary
- Relative simplicity of the terms of the vocabulary
- Irreducible nature of the elements of the vocabulary
I feel that Basic English is an appropriate collection of words from which to
construct a general purpose, expandable lexicon.
Back to Project Catalog or to
Institute home page.
About this Page: hawkins20.html - Project 712 page 20 Basic English
Last updated January 16, 2015.