Ogden's Basic English as a Lexical Database In Natural Language Processing
by Scott R. Hawkins
Introduction and Statement of Purpose
The prospect of representing knowledge in associative networks is
intuitively appealing. It takes only a moment's introspection to appreciate
that human knowledge may be stored in some sort of associative network.
From that realization it is a quick jump to a vision of vast bodies of
knowledge neatly and elegantly encoded in computer networks. The notion
is seductive, but also misleading. Virtually everyone who has worked
extensively with associative networks has ultimately dismissed them as
disappointing, if not actually useless. The current prospects for the use of
associative networks in artificial intelligence and more particularly in natural
language processing appear disappointing when considered in light of the
intuitive correctness of the basic idea.
In my case, at least, this disillusionment stemmed from skimming
over a difference between humans and machines which is so fundamental
and huge that it is easy to overlook. Computer scientists are (at least
Sample of Scan
nominally) engaged in an attempt to reproduce 'intelligent' behavior
mechanically. Unfortunately, there is no universally accepted definition of
intelligence. Arguably, the closest thing to such a definition is the Turing
Test. The intuitive appeal of such a test is undeniable: people are the only
thing currently classified as intelligent. Therefore, if we can create a machine
to imitate human behavior precisely enough to fool a human observer we
have created a thinking machine.
This task reduces quickly and neatly to one of natural language
processing. NLP textbooks treat such issues as syntax, semantics, the relative
importance of subjects, verbs, and modifiers, and other grammatical issues in
depth. Many useful and elegant methodologies for symbol manipulation
have been added to scientific knowledge. The issues of computational
linguistics are rich and intriguing, and it is easy to overlook the fact that in
human minds words are arbitrary, language-dependent labels for common
experiences, but to computers words are just arbitrary labels.
Consider the 'cat.' The Japanese label for cat is 'neko,' the German label
'katze,' the Russian label something else, and so on. Each label applies to a set
of memories and sensory experiences--touch, sight, smell, allergic reaction or
the lack thereof--which differ only slightly not merely from individual to
individual but from culture to culture. It is the similarity of these experiences
which allows meaningful labels to be created. Any English speaker would
undoubtedly conclude that 'cats are furry.' the Germans would agree that in
general 'katze' are 'mit Pelz besetzen.'
Indeed, members of some Oriental cultures might agree with dogs that
cats are 'oishii'--tasty. Unfortunately dogs cannot express themselves vocally
as well as people, or even computers. Still, at present one would be more
likely to experience that ephemeral phenomenon known as communication
when discussing the taste of cat with one's dog than with one's computer.
The dog might not be able to express that in his mind an attribute which
could be labeled 'tasty' exists, but it most assuredly does.
My point is that associative network designers are presented with a
much larger problem than is immediately obvious. Though we are able to
categorize words into nouns, verbs and adjectives--or whatever else might
seem appropriate--though we are able to assign attributes such as 'furry' or
even 'tasty' to nodes which we label 'cat,' we are not really accomplishing
much in the way of knowledge representation.
Though a modern computer system might conceivably incorporate
information storage capabilities and certain pattern recognition algorithms
which are near the level of human abilities, such a system would still be
limited by the quality of data available to those operations. No one has yet
designed a mechanical input system which approaches a human level of
Hence, we are left with the concept of an attribute. If we can associate
an attribute 'fuzzy' with the node 'dog,' we have for many purposes achieved
roughly the same result as giving the computer the power to recognize the
pattern of 'fuzziness' independently.
The difficulty lies in the assignation process. Though it is a
straightforward matter to construct an association between any two given
words, no thorough and comprehensive system for assigning attributes
currently exists. Where do you stop? In addition to being fuzzy, dogs can also
be cute, mean, short, tall, big, small, brown, black, etc. The brute force
method--have some human check each node in the network against each
possible attribute--might be workable,1 but it seems inelegant.
Now, from these observations and others like them, computer
scientists can conclude some rather damning things about the usefulness of
associative networks. The perceived main strength of the associative
network--its usefulness in grouping concepts--has been undermined by the
lack of consensus regarding the best way to exploit those properties of the
network which deal with knowledge representation. Thus, the associative
network is not yet a standard of the field.
I do not take issue with these conclusions. Certainly associative
networks are a tricky medium with which to represent knowledge, and no
consensus exists on the best way in which to do so (Brachman, 1979, pp. 43-
45). However, I felt that there were other potentials inherent in the network
model which might be profitably explored.
1 Worst case for Basic English: 850 x 850 comparisons = 722,000 comparisons.
Assuming an average of one comparison every two seconds, the process
would take about ten work weeks.
Statement of Purpose
The purpose of this project is to construct an associative network from
the system of Basic English (Ogden, 1934). The primary goal is to design and
construct an expandable general purpose data structure to serve as a lexicon.
In addition, the data structure should provide support to the operations of
sentence parsing, sentence generation, and the construction of a logical form.
In Chapter II an abbreviated history of the associative network is
presented. First, networks whose features were utilized directly are briefly
described. Also included are descriptions of some networks which, while not
utilized directly, embody significant features which might profitably be
incorporated into future refinements of this project.
Chapter III describes of the System of Basic English presented together
with an analysis of properties of that system which proved useful or
Chapter IV contains a description of the lexicon. Also included is an
explanation of how the data structure's properties can aid in sentence parsing
and the generation of a logical form.
In Chapter V significant features of the implementation of the lexicon
In Chapter VI the capabilities inherent in the data structure are shown.
The results of a series of tests using a primitive question and answer interface
are presented and explained.
In Chapter VII, I list proposed routine improvements of the data
structure followed by a proposal of directions for future research.
History of the Associative Network
The observation that words fall into categories is ancient; it dates back
and probably earlier. Computer science applications date
from M. Ross Quillian's publication of "Semantic Memory" (Quillian, 1966).
Following Quillian, several papers were published (Levesque and
Mylopoulos, 1979; Hendrix, 1979; Brachman,1979) in which the semantic
hierarchy/ associative network concept was explored in great detail.
Though not directly related to the concept of the associative network,
case grammar had enough influence on the design of many such networks to
justify its inclusion in this section as background.
The early attempts at natural language computation demonstrated that
syntax-directed methods were not powerful enough to parse real world
sentences unaided. Such tools as Generative Grammars, Augmented
Transition Networks, and their associated refinements are at best
intermediate steps in the attempt to automate natural language processing.
Charles Fillmore's introduction of Case Grammar (1968) was
indicative of a shift of emphasis from syntax to semantics. Broadly speaking,
Case Grammar is an attempt to store sentences with similar meanings but
different surface structures in the same way. For example, the two sentences
'The man ate the doughnut'
'The doughnut was eaten by the man.'
are identical in meaning but significantly different in structure. Case
Grammar techniques reduce both sentences to the same internal
representation. This goal came to be known as the quest for a logical form
By all accounts, the first semantic memory was created by M. Ross Quillian (1968).
Quillian's Ph.D. thesis described a system based in part on
the observation that dictionary definitions are often circular. That is, if you
start by looking up a word X defined as Y, then look up Y and see that it is
defined as Z, the definitions will eventually lead back to X. Quillian used this
observation to implement the memory model described below.
Each word was stored in a node. Nodes were connected by various types
of associative links. the associative links included ...
Each word was stored in a node. Nodes were connected by various
types of associative links. The associative links included
The set of a node and its associative links was called the plane of that
- a single pointer to the node's parent or type node
- a collection of pointers to the other nodes which roughly made
up the node's dictionary definition.
The end product was an 'extremely complex' network of associated
nodes. It is important to note that there was no particular hierarchy to the
network. Whichever node you happened to start with served as the root.
From an organizational standpoint it was a mess, but it did yield some
For example, when asked to compare the words 'Earth' and 'Live,'
Quillian's network yielded the following:
Earth is a planet of animal
To live is to have existence as animal
These conclusions were reached by searching the network for the
shortest path between the nodes 'Earth' and 'Live.' That path went through
the node 'animal,' and the two sentences above were generated as a result.
The project's ultimate goal was to enable the computer to infer the
most likely meaning of an unclear sentence by tracing the links through the
network to the place where they intersected.
This would be useful in sentence parsing in the following way: suppose
the computer was parsing the sentence 'Animals which live on the Earth
breathe oxygen.' Upon encountering the word 'live' the computer is faced
with a variety of conflicting possible meanings.2 Hopefully, the network
would help to resolve the conflict.
The project's success was limited by the bulkiness of its end product in
terms of late-l960's memory. Ultimately only twenty words or so could be
networked at one dme.
Quillian's work on networks continued in 1969 when he teamed up
with a psychologist, Alan Collins (Collins and Quillian, 1969). Together they
conduded experiments designed to determine what resemblance Quillian's
model bore to the way humans store information and conduct inferencing.
The results of these experiments appeared to reaffirm the basic correctness of
the associative network concept.
2 'Live' as a verb has several possible meanings--for example, in the sentence
'Where do you live?,' the word 'live' is a synonym for 'dwell.' However, in
the sentence 'Let him live! the verb 'live' acquires an altogether different
By 1969 Quillian had retined his model somewhat. Instead of the
somewhat amorphous data structure used previously, Quillian and Collins
constructed a semantic hierarchy. This network, which came to be known as
a taxonomic tree or sometimes as an is a hierarchy, was the first instance of
semantic hierarchies being used for natural language processing. However,
the authors were careful to note that the idea of taxonomic hierarchies being
implicit in natural language dates back to Aristotle.
Their experiment presented human subjects with propositions of the
A is a B
and measured their reaction times in determining whether the proposition
was true or false. Their hypothesis was that human reaction times would be
proportional to the number of links separating A and B in the data structure.
Experimental evidence bore their hypothesis out.
In 1973 Robert Simmons drew on the concepts of case grammar and
semantic networks for his paper "Semantic Networks' Their Computation
and Use for Understanding English Sentences" (Simmons, 1973). He
abandoned the taxonomic hierarchy proposed by Quillian and Collins. In its
place he used a non-hierarchical associative network which incorporated
features of Case Grammar.
His program utilized a sophisticated ATN to generate network
representations of the deep structure of English sentences. Once the net for a
sentence was created, the program could use it to generate English sentences
in response to queries. Furthermore, the program had some ability to
recognize when two sentences with different surface structures had identical
Though his paper contained some interesting philosophical
discussions on the nature of thought and computation, his actual product was
not much more than an implementation of standard Case Grammar ideas.
Like Simmons, Gary Hendrix (1979) also eschewed the taxonomic
hierarchy concept. However, the refinements he added to Simmons' non-
hierarchical scheme endowed Hendrix's model with powers comparable to
the taxonomic hierarchies of Quillian and Collins, in addition to the
flexibility inherent in Simmons' program.
Hendrix's model had many distinct types of arc. First, he provided the
arcs necessary to implement standard Case Grammar features. In addition,
there were arcs to denote the concepts of subset, element, disjoint and distinct.
This gave Hendrix's model the ability to answer questions of categorization.
Hendrix was apparently something of a mathematician as well.
By viewing his network purely in the abstract, he was able to include
some of the broader taxonomic concepts without explicitly defining
links for them. This was accomplished by assigning each node to a
space with the option of assigning spaces to metaspaces called vistas.
Vistas or spaces might have supernodes associated with them.
Supernodes were nodes used to store taxonomic knowledge about
Q: Is the HENRY L. STIMSON a submarine?
A: Yes it is.
For example, the node HENRY L. STIMSON might be linked to
the node NUCLEAR SUBMARINE by an e-arc, indicating that the
HENRY L. STIMSON was an example of the concept NUCLEAR
SUBMARINE. The concept of NUCLEAR SUBMARINE was a part of
the BOAT space which was in turn part of the vista of
Hendrix's model did not really have any more ability to sort out
cases than Simmons' work, nor did it contain more taxonomic
knowledge than Quillian's work. However, Hendrix did manage to
wrap the best features of both up in one elegant little package.
One final feature of Hendrix's model is worth mentioning. His
spaces were also subject to logic operations. The interface could sort
out the truth values of simple propositions by use of conjunction,
Levesque and Mylopoulos
Hector Levesque and John Mylopoulos' paper " A Procedural
Semantics for Semantic Networks" (Levesque and Mylopoulos, 1979)
included an interesting addendum to the taxonomic hierarchy
schemata. There were three aspects to their representation of semantic
networks. The first two, classes and relations, implemented the
standard features of isa (sic) trees. Classes could be created, destroyed, or
retrieved. Object could be tested for membership within a given class.
Relationships between two objects could be tested for existence and type
or simply fetched.
The feature that made their work interesting was that actual
source code objects such as functions and procedures were part of the
network as well. Levesque and Mylopoulos defined a four step
formalism to deal with the unique problems posed by incorporating
meta-knowledge in a semantic network.
A proposed program was defined in four parts: the prerequisites
necessary before the code could be executed, the executable code, the
effect this execution had on the network, or the complaint which
would be a result of execution failure. The actual coding was done in
LISP due to that language's self-referential features.
Shubert, Cercone, and Goebel
The last variation on the semantic network concept covered
here is also the most complicated. It was developed by Lenhart K.
Nicholas I. Cercone and Randolph I. Goebel (1979). Their
network was primarily concerned with knowledge representation.
Rather then further refine the now traditional Case Grammar
variations, they chose instead to work in predicate calculus. The
'extended' part of the title was justified by a second organizational
scheme called topic hierarchies which they wove into the knowledge
The choice of propositional calculus as the primary storage
scheme naturally expedited the internal representation of propositions.
For example, the propositions 'Scott does not have a computer' would
be stored as:
[[Scott has computer] NOT]
The standard corollaries such as equivalence, causation, implication,
etc., were covered by similar means.
Provisions for temporal data were also implicit in the structure.
For example, the fact that the verb 'to go' implies a succession of states
(start time to stop time) yielded a variety of new storage fields for all
the possible tenses of the verb.
Another noteworthy feature of the extended semantic network
was the probability assessment features. English words automatically
yielded various probabilities. If the word 'doubt' occurred in a
proposition, the proposition was assigned a C-value of thirty percent.
The word 'probably' yielded a C~value of eighty percent, 'maybe'
produced fifty percent, and so on.
One of the drawbacks of semantic nets is the apparent
randomness with which new relations spring up. When accessing
information on a given topic a query executor might have to scan
dozens or perhaps hundreds of links before finding the appropriate
one. This problem is addressed by the topic hierarchy structure.
A topic hierarchy is a generalized template of the knowable facts
about a given topic. As the network accumulates knowledge about the
topic, slots are filled in. Essentially it is an index, but it cuts the access
More recently, research related to associative networks has shifted from
their use in representing the meanings of individual words to their uses in
representing the meaning of entire sentences (Sowa, 1992). The so-called
conceptual structures have been demonstrated to be a useful formalism for
this type of knowledge representation.
The methodologies involved are, in part, refinements of the topic
hierarchies and Case Grammar techniques discussed above. Again, the main
difference lies in the purpose of the new type of network. The conceptual
structures described by Sowa are not designed to be used as lexicons.
Consequently, they can make use of linguistic constructs (Ex: Time-period,
Supported-By) and abbreviations (Ex: Attr, Chrc) which could not be included
in a 1exicon. Therefore, they were not of direct use in this project.
I was impressed with the case grammar arcs implemented by Gary
Hendrix. However, after some thought I concluded that they are more
applicable to the knowledge representation component of natural language
processing than to lexical storage.
The classes and relations of the Levesque / Mylopoulos model proved
to be unnecessary for expressing the relationships between words in my
vocabulary of Basic English. I found that equal and sometimes greater lexical
knowledge of a word could be encoded by using two arcs (described below)
together with a type field. In addition, my network ultimately shaped up as a
fairly strict hierarchy. The advantages of this property become clear when you
see some of their situation representations presented in graphical format.
The representations were rather untidy and often confusing.
Ultimately, I settled on a three arc model for my implementation. This
choice was dictated as much by the general nature of the vocabulary as any
historical factor. Though the multitude of arc types historically available
allows a rich variety of structural and semantic concepts to be expressed, I felt
the vocabulary required only two types of arc, instance and component, to be
adequately expressive. For example, the node for the word 'bone' is a
component of the parent node 'animal.' On the other hand, the node for the
word 'quadruped' is an instance of the parent node 'animal' I found that this
two arc representation method to be entirely adequate for application to most
elements of Basic English.
In addition, I included a synonym arc. I felt that to be more appropriate
than simply associating a list of synonyms with a given word, since words
often have similar meanings but not identical uses. For example, the
adjedives 'long' and 'tall' convey roughly the same piece of information; but
'tall' is the preferred word for describing a person's height.
Finally, my concept of a systems hierarchy (described below) owes
something to the topic hierarchies of Schubert, Cercone, and Goebel.
However, their implementation was extremely oriented to case grammar and
lacked simplicity. Though their ideas were useful, it was not feasible to
duplicate their implementation in a data structure intended to serve
primarily as a lexicon.
Before moving on to a more thorough discussion of the categorization process and the resulting network model,
I believe it would be in order to briefly discuss the source vocabulary, the System of Basic English.
a This project uses the Basic word list, not Basic
the language. There are only 16 Basic verbs. The Basic word 'living' is an adjective, a modifier that needs a verb.
Thus 'dwell' is 'Where are you living?' and the operator for 'Let him live' would be 'Let him be living'. After a person has
learned basic Basic, then many words are easily expanded to a fuller English, such as, the noun 'a hammer' expands to the action verb 'to hammer'. Such expansion is great for moving towards full English, but is at the risk of losing some international understanding if that is a concern.
Back to Project Catalog or to
Institute home page.
About this Page: hawkins01.html - Project 712 page 01 Introduction
Last updated January 20, 2015.