Project 712
   

Ogden's Basic English as a Lexical Database In Natural Language Processing
by   Scott R. Hawkins


Chapter I
Introduction and Statement of Purpose

Introduction


    The prospect of representing knowledge in associative networks is intuitively appealing. It takes only a moment's introspection to appreciate that human knowledge may be stored in some sort of associative network. From that realization it is a quick jump to a vision of vast bodies of knowledge neatly and elegantly encoded in computer networks. The notion is seductive, but also misleading. Virtually everyone who has worked extensively with associative networks has ultimately dismissed them as disappointing, if not actually useless. The current prospects for the use of associative networks in artificial intelligence and more particularly in natural language processing appear disappointing when considered in light of the intuitive correctness of the basic idea.

    In my case, at least, this disillusionment stemmed from skimming over a difference between humans and machines which is so fundamental and huge that it is easy to overlook. Computer scientists are (at least

Sample of Scan

1

nominally) engaged in an attempt to reproduce 'intelligent' behavior mechanically. Unfortunately, there is no universally accepted definition of intelligence. Arguably, the closest thing to such a definition is the Turing Test. The intuitive appeal of such a test is undeniable: people are the only thing currently classified as intelligent. Therefore, if we can create a machine to imitate human behavior precisely enough to fool a human observer we have created a thinking machine.

    This task reduces quickly and neatly to one of natural language processing. NLP textbooks treat such issues as syntax, semantics, the relative importance of subjects, verbs, and modifiers, and other grammatical issues in depth. Many useful and elegant methodologies for symbol manipulation have been added to scientific knowledge. The issues of computational linguistics are rich and intriguing, and it is easy to overlook the fact that in human minds words are arbitrary, language-dependent labels for common experiences, but to computers words are just arbitrary labels.

    Consider the 'cat.' The Japanese label for cat is 'neko,' the German label 'katze,' the Russian label something else, and so on. Each label applies to a set of memories and sensory experiences--touch, sight, smell, allergic reaction or the lack thereof--which differ only slightly not merely from individual to individual but from culture to culture. It is the similarity of these experiences which allows meaningful labels to be created. Any English speaker would undoubtedly conclude that 'cats are furry.' the Germans would agree that in general 'katze' are 'mit Pelz besetzen.'

2

    Indeed, members of some Oriental cultures might agree with dogs that cats are 'oishii'--tasty. Unfortunately dogs cannot express themselves vocally as well as people, or even computers. Still, at present one would be more likely to experience that ephemeral phenomenon known as communication when discussing the taste of cat with one's dog than with one's computer. The dog might not be able to express that in his mind an attribute which could be labeled 'tasty' exists, but it most assuredly does.

My point is that associative network designers are presented with a much larger problem than is immediately obvious. Though we are able to categorize words into nouns, verbs and adjectives--or whatever else might seem appropriate--though we are able to assign attributes such as 'furry' or even 'tasty' to nodes which we label 'cat,' we are not really accomplishing much in the way of knowledge representation.

Though a modern computer system might conceivably incorporate information storage capabilities and certain pattern recognition algorithms which are near the level of human abilities, such a system would still be limited by the quality of data available to those operations. No one has yet designed a mechanical input system which approaches a human level of sophistication.

    Hence, we are left with the concept of an attribute. If we can associate an attribute 'fuzzy' with the node 'dog,' we have for many purposes achieved roughly the same result as giving the computer the power to recognize the pattern of 'fuzziness' independently.

3

    The difficulty lies in the assignation process. Though it is a straightforward matter to construct an association between any two given words, no thorough and comprehensive system for assigning attributes currently exists. Where do you stop? In addition to being fuzzy, dogs can also be cute, mean, short, tall, big, small, brown, black, etc. The brute force method--have some human check each node in the network against each possible attribute--might be workable,1 but it seems inelegant.

    Now, from these observations and others like them, computer scientists can conclude some rather damning things about the usefulness of associative networks. The perceived main strength of the associative network--its usefulness in grouping concepts--has been undermined by the lack of consensus regarding the best way to exploit those properties of the network which deal with knowledge representation. Thus, the associative network is not yet a standard of the field.

    I do not take issue with these conclusions. Certainly associative networks are a tricky medium with which to represent knowledge, and no consensus exists on the best way in which to do so (Brachman, 1979, pp. 43- 45). However, I felt that there were other potentials inherent in the network model which might be profitably explored.

4

Statement of Purpose


    The purpose of this project is to construct an associative network from the system of Basic English (Ogden, 1934). The primary goal is to design and construct an expandable general purpose data structure to serve as a lexicon. In addition, the data structure should provide support to the operations of sentence parsing, sentence generation, and the construction of a logical form.

    In Chapter II an abbreviated history of the associative network is presented. First, networks whose features were utilized directly are briefly described. Also included are descriptions of some networks which, while not utilized directly, embody significant features which might profitably be incorporated into future refinements of this project.

    Chapter III describes of the System of Basic English presented together with an analysis of properties of that system which proved useful or interesting.

   Chapter IV contains a description of the lexicon. Also included is an explanation of how the data structure's properties can aid in sentence parsing and the generation of a logical form.
    In Chapter V significant features of the implementation of the lexicon are described.

5

    In Chapter VI the capabilities inherent in the data structure are shown. The results of a series of tests using a primitive question and answer interface are presented and explained.

    In Chapter VII, I list proposed routine improvements of the data structure followed by a proposal of directions for future research.

6

Chapter II
History of the Associative Network

Prelude


    The observation that words fall into categories is ancient; it dates back and probably earlier. Computer science applications date from M. Ross Quillian's publication of "Semantic Memory" (Quillian, 1966). Following Quillian, several papers were published (Levesque and Mylopoulos, 1979; Hendrix, 1979; Brachman,1979) in which the semantic hierarchy/ associative network concept was explored in great detail.

Case Grammar


    Though not directly related to the concept of the associative network, case grammar had enough influence on the design of many such networks to justify its inclusion in this section as background.

    The early attempts at natural language computation demonstrated that syntax-directed methods were not powerful enough to parse real world sentences unaided. Such tools as Generative Grammars, Augmented Transition Networks, and their associated refinements are at best intermediate steps in the attempt to automate natural language processing.

7

    Charles Fillmore's introduction of Case Grammar (1968) was indicative of a shift of emphasis from syntax to semantics. Broadly speaking, Case Grammar is an attempt to store sentences with similar meanings but different surface structures in the same way. For example, the two sentences

'The man ate the doughnut'
and
'The doughnut was eaten by the man.'

are identical in meaning but significantly different in structure. Case Grammar techniques reduce both sentences to the same internal representation. This goal came to be known as the quest for a logical form (Allen, 1987).

Quillian

    By all accounts, the first semantic memory was created by M. Ross Quillian (1968). Quillian's Ph.D. thesis described a system based in part on the observation that dictionary definitions are often circular. That is, if you start by looking up a word X defined as Y, then look up Y and see that it is defined as Z, the definitions will eventually lead back to X. Quillian used this observation to implement the memory model described below.

8

    Each word was stored in a node. Nodes were connected by various types of associative links. the associative links included ... Each word was stored in a node. Nodes were connected by various types of associative links. The associative links included     The set of a node and its associative links was called the plane of that node .

    The end product was an 'extremely complex' network of associated nodes. It is important to note that there was no particular hierarchy to the network. Whichever node you happened to start with served as the root. From an organizational standpoint it was a mess, but it did yield some interesting results.

    For example, when asked to compare the words 'Earth' and 'Live,' Quillian's network yielded the following:

Earth is a planet of animal
To live is to have existence as animal



    These conclusions were reached by searching the network for the shortest path between the nodes 'Earth' and 'Live.' That path went through the node 'animal,' and the two sentences above were generated as a result.

9


    The project's ultimate goal was to enable the computer to infer the most likely meaning of an unclear sentence by tracing the links through the network to the place where they intersected.
    This would be useful in sentence parsing in the following way: suppose the computer was parsing the sentence 'Animals which live on the Earth breathe oxygen.' Upon encountering the word 'live' the computer is faced with a variety of conflicting possible meanings.2 Hopefully, the network would help to resolve the conflict.
    The project's success was limited by the bulkiness of its end product in terms of late-l960's memory. Ultimately only twenty words or so could be networked at one dme.
    Quillian's work on networks continued in 1969 when he teamed up with a psychologist, Alan Collins (Collins and Quillian, 1969). Together they conduded experiments designed to determine what resemblance Quillian's model bore to the way humans store information and conduct inferencing. The results of these experiments appeared to reaffirm the basic correctness of the associative network concept.

10

    By 1969 Quillian had retined his model somewhat. Instead of the somewhat amorphous data structure used previously, Quillian and Collins constructed a semantic hierarchy. This network, which came to be known as a taxonomic tree or sometimes as an is a hierarchy, was the first instance of semantic hierarchies being used for natural language processing. However, the authors were careful to note that the idea of taxonomic hierarchies being implicit in natural language dates back to Aristotle.

    Their experiment presented human subjects with propositions of the form

A is a B

and measured their reaction times in determining whether the proposition was true or false. Their hypothesis was that human reaction times would be proportional to the number of links separating A and B in the data structure. Experimental evidence bore their hypothesis out.

Simmons

    In 1973 Robert Simmons drew on the concepts of case grammar and semantic networks for his paper "Semantic Networks' Their Computation and Use for Understanding English Sentences" (Simmons, 1973). He abandoned the taxonomic hierarchy proposed by Quillian and Collins. In its place he used a non-hierarchical associative network which incorporated features of Case Grammar.

11

    His program utilized a sophisticated ATN to generate network representations of the deep structure of English sentences. Once the net for a sentence was created, the program could use it to generate English sentences in response to queries. Furthermore, the program had some ability to recognize when two sentences with different surface structures had identical meanings.

    Though his paper contained some interesting philosophical discussions on the nature of thought and computation, his actual product was not much more than an implementation of standard Case Grammar ideas.

Hendrix



    Like Simmons, Gary Hendrix (1979) also eschewed the taxonomic hierarchy concept. However, the refinements he added to Simmons' non- hierarchical scheme endowed Hendrix's model with powers comparable to the taxonomic hierarchies of Quillian and Collins, in addition to the flexibility inherent in Simmons' program.
    Hendrix's model had many distinct types of arc. First, he provided the arcs necessary to implement standard Case Grammar features. In addition, there were arcs to denote the concepts of subset, element, disjoint and distinct. This gave Hendrix's model the ability to answer questions of categorization.

12

For example:     Hendrix was apparently something of a mathematician as well. By viewing his network purely in the abstract, he was able to include some of the broader taxonomic concepts without explicitly defining links for them. This was accomplished by assigning each node to a space with the option of assigning spaces to metaspaces called vistas. Vistas or spaces might have supernodes associated with them. Supernodes were nodes used to store taxonomic knowledge about entire spaces.

    For example, the node HENRY L. STIMSON might be linked to the node NUCLEAR SUBMARINE by an e-arc, indicating that the HENRY L. STIMSON was an example of the concept NUCLEAR SUBMARINE. The concept of NUCLEAR SUBMARINE was a part of the BOAT space which was in turn part of the vista of TRANSPORTATION.

    Hendrix's model did not really have any more ability to sort out cases than Simmons' work, nor did it contain more taxonomic knowledge than Quillian's work. However, Hendrix did manage to wrap the best features of both up in one elegant little package.

13

    One final feature of Hendrix's model is worth mentioning. His spaces were also subject to logic operations. The interface could sort out the truth values of simple propositions by use of conjunction, disjunction, etc.

Levesque and Mylopoulos

    Hector Levesque and John Mylopoulos' paper " A Procedural Semantics for Semantic Networks" (Levesque and Mylopoulos, 1979) included an interesting addendum to the taxonomic hierarchy schemata. There were three aspects to their representation of semantic networks. The first two, classes and relations, implemented the standard features of isa (sic) trees. Classes could be created, destroyed, or retrieved. Object could be tested for membership within a given class. Relationships between two objects could be tested for existence and type or simply fetched.

    The feature that made their work interesting was that actual source code objects such as functions and procedures were part of the network as well. Levesque and Mylopoulos defined a four step formalism to deal with the unique problems posed by incorporating meta-knowledge in a semantic network.

    A proposed program was defined in four parts: the prerequisites necessary before the code could be executed, the executable code, the effect this execution had on the network, or the complaint which would be a result of execution failure. The actual coding was done in LISP due to that language's self-referential features.

14

Shubert, Cercone, and Goebel

    The last variation on the semantic network concept covered here is also the most complicated. It was developed by Lenhart K. Nicholas I. Cercone and Randolph I. Goebel (1979). Their network was primarily concerned with knowledge representation. Rather then further refine the now traditional Case Grammar variations, they chose instead to work in predicate calculus. The 'extended' part of the title was justified by a second organizational scheme called topic hierarchies which they wove into the knowledge base.

    The choice of propositional calculus as the primary storage scheme naturally expedited the internal representation of propositions. For example, the propositions 'Scott does not have a computer' would be stored as:

[[Scott has computer] NOT]

15

    The standard corollaries such as equivalence, causation, implication, etc., were covered by similar means.

    Provisions for temporal data were also implicit in the structure. For example, the fact that the verb 'to go' implies a succession of states (start time to stop time) yielded a variety of new storage fields for all the possible tenses of the verb.

    Another noteworthy feature of the extended semantic network was the probability assessment features. English words automatically yielded various probabilities. If the word 'doubt' occurred in a proposition, the proposition was assigned a C-value of thirty percent. The word 'probably' yielded a C~value of eighty percent, 'maybe' produced fifty percent, and so on.

    One of the drawbacks of semantic nets is the apparent randomness with which new relations spring up. When accessing information on a given topic a query executor might have to scan dozens or perhaps hundreds of links before finding the appropriate one. This problem is addressed by the topic hierarchy structure.

    A topic hierarchy is a generalized template of the knowable facts about a given topic. As the network accumulates knowledge about the topic, slots are filled in. Essentially it is an index, but it cuts the access time significantly.

16

Sowa

    More recently, research related to associative networks has shifted from their use in representing the meanings of individual words to their uses in representing the meaning of entire sentences (Sowa, 1992). The so-called conceptual structures have been demonstrated to be a useful formalism for this type of knowledge representation.

    The methodologies involved are, in part, refinements of the topic hierarchies and Case Grammar techniques discussed above. Again, the main difference lies in the purpose of the new type of network. The conceptual structures described by Sowa are not designed to be used as lexicons. Consequently, they can make use of linguistic constructs (Ex: Time-period, Supported-By) and abbreviations (Ex: Attr, Chrc) which could not be included in a 1exicon. Therefore, they were not of direct use in this project.

Analysis

    I was impressed with the case grammar arcs implemented by Gary Hendrix. However, after some thought I concluded that they are more applicable to the knowledge representation component of natural language processing than to lexical storage.

17

    The classes and relations of the Levesque / Mylopoulos model proved to be unnecessary for expressing the relationships between words in my vocabulary of Basic English. I found that equal and sometimes greater lexical knowledge of a word could be encoded by using two arcs (described below) together with a type field. In addition, my network ultimately shaped up as a fairly strict hierarchy. The advantages of this property become clear when you see some of their situation representations presented in graphical format. The representations were rather untidy and often confusing.

    Ultimately, I settled on a three arc model for my implementation. This choice was dictated as much by the general nature of the vocabulary as any historical factor. Though the multitude of arc types historically available allows a rich variety of structural and semantic concepts to be expressed, I felt the vocabulary required only two types of arc, instance and component, to be adequately expressive. For example, the node for the word 'bone' is a component of the parent node 'animal.' On the other hand, the node for the word 'quadruped' is an instance of the parent node 'animal' I found that this two arc representation method to be entirely adequate for application to most elements of Basic English.

    In addition, I included a synonym arc. I felt that to be more appropriate than simply associating a list of synonyms with a given word, since words often have similar meanings but not identical uses. For example, the adjedives 'long' and 'tall' convey roughly the same piece of information; but 'tall' is the preferred word for describing a person's height.

18

    Finally, my concept of a systems hierarchy (described below) owes something to the topic hierarchies of Schubert, Cercone, and Goebel. However, their implementation was extremely oriented to case grammar and lacked simplicity. Though their ideas were useful, it was not feasible to duplicate their implementation in a data structure intended to serve primarily as a lexicon.
    Before moving on to a more thorough discussion of the categorization process and the resulting network model, I believe it would be in order to briefly discuss the source vocabulary, the System of Basic English.
   

19



Back to Project Catalog   or to   Basic English Institute home page.        
About this Page: hawkins01.html - Project 712 page 01 Introduction
Last updated January 20, 2015.
Contact us

URL: http://www.basic-english.org/projects/hawkins01.html