Readme Notes for "Simple English" Wordlists
and TOOLBOX
Simple English = Basic English 1500 + Most Frequent 1000 words (320 additional).
Draft of May 2, 2008
Toolbox Files included:
readsimple.html 19KB (this note)
en_ZW.dic - the simple English spell checking dictionary.
en_ZW.aff - the simple English affic file (suffixes) Same as en_US.aff.
-commonwealth users may chose to us en_GB.aff.
dictionary.lst - list of languages available on your PC.
Purpose:
Provide a spell checking filter for use as Simple English by use with the free Open Office software suite. The vocabulary of
Simple English is composed of Basic English plus
the most frequent 1000 words in English. This page addresses the technical issues and electronic
office aids to try to help writers using Simple English.
These remainder here are left over notes from an earlier draft of this file incase we find something useful.
TO USE:
Copy en-ZW.dic ,
en_ZW.aff , and
dictionary.lst
into the file where you have OpenOffice.org shared dictionaires.
Probably is:
C:\Program Files\OpenOffice.org 2.4\share\dict\ooo
(You will also find en_US.dic , etc. in that same directory.)
(These instructions are the same as for Basic English (en_BE.dic) for those
familar with that spell checking.)
We have to fool Open Office for use of Simple English and pretend it is an
international language known to OpenOffice.org.
Start up Open Office.
Select Tools from the top line.
Select Options from the bottom of drop dowm menu.
Click on Language Settings
Select Languge Aids
Under User-Defined Dictionaires, Select New.
Enter name : "Simple
Select from list: English (Zimbabawe)
Put a check mark in front of Simple (English(Zimbabwe))
Click OK.
You have finished adding a new language to Open Office.
Click on Languages.
Go to the line: Default language for Documents.
On the line Wester -- note that English (Zimbabwe) is now
available to you and it has spell checking.
IF you intend Simple English to be used only occationally,
then select For this Document Only whenever you want to use
Simple English (Zimbabwe) spell checking.
To Undo:
There are no registry entries. Simply delete or don't use any features
that are no longer wanted.
Notes about OpenOffice.org
Download OpenOffice.org,
its freeware and a large file (95MB), or order it as a CD from one of their
partners.
We paid $5.50 for a copy.
Somebody owns the word "OpenOffice" so the software must be called
OpenOffice.org. Wonder what the story there is?
OOo calls spell-checking word lists -- a dictionary.
They call a translation chart -- a thesaurus or synonym list.
An OOo dictionary, .dic, file is a simple text file saved as
with OpenOffice Writer as text, but with the name-end of .dic . A "techy type" might chose to save as "text encoded", with LF, without CR. (It saves one
carriage return per word, about 10%..) Saving as a regular text
file will work fine and is easier to work with for additions and changes.
OpenOffice QuickStarter must be "off" only once, to recognize new dictionaries
or affix files. QuickStarter will save loading time after this first time.
This page was adapted from a similar page for use of Basic English with OOo. That page may be able to amplify on the discussion on this page.
This page started as creation of Basic 1500 file, but expanded to go directly to Simple English because of a need seen in the Simple Wikipedia.
People interested in VOASE (Special English) should contact the Institute, we
have a version that does Basic plus VOASE.
dictionary.lst file.
The file "dictionary.lst" contains a list of all dictionaries that OpenOffice is capable of using. We have added: en_BE (Basic English) and en_ZW (simple English) as dialects of English.
basic.dic 65KB
business.dic 4KB
science.dic 5KB
verse.dic 5KB
socialsci.dic 1KB
next 150 animals,plants, foods [future]
subseq350.txt 4KB+
freq.txt 3KB
capital.dic 169KB
miscellaneous.dic - of interest to Simple English / Internet
dictionary.lst (this replaces the one that comes with OOo and includes Simple English
en_BE.aff 2KB (Basic English version of affix file)
en_SIMPLE.aff 2KB (Simple English version of affix file.)
en_US.aff 2KB (spare copy of OOo full affix file.)
Note : .aff, .dic, and .txt are simply text files that can be read and edited with any simple text editor. The .dic files have affixes applied and possibly complex words. The .txt files are wordlists without affixes or not complete. They can be used, but derivatives will not be recognized.
HISTORY:
Jan 27, 2006 -- some may still be relivant.
This represents the state of things with plans of what might be available in a short time. You have the tools to make do your own list.
Basic The Basic 850, international, and complex words have affixes. This is the core of Basic English spell checking used for the last several years and is complete.
General Topics The root words are included with affixes, but not complex words.
Subsequent 350 About 10% of the root words have affixes applied and complex words. Affixes will be added in the future.
Most Frequent -- The most frequent 1000 root words that are not already in Basic (320 words) are provided. Affixes and some complex words will be added sometime.
The articles (an, an, the) comprise almost 10% of all words used. This shows that a few words can make up a lot of regularly English usage.
Miscellaneous -- Words useful in Simple English Environment,
especially if the big Capital.dic file is not used. Also, lower case things that are useful
for writer in an Internet environment. These are to reduce the number of apparent non-Simple English words.
Complex words (compounds) grow with addition of words. Complex words are only addressed within the
individual lists and most often only those that begin with the root word. (not those where the root follows). No attempt has been made to find complex words between lists. You will note that a large number can be created
from just the common Basic words : in-, out-, outer-, over-, self-,. under-, up-, upper-.
Other complex words can be added as uncovered in daily usage.
If you wish to help, such as add affixes to Most Frequent words or proof-read/edit the wordlists,, please email for current status to find where you can start.
A separate project available for someone is to search for all Standard English words (en_US.dic) that contain each Simple English word. Select those words found that are made of only Simple English words and allowed derivatives where the sense of the word-candidate is clear from the component Simple English words.
Proper Nouns are in a separate wordlist. The Proper names of persons, places and
things that are capitalized are outside of vocabulary and any properly capitalized word is
considered good Basic and, hence, good Simple English. A wordlist of proper nouns (capitalized words)
is very large because, for example, it has 20 spellings of Alice.
Special English Root words for this option are provided. But NOT complex words and affixes. VOASE follows the Basic rules for derivatives.
To Use:
Using OpenOffice.org Writer, open en_SIMPLE.dic.
Copy one or more or all of the specialty wordlist(s) into en_SIMPLE.dic.
While the file is open
Add your name, town, street, etc., one word per line.
Add a code word to confirm which wordlist is active : aasimple.
Sort alphabetically.
Delete duplicates. (note : affix codes may be in different order)
Change the word count in the first line to the final count number.
Save as en_SIMPLE.dic or en_anyname.dic
That's all.
You are ready to continue on to creating another language
in the "dictionary.lst" file.
It may have a path something like this :
C:\Program Files\OpenOffice.org 2.0\share\dict\ooo\dictionary.lst
Add or confirm this line :
English (Nimbabwe) will now be recognized as a language with spell checking capabilities.
Configure the OOo text processor to recognize the language "English (Nimbabwe)" as either "Default"
or "For the current document only."
Exit OOo QuickStart and re-start OOo.
Affix Details:
The number at the top is the word count. This saves the system
from having to do two passes thru the file. Therefore when you add
your name, town, etc. to the list, you will want to increase the word count.
Affix file.
Spell checking software often makes use of "affix" files and an
algorithm to add prefix and suffix forms to the root word.
The OpenOffice.org affix file currently has 22 affixes defined.
Ogden's Basic English makes use of only some of these.
Affix files have some
idiosyncrasies; for example, re- is one of seven prefix options used by MySpell
and is coded as option A. The word "read" therefore might be coded as ad/A.
This can get confusing. The Institute will eventually provide several
versions for users levels of spell checking needs. The Basic 1500
level seems to have some demand. This idea has been extended to the
Simple Wikipedia that resulted in this page.
The affix file en_ZW.aff is just en_US.aff should be in the program file for normal English documents.
And/Or the en_GB should also be there for Commonwealth documents. This is a simplied way for the software to efficiently handle common suffixes and prefixes (affixes) to root words. There are idiosycrocizes in this software and the
user may neglect the affix process and enter all spelling one per line.
un- -able, -ed, -er, -ers, -est, -ful, -full, -ing, -ings,
-ly, -s, -'s, -th (-th is a Basic Science suffix.)
Not recognized by MySpell and must be handled manually.
-edly, -eds, -ered, -erly, -ers, -erest, -ingly, -lies, and similar.
-ful, -full, -fully. -hold, -like, -most, -self, self-, -some, the prepositions,
and some other words.
For discussion : these are available in MySpell, OOo. but are
unknown to Basic.
con-, de-, dis-, e-, pro-, re-, -ive, -ion, ions, -ication (how?) -ment, -ness, -y.
(e- is a Basic Computer prefix)
Not available in MySpell -- for discussion for manual recognition.
non-, pre- , -tion (non- is a Basic Science prefix.)
dictionary.lst file.
The file "dictionary.lst" contains a list of all dictionaries that OpenOffice is capable of using. We have added: en_BE (Basic English) and en_ZW (simple English) as dialects of English.
Preset as: en ZW en_ZW
en ZW indicates language of a document of English, country dialect of Zimbabwe, will use en_SIMPLE as the name of the spellcheck dictionary and affix file.
If you create multiple spelling lists, then another less used dialect acknowledged by OOo include English(Trinidad) = en TT. Note English (Belize) is currently used for pure Basic. And English (Jamaica) for Basic2 ( 1500). Basic will be used by the most skilled Simple English writers. ;^)
dictionary.lst file.
A file suitable for Simple English (and Basic) usage is provided to replace the file that came with OOo. The additional features of Hyphenation and Thesaurus are pre-set
to use full English. Commonwealth users may prefer to change these from _US to _GB.
Note : Spelling in all files is first American with the most useful examples of Great
Britain included. Writers can use either, but try to be consistent within
any one page.
This "how-to" will be simplified over time as the elements are
consolidated, duplicates removed, affixes added, and so on. The consolidation will reduce the initial six (6) Basic English files into one word list. Eventually, there may become only one Simple English dictionary and one affix file. If anybody does this, please send a copy to share with others.
Reference project 452
Return to : Institute Home
About this Page: readsimple.html -- discussion of writing aids, spell checking wordlist for Simple English using OpenOffice.org software suite (HunSpell,MySpell).
Last updated : May 2, 2008 -- Simplify to one main Spell Check file.
February 1, 2006. -- minor wording changes
Created : January 14, 2006. Plan of aids for Simple English
URL: http://www.basic-english.org/down/readsimplemore.html