Basic English Spell-check for OpenOffice.org Word Processor
This page gives discussion about the spelling check project and software.
en_BZ.zip readooo.html For OOo versions 1.1 through 2.4 April 8, 2008.
This is an OpenOffice.org (OOo) (word processor suite), spell-check wordlist. It is
in proper form because it was made by taking the regular OOo dictionary (spell-check wordlist) and removing the non-Basic words. This spell-check list was then
tested on most of Ogden's writings that he said are written in Basic English (they are) and has now been in use for over three years.
Files contained: The ZIP file will automatically download, but you will need
software to UNZIP it.
If you don't want to use the ZIP form, click on any of the filenames below and SAVE AS to your disk.
This is a spell checking word list for use with OpenOffice.org software
as a Basic English spell checker. It should also work with any other software that uses HunSpell or MyPELL for spell checking.
This word list is the Basic English wordlist with derivatives, complex words, contractions. The OpenOffice.org spell checking file provides the
capitalized words,1 proper nouns and names
with the non-Basic words removed. These two files are merged into a file
See "toddlist" for an early Basic English wordlist that is without proper nouns. The file named "en_BE.dic" has more compound words.
SETUP for Windows -- To Use:
Open Office.org offers a selection of languages. Several
come bundled with the OOo download: US, GB, DK, IT, RU (partial). [ American, Great Britain, German, Italian, Russian. You can
download many others.] Select a country dialect of minimal interest to you. In this example, English, en_BZ, is selected 2
The OOo dictionary root directory -- our working directory-- is probably named
B . Start OpenOffice.org
C:\Program Files\OpenOffice.org.(version number)\share\dict\ooo
It will contain dictionary files for several languages. Files for each language
for spell checking include : a dictionary, its affix file, a hyphenation file, and maybe a thesaurus file.
You will make one of the languages into Basic English and do your Basic English work using that country name, but it will really be Basic English.
1 . Copy en_BE.dic and en_BE.aff into the directory :
Program Files\OpenOffice.org.(version number)\share\dict\ooo\ 3
2 . Make a backup copy of the original word processing files.
Make a backup directory under this file and call it "originals" and put a copy of the entire ../ooo subdirectory into the new backup directory into ../ooo/originals/ .
2 . Copy three unZipped files into the first directory :
Program Files\OpenOffice.org.(version number)\share\dict\ooo\
3 . Commonwealth users (only) will want to use GB thesaurus and hyphenation.
Open Program Files\OpenOffice.org 2.0\share\dict\ooo\ dictionary.lst with any text editor.
dictionary.lst - you will delete the original file of this name.
Change these lines from :
HYPH en BZ hyph_en_US to HYPH en BZ hyph_en_GB
THES en BZ th_en_US_v2 to THES en BZ thes_en_GB
On the taskbar, select Tools |
then Options |
then Language Setting
| then Languages
Under "Default languages for documents" | Western
then select English (Belize)
(keep "Language of" the software as "Default". This will be your language
set during installation of OOo.)
Again, on the taskbar, select Tools
Language Settings | Writing Aids | Options and be sure that
these are NOT marked.
[ ] "Check in all languages" and
[ ] "Do not mark errors".
That is all that is needed ;
Restart OOo.5 Any new writing in OOo will be spell checked for Basic English. (unless or until you change Language back to your native language. You are able to change this for any or every document.
1. We find it simple to accept all capitalized words as legitimate Basic English, but provide the capitalized word list to make editing more clear with less underlining.
We found that the capitals file is excessively large, 169KB. For example, There are 17 spellings of the name Angelique. However, Open Office is fast enough that we provide the capitals in the default file.
Thesaurus. The thesaurus is a drop-down from toolbar Tools | Language | Thesaurus. Test the thesaurus feature with the word "fractional". Multiple
choices should appear. The Thesaurus provided is not Basic English. However, we have a Basic English Thesaurus in Beta Test to works okay that you can install after you have OOo working. We may include this is future isses of the en-BE "Zip file" download.
2 . OOo is capable of identifying a language for each memo or all future memos and this can be changed at will. You are able to maintain the flexibility of using your native language in OOo; this is why we recommend making a seldom used language into Basic English.
3 .The hyphen feature has not been tested, but try it. Commonwealth members should change these instructions to use hyph.en_GB.dic
4 . The first opening of OOo (swriter.exe) will take a few seconds. Subsequent opening that same day will be quicker. OOo has a Quick Start icon that speeds up the first opening by loading all the files as part of turning on the computer. If you have Quick Start on today, you will have to close the quick Start icon so that OOo can have a full start to recognize your new Basic English files. Future uses will start normally.
To Undo :
Change Tools | Options | Language Setting | Languages to your native language.
There are no registry changes. You can delete files you don't want.
SETUP for UNIX and APPLE
Adjust the instructions for a Windows setup to your system using the Basic English word list provided.
Apple users, please tell us what wording is better known to you, so we are able to make instructions friendly to other readers.
Unix users, please tell us what wording to better for you, so we are able to make instructions friendly to other readers.
Advanced Unix users might use MYSPELL or its enhansement, HunSpell, directly and not need Open Office. If you do use directly, please tell us how to do it so we can make instructions available to other readers. If you know how to make MYSPELL/HunSpell usable by Windows users, please tell us.
AutoCorrect is a useful feature. For example: the word "can" is not Basic.
You can have this automatically changed to "are able to" by going to Taskbar, click : TOOLS | then AUTOCORRECT. . .
Select the language English (Belize) and enter under Replace, "can", and under With, "may" or, perhaps, "are able to". Or to be more general "(be) able to". Hint : look under English(US) to see
some how AutoCorrect works, there are likely a few spelling corrections or features already there, such as replace "(c)" with "©".
Simplifying the working folder :
These files will be in the file folder and/or their various language counterparts.
dictionary.lst - - key file , see below
DicOOo.swx - - get more languages
FontOOo.sxw - - get more fonts (for OOo only)
WordNet_License.txt - -
en_US.aff - - US affixes
en_US.dic - - US spelling wordlist
hyph_en_US.dic - - US hyphen rules
th_en_US_v2.dat - - US thesaurus data
th_en_US_v2.idx - - US thesaurus index
en_BE.aff -- Basic English affix file
en_BE.dic - - Basic English dictionary file
hyph_en_US.dic - - copy of hyph_en_US.dic or your native language
The file "dictionary.lst" tells OOo which Dictionary, Affixes, Hyphenation rules, and Thesaurus to use. Where the first column is the
type of file, the 2nd is language, 3rd is variation, and 4th is file type.
Here is what the Institute version of dictionary.lst file looks like.
DICT en GB en_GB
The dictionary.lst file gives the names of the languages and features of those languages available to OOo. Other languages can be added and those that will never be used may be delete Because these are the only file names recognized, other language files can be deleted.
HYPH en GB hyph_en_GB
DICT en US en_US
HYPH en US hyph_en_US
THES en US th_en_US_v2
# These are added to activate Basic English
DICT en BZ en_BE
HYPH en BZ hyph_en_US
THES en BZ th_en_BE.dat
THES en BZ th_en_BE.idx
Affix means prefixes and suffixes.
Spell checking software uses an
algorithm to add prefix and suffix forms to the root word. The OpenOffice.org
affix file currently has 22 affixes defined. Ogden's Basic English
will make use of 10 of these. Unix created affix files have some
idiosyncrasies ; for example, re- is one of seven prefix options and
has a code value of option /A. The word "read" is coded automatically as ad/A. This means that manual preparation of files will contain human errors.
There are UNIX programs to create lists of these files, called, "munch" and "unmunch,"
that use Unix utilities in the OpenOffice.org project. This is not a problem with
our Basic English wordlist which is created in Windows, Excel. every Basic English word is listed without affixes, although some may have been manually added.
The OpenOffice.org affix file for en_US.aff is duplicated as en_BE.aff for use with Capitalized words. Any manually added affixes for BAsic words used only 10 of the available affixes.
The Basic prefix and suffixes are
Prefix : un- . Suffix : -ed , -est , -er , -ers , -ing , -ings , -ly , -s , -'s , plus restricted use of -able , -en and -th.
Details: - More than you need to know:
The spell checking software is HunSpell, a Hungarian University enhansemeent of MYSPELL by Kevin B. Hendricks, which was an enhancement of ISPELL,
The extra non-Basic prefixes include :
re- , in- , de- , dis- , con- , pro-
And suffixes of :
-ive , -ion, -en, -ication , ions , -ens , -ness , -th , -able , -ment .
These might be allowed with an advanced Basic situation and with a
little work these can be selectively included in the en_BE.dic file. An instructor of advanced
Basic might selectively introduce additional affixes.]
U un- D -ed T -est Restricted
G -ing S -s B -able
J -ings Y -ly N -en
M -'s Z -ers H -th
Inside the main dictionary file.
The number at the top is the word count. This saves the system
from having to do two passes thru the file. Therefore when you add
your name and town to the list, you will want to increase the
"TRY esianrtolcdugmphbyfvkw" in the affix file is the
the alphabetic search frequency list. It tells the affix software which letters to look for first.
Very little trouble has been reported in using OpenOffice for Basic English Spellchecking.
Does not find non-Basic words as misspelled.
Select "Tools" from the tool bar. Then "Options" at the bottom of
the drop-down list.. Them open "Language Settings" in the middle of the next drop-down list. Then open "Writing Aids". In the "Options" window near the bottom, UN-mark "Check all languages".
Be sure that QuickStart is not on and restart OOo after any changes to options.
Status and Future
The Basic English wordlist has been used for six years and is stable.
More compound words are added from time to time.
1. The big list that is provided today which contains all Basic words (spellings) and all Proper Nouns.
There is a need for several versions to satisfy different needs and levels of readers, but NOT coming soon.
2. The basic Basic wordlist only. This is much more than 850 words. Derivatives (affixes: -ed, -er, -ing, -ly, -s, un-), international words and compounds extend this to over 5000 different "words".
3. The Proper Noun wordlist from OOo. All of these are allowed in Basic and prevent thousands of false signs of misspelling. Proper nouns are names of people, titles, and places and are capitalized. Alice, Betty, Charles , Ambassador, Bishop, Captain , Asia, Brazil, China , etc. We let OOo provide these words and affixes.
1 . Very basic Basic for education. Ogden identified the derivatives allowed for each root word. Ogden suggests a limited number of complex words.
2 . Basic English as we use it today allows all six derivative affixes put with all Basic root words where the new word is existent in English. We make addition of many complex words from basic Basic and includes some modern words as international (computer, internet).
3 . and makes addition of the general-level special wordlists.
4 . Intermediate Basic with Ogden's "next step" words towards full English and make
addition of complex words from all the Basic words in lists 2. and 3. Basic words Able and Full will be allowed as suffixes where the meaning in a good Basic sense. This is the level
at which Basic will be used in public media.
5 . Advanced Basic will make use a few more common affixes (-ment, -th, -tion, non-) that
the learner will have found in daily usage. Technically, this simply allows use of the default OOo affix list.
6 . Simplified English. This is advanced Basic with the addition of the 1,000 most frequent
English words and with the addition of complex words made from them. This tends to be used in Instruction Manuals and at Simple Wikapedia.
7 . Make separate American and British spellings. They are joined today and allow mixed usage in text that may not be good.
8 . Basic words only -- where the big capital words are not needed.
OpenOffice.org and HunSpell/MySPELL are supposed to allow User Dictionaries. We had not been able to get them to work in OOo version 1. In the future we will test OOo ver 2 with the Basic Specialty Lists (Science, Business, Verse) and Basic Detail Lists (Geology, Economics, Bible, etc). We hope this will work because it will allow your selection of one or more additional lists of allowed words for spell checking. Our expectation is that you can select the Basic 850, one or more Specialty lists (100) and any detail word lists (50) as desired.
An instructor in two levels of Basic might use the technique described here to make another language into his Basic2 variation that might be either more strict or more advanced..
The standard language code is: a 2-character, lower case letters
to indicate the language. "en" means English ; an underscore ; and
2-capital letters indicating the language as spoken in a country.
US, GB and CA are USA, Great Britain and Canada. Thus en_US
is American English and en_GB is British.
The country dialect code BZ is used only because it is recognized by OOo.
We may ask the standards committee or OOo to
recognize a dialect of BE for "Basic English: International Second Language"
at some time.
Other files :
See toddlist.txt for a text file
of only basic Basic words. .
Todd showed us the way to use Basic English in the OOo software.
Or, the Institute's spellist.xls in Excel form. Todd and Excel are each without proper nouns and are a more strict use of derivatives.
Notes about OOo.
Somebody owns the word "OfficeOffice" so the software must be called
OpenOffice.org or OOo . Wonder what the story there is?
OOo calls a spell-checking wordlist, a dictionary -- it is really a vocabulary.
They call a synonym list, a thesaurus -- which is really a translation table. OOo thesaurus is more capable than we make use of in Basic.
An OOo dictionary, .dic, file is a simple text file that can be created or modified with any text processor. After you have confirmed that things are working properly, you can play with (customize) the file adding some local or personal words.
For efficiency using the OpenOffice.org
word processor , if you have limited computer power, save as "text encoded", with LF, without CR. (It saves one
carriage return per word, which is about 10% of file size. CR alone will not work.) But, saving as a regular
text file with any text processor will work fine. Just be sure to rename to .dic before using. And remember the .dic files have a word-count at the top.
OpenOffice.org must be restarted to recognize any changes.
Thus OpenOffice QuickStart must be "off" to recognize new dictionaries
or affix files. Otherwise it is not recognized until the next system restart (reboot).
This list is not yet approved by the OpenOffice.org project.
After we get the words stabilized and in the proper form, the
Basic English language will be submitted to the OOo Project for consideration.
Sample of the data:
The affix options used after a slash for Basic English are : (See OOo for complete instructions.)
About this Page: readoomore.html --Discussion of the spell checking project for a Basic English version of OpenOffice.org versions 1.1 through 2.4 spellcheck wordlist (dictionary).
Last updated April 8, 2008 -- this page is created to allow simplification of readooo.html installation instructions.
History : Jan 13, 2006 -- Simpifications
Dec 26, 2005 -- creation of this page.
Mar 18, 2005 -- OOo ver 2; Replace another language name, rather than English.
Feb 2, 2005 -- OOo 1.1.4
Aug 14, 2004 -- simplified
Feb 13, 2003 -- for OOo 1.0
Apr 28. 2002 -- initial release