SIMPLE (ENGLISH) WIKIPEDIA Wordlist for Spell Checking
More than you may want to know.
Draft warning -- this note is in transition between a prior Wiki
readme for developers and a simplified, version with only one main file for use by writers. This Wiki note will eventually look similar to the one for Simple English where VOA words are kept separate, whereas
VOA is included in this Wiki version. A couple of file names will change, but is otherwise the same. That readme may be more helpful until after planting season is over and this memo may be made more clear.
Simple English has no firm definition(s) but it is generally regarded as Basic English (to be able to say anything) ; Plus the 1000 (pick a number) most frequent English words (not already in Basic, to give fluidity) ; and some, including Simple.Wikipedia.Org, add VOA Special English (because of if its wide access for beginners). These have been provided in lists at http://www.basic-english.org/down/readsimple.html for selection, experimentation, and
use in Open Office.org word processing (and more) suite wherein the non-Simple words are
highlighted for simplification. Comment within SimpleWiki is desired and perhaps a standard definition may be developed of the SimpleWiki vocabulary. Until such time as
some standards are established, we offer a Simple English writer spell check and translation of non-Simple words into Basic English (not into the wider Simple English). Please note that all proper nouns (capitalized words) are allowed in Simple English, even though most capitalized words will show as misspelled and not Wiki)A full range of Basic, frequency, VOA and capitalized word files may be provided for a Simple English Developer. If you as a Wiki writer need proper nouns, then concatenate capital.dic to WIKI.dic , sort , change line count in the first line , save WIKI.dic.
This page is being separated into its own page re-written for writers
of Simple.Wikedia English, rather than developers of Simple English. Please pardon the unfinished nature of this page.
I . FOR SIMPLE WIKI WRITERS
Files included:
a . readwiki.html 12B
b. WIKI.dic -- 4760 root words (line items). 50 KB
Consists of Basic 1500, plus Frequent 1000 (adds 680 words) plus VOA-SE (adds 450 words).
c. WIKI.aff 2KB An "affix" includes both prefixes and suffixes. This is a feature of the software that adds some efficiency. It is the same as en_US.aff.
d . dictionary.lst (this lists dictionaries that are available to Open Office and replaces the one that comes with OOo and includes Simple English/Wiki.
e. Read Wiki More - 14KB , this note.
Follow the instructions for substituting your selected vocabulary in Open Office.org.
Note : Many people new to simple languages are surprised that a few root words multiply into many times their number of spellings and senses. Learning the Basic 850 results in over 5000 simple derivatives and compound words.
Example : "equal" becomes equaled, equaler, equaling, equally, equals, unequal, unequaled, unequally..
Common words making complex words are : -able, -full ; any-, out-. over-, short-, side-, some-, under-, up/upper- , work-.
Complex words have not been added for Frequent and VOA words
to this trial dictionary. Many affix derivatives have been added, more will follow.
Note : .aff, .dic, and .txt are simple text files that can be read or changed with any simple text editor.
Purpose:
Provide a spell checking filter for use in writing Simple English (example, Simple Wikipedia) by use with HunSpell(MySpell) software
that is most notably used by the free office suite, OpenOffice.org . The vocabulary of
Simple English is composed (recommended) as Basic English plus
the most Frequent words in English. Wiki-Simple English adds VOA Special English.
Basic 1500
Every learner of Basic English is expected to know the 850 words,
the international words, six affixes and complex words, plus
one area of General interest with 100 words, such as Science, Business,
or Verse; and one Specialty detail within that general topic with an additional 50 words
such as Biology, Economics, or Bible which are not included.
Basic English is a full language for general living
and work as an auxiliary international language. It is good English. The limited vocabulary allows
quick learning -- weeks, not years. Obviously it is an excellent first step in learning
full English because it allow almost immediate immersion into daily English-speaking life.
Note, Basic English is a subset of Standard English with simple rules of grammar -- there is
NO unlearning required to progress to full English. The originators of Basic also provide
a learning path beyond basic Basic of 150 Next Step words and 350 Subsequent words at
which point the learner should be able to continue at his own pace. Because Simple English comes after Basic English, the expanded, "next step" Basic is included.
For general Simple Wiki use we have included the general subject words but NOT
the Speciality lists of Basic. We included the Subsequent addenda 350 words for next step
from Basic towards full English. This combined list is sometimes referred to as the Basic 1500.
The First Supplement of 150 words for common
foods, plants, and animals has been lost. If found they will be added, else a good guess will be provided
sometime.
There is much overlap between the three sources. For example,
98 of most frequent 100 words are already in Basic ,
Half of VOA-SE words are also Basic words.
Note that the Simple English versions here have attempted to remove duplicates.
More than you may want to know.
See page Read Wiki More
It may have a path something like this :
C:\Program Files\OpenOffice.org 2.0\share\dict\ooo\dictionary.lst
Add or confirm this line :
English (Ghana) will now be recognized as a language with spell checking capabilities.
Configure the OOo text processor to recognize the language "English (Ghana)" as either "Default"
or "For the current document only."
Exit OOo QuickStart and re-start OOo.
OOo Tools.
Sorting is done by Edit | Select All. Then Tools | Sort.
Count is done in Tools | Line Count | check box in upper left.
Because there is one word per line, this is the word count, except,
subtract one for the word count number at the top. Go back into Tools |
Line Count | uncheck box in upper left.
Spell check is done in Tools | spell check. Your added
words will be underlined until your have added the new list and
re-opened OpenOffice.org Write.
Save As WIKI2.dic . Select Text Encode from the
type list. check LF, do not check CR. There may be a harmless
message about (errors possible?)
To UNINSTALL:
There are no registry entries. Simply delete or don't use any features
that are no longer wanted.
dictionary Details:
The number at the top is the word count. This saves the system
from having to do two passes thru the file. Therefore when you add
your name, town, etc. to the list, you will want to increase the word count.
Affix file.
Spell checking software often makes use of "affix" files and an
algorithm to add prefix and suffix forms to the root word. this is
for efficiency and you need not be concerned with them.
The OpenOffice.org affix file currently has 22 affixes defined.
Ogden's Basic English and VOA-SE makes use of only some of these.
Affix files have some
idiosyncrasies; for example, re- is one of seven prefix options used by MySpell
and is coded as option A. The word "read" therefore might be coded as ad/A .
This can get confusing. But you can add full and correct spelling to the files.
Just sort afterwards and change the line count at the top of the file. It has trouble with words ending with double letters, etc.
The affix file, en_US.aff should be in the program file for normal English documents.
And/Or the en_GB should also be there for Commonwealth documents.
The Simple English Wiki leadership team (sic) may
have recommendations in the future. Our opening suggestion is to include Basic
affixes and include those that OOo offers that are made from Basic
and VOA-SE words .
un- -able, -ed, -er, -ers, -est, -ful, -full, -ing, -ings,
-ly, -s, -'s, -th (-th is a Basic Science suffix.)
A few things are not recognized by HunSpell (MySpell) and must be handled manually.
words ending in certain double letters, and some exceptions to ending letter "y".
-ful, -full, -fully. -hold, -like, -most, -self, self-, -some, the prepositions,
and some other words.
For discussion : these are available in HunSpell( MySpell), OOo. but are
unknown to Basic.
con-, de-, dis-, e-, pro-, re-, -ive, -ion, ions, -ication (how?) -ment, -ness, -y.
(e- is a Basic Computer prefix)
Not in HuhSpell (MySpell) -- for discussion for manual recognition.
non-, pre- , -tion (non- is a Basic Science prefix.)
The name en_WIKI is pre-set in the dictionary.lst file as a country dialect of Ghana English.
If you create multiple spelling lists, then another less used dialect acknowledged by OOo include English(Trinidad) = en TT. Note English (Belize) is currently used for pure Basic. And English (Jamaica) for Next Step Basic (Basic 1500). Basic will be used by the most skilled Simple Wiki writers. ;^)
dictionary.lst file.
A file suitable for Basic English and Simple English usage is provided to replace the file that came with OOo. Wiki users will have to make one change in the dictionary.lst file, changing
DICT en GH SIMPLE to
DICT en GH WIKI
The additional features of Hyphenation and Thesaurus are pre-set
to use full English. Commonwealth users may prefer to change these from US to GB.
Note : Spelling in all files is first American with the most useful examples of Great
Britain included. Wikipedia writers can use either, but try to be consistent within
any one page.
Our Rules: VOA is silent on affixes, we have used the same rules that we use for Basic English.
- Where a listed words is a derivative of an existing Basic/Simple word, then the
root is used and all derivatives of that root is used. Example: "container" is
a derivative of "contain", therefore we allow "contained, containing, contains," etc.
- We make no distinction for parts of speech : noun and verb senses might be taken from the same word-spelling.
- There is no limitation to use of Basic derivatives (-ed,-ing, etc.) ; if a derivative word by the rules exists, then it is used.
- We use the AOL word list including their Science option words.
Notes about OpenOffice.org
Download OpenOffice.org,
its freeware and a large file (96MB), or order it as a CD from one of their
partners.
We paid $5.50 for a copy.
Somebody owns the word "OpenOffice" so the software must be called
OpenOffice.org. Wonder what he story there is?
OOo calls spell-checking word lists -- a dictionary.
They call a translation chart -- a thesaurus or synonym list.
An OOo dictionary, .dic, file is a simple text file saved as
with OpenOffice Writer as text, but with the name-end of .dic . A "techy type" might chose to save as "text encoded", with LF, without CR. (It saves one
carriage return per word, which is a lot.) Saving as a regular text
file will work fine and is easier to work with for additions and changes.
OpenOffice QuickStarter must be "off" only once, to recognize new dictionaries
or affix files. QuickStarter is no longer useful for OOo 2.4 and higher and may be permanently turned off.
About this Page: readwikimore.html -- discussion of writing aids, spell checking word list for Simple Wiki using HunSpell (MySpell) software, specifically for use with OpenOffice.org.
Last updated : May 21, 2008. Add Ghana dialect as WIKI. This page needs work !!!
May 8, 2008. Replace Basic 850 with Basic 1500
May 3, 2008. Simplify to one main word list, without Capitals.
July 26, 2007. Add two Wiki Option Lists
Created : January 14, 2005. Plan of aids for Simple English / Wikipedia
URL: http://www.basic-english.org/down/readwikimore.html
LINKS : Simple English Wiki