========================================================

Project 333 . Translation Software   -- WORKING FILES
    Provide translation from English to Basic suitable for mass of large amounts of text.
We have a word at a time translator in IDP Companion. We need something much faster and with editing or within an editor to allow the author to rearrange wording to make the Basic suggestions from the mechanical translator flow smoothly.INDEX

MEMBERS
Mike
Toan
Spencer
Luis
Jim
Alan
DECEMBER 2008
Mike I like the idea of doing the whole thing in Open Office, although I'm not
 sure what that means.
 I see what you mean about the importance of the spell checker. 
 I'll talk to Toan to see if we can squeeze some time for that.

Mike Tue, December 2, 2008    Hope to get back to work on the Basic
paraphraser and add an Open Office front end to it. The concept is to
write everything as close to Basic in an Open Office Window, and have open office
spell check the document. Push a button and the program starts translating the Open
Office text into Basic. 
Spencer -- Thu, December 4, 2008 The Romanian file is so excellent. I would not have developed the
Simple English Translation program without it. The list I'm working with 
should be thought of as a subset - words that work within context. 
 The policy so far is if there
is any ambiguity, or different uses (bow, etc.) it won't be included at all. 
I have experimented with wordnet's word-sense tool, to maybe
use later, but right now as you know, there are so many bugs to
resolve. 
 If the translation proves to be stable, and the list becomes refined,
and big enough to be relevant, I think too that we have something
important. I have been tossing around ideas about a simplifying bot
crawling through wikipedia, or even a browser extension that
translates while you surf the web.
 Do you have more info on Colorado's page-at-time translator?
 http://simple.wikipedia.org/wiki/User:Spencerk/list_of_straight-up_substitutables
 http://simple.wikipedia.org/wiki/User:Spencerk/multiple_word_translations

Mike -- Thu, December 18, 2008    Would your Wiki person working on his Simple
English translator and/or your volunteer be willing to work with Toan on a common
project? In any event, I'll have Toan send you the source code to see where it
leads. Originally I had thought of placing the source code on a Wiki, which we
still could do through you. If you think it appropriate, could you send me the
Simple English person's email?

Meanwhile I've reread some of Alan's documentation on Entente and was again
impressed with the genius---and I rarely use the word---behind it. I'm also
re-reading a computer science masters thesis by a Scott R. Hawkins, entitled,
"Ogden's Basic English as a Lexical Database for Natural Language processing,"
written in 1993. I'll have it scanned and send you a digital copy for posting. 
Luis -- Fri, Dec 19, 2008Translation in OOo -- like as another Extension built upon OOo. That's
theoretically possible, though I'd never thought about something like that.

Of course, I'd say it's possible in principle to port whatever tool
you have to either Java or Python to plug it in as an extension for
OOo; but I don't know if this is what you mean. For instance,
LanguageTool, the current grammar checker for OOo, is written as an
extension (in Java, IIRC).

Spencer --- Sat, December 20, 2008Great ideas.
I'd love to see the data I'm making used as an OOo translator.
My code is in PHP, and the bulk deals with wikipedia script, so may
not be useful. I've posted it here --
 http://simple.wikipedia.org/wiki/User:Spencerk/source
This application translates regular English into Simple English, it is free and open source. 
It was made by Spencer Kelly in 2008, using data from OpenOffice and Charles Kay Ogden.
It was made to be friendly with wiki-script, for use with the Simple English Wikipedia. 
There's no post-processing.
What's Entente? 
    Entente is a traveler's translator written by Alan Mole using a PC with
both parties present in conversation between English and 14 languages.]
Tell the guys I'd love to work with them.
I'm going on vacation over Christmas.
Spencer ---I know about the failure of machine translation and the challenges of natural language processing, 
but this project is modest enough that it can succeed. Substituting the word 'change' for the word 
'alteration' is fair for every usage of alteration I can think of. The inevitable change in meaning is
insubstantial, only of nuance really, especially in the context of ESL learning or aphasia. 
While I'm prepared to be embarrassed to learn otherwise, no project of this kind has been taken on 
by anywhere -- at least anything available freely on the Internet -- and that's ridiculous because 
it was easy to make.
-Details- 
It was written in php, and references three xml files:
 * basic.xml - Ogden's basic 850 words,
 +international English words (nuclear, macaroni)
 +simple compound words (bedroom, eyeball)
 +pronouns (DiMaggio, Rudolf, PMS)
 this list (17,900 items) are words left untranslated.
 * translate.xml - single word substitution.
 very. context. sensitive.
 I have used a lot of translations of the Internet Dictionary Project's list of 1600.
 My list is around 2,000 items. You can edit this list as a wiki here.
 * idiom.xml - multiple-word translation. ( 'back and forth' -> 'backwards and forwards', 
 'a cash cow' -> 'an easy way to make money')
 As far as I know this is the only developer-friendly idiom substitution list. 
I'm making it by hand, . . . usually on the bus; its around 300 items. 
The remaining untranslatable words are turned red, and/or given ((wiki:|)) tags, according to the version. 
Yup, it's pretty brute force. The only heuristics I've used is on plurals.
As I am not a strong programmer (or /SQL'er omg), this project is, if anything, a function of the collaborative power 
of the internet.

# Manual undo -- Click the translated term and it reverts back to its previous form. Could collect this data to improve problem translations.
There is a lot of work left to do, on both the data and code.

Some formatting is lost. I have been pulling my hair out with this bug.
It has no sentence-level analysis. I have a large list of words that would be translated if they weren't ambiguous, 
like 'bass', which could be translated properly if the rest of the sentence is searched for musical or fish words. 
Luis -- Fri, December 19, 2008 11:42 pmI'm sending to you what I deem as the final betas for the assigned
"dialects" of Basic, as follows

Bible: Trinity-TT
Business: Belize-BZ
Engineering: Philippines-PH
Media: Jamaica-JM
Simple: Zimbabwe-ZW

All codes are valid on WinXP, and have been validated and tested.

Media contains science, verse, and business, not socialsci, not special work lists 
Business contains business and socialsci. 
Engineering contains science, math/mech, physics/chemistry, 
I switched BZ to make Biznez fit there, so that
Media would fit in JM (Just Media).

Save for final changes in these assignments, I think these are ready for distribution

Attachments:
basic-english-bible-0.1.0.oxt 692 k [ application/vnd.openofficeorg.extension ] Download 
basic-english-business-0.1.0.oxt 692 k [ application/vnd.openofficeorg.extension ] Download
basic-english-engine-0.1.0.oxt 693 k [ application/vnd.openofficeorg.extension ] Download
basic-english-media-0.1.0.oxt 690 k [ application/vnd.openofficeorg.extension ] Download
simple-english-tiny-0.1.0.oxt 100 k [ application/vnd.openofficeorg.extension ] Download

I've uploaded the Mozilla addons. They require no localization, so
they were set off free without much ado. At the moment they are in the
experimental section of the site -- addons.mozilla.org

Please update the links from your website: the internal identifiers
have changed to fit the IETF BCP 47 standard, and the addons have been
assigned new codes.
Cheers, 
Luis

Jim -- 22 Dec2008 07:20
 Luis has built installers for Basic English on OpenOffice.org version 3.0. 
 Recall that OOo version 3.0 changes file structures and our previous approach no longer works.
(Still good on version 2.x)
 The installer puts these five dialects of Basic and Simple into reserved names within OOo
for existing regional dialect names (all are empty by default). Thus a writer of religion would set 
his OOo text processor as if for Trinidad (mnemonic: Trinity) and it will spell check for Basic English 
with Verse and Bible addons.
 Each or all installers may be downloaded from BEI (soon) or OOo (check how?). Execution
is double click on the installer (the Betas just arrived, not tested yet {the Alphas worked} , 
so feel free to wait a day.) http://www.basic-english.org/down/download.html
 Follow the normal OOo language preferences (todo : write detailed instructions, but almost obvious
by following the options menu).
 Media Basic is the most likely to receive wide usage as it is suitable for learners, teacher,
and the public. The others are for special interests -- of religious writes (a large following), engineers,
commerce, and simple (beyond Basic). There are others, geology, biology, dialects within dialects (Brit vs. US),
and features (thesaurus, translation), all to come.
 -----------------------------
References.
OpenOffice.org
LibraOffice.org
IDP Companion
English to Basic
Entente
Back to Project Catalog   or to   Basic English Institute home page.        
About this Page: 333.html - Project 333 . Last updated December 17, 2008.
Contact us URL: http://www.basic-english.org/projects/3333work.html