PROJECT 333 . Translation Software   -- TEAM WORKING FILES
        Provide translation from English to Basic suitable for large amounts of text.
    We have a word at a time translator in IDP Companion. We need something much faster and with editing or within an editor to allow the author to rearrange wording to make the Basic suggestions from the mechanical translator flow smoothly.
    INDEX

    MEMBERS
      Mike
      Toan
      Spencer
      Luis
      Jim
      Alan

    JANUARY 2009

    next msg -- date

    Jim -- Wed 21Jan
    	Just email to me anything you want to go to the group.  I will FTP it to
    the web page.  This is temporary until we can find a suitable shared system,
    but we have had forums vandalized on two different internet service providers
    that were unrecoverable and we need somebody semi-technical to configuare
    and run a system and I haven't found time to select and install a CMS.
    You can read the progress on www.basic-english.org/projects/333/333.html.
    

    Toan -- Wed, January 21, 2009 10:57 am
    I'm glad to work in the team.  However, I don't think I'm able to contribute much of
    my effort since this's a busy semester for me.  But I'll try to do as much as I can.
     
    Dr. Tang told me to upload eParaphraser's code to the link you provide, but I don't
    know how to do it.  
    
    Could you tell me how to do it? Or do you have a CVS server to upload the code?
    

    Luis -- Fri, January 9, 2009 12:52 pm
    First of all, Happy New Year!
    
    I'll be very busy this coming semester, so I may not be able to
    contribute much to our efforts. However, I'll share with you a few
    thoughts on implementing Basic.
    
    1. First of all, it is now clear to me that OpenOffice is perhaps the
    shortest path to provide services for BE users:
    
    a) Hunspell provides control over which words are valid;
    
    b) MyThes can do the sort of translation provided by the IDP right
    into the editor, so there is no need to cut and paste or to download
    additional programs;
    
    c) LanguageTool may provide the sort of grammatical analysis necessary
    to keep the English Basic (forbidding verbal forms, and so on).
    
    And last but not least, OpenOffice is already a free, libre Office
    Suite, so everything comes in bundled for developers and writers.
    
    So I'd rather suggest our other contributors (more technically
    oriented) to find support for developing tools integrated into
    OpenOffice.
    
    I may keep on tracking stuff in the incoming months, but might be
    slightly delayed. I'll be reserving some time Fridays afternoon to
    follow up things.
    
    Cheers,
    

    Spencer -- Thur January 8, 2009
    I've doubled the wordlist and fixed some major bugs. like "a /an",
    and punctuation problems. Still lots of work, but I've got a proud working
    demo now. You can click translated words to revert them.  On good days
    its simplifying 10% of words.
    

    DECEMBER 2008   Chronological sequence.


    Mike
     I like the idea of doing the whole thing in Open Office, although I'm not
     sure what that means.
     I see what you mean about the importance of the spell checker.  
     I'll talk to Toan to see if we can squeeze some time for that.
    

    Mike Tue, December 2, 2008
    		Hope to get back to work on the Basic
    paraphraser and  add an Open Office front end to it.  The concept is to
    write everything as close to Basic in an Open Office Window, and have open office
    spell check the document.  Push a button and the program starts translating the Open
    Office text into Basic. 
    

    Spencer -- Thu, December 4, 2008
         The Romanian file is so excellent. I would not have developed the
    Simple English Translation program without it. The list I'm working with 
    should be thought of as a subset - words that work within context. 
         The policy so far is if there
    is any ambiguity, or different uses (bow, etc.) it won't be included at all.   
    I have experimented with wordnet's word-sense tool, to maybe
    use later, but right now  as you know, there are so many bugs to
    resolve. 
         If the translation proves to be stable,  and the list becomes refined,
    and big enough to be relevant, I think too that we have something
    important. I have been tossing around ideas about a simplifying bot
    crawling through wikipedia, or even a browser extension that
    translates while you surf the web.
         Do you have more info on Colorado's page-at-time translator?
         http://simple.wikipedia.org/wiki/User:Spencerk/list_of_straight-up_substitutables
         http://simple.wikipedia.org/wiki/User:Spencerk/multiple_word_translations
    

    Mike -- Thu, December 18, 2008
    	Would your Wiki person working on his Simple
    English translator and/or your volunteer be willing to work with Toan on a common
    project?  In any event, I'll have Toan send you the source code to see where it
    leads.  Originally I had thought of placing the source code on a Wiki, which we
    still could do through you.  If you think it appropriate, could you send me the
    Simple English person's email?
    
    Meanwhile I've reread some of Alan's documentation on Entente and was again
    impressed with the genius---and I rarely use the word---behind it.  I'm also
    re-reading a computer science masters thesis by a Scott R. Hawkins, entitled,
    "Ogden's Basic English as a Lexical Database for Natural Language processing,"
    written in 1993.  I'll have it scanned and send you a digital copy for posting.  
    

    Luis -- Fri, Dec 19, 2008
    Translation in OOo -- like as another Extension built upon OOo. That's
    theoretically possible, though I'd never thought about something like that.
    
    Of course, I'd say it's possible in principle to port whatever tool
    you have to either Java or Python to plug it in as an extension for
    OOo; but I don't know if this is what you mean. For instance,
    LanguageTool, the current grammar checker for OOo, is written as an
    extension (in Java, IIRC).
    

    Spencer --- Sat, December 20, 2008
    Great ideas.
    I'd love to see the data I'm making used as an OOo translator.
    My code is in PHP, and the bulk deals with wikipedia script, so may
    not be useful. I've posted it here --
            http://simple.wikipedia.org/wiki/User:Spencerk/source
    This application translates regular English into Simple English, it is free and open source. 
    It was made by Spencer Kelly in 2008, using data from OpenOffice and Charles Kay Ogden.
    It was made to be friendly with wiki-script, for use with the Simple English Wikipedia. 
    There's no post-processing.
    What's Entente?  
    
      [Entente is a traveler's translator written by Alan Mole using a PC with both parties present in conversation between English and 14 languages.]
    Tell the guys I'd love to work with them. I'm going on vacation over Christmas.

    Spencer ---
    I know about the failure of machine translation and the challenges of natural language processing, 
    but this project is modest enough that it can succeed. Substituting the word 'change' for the word 
    'alteration' is fair for every usage of alteration I can think of. The inevitable change in meaning is
    insubstantial, only of nuance really, especially in the context of ESL learning or aphasia. 
    While I'm prepared to be embarrassed to learn otherwise, no project of this kind has been taken on 
    by anywhere -- at least anything available freely on the Internet -- and that's ridiculous because 
    it was easy to make.
    -Details-   
    It was written in php, and references three xml files:
        * basic.xml - Ogden's basic 850 words,
             +international English words (nuclear, macaroni)
             +simple compound words (bedroom, eyeball)
             +pronouns (DiMaggio, Rudolf, PMS)
             this list (17,900 items) are words left untranslated.
        * translate.xml - single word substitution.
            very. context. sensitive.
            I have used a lot of translations of the Internet Dictionary Project's list of 1600.
            My list is around 2,000 items. You can edit this list as a wiki here.
        * idiom.xml - multiple-word translation. ( 'back and forth' -> 'backwards and forwards', 
             'a cash cow' -> 'an easy way to make money')
          As far as I know this is the only developer-friendly idiom substitution list. 
    I'm making it by hand, . . . usually on the bus; its around 300 items. 
    The remaining untranslatable words are turned red, and/or given ((wiki:|)) tags, according to the version. 
    Yup, it's pretty brute force. The only heuristics I've used is on plurals.
    As I am not a strong programmer (or /SQL'er omg), this project is, if anything, a function of the collaborative power 
    of the internet.
    
    # Manual undo -- Click the translated term and it reverts back to its previous form. Could collect this data to improve problem translations.
    There is a lot of work left to do, on both the data and code.
    
    Some formatting is lost. I have been pulling my hair out with this bug.
    It has no sentence-level analysis. I have a large list of words that would be translated if they weren't ambiguous, 
    like 'bass', which could be translated properly if the rest of the sentence is searched for musical or fish words. 
    

    Luis -- Fri, December 19, 2008 11:42 pm
    I'm sending to you what I deem as the final betas for the assigned
    "dialects" of Basic, as follows
    
    Bible: Trinity-TT
    Business: Belize-BZ
    Engineering: Philippines-PH
    Media: Jamaica-JM
    Simple: Zimbabwe-ZW
    
    All codes are valid on WinXP, and have been validated and tested.
    
    Media contains science, verse, and business, not socialsci, not special work lists 
    Business contains business and socialsci. 
    Engineering contains  science, math/mech, physics/chemistry, 
    I switched BZ to make Biznez fit there, so that
    Media would fit in JM (Just Media).
    
    Save for final changes in these assignments, I think these are ready for distribution
    
    Attachments:
    basic-english-bible-0.1.0.oxt 		692 k  	[ application/vnd.openofficeorg.extension ] 	 Download 
    basic-english-business-0.1.0.oxt 	692 k  	[ application/vnd.openofficeorg.extension ]  Download
    basic-english-engine-0.1.0.oxt 		693 k  	[ application/vnd.openofficeorg.extension ] 	 Download
    basic-english-media-0.1.0.oxt 		690 k  	[ application/vnd.openofficeorg.extension ] 	 Download
    simple-english-tiny-0.1.0.oxt 		100 k  	[ application/vnd.openofficeorg.extension ] 	 Download
    
    I've uploaded the Mozilla addons. They require no localization, so they were set off free without much ado. At the moment they are in the experimental section of the site -- addons.mozilla.org Please update the links from your website: the internal identifiers have changed to fit the IETF BCP 47 standard, and the addons have been assigned new codes. Cheers, Luis

    Jim  -- 22 Dec2008 07:20
      Luis has built installers for Basic English on OpenOffice.org version 3.0. 
      Recall that OOo version 3.0 changes file structures and our previous approach no longer works.
    (Still good on version 2.x)
      The installer puts these five dialects of Basic and Simple into reserved names within OOo
    for existing regional dialect names (all are empty by default).  Thus a writer of religion would set 
    his OOo text processor as if for  Trinidad (mnemonic: Trinity) and it will spell check for Basic English 
    with Verse and Bible addons.
      Each or all installers may be downloaded from BEI (soon) or OOo (check how?).  Execution
    is double click on the installer (the Betas just arrived, not tested yet {the Alphas worked} , 
    so feel free to wait a day.)   http://www.basic-english.org/down/download.html
      Follow the normal OOo language preferences (todo : write detailed instructions, but almost obvious
    by following the options menu).
      Media Basic is the most likely to receive wide usage as it is suitable for learners, teacher,
    and the public.  The others are for special interests -- of religious writes (a large following), engineers,
    commerce, and simple (beyond Basic).  There are others, geology, biology, dialects within dialects (Brit vs. US),
    and features (thesaurus, translation), all to come.
    

    Luis --Mon, December 22, 2008
    What I do with some cooperative projects at school is to setup a
    shared document on GoogleDocs. It only requires all members have google
    accounts. You may distinguish different contributors by allowing them
    to pick a color for the text of their contributions. It's not harder
    than a Wiki, and access is strictly private. I'm forwarding to you a
    shared document; if you don't have a google account you may setup a
    googledocs account without much hassle.
    [Thanks for the suggestion.  Will check it out.  I pick forest green. -- Jim]
    



    References.
    • OpenOffice.org
    • IDP Companion
      English to Basic
    • Entente

    Back to Project Catalog   or to   Basic English Institute home page.        
    About this Page: 333.html - Project 333 .
    Last updated December 17, 2008.
    Contact us
    URL: http://www.basic-english.org/projects/???/333work.html