Pages in topic:   < [1 2]
Machine Translation Postediting - Translator's views needed
Thread poster: hzhang
chica nueva
chica nueva
Local time: 23:04
Chinese to English
Unpromising languages May 18, 2009

lai an wrote:

Rod Walters wrote:
... , and I'd be surprised if the same weren't true with Chinese too.


Oh? why do you say that...

I have no knowledge of this area, but as far as I know Chinese and English are structurally similar. so ... perhaps Chinese would be more suited to MT from English than Japanese is ... I have no idea really. What did you have in mind, Rod?


Hello again Rod

I am still quite interested in this. Of course Chinese is dissimilar to English in some respects. It shares 'topic prominence' with Japanese, and tends to have long sentences and paragraphs.

What particular features or aspects did you have in mind, as causing difficulty? http://en.wikipedia.org/wiki/Chinese_grammar

As for word order, English and Chinese are both SVO, whereas Japanese is SOV:
http://en.wikipedia.org/wiki/Word_order#Sentence_word_orders This is possibly what led me to conclude that the languages are structurally similar. I rarely have to change word-order very much when I translate Ch->E.

Morphology: Could the fact that Chinese verbs do not conjugate in the way that some of the European languages do cause a problem? what about 'cases' and 'voices', are they significant? I believe that Chinese is considered to be relatively 'uninflected'.

Lesley

@ Jeff
I'm not sure how much you can generalise about Asian-language suitability, can you? Which languages were you thinking of in particular?

[Edited at 2009-05-18 05:20 GMT]


 
Jarosław Zawadzki
Jarosław Zawadzki  Identity Verified
Poland
Local time: 13:04
Chinese to Polish
+ ...
not so similar Jun 2, 2009

lai an wrote:


Oh? why do you say that...

I have no knowledge of this area, but as far as I know Chinese and English are structurally similar. so ... perhaps Chinese would be more suited to MT from English than Japanese is ... I have no idea really. What did you have in mind, Rod?


Chinese and English are not so similar in structure as it might seem:

Consider this:

Proper wording: The man I saw = 我见过的那个人。
Word by word: 那个人我见过 (by) me seen the man

Is it similar in structure?


 
Rod Walters
Rod Walters  Identity Verified
Japan
Local time: 20:04
Japanese to English
I'm not speaking from a great fund of knowledge here Jun 2, 2009

[quote]lai an wrote:

lai an wrote:

Rod Walters wrote:
... , and I'd be surprised if the same weren't true with Chinese too.


Oh? why do you say that...

I have no knowledge of this area, but as far as I know Chinese and English are structurally similar. so ... perhaps Chinese would be more suited to MT from English than Japanese is ... I have no idea really. What did you have in mind, Rod?


Hi lai an

Sorry, I lost track of this topic...

All I know about the structure of Chinese was gleaned from a short course in linguistics at university. We had a Chinese colleague, and the tutor often asked him for examples of Chinese sentence structure. It was not what I would consider as familiar, say, as French or German.

Certainly the SOV structure of Japanese wreaks havoc with MT, and if Chinese is indeed SVO, that might help. But when I mentioned Chinese as possibly being similar to Japanese in being unsuitable for MT into European languages, I was thinking that MT is more likely to be useful within regional language groupings. So if Chinese is actually part of a regional language grouping (and I have no idea if it is), I would expect MT to be more practicable within that group.

My main and rather general point is that MT doesn't tend to work very well for languages half a world away.

I've tried using MT for J>E patents. The structured nature of patents gives MT a slight advantage over other kinds of J>E texts, but translation memory still offers far more useful results. (However, I'm not an expert translator of patents either - experts may have a different opinion.)


 
chica nueva
chica nueva
Local time: 23:04
Chinese to English
Chinese language structure and machine translation; Asian languages and MT Jun 2, 2009

Jarosław Zawadzki wrote:

Chinese and English are not so similar in structure as it might seem:

Consider this:

Proper wording: The man I saw = 我见过的那个人。
Word by word: 那个人我见过 (by) me seen the man

Is it similar in structure?


Hello Jaroslaw

Excellent to see you again. Your Chinese is streets ahead of mine, I know. I was rather hoping that someone more expert in the language might pick this up. Plus I have no experience whatsoever of computerised translation.

As far as I can see, the examples you give both have examples of 'topics' (or 'head-words' or 'noun-phrases' or whatever you call it) in Chinese. In the first, the subject 主语 or 'head-word' is the whole noun-phrase 我见过的那个人, whereas, in the second it seems that you have a head word '那个人 that person', followed by the subject 'I 我'. Is that it? ('过' denotes the 'perfect aspect', I think. But I am quite hazy on this.)

1 我见过的那个人。is not a sentence (is it?). The person I met ...
2 那个人我见过 is a sentence (isn't it?) I've met that person before.
(Word for word it is: [As for] that person [,] I've met [him (her)] before.)
(by me seen the man?? - sorry, pardon me, I'm not sure about this? => 那个人被我看见 or something like that? )

To answer the question, is it similar in structure, I think it perhaps seemed to be, to me, in the context of a whole sentence.

Clearly it turns out not to be similar at the microlevel, here:
我见过的那个人 The person I (once) met or The person I met (before)
But here it is similar, sort of:
那个人我见过 (That person) I've met [him (her)] before.
(This is not a very precise or 'proper' way of representing it, probably, I have to say. ... )

Lesley

[Incidentally, I find it interesting how '那个 that one' often translates to the definite article 'the' in English. Plus, I am interested that you translate 那个人 as ' that man' and not 'that person'. Perhaps you are right (?) ‘那个人’跟‘那个女的’比, 'that man' as compared with 'that woman' but I am not so sure ... ]

What do the peers you think of this. Is it machine translation? What kind of post editing is possible/desirable? : http://www.proz.com/forum/chinese/135704-最新英语“动词短语”:liusuo_across.html#1129961

@ Rod. Thank you for your reply ... I am no expert in 'language families', but here are some links which may be of interest to some peers working in 'East Asian' languages (roughly speaking):
http://en.wikipedia.org/wiki/Sino-Tibetan_languages
http://en.wikipedia.org/wiki/Sinitic_languages
http://en.wikipedia.org/wiki/Altaic_languages
[ http://en.wikipedia.org/wiki/Tibeto-Burman_languages http://en.wikipedia.org/wiki/Mandarin_dialects
http://en.wikipedia.org/wiki/Austronesian ]

[ Chinese is a Sino-Tibetan language (though there is apparently some debate amongst linguists about this - see Tibeto-Burman). The Chinese language family (ie the various Chinese dialects) is known by some as the Sinitic languages. (There are finer distinctions still eg Mandarin also has dialects.) The other big language family in the North-East Asia region seems to be Altaic, though there is some controversy about this as a grouping. (And then, in SE Asia, Indonesian and Malay are classed as Austronesian languages. In South Asia eg e Indian Subcontinent, there are the Indian languages etc. And to the west, Iran is sometimes considered to be the western-most edge of Asia ... ) ]

@ Jeff I am still quite interested in hearing which Asian languages in particular you had had experience of (with MT) ...

[Edited at 2009-06-03 03:58 GMT]


 
chica nueva
chica nueva
Local time: 23:04
Chinese to English
South-East Asian languages, cont. Jul 30, 2009

Austro-Asiatic is a language family that I neglected to mention (above). I also have an interest in Hmong-Mien/Miao-Yao (classified as Sino-Tibetan) because of the Miao and Yao minorities in China. I wonder, is there MT being done in any of these languages?

[ http://en.wikipedia.org/wiki/Austro-Asiatic includes Vietnamese and Khme
... See more
Austro-Asiatic is a language family that I neglected to mention (above). I also have an interest in Hmong-Mien/Miao-Yao (classified as Sino-Tibetan) because of the Miao and Yao minorities in China. I wonder, is there MT being done in any of these languages?

[ http://en.wikipedia.org/wiki/Austro-Asiatic includes Vietnamese and Khmer.
http://en.wikipedia.org/wiki/Hmong-Mien_languages Hmong-Mien or Miao-Yao
[The Miao are an ancient people who feature in the founding myths of the Han people (rather like the Greeks and the Trojans, perhaps) ]

[Edited at 2009-07-30 06:38 GMT]
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 13:04
Multiplelanguages
+ ...
reply about MT for Asian languages Aug 29, 2009

lai an wrote:
@ Jeff
I'm not sure how much you can generalise about Asian-language suitability, can you? Which languages were you thinking of in particular?

[Edited at 2009-05-18 05:20 GMT]


lai an wrote:
@ Jeff I am still quite interested in hearing which Asian languages in particular you had had experience of (with MT) ...

[Edited at 2009-06-03 03:58 GMT]


Lesley,
My apologies for not answering your questions above. I was quite busy during that period of time and didn't remember to come back later to answer.

I was the technical account manager for corporate accounts in EMEA at Systran, which included handling, qualifying and investigating all linguistic fix requests with development and language experts for all language pairs used by the corporate customers. So, I've been through all kinds of discussions with Japanese and Chinese language experts who have done the dictionary and grammar work on the Systran system.
I was making a general statement only because MT software is usually sold in regional language packs (such as European languages, Asian languages) for the mass market.
Each language indeed has its own linguistic-specific issues.
Yet, in general, it seemed easier for the European language linguists to implement lexical-related rules than for those who worked on these two Asian languages, which did present challenges from time to time on specific linguistic constructions.
I think I may have also evaluated the feasability of MT project proposals and handled MT-related projects for Asian languages when I was technical director at Elda (European Language Resources Distribution Agency).

As for using MT systems in production based contexts where I have done it myself or worked in combination with colleagues, it is more on European based language pairs. My Proz profile indicates all of that.

Jeff


[Edited at 2009-08-29 22:25 GMT]


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 13:04
Multiplelanguages
+ ...
MT for Hmong-Mien/Miao-Yao Aug 29, 2009

lai an wrote:

Austro-Asiatic is a language family that I neglected to mention (above). I also have an interest in Hmong-Mien/Miao-Yao (classified as Sino-Tibetan) because of the Miao and Yao minorities in China. I wonder, is there MT being done in any of these languages?


Lesley,
The answer is quite simple. If there is not a financial business justification to invest several person years minimally, with a definite promise for return on investment, then MT vendors are not doing it.

However, there might be academic university projects that have investigated these languages, but I don't usually try to keep track of such systems.

You might want to contact AsiaOnline, a statistical MT system vendor, and see what Asian language pairs they are covering with their MT solutions. Yet, I doubt they would be working on minority groups for now, since there is already plenty of work to keep them busy with other Asian languages having larger populations, and for which there is a commercial request for language translation.

Jeff


 
chica nueva
chica nueva
Local time: 23:04
Chinese to English
A theoretical question. Sep 4, 2009

Thank you Jeff. I hope my comments are not too impractical.

I am curious. Could/Should one MT handle a language which has two scripts, such as Mongolian or Serbian/Croatian or Mandarin (Complex and Simplified), do you think? Or would it be better to have two separate MTs.

[ Thread relating to Google Translator which may
... See more
Thank you Jeff. I hope my comments are not too impractical.

I am curious. Could/Should one MT handle a language which has two scripts, such as Mongolian or Serbian/Croatian or Mandarin (Complex and Simplified), do you think? Or would it be better to have two separate MTs.

[ Thread relating to Google Translator which may be of interest. http://www.proz.com/forum/chinese/140242-using_google_translator_from_simplified_to_traditional_chinese_and_viceversa.html ]
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 13:04
Multiplelanguages
+ ...
MT for languages with multiple scripts Sep 5, 2009

lai an wrote:

Thank you Jeff. I hope my comments are not too impractical.

I am curious. Could/Should one MT handle a language which has two scripts, such as Mongolian or Serbian/Croatian or Mandarin (Complex and Simplified), do you think? Or would it be better to have two separate MTs.

[ Thread relating to Google Translator which may be of interest. http://www.proz.com/forum/chinese/140242-using_google_translator_from_simplified_to_traditional_chinese_and_viceversa.html ]


Lesley,
You comments are definitely not impractical. There is usually a knowledge gap between the professional translator community and the MT development community on many points related to how MT does things.

What I did is restate your question in a slightly different way to address the MT development community on different ways of how they implement this, on the MT list that is dedicated to MT related topics.
See below:
http://www.mail-archive.com/[email protected]/
thread topic: MT for languages with multiple scripts

This way, it is allows them to respond on their list with their regular dialog (without having to come here, create an account, etc which most will not do based on my experience), and yet readers here can access it (read only) with any special authorization request.
There are already two replies.

Hope that helps.

Jeff


 
Luca Tutino
Luca Tutino  Identity Verified
Italy
Member (2002)
English to Italian
+ ...
Plain EN>IT experience: I am giving up MT postediting (for a good while) Sep 6, 2009

This year I decided I would try and accept a MT postediting proposal. The size of the client, the type of original text and the fact that this projct seems to have been running for years made me think that I was in contact with a fairly advanced MT application. I accepted to work on a large catalogue update, coming as .itd files for SDLX, and accepted a reduced rate (about 60% of the regular translation rate).

I am a technical translator and a technically oriented person, and I use
... See more
This year I decided I would try and accept a MT postediting proposal. The size of the client, the type of original text and the fact that this projct seems to have been running for years made me think that I was in contact with a fairly advanced MT application. I accepted to work on a large catalogue update, coming as .itd files for SDLX, and accepted a reduced rate (about 60% of the regular translation rate).

I am a technical translator and a technically oriented person, and I use the full potentiality of my PC for the sake of the best quality and productivity. I thought that this should put me in a good position to adjust to MT postediting workflow.

More than six months later, today I have delivered my last batch and told the client that from now on, for postediting , I will charge my regular hourly fee or a word fee higher than my base translation fee.

My honest impression is that postediting a heavily technical and segmented source takes more than twice the hours needed for a regular translation. If you are also forced to use an inefficient interface (eg SDLX, often pausing for several dozen of seconds to perform the TU concordance with a large TM), than it can take more than 3 times more hours than necessary for a regular CAT translation.

I also believe that this experiment has caused a significant dent on my income for this year.

And finally, judging from the untouchable segments (i.e. those already 100% approved and published in earlier versions I guess), the final quality of the output translation is still questionable even after postediting.
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 13:04
Multiplelanguages
+ ...
don't give up so soon Sep 6, 2009

Luca Tutino wrote:

This year I decided I would try and accept a MT postediting proposal. ..

I am a technical translator and a technically oriented person, and I use the full potentiality of my PC for the sake of the best quality and productivity. I thought that this should put me in a good position to adjust to MT postediting workflow.

More than six months later, today I have delivered my last batch and told the client that from now on, for postediting , I will charge my regular hourly fee or a word fee higher than my base translation fee.


Luca,

It saddens me to hear that you spent 6 months on an MT project that was obviously not set up in an optimal way for you to be successful.

Being technically oriented and being a technical translator, and having good computer use principles are not the core essentials of training on MT postediting.

MT postediting, and I should say the pre-processing and parallel processing skills, require transfer of know-how of someone who has done it before. The basics can be learned in about 3-4 hours, and advanced techniques take some more time.
It doesn't sound like you received any of that. And that was clearly the fault of the person who set up this MT project.

See the following post for links to project reports I have published that give details on productivity statistics for all tasks involved.

http://www.proz.com/forum/proofreading_editing_reviewing/133180-machine_translation_postediting_translators_views_needed.html#1123567

And see the MA thesis of someone I trained who quickly learned the skills and applied them to a different language pair, and who achieved productivity that were also quite good.

GUERRA, Lorena. 2003. Human Translation versus Machine Translation and Full Post-Editing of Raw Machine Translation Output. Master's Thesis. Dublin City University.
http://www.geocities.com/mtpostediting/lorena-guerra-masters.pdf

But nothing like 2-3 times longer.


I have however seen a few projects where I was invited to discuss with the participants long after it had started, and it was not possible to change anything. And in several cases, I have indeed seen that it took twice as much time or more. But in all of those cases, I did not see any evidence of any type of MT usage training that was worth referencing.

Jeff

[Edited at 2009-09-06 18:54 GMT]


 
Luca Tutino
Luca Tutino  Identity Verified
Italy
Member (2002)
English to Italian
+ ...
MT post-editing today: MT translation failure paid for by translators May 29, 2011

Dear Jeff,

after your comments I have tried again a couple of times (actually 3 times) to accept "MT post-editing" projects again, from reputable agencies, which apparently prepared the projects with extreme care, and on subjects apparently ideally suited for MT processing according to literature. Each one of these projects offered rates about 50-60% lower than my usual translation rates and required delivery within a time 50 to 70% shorter than my usual lead-time. Once again it was
... See more
Dear Jeff,

after your comments I have tried again a couple of times (actually 3 times) to accept "MT post-editing" projects again, from reputable agencies, which apparently prepared the projects with extreme care, and on subjects apparently ideally suited for MT processing according to literature. Each one of these projects offered rates about 50-60% lower than my usual translation rates and required delivery within a time 50 to 70% shorter than my usual lead-time. Once again it was a total failure: every single project resulted in a consistent reduction of my usual income + tension with clients due to late delivery and/or poor results.

A wide range of MT translation problems contributed to this, including sentence structure, grammar, incorrect translation, omissions, random capitalization and terminology issues. It should be considered that with pre-translated material, one needs to read, understand and consider both source and target sentences, and this takes twice as much as for a normal translation. (Actually more than twice, as one should also take some extra time to evaluate if the proposed translation could be made acceptable, even if very different from what one would usually write, with some small touch here and there). To make up for the time lost in reading it would be necessary to have high percentage of perfect TUs that can be left untouched, and a very high percentage of the remaining TUs requiring just small touches. On the contrary it is mostly faster and safer to delete the pre-translation and rewrite it. In my experience, the best MTs manage to guess the shortest and simplest sentences all right. But these make up for a very small amount of words, and are offset by the additional reading and checking time.

An advantage found in good MT is that it can suggest alternative terms or translations that for one reason or another which I might occasionally have needed to look up in a dictionary. In these cases I noticed some speeding up. This rarely happens in post-editing, but more often in a totally different situation: I sometimes use Google translation TU by TU (with keyboard macros at first, now with GT4T), to ease the strain on my hands, when I am tired. In the best conditions (which is usually some sorts of legal texts, where even longer sentences are available in parallel translation on internet), I manage to keep my normal speed while reducing the frequency of hits on the keys. This leads me to believe that a more systematic use of MT translation could provide me some advantage in very specific situations, if I could own and manage my own MT system. The problems are that the cost of MT systems seems to high for the very limited occasions when it could be sensibly used, and that the short delivery time requested for a medium size translator project (under 60k words) does not generally leave the time for a serious consideration and preparation needed for a systematic use of MT.

On the contrary the MT "post-editing" jobs try to pass any supposed advantage of MT to the client or the agency. If, as it usually happens, there was no actual advantage, than the promised discount is expected to come anyway from the translator's free extra work.

Luca

[Edited at 2011-05-29 22:13 GMT]
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 13:04
Multiplelanguages
+ ...
thanks for your feedback Luca May 29, 2011

Luca Tutino wrote:

Dear Jeff,

after your comments I have tried again a couple of times (actually 3 times) to accept "MT post-editing" projects again, from reputable agencies, which apparently prepared the projects with extreme care, and on subjects apparently ideally suited for MT processing according to literature.


Thanks Luca for your very valuable feedback.

unfortunately, the "literature" that is publicly available is very poor and limited in depth. All info that I see on this topic in published reports and forums are very vague (similar to what was stated back in 1985) and don't provide much more detail.
However, company-specific procedures are much more detailed. I have been involved in a number of them since 1995.
So, it seems to me that these providers are creating their own ad-hoc proprietary interpretation of general statements and info that are available.

This shows the need for guidance in this area, and yet people do not come asking for help on it. As many wannabes continue to create poorly prepared projects and give them to freelancers in non-optimized ways, this will remain to create wasted work.

Jeff


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Machine Translation Postediting - Translator's views needed







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »