Using regex/grep to "pre-edit" Transit files
Thread poster: Dan Lucas
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:48
Member (2014)
Japanese to English
Oct 10, 2016

I get regular Transit projects from a certain client. I unpack them into a folder. They reliably contain certain characters that need to be replaced, such as double-byte digits in Japanese, and I am wondering if I can edit the files directly.

Looking at the current project folder I see a number of .MTX, .BAS files as well as .JPN and .ENG files. The .JPN files are of two types: _AEXTR_1.JPN and myfilename01.JPN. Both are XML files. I am guessing that the former .JPN files contain so
... See more
I get regular Transit projects from a certain client. I unpack them into a folder. They reliably contain certain characters that need to be replaced, such as double-byte digits in Japanese, and I am wondering if I can edit the files directly.

Looking at the current project folder I see a number of .MTX, .BAS files as well as .JPN and .ENG files. The .JPN files are of two types: _AEXTR_1.JPN and myfilename01.JPN. Both are XML files. I am guessing that the former .JPN files contain some kind of structural information and the latter contain content.

Some questions for anybody kind enough to help me.

1) I am wondering if I can simply use a grep utility on the .JPN files to make a bunch of character substitutions. Given that the files seem to be plain text it could be done, but will it break the project in some way, perhaps by invalidating some kind of checksum?

2) If the answer to the above is "yes, it would break the project", how about opening the project, opening all translation pairs, copying all source segments to target, saving the project, then grepping the .ENG files that (presumably) result?

3) What flavour of regex does Transit NXT use? I could not find this information in the docs. Is it C# .NET like Studio?

EDIT: just to be clear, of course I can do search and replace (with regexes) within Transit NXT but that requires in some cases dozens of actions.

Thanks
Dan


[Edited at 2016-10-10 08:44 GMT]
Collapse


 
AlSqur (X)
AlSqur (X)
myfilename01.JPN Oct 10, 2016

first of all, try using the internal search and replace. The problem is, you need to search for Unicode characters. I don't know, which type of regex it uses.

If you can't replace the characters within the application, use search and replace applications.

about files:

.MTX, .BAS
these files can be ignored altogether, only needed for fuzzy search

.JPN and .ENG files
these files are source and target language files

_AEXTR
... See more
first of all, try using the internal search and replace. The problem is, you need to search for Unicode characters. I don't know, which type of regex it uses.

If you can't replace the characters within the application, use search and replace applications.

about files:

.MTX, .BAS
these files can be ignored altogether, only needed for fuzzy search

.JPN and .ENG files
these files are source and target language files

_AEXTR_1.JPN
this is the reference extract, compiled of fuzzy-matches found in the reference material during pretranslation. Only used when you search for fuzzies.

myfilename01.JPN
actual language pair, which should be translated

So you should only consider the last type. I would try using some third party search and replace tool and just make sure, that the encoding will not be changed. There you can search for your characters and replace or delete them.
Collapse


 
oerjan
oerjan
Local time: 19:48
German to Swedish
Transit regex Oct 10, 2016

Hi Dan

> 1) I am wondering if I can simply use a grep utility on the .JPN
> files to make a bunch of character substitutions. Given that the
> files seem to be plain text it could be done, but will it break the
> project in some way, perhaps by invalidating some kind of checksum?

I do not know, but as long as you do not mess with the structure I doubt there should be any stress.

Btw, _AEXTR_1.JPN is the memory extract built when the projec
... See more
Hi Dan

> 1) I am wondering if I can simply use a grep utility on the .JPN
> files to make a bunch of character substitutions. Given that the
> files seem to be plain text it could be done, but will it break the
> project in some way, perhaps by invalidating some kind of checksum?

I do not know, but as long as you do not mess with the structure I doubt there should be any stress.

Btw, _AEXTR_1.JPN is the memory extract built when the project was created.

> 3) What flavour of regex does Transit NXT use? I could not find
> this information in the docs. Is it C# .NET like Studio?

Transit uses it own flavour of regex. Find the manual on STAR:s homepage, their regex is described in one chapter there. You can´t use 'normal' regex.

(Even though I do not understand why you need that if you are not going to use it inside Transit?)

A tip might be that regex also works fine with the filter.

FWIW,
Örjan
Collapse


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 19:48
Yes you can Oct 10, 2016

You can edit the target language files with a regular expressions editor, I used to use WildEdit (http://www.textpad.com ) for that.

If I remember correctly, the encoding was 16LE.

Make sure you break no markups.

PS Your description of Transit not being as powerful as Studio, is not correct. Au contraire, I'd say.

[Edited at 2016-10-10 14:52 GMT]


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:48
Member (2014)
Japanese to English
TOPIC STARTER
Target file it is then Oct 10, 2016

CafeTran Training wrote:
You can edit the target language files with a regular expressions editor, I used to use WildEdit (http://textpad.com/products/wildedit/index.html) for that.
If I remember correctly, the encoding was 16LE.
Make sure you break no markups.

Thank you for that. I will take it slowly with the ENG rather than the JPN files and see how it goes.

PS Your description of Transit not being as powerful as Studio, is not correct. Au contraire, I'd say.

I take it that you're referring to my comment in the other thread: "In terms of use, Transit is tolerable, but not in the same category as Studio".

I'd stand by that statement. In terms of user-friendliness from the translator's perspective, I do indeed find Transit to be in a different category to Studio, and by that I do mean "inferior". Some things I miss in Transit that can be found in Studio are auto-suggest functions and termbases that pop up terms rather than entire segments. Maybe this is possible, but I can't find how to do it. And Studio has its plug-in system, which is potentially enormously useful, though still in its infancy.

In terms of its ability as a corporate translation platform - which is a very different thing - Transit/Star must have something going for it, because this household-name multinational continues to use it.

Dan


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 19:48
It's all a matter of preferences Oct 10, 2016

Dan Lucas wrote:

I take it that you're referring to my comment in the other thread: "In terms of use, Transit is tolerable, but not in the same category as Studio".

I'd stand by that statement. In terms of user-friendliness from the translator's perspective, I do indeed find Transit to be in a different category to Studio, and by that I do mean "inferior".


I used to love the way how Transit treats translation projects as text files that can be edited almost like plain text. When I want to make changes to 300 segments out of 8000 segments, Transit is very fast. Same with spell-checking.

It has a superior F/R dialogue box that will allow you to change word order and adjust case in the replacement text.

It's a tool for the heavy user. But they love it.

Ever seen how Transit displays an InDesign project? It's almost like in InDesign itself. Same goes for FrameMaker. Technical translators appreciate this.

No auto-suggestion, indeed. But doesn't the bubble for the target language make a decent replacement?

Screen Shot 2016-10-10 at 18.59.34


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:48
Member (2014)
Japanese to English
TOPIC STARTER
Horses for courses Oct 11, 2016

CafeTran Training wrote:
Ever seen how Transit displays an InDesign project? It's almost like in InDesign itself. Same goes for FrameMaker. Technical translators appreciate this.

Well, I'd describe myself as a heavy user, but I'm effectively processing files whose source is plain text. I'm sure if I were tackling InDesign or similarly complex files that display would be useful, but (fortunately?) I haven't been asked to tackle those.

The bubble is useful-ish, but it basically works as a TM. Search and replace is solid, with proper regex and the ability to save regexes too, which is unusual and useful. If we could only chain sequences of regexes I wouldn't have to use grep!

I'm locked into using Transit until next March when my license expires, at which point I may see how MemoQ works with Transit files. I thought MemoQ was a nice(r) application when I had to use it in anger for the first time a few weeks back. The question is how well it deals with Transit projects. I would not like to spend the money and then find that it doesn't work 100%.

Regards
Dan


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:48
Member (2014)
Japanese to English
TOPIC STARTER
Avoid Oct 11, 2016

AlSqur wrote:
So you should only consider the last type. I would try using some third party search and replace tool and just make sure, that the encoding will not be changed. There you can search for your characters and replace or delete them.

Thanks, I shall exclude the other types then. I had been including the *_AEXTR_* files but I will exclude from now on.

Dan


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:48
Member (2014)
Japanese to English
TOPIC STARTER
Reference Oct 11, 2016

oerjan wrote:
Transit uses it own flavour of regex. Find the manual on STAR:s homepage, their regex is described in one chapter there. You can´t use 'normal' regex.
(Even though I do not understand why you need that if you are not going to use it inside Transit?)

Thanks Örjan. I just wanted to know from a point of view of comparability to other regex dialects. In this case, as you say, the use of grep makes it a moot point, but it's good to know.

Dan


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:48
Member (2014)
Japanese to English
TOPIC STARTER
Grep works Oct 13, 2016

Dan Lucas wrote:
the use of grep makes it a moot point, but it's good to know.

Grepping seems to work. However, one of my .ENG files failed to load. It seems to have been the angle bracket as I replaced a Japanese angle bracket with a Latin one and that may have messed up the tags in the XML. However, disabling that replacement fixed that particular problem. Other than that, so far so good.

This has been a public service announcement.

Dan


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze[Call to this topic]

You can also contact site staff by submitting a support request »

Using regex/grep to "pre-edit" Transit files






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »