Jump to content

[Internationalization] Splitting the translation files


Recommended Posts

I’m going to start working on #2494 Split the POT file of the public mod in more manageable POTs. But before I start with this task, I would appreciate it if other translators could give me feedback about this split, in the form of suggestions on what you would want to have in a separate translation file.

Currently we have two translation files, one for the engine (engine.pot) and one for the game data (public.pot). The latter contains 3,665 translatable strings, which I bet most of you consider way too many for a single file.

So, what content of the game data translation file (public.pot) would you like to see in a separate translation file to make it easier to handle?

For example, let me suggest splitting the translation file like this:

  • campaigns, campaign strings (simulation/templates/campaigns/).
  • civilizations, civilization strings (civs/).
  • gaia, for gaia strings (simulation/templates/template_gaia*, simulation/templates/gaia/).
  • gui, for GUI strings (globalscripts/, gui/, simulation/data/game_speeds.json, simulation/data/player_defaults.json, simulation/data/map_sizes.json, simulation/ai/, simulation/templates/formations/).
  • help, for help strings (gui/manual/intro.txt, gui/manual/userreport.txt).
  • logic, for strings related to the game logic (simulation/components).
  • maps, for map names, descriptions, etc. (maps/).
  • other, for strings that do not fir anywhere else (simulation/templates/other/)
  • quotes, for quotes (gui/text/quotes.txt).
  • structures, for structure strings (simulation/templates/template_structure*, simulation/templates/structures/, simulation/templates/skirmish/structures/).
  • technologies, for technology strings (simulation/data/technologies)
  • tips, for tips displayed in the loading screen (gui/text/tips/).
  • tos, for strings from the Terms of Service and the Terms of Use of the lobby (gui/lobby/Terms_of_Service.txt, gui/lobby/Terms_of_Use.txt).
  • tutorials, for tutorial strings (simulation/ai/tutorial-ai/).
  • units, for unit strings (simulation/templates/template_unit*, simulation/templates/units/).

What would you change of that splitting? Split some module further? Merge some of these modules into a single module? Please, give your opinion!

Link to comment
Share on other sites

I think it's nice, for comparison, to add some numbers to it. I counted these by just grepping the lines that countain said path. So multiple occurences are counted multiple times. Like "Yes" appears 18 times in the gui, but that's probably the most extreme example. I didn't want to invest time in better calculation methods.

The results for the current proposal are:

  • Campaigns: 17 (should probably become bigger in the future, but not in the near future)
  • Civilisations: 699
  • Gaia: 134
  • Gui: 1053 (of which 917 in "gui/" itself)
  • Help: 106
  • Logic: 30
  • Maps: 567
  • Quotes: 117
  • Structures: 580
  • Technologies: 719
  • Tips: 136
  • ToS: 39
  • Tutorial AI: 80 (might completely disappear in the future and be replaced with a set of tutorial triggers)
  • Units: 985

I think the "Gui" situation is the most strange one. Gui should only have stuff inside "gui/" IMO. Splitting up gui/ seems fair enough, else you end up with over 1300 strings in one category. This is my proposal:

  • Civilisations: civs/ (699 strings)
  • Gui-ingame: gui/session (385 strings)
  • Gui-gamesetup (anything setup and loading): gui/gamesetup, gui/aiconfig, gui/loading, gui/text/quotes.txt, (375 strings)
  • Gui-lobby: gui/lobby (123 strings)
  • Gui-manual: gui/manual (109 strings)
  • Gui-other (summary, pregame, settings, ...): gui/ minus the stuff above (403 strings)
  • Templates-units: simulation/templates/template_unit_*, simulation/templates/units/ (985 strings)
  • Templates-buildings: simulation/templates/template_structure_*, simulation/templates/structures/ (580 strings)
  • Templates-other: (286 strings)
  • Simulation-Technologies: simulation/data/technologies (719 strings)
  • Simulation-other: simulation/components, simulation/ai, simulation/data/game_speeds.json, simulation/data/player_defaults.json (145 strings)
  • Maps: maps/ (597 strings)

The only problem I see here are the unit templates. When we have some sort of in-game encyclopedia, we will probably also have to translate the "history" tag of the unit templates. Which means that for every unit, we get a quite long string extra.

Maybe an other way of splitting up the templates could be per civilisation. Currently, there are 1851 template strings. With the histories added, this could easily go to 2500 or even 3000 strings. So when split up per civ, it would give you 12 civs + common strings, so on average 190-230 strings per civ. Which is a nice number IMO.

Link to comment
Share on other sites

I think the list in the first post looks good :)

How does modding work? We should also consider making it easy adding the translation of mods in the future by dropping extra po files in the folder. So, apart from size, splitting by civ might make sense for that, maybe?

Link to comment
Share on other sites

For modding, it's quite simple. Every file with the format lg.somthing.po is loaded for a particular language. So mods can add their own po files (in any subdivision scheme they want), or overwrite the public po files with their own translations (but if they do so, they must replace the whole file, you can't replace string-by-string).

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

OK, these are the final numbers:

[gallaecio@afonso i18n]$ python2 updateTemplates.py
Generated “engine.pot” with 6 messages.
Generated “public.civilizations.pot” with 534 messages.
Generated “public.gui-ingame.pot” with 286 messages.
Generated “public.gui-gamesetup.pot” with 300 messages.
Generated “public.gui-lobby.pot” with 105 messages.
Generated “public.gui-manual.pot” with 109 messages.
Generated “public.gui-other.pot” with 393 messages.
Generated “public.templates-units.pot” with 582 messages.
Generated “public.templates-buildings.pot” with 336 messages.
Generated “public.templates-other.pot” with 222 messages.
Generated “public.simulation-technologies.pot” with 430 messages.
Generated “public.simulation-other.pot” with 137 messages.
Generated “public.maps.pot” with 382 messages.

Using this messages.json file: http://paste.kde.org/pvxgefs5v

Thumbs up?

Link to comment
Share on other sites

Do we have so many duplicates in the unit strings? Something isn't right there. We shouldn't have duplication thanks to inheritance (except in some cases, where we'd need multiple inheritance).

The splitup seems good for me.

Could you also add the AuraName and AuraDescription tags to the templates extraction? (I didn't add those as I knew you were working on it).

Link to comment
Share on other sites

Cool, fully translated to Spanish yet again.

Also, looks like all the previously available comments, reviewing progress and string tags (specially the handy "specific" tag set, used with the untranslatable original denominations for units and buildings) are lost. I've started to repopulate them as I kept re-reviewing more than half of the string pools. :(

Maybe there's a feasible programmatic way of automatically applying the "specific" tag to strings by parsing the XML trees of the unit descriptors or something. It's a lot of work to do it manually. Transifex API? Python scripts?

Edited by Swyter
  • Like 2
Link to comment
Share on other sites

We could also add it as a translator comment to the messages, so that it shows up in Transifex. Would somehting like that work for you?

That sounds fair. I didn't remember that the translations are cherry-picked with a custom Python script and transformed into a PO.

Enhancing the PO building process to add a comment in this case seems ideal.

I guess that once we have a way of signaling programmatically which strings are original denominations for units we could filter them out using the editor and batch review them in a few clicks, saving a lot of work for most of the languages which don't need to retouch them, like Spanish.

Plus, there are many novice translators which mistakenly try to adapt them without asking. So it's a bit of a pain in the @#$%.

--

By the way, there's a quick way of tagging strings using the Transifex API. You have to feed it with the string hash (or PO string id?), though.

http://docs.transifex.com/developer/api/resource_strings

Edited by Swyter
Link to comment
Share on other sites

That sounds fair. I didn't remember that the translations are cherry-picked with a custom Python script and transformed into a PO.

Enhancing the PO building process to add a comment in this case seems ideal.

I guess that once we have a way of signaling programmatically which strings are original denominations for units we could filter them out using the editor and batch review them in a few clicks, saving a lot of work for most of the languages which don't need to retouch them, like Spanish.

You need to modify some of the specific names. Some contain an English "of", or some are just plain English, like specific plant names.
Link to comment
Share on other sites

You need to modify some of the specific names. Some contain an English "of", or some are just plain English, like specific plant names.

I'm talking primarily about main unit and building names. You know, Greek, Latin... and all that. I don't consider plant or prop names to be in the "specific" category, specially if they are written in English. And don't worry, that's all properly translated already since a bunch of months ago.

--

Edit: Just saw how everything is laid out within the XML templates. Maybe is better to classify things like Bear names as GenericName instead of SpecificName. I don't know how the engine reacts to that, but seems more logical.

I guess you could call them Ursa bulgaris or whatever scientific name it has.

Edited by Swyter
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...