Support multiple languages in TTML #107

NhanNguyen700 · 2024-05-31T03:39:02Z

Hi,

This is valid in TTML:

<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="en" xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:tts="http://www.w3.org/2006/04/ttaf1#styling">
  <head>
    <metadata xmlns:ttm="http://www.w3.org/2006/10/ttaf1#metadata">
      <ttm:copyright>TVB (c)</ttm:copyright>
    </metadata>
    <styling>
      <style id="1" tts:textAlign="center" tts:color="transparent" tts:fontFamily="Verdana" tts:wrapOption="wrap" />
    </styling>
  </head>
  <body>
    <div xml:id="captions" xml:lang="eng">
      <p begin="00:01:58:040" end="00:01:59:920">eng text</p>
    </div>
    <div xml:id="captions" xml:lang="zho">
      <p begin="00:02:09:760" end="00:02:11:280">zho text</p>
    </div>
  </body>
</tt>

After parsing it and write it as TTML again, I expect that we still have two div tag with different languages, but I have this:

<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tts="http://www.w3.org/ns/ttml#styling">
    <head>
        <metadata>
            <ttm:copyright>TVB (c)</ttm:copyright>
        </metadata>
        <styling>
            <style xml:id="1" tts:color="transparent" tts:fontFamily="Verdana" tts:textAlign="center" tts:wrapOption="wrap"></style>
        </styling>
        <layout></layout>
    </head>
    <body>
        <div>
            <p begin="00:01:58.000" end="00:01:59.000">
                <span>eng text</span>
            </p>
            <p begin="00:02:09.000" end="00:02:11.000">
                <span>zho text</span>
            </p>
        </div>
    </body>
</tt>

Languages are gone, and texts are merged into one div tag.

I am looking for a way to fix this, but with the current structure of the lib, It is hard to achieve that without breaking anything.

The text was updated successfully, but these errors were encountered:

asticode · 2024-06-14T14:19:22Z

I've the feeling that adding a Language attribute to Item would do the trick but I may be missing something 🤔 On reading the ttml, language attribute should of an item would be update accordingly and on writing, we could either repeat the xml language attribute for each item (which would be simpler in the code), or add separate divs if we detect an item with a language 🤔

NhanNguyen700 · 2024-06-17T08:04:49Z

All the parsing will return the result as object Subtitles, and there is only one master language for the whole object, we can not know which Items belong to which language, that's why. If we want to fix it, we will break the Subtitles object structures and affect user of this library, they need to change their code to adapt with new structure. There is a way to achieve fixing the issue by storing language for each Subtitles Items, yeah, just like what you said, but it sounds inefficiency. But seems like that it is the only way for this current structure. And then, a question pop up. When converting the multiple languages TTML to WebVTT (or other formats), should we output multiple WebVTT files? I think it is yes, we should append the language into the name of WebVTT file for distinguishing them.

asticode · 2024-06-17T12:27:23Z

If we want to fix it, we will break the Subtitles object structures and affect user of this library, they need to change their code to adapt with new structure

Which changes are you thinking about? 🤔

NhanNguyen700 · 2024-06-18T02:50:39Z

Nothing special, just do not store Items directly in Subtitles, we can have some kind of Wrapper that store metadata (contains languages) and Items, then Subtitles can include that Wrapper. Another way is returning a list of Subtitles objects which different languages when parsing from the input, instead of just returning only one object like what we are doing currently. Those are my thoughts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple languages in TTML #107

Support multiple languages in TTML #107

NhanNguyen700 commented May 31, 2024

asticode commented Jun 14, 2024

NhanNguyen700 commented Jun 17, 2024

asticode commented Jun 17, 2024

NhanNguyen700 commented Jun 18, 2024

Support multiple languages in TTML #107

Support multiple languages in TTML #107

Comments

NhanNguyen700 commented May 31, 2024

asticode commented Jun 14, 2024

NhanNguyen700 commented Jun 17, 2024

asticode commented Jun 17, 2024

NhanNguyen700 commented Jun 18, 2024