Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple languages in TTML #107

Open
NhanNguyen700 opened this issue May 31, 2024 · 4 comments
Open

Support multiple languages in TTML #107

NhanNguyen700 opened this issue May 31, 2024 · 4 comments

Comments

@NhanNguyen700
Copy link
Contributor

Hi,

This is valid in TTML:

<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="en" xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:tts="http://www.w3.org/2006/04/ttaf1#styling">
  <head>
    <metadata xmlns:ttm="http://www.w3.org/2006/10/ttaf1#metadata">
      <ttm:copyright>TVB (c)</ttm:copyright>
    </metadata>
    <styling>
      <style id="1" tts:textAlign="center" tts:color="transparent" tts:fontFamily="Verdana" tts:wrapOption="wrap" />
    </styling>
  </head>
  <body>
    <div xml:id="captions" xml:lang="eng">
      <p begin="00:01:58:040" end="00:01:59:920">eng text</p>
    </div>
    <div xml:id="captions" xml:lang="zho">
      <p begin="00:02:09:760" end="00:02:11:280">zho text</p>
    </div>
  </body>
</tt>

After parsing it and write it as TTML again, I expect that we still have two div tag with different languages, but I have this:

<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tts="http://www.w3.org/ns/ttml#styling">
    <head>
        <metadata>
            <ttm:copyright>TVB (c)</ttm:copyright>
        </metadata>
        <styling>
            <style xml:id="1" tts:color="transparent" tts:fontFamily="Verdana" tts:textAlign="center" tts:wrapOption="wrap"></style>
        </styling>
        <layout></layout>
    </head>
    <body>
        <div>
            <p begin="00:01:58.000" end="00:01:59.000">
                <span>eng text</span>
            </p>
            <p begin="00:02:09.000" end="00:02:11.000">
                <span>zho text</span>
            </p>
        </div>
    </body>
</tt>

Languages are gone, and texts are merged into one div tag.

I am looking for a way to fix this, but with the current structure of the lib, It is hard to achieve that without breaking anything.

@asticode
Copy link
Owner

I've the feeling that adding a Language attribute to Item would do the trick but I may be missing something 🤔 On reading the ttml, language attribute should of an item would be update accordingly and on writing, we could either repeat the xml language attribute for each item (which would be simpler in the code), or add separate divs if we detect an item with a language 🤔

@NhanNguyen700
Copy link
Contributor Author

All the parsing will return the result as object Subtitles, and there is only one master language for the whole object, we can not know which Items belong to which language, that's why. If we want to fix it, we will break the Subtitles object structures and affect user of this library, they need to change their code to adapt with new structure. There is a way to achieve fixing the issue by storing language for each Subtitles Items, yeah, just like what you said, but it sounds inefficiency. But seems like that it is the only way for this current structure. And then, a question pop up. When converting the multiple languages TTML to WebVTT (or other formats), should we output multiple WebVTT files? I think it is yes, we should append the language into the name of WebVTT file for distinguishing them.

@asticode
Copy link
Owner

If we want to fix it, we will break the Subtitles object structures and affect user of this library, they need to change their code to adapt with new structure

Which changes are you thinking about? 🤔

@NhanNguyen700
Copy link
Contributor Author

Nothing special, just do not store Items directly in Subtitles, we can have some kind of Wrapper that store metadata (contains languages) and Items, then Subtitles can include that Wrapper. Another way is returning a list of Subtitles objects which different languages when parsing from the input, instead of just returning only one object like what we are doing currently. Those are my thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants