[Improve][Transform-V2][Embedding]Enhance multimodal embeddings #9996

loupipalien · 2025-10-29T18:18:58Z

Purpose of this pull request

Multi-field multimodal vectorization, doubao-embedding-vision supports multi-field multimodal mixing as input

vectorization_fields {
      multi_field_text_vector = [product_name, description]

      multi_field_image_vector = [
        {
          field = product_image_url
          modality = jpeg
          format = url
        },
        {
          field = thumbnail_image
          modality = png
          format = url
        }
      ]

      multi_field_video_vector = [
        {
          field = product_video_url
          modality = mp4
          format = url
        },
        {
          field = promotional_video
          modality = mov
          format = url
        }
      ]

      multi_field_mix_vector = [
        product_name,
        {
          field = product_image_url
          modality = jpeg
          format = url
        },
        {
          field = product_video_url
          modality = mp4
          format = url
        }
      ]
 }

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Add new test cases

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

loupipalien · 2025-11-05T13:40:05Z

@Hisoka-X @corgy-w @xiaochen-zhou help to review if have time, thanks

github-actions bot added Transform-v2 e2e labels Oct 29, 2025

loupipalien changed the title ~~Enhance multimodal embeddings~~ [Improve][Transform-V2][Embedding]Enhance multimodal embeddings Oct 30, 2025

loupipalien added 6 commits November 4, 2025 22:45

enhance multimodal embeddings

6df1c07

fix: fix VectorFieldSpecTest test case

982c070

chore: add spotless toggleOffOn

6495a12

chore: code format

d9fcc53

chore:remove deduplicated code

f2ee863

chore: modify comment

bdbd012

loupipalien force-pushed the enhance-multimodal-embeddings branch from 2f8b47b to bdbd012 Compare November 4, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Improve][Transform-V2][Embedding]Enhance multimodal embeddings #9996

[Improve][Transform-V2][Embedding]Enhance multimodal embeddings #9996

loupipalien commented Oct 29, 2025 •

edited

Loading

Uh oh!

loupipalien commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Improve][Transform-V2][Embedding]Enhance multimodal embeddings #9996

Are you sure you want to change the base?

[Improve][Transform-V2][Embedding]Enhance multimodal embeddings #9996

Conversation

loupipalien commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

loupipalien commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

loupipalien commented Oct 29, 2025 •

edited

Loading