Add Multimodal Input Support (Image, Audio, Video) to App-UI in MS-Swift Library #2469

SushantGautam · 2024-11-18T09:42:07Z

The MS-Swift library currently supports models capable of processing multimodal input (image, audio, video) via the web-UI. However, this functionality is not available in the app UI. We request the inclusion of multimodal input support in the app-UI to enable seamless integration and usage of models with multimodal capabilities, aligning it with the web UI's features.

Adding this feature will enhance the MS-Swift library's usability in mobile or desktop application development, ensuring consistent multimodal support across platforms. This could involve creating APIs for uploading and processing different data modalities and providing developers with examples or templates for implementation. Such an update would broaden the library’s applicability in real-world scenarios, such as multimedia content analysis, accessibility tools, and creative applications.

SushantGautam mentioned this issue Nov 18, 2024

Qwen2-VL web-ui #2118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Multimodal Input Support (Image, Audio, Video) to App-UI in MS-Swift Library #2469

Add Multimodal Input Support (Image, Audio, Video) to App-UI in MS-Swift Library #2469

SushantGautam commented Nov 18, 2024

Add Multimodal Input Support (Image, Audio, Video) to App-UI in MS-Swift Library #2469

Add Multimodal Input Support (Image, Audio, Video) to App-UI in MS-Swift Library #2469

Comments

SushantGautam commented Nov 18, 2024