Does distributed-llama currently support multimodal models?

Does distributed-llama currently support multimodal models? For example, llava.

I tried and found that it can run, but I can't make inferences based on pictures

In addition, do you need edge node device testing? We have a lot of idle edge nodes and can provide relevant assistance and support