Multimodal Inference in Dynamo:
Multimodal Inference in Dynamo:
Multimodal Inference in Dynamo:
You can find example workflows and reference implementations for deploying a multimodal model using Dynamo in multimodal examples.
Dynamo supports two primary approaches for processing multimodal inputs, which differ in how the initial media encoding step is handled relative to the main LLM inference engine.
The EPD approach introduces an explicit separation of the media encoding step, maximizing the utilization of specialized hardware and increasing overall system efficiency for large multimodal models.
The PD approach is a more traditional, aggregated method where the inference engine handles the entire process.
Dynamo supports multimodal capabilities across leading LLM inference backends, including vLLM, TensorRT-LLM (TRT-LLM), and SGLang. The table below details the current support level for EPD/PD and various media types for each stack.