vLLM Chat Processor
vLLM-native preprocessing and postprocessing for chat completions
vLLM-native preprocessing and postprocessing for chat completions
The vLLM chat processor enables vLLM-native preprocessing and postprocessing in the Dynamo frontend. It uses vLLM’s tokenizer, chat templates, tool call parser, and reasoning parser directly — bypassing the default Rust preprocessor for v1/chat/completions requests.
Use --dyn-chat-processor vllm when Dynamo’s built-in Rust preprocessor does not yet support a tool call parser or reasoning parser you need. The vLLM processor delegates to vLLM’s Python implementations, so any parser vLLM supports works immediately.
Common cases:
tool_calling libraryIf the parser you need is missing from the Rust preprocessor, consider opening an issue or PR to add native support — native parsers avoid the Python GIL overhead entirely.
These arguments are passed to the frontend (not the worker) when using --dyn-chat-processor vllm. The frontend forwards unknown arguments to vLLM’s own CLI parser (AsyncEngineArgs and FrontendArgs), so any vLLM frontend or engine flag is accepted.
The processor supports all vLLM tool call formats. Pass --tool-call-parser (and typically --enable-auto-tool-choice) on the frontend:
Any parser supported by vLLM can be used. See the vLLM documentation for the full list of available tool call parsers.
Response:
For models that produce chain-of-thought reasoning (e.g., Qwen3, DeepSeek-R1), pass --reasoning-parser:
The parser separates think tag content into the reasoning_content field and regular content into the content field.