Frontend | NVIDIA Dynamo Documentation

The Dynamo Frontend is the API gateway for serving LLM inference requests. It provides OpenAI-compatible HTTP endpoints and KServe gRPC endpoints, handling request preprocessing, routing, and response formatting.

Feature Matrix

Feature	Status
OpenAI Chat Completions API	✅ Supported
OpenAI Completions API	✅ Supported
KServe gRPC v2 API	✅ Supported
Streaming responses	✅ Supported
Multi-model serving	✅ Supported
Integrated routing	✅ Supported
Tool calling	✅ Supported

Quick Start

Prerequisites

Dynamo platform installed
etcd and nats-server -js running
At least one backend worker registered

HTTP Frontend

$ python -m dynamo.frontend --http-port 8000

This starts an OpenAI-compatible HTTP server with integrated preprocessing and routing. Backends are auto-discovered when they call register_llm.

KServe gRPC Frontend

$ python -m dynamo.frontend --kserve-grpc-server

See the Frontend Guide for KServe-specific configuration and message formats.

Kubernetes

1 apiVersion: nvidia.com/v1alpha1
2 kind: DynamoGraphDeployment
3 metadata:
4   name: frontend-example
5 spec:
6   graphs:
7     - name: frontend
8       replicas: 1
9       services:
10         - name: Frontend
11           image: nvcr.io/nvidia/dynamo/dynamo-vllm:latest
12           command:
13             - python
14             - -m
15             - dynamo.frontend
16             - --http-port
17             - "8000"

Configuration

Parameter	Default	Description
`--http-port`	8000	HTTP server port
`--kserve-grpc-server`	false	Enable KServe gRPC server
`--router-mode`	`round_robin`	Routing strategy: `round_robin`, `random`, `kv`

See the Frontend Guide for full configuration options.

Next Steps

Document	Description
Frontend Guide	KServe gRPC configuration and integration
Router Documentation	KV-aware routing configuration