Frontend

View as Markdown

The Dynamo Frontend is the API gateway for serving LLM inference requests. It provides OpenAI-compatible HTTP endpoints and KServe gRPC endpoints, handling request preprocessing, routing, and response formatting.

Feature Matrix

FeatureStatus
OpenAI Chat Completions API✅ Supported
OpenAI Completions API✅ Supported
KServe gRPC v2 API✅ Supported
Streaming responses✅ Supported
Multi-model serving✅ Supported
Integrated routing✅ Supported
Tool calling✅ Supported

Quick Start

Prerequisites

  • Dynamo platform installed
  • etcd and nats-server -js running
  • At least one backend worker registered

HTTP Frontend

$python -m dynamo.frontend --http-port 8000

This starts an OpenAI-compatible HTTP server with integrated preprocessing and routing. Backends are auto-discovered when they call register_llm.

KServe gRPC Frontend

$python -m dynamo.frontend --kserve-grpc-server

See the Frontend Guide for KServe-specific configuration and message formats.

Kubernetes

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoGraphDeployment
3metadata:
4 name: frontend-example
5spec:
6 graphs:
7 - name: frontend
8 replicas: 1
9 services:
10 - name: Frontend
11 image: nvcr.io/nvidia/dynamo/dynamo-vllm:latest
12 command:
13 - python
14 - -m
15 - dynamo.frontend
16 - --http-port
17 - "8000"

Configuration

ParameterDefaultDescription
--http-port8000HTTP server port
--kserve-grpc-serverfalseEnable KServe gRPC server
--router-moderound_robinRouting strategy: round_robin, random, kv

See the Frontend Guide for full configuration options.

Next Steps

DocumentDescription
Frontend GuideKServe gRPC configuration and integration
Router DocumentationKV-aware routing configuration