Gateway API Inference Extension (GAIE)
Gateway API Inference Extension (GAIE)
Dynamo supports two request routing topologies on Kubernetes:
- Dynamo-native Frontend routing. The Dynamo Frontend receives HTTP requests and the integrated Dynamo Router selects workers.
- Gateway API routing with GAIE. A Kubernetes
Gatewayreceives HTTP requests, the Gateway API Inference Extension (GAIE) calls the Dynamo Endpoint Picker Plugin (EPP) for endpoint selection, and the selected worker’s Frontend sidecar forwards the request in direct mode.
This guide covers the Gateway API path for DynamoGraphDeployment resources managed by the Dynamo
operator. Use it when your Kubernetes platform wants Gateway API to own traffic entry, policy, and
observability while Dynamo owns the serving graph, discovery, event plane, and routing logic inside
the EPP.
Components
The operator-managed GAIE path combines user-created Gateway API objects with resources created
from the DynamoGraphDeployment.
Request Flow
Gateway API owns the external request path. Dynamo still owns the serving graph: the operator
creates the EPP Service, worker pods, Frontend sidecars, and InferencePool that binds the route to
the EPP. The EPP receives Dynamo routing state from the runtime event plane and returns the selected
worker ID to the gateway. The gateway forwards the request to the selected worker’s Frontend sidecar,
which runs in direct routing mode.
In this operator-managed path, the EPP consumes routing state through the Dynamo event plane using NATS/JetStream. Direct vLLM ZMQ KV-event subscriptions are used by other integration shapes, but not by this quickstart path.
Shared Prerequisites
- A Kubernetes cluster with GPU nodes. For the baseline Gateway API environment, start with the upstream Gateway API getting started guide and the upstream GAIE guide.
kubectl, Helm, and jq configured for the cluster.- Gateway API and GAIE CRDs installed. The quickstart installs them explicitly from pinned upstream release manifests.
- A Gateway API implementation that supports GAIE
InferencePoolresources andendpointPickerRefcalls. - Dynamo platform installed with the operator. See the Kubernetes Quickstart and Installation Guide.
- Model credentials and storage needed by the selected model. Hugging Face token secrets are a Dynamo model-serving prerequisite, not a GAIE-specific resource; see the Hugging Face token secret setup.
Gateway Implementation
GAIE requires a Gateway API implementation that can call an Endpoint Picker Plugin before forwarding
the request to a backend. Dynamo is independent of the Gateway implementation: pick the gateway that
matches your platform, then point its HTTPRoute and generated InferencePool at the Dynamo EPP.
The quickstart shows two verified paths: agentgateway and Istio. Other Gateway API
implementations might work when they support the same GAIE InferencePool and endpointPickerRef
EPP path; check the upstream
GAIE gateway implementation list
and your controller’s documentation before choosing another implementation.
Istio uses Envoy in its data plane. agentgateway is a Rust-based AI gateway. The requirement for this guide is not Envoy specifically; it is support for Gateway API plus the GAIE EndpointPicker flow.
agentgateway
Istio
Use agentgateway for a small Gateway API footprint or when the cluster does not already
standardize on a service mesh. Install the agentgateway chart with
inferenceExtension.enabled=true; the GatewayClass is agentgateway.
The quickstart walks through the two verified implementation paths shown in this table:
Gateway API Concepts
HTTPRoute.spec.parentRefs attaches a route to a Gateway. If the HTTPRoute and Gateway live
in different namespaces, set parentRefs[].namespace to the Gateway namespace. rules[].backendRefs
points at the InferencePool; the pool points at the EPP service through endpointPickerRef.
For the upstream API model, see the Gateway API HTTP routing guide and the cross-namespace routing guide.
Configure DynamoGraphDeployments for GAIE
In GAIE mode, the EPP chooses workers. The worker Frontend sidecar must run in direct routing mode so it honors the EPP selection instead of choosing a worker again.
The EPP component is part of the DynamoGraphDeployment. The operator creates the EPP Service and
the matching InferencePool, so users apply the DGD and the route instead of hand-crafting the pool.
EPP Component Configuration
Start from the recipe EPP component and update the DynamoGraphDeployment for your cluster. Change
deployment-level settings such as replicas and resources to fit gateway traffic volume. Change
routing plugin settings only when you want different endpoint-selection behavior, then validate the
result with production-like traffic.
For upstream EPP configuration semantics, see the GAIE
EPP YAML configuration guide
and its
Scheduling Profiles
section for plugin weights. For label-based endpoint selection, see the upstream
InferencePool configuration guide.
The label-filter plugin shown here is Dynamo-specific; the component role label comes from the
Dynamo ComponentType field.
The operator reconciles the EPP Deployment, EPP Service, and generated InferencePool from the
DGD. Tune the DGD first; patch generated resources only for short-lived debugging.
The DYN_* environment values are runtime contracts between the EPP router logic and the workers.
Update them when the worker backend changes; do not use them to tune scoring.
See the complete EPP examples in the source tree:
recipes/qwen3-0.6b/vllm/agg/gaie/deploy.yaml for the Qwen 0.6B aggregated recipe manifest and
examples/backends/vllm/deploy/gaie/disagg.yaml for the Qwen 0.6B disaggregated example manifest.
Routing Behavior
GAIE does not require one scoring strategy. Choose the routing behavior based on the routing state available to the EPP.
With operator-managed GAIE, NATS/JetStream backs routing-state delivery. The EPP can receive startup state and subsequent updates through the Dynamo event plane instead of rebuilding all state from new traffic after every EPP restart.
Compatibility and Defaults
The quickstart pins the Gateway API layer so manual setup is repeatable. Keep the Dynamo platform, EPP, and runtime images on the same Dynamo release line.
Troubleshooting Signals
Next Step
Run the GAIE Quickstart to deploy a DynamoGraphDeployment, expose it through
Gateway API, and verify an end-to-end request through the Dynamo EPP.
Use GAIE Reference for resource contracts, routing knobs, and service mesh settings.