Logits Processing
For general TensorRT-LLM features and configuration, see the Reference Guide.
Logits processors let you modify the next-token logits at every decoding step (e.g., to apply custom constraints or sampling transforms). Dynamo provides a backend-agnostic interface and an adapter for TensorRT-LLM so you can plug in custom processors.
How it works
- Interface: Implement
dynamo.logits_processing.BaseLogitsProcessorwhich defines__call__(input_ids, logits)and modifieslogitsin-place. - TRT-LLM adapter: Use
dynamo.trtllm.logits_processing.adapter.create_trtllm_adapters(...)to convert Dynamo processors into TRT-LLM-compatible processors and assign them toSamplingParams.logits_processor. - Examples: See example processors in
lib/bindings/python/src/dynamo/logits_processing/examples/(temperature, hello_world).
Quick test: HelloWorld processor
You can enable a test-only processor that forces the model to respond with “Hello world!”. This is useful to verify the wiring without modifying your model or engine code.
- When enabled, Dynamo initializes the tokenizer so the HelloWorld processor can map text to token IDs.
- Expected chat response contains “Hello world”.
Bring your own processor
Implement a processor by conforming to BaseLogitsProcessor and modify logits in-place. For example, temperature scaling:
Wire it into TRT-LLM by adapting and attaching to SamplingParams:
Current limitations
- Per-request processing only (batch size must be 1); beam width > 1 is not supported.
- Processors must modify logits in-place and not return a new tensor.
- If your processor needs tokenization, ensure the tokenizer is initialized (do not skip tokenizer init).