Frontier Labs

NVIDIA Pioneers End-to-End Test-Time Training for LLM Memory

NVIDIA Research introduces TTT-E2E, compressing context into weights via next-token prediction to enable learning at inference time.

Tech Insights Reporter Jan 22, 2026 4 min read Santa Clara

Santa Clara, January 22, 2026 - NVIDIA Research has unveiled End-to-End Test-Time Training (TTT-E2E), an approach that allows LLMs to learn from context during inference by compressing it directly into model weights via next-token prediction.

Unlike traditional retrieval-augmented pipelines that keep context in external memory, TTT-E2E treats the immediate context as training data. The model continuously adapts to the current task, aiming to preserve key details without expanding context windows.

Researchers say the approach shows promise for long-context tasks, where memory constraints and retrieval errors often degrade quality. Early experiments suggest improved recall on long-form inputs while keeping inference costs manageable.

The work adds to a growing body of research on memory-efficient agent design, but it will require careful evaluation to ensure safety and stability under continual adaptation.

Credit: NVIDIA Research team. Primary source: NVIDIA Developer Blog.