
REFRAG: Turning RAG Context from a Token String into a Compressible Representation
·2252 words·11 mins
REFRAG compresses retrieved passages into chunk-level decoder positions, then uses RL to selectively expand high-entropy spans—mechanisms, training pipeline, and how to read TTFT and RAG benchmarks without over-generalizing paper numbers.