NeurIPS 2025: Homogeneous Keys, Heterogeneous Values

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values

Escúchala gratis

Ver detalles del espectáculo

This research presents a novel method for efficient long-context modeling in Large Language Models (LLMs) by tackling the quadratic complexity of attention mechanisms through KV cache compression. The core discovery is a fundamental **local KV cache asymmetry**, which reveals that adjacent attention keys exhibit high structural homogeneity, while their associated value vectors possess distinct, heterogeneous distributions. To capitalize on this finding, the authors propose **AsymKV**, a training-free compression framework that shifts information loss from heterogeneous values to homogeneous keys. AsymKV operates by applying **homogeneity-based merging to keys** using a mathematically derived optimal vector, paired with a **lossless value representation scheme** utilizing cardinality-aware normalization to preserve vital information. Extensive empirical results on benchmarks like LongBench, across diverse models such as LLaMA3.1-8B, confirm that **AsymKV consistently surpasses state-of-the-art long-context methods** in terms of accuracy and information retention, offering improved performance with practical inference efficiency.

Source:

https://arxiv.org/pdf/2506.05410

Todavía no hay opiniones