"Measuring no CoT math time horizon (single forward pass)" by ryan_greenblatt
No se pudo agregar al carrito
Solo puedes tener X títulos en el carrito para realizar el pago.
Add to Cart failed.
Por favor prueba de nuevo más tarde
Error al Agregar a Lista de Deseos.
Por favor prueba de nuevo más tarde
Error al eliminar de la lista de deseos.
Por favor prueba de nuevo más tarde
Error al añadir a tu biblioteca
Por favor intenta de nuevo
Error al seguir el podcast
Intenta nuevamente
Error al dejar de seguir el podcast
Intenta nuevamente
-
Narrado por:
-
De:
Important caveat: To get human completion times, I ask Opus 4.5 (with thinking) to estimate how long it would take the median AIME participant to complete a given problem.These times seem roughly reasonable to me, but getting some actual human baselines and using these to correct Opus 4.5's estimates would be better.
Here are the 50% reliability time horizon results:
I find that Opus 4.5 has a no-CoT 50% reliability time horizon of 3.5 minutes and that time horizon has been doubling every 9 months.
In an earlier post (Recent LLMs can leverage filler tokens or repeated problems to improve (no-CoT) math performance), I found that repeating the problem substantially boosts performance.In the above plot, if [...]
---
Outline:
(02:33) Some details about the time horizon fit and data
(04:17) Analysis
(06:59) Appendix: scores for Gemini 3 Pro
(11:41) Appendix: full result tables
(11:46) Time Horizon - Repetitions
(12:07) Time Horizon - Filler
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
December 26th, 2025
Source:
https://www.lesswrong.com/posts/Ty5Bmg7P6Tciy2uj2/measuring-no-cot-math-time-horizon-single-forward-pass
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Todavía no hay opiniones