Advanced LLM Optimization techniques

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Advanced LLM Optimization techniques

Escúchala gratis

Ver detalles del espectáculo

Welcome to another Data Architecture Elevator podcast! Today's discussion is hosted by Paolo Platter supported by our experts Antonino Ingargiola and Irene Donato.

In this episode, we explore effective strategies for optimizing large language models (LLMs) for inference tasks with multimodal data like audio, text, images, and video.

We discuss the shift from online APIs to hosted models, choosing smaller, task-specific models, and leveraging fine-tuning, distillation, quantization, and tensor fusion techniques. We also highlight the role of specialized inference servers such as Triton and Dynamo, and how Kubernetes helps manage horizontal scaling.

Don't forget to follow us on LinkedIn! Enjoy!

Todavía no hay opiniones