Managing Production Large Language Models Audiolibro Por Jordan O'Neal arte de portada

Managing Production Large Language Models

Playbook for Designing, Deploying, and Operating LLM at Scale and Machine Learning FinOps Blueprints

Muestra de Voz Virtual

Obtén 30 días de Standard gratis

$8.99 al mes después de que termine la prueba. Cancela en cualquier momento
Pruébalo por $0.00
Más opciones de compra

Managing Production Large Language Models

De: Jordan O'Neal
Narrado por: Virtual Voice
Pruébalo por $0.00

$8.99 al mes después de 30 días. Cancela en cualquier momento.

Compra ahora por $7.99

Compra ahora por $7.99

Background images

Este título utiliza narración de voz virtual

Voz Virtual es una narración generada por computadora para audiolibros..
Written for ML architects, ML and LLM engineers, and technical leads, with platform and SRE engineers as a strong secondary audience.
Large language models fail in production in ways no load test anticipates. A retrieval-augmented pipeline answers confidently with fabricated citations when the vector index drifts. An agent loop burns a month of API budget in forty minutes because one tool call returned an unexpected schema. A prompt that cleared every red-team review gets hijacked by a malicious document in the first week of real traffic. Latency spikes vanish with no correlated metric because the KV cache was sized for a context window half as long as users actually send. These are the predictable failure modes of LLM systems, and teams that ship reliable LLM features design against them with patterns that hold across models, vendors, and inference frameworks.
Inside this book, readers will learn how to:
  • Operate LLMs in production with the reliability, cost discipline, and observability the rest of the stack already expects.
  • Select and size models using a structured framework that weighs capability, latency, cost, and compliance before the first token is generated.
  • Choose between fine-tuning and prompting applying LoRA, PEFT, and instruction-tuning where they outperform prompt engineering and recognizing when they do not.
  • Serve LLMs at scale with continuous batching, paged attention, and KV cache management to maximize throughput and meet latency SLOs.
  • Engineer LLM costs through semantic caching, model tiering, and FinOps practices that keep inference spend predictable as usage grows.
  • Defend against prompt injection and detect hallucinations using layered input validation, output verification, and guardrails that hold under adversarial conditions.
  • Instrument LLM systems for observability capturing reasoning traces, semantic drift signals, and metrics that distinguish a degraded model from a degraded pipeline.
  • Drive quality with eval-driven development building replay harnesses, canary deployments, and evaluation gates so every model update ships with a known risk profile.
  • Build production RAG and agent systems with retrieval quality metrics, tool-call guardrails, and loop termination policies that keep costs bounded.
  • Navigate compliance obligations including the EU AI Act and sector-specific rules without stalling LLM feature delivery.
The LLM tooling landscape rotates fast: the serving framework favored today will be superseded before the next model generation ships. The underlying patterns do not rotate: how to reason about context budget and cache hit rate, how to structure an evaluation harness that survives a model swap, and how to build an observability layer that surfaces model-layer failures the application cannot see. Teams that internalize these patterns move faster when tools change because they replace syntax, not understanding.
The book is organized in four parts: Foundations covers the LLM landscape, model selection, and adaptation patterns; Adaptation and Serving addresses fine-tuning, serving architectures, and cost engineering; Production Concerns covers safety, compliance, observability, and LLM CI/CD; and Frontier Patterns and Case Studies applies all prior material to RAG, agentic systems, and tool-using LLMs, closing with end-to-end reference architectures.
Written for ML architects, ML and LLM engineers, and technical leads, with platform and SRE engineers as a strong secondary audience. Every chapter opens with a production incident, teaches canonical patterns by name, and closes with a checklist the team can apply the same day. Readers finish with the vocabulary, playbook, and pattern library to ship reliable LLM features with confidence.
Informática Programación
adbl_web_anon_alc_button_suppression_c
Todavía no hay opiniones