Production Data Engineering for Machine Learning Audiolibro Por Jordan O'Neal arte de portada

Production Data Engineering for Machine Learning

Designing, Building, and Operating the Data Foundations of Real-World ML Systems Framework and Blueprints

Muestra de Voz Virtual

Obtén 30 días de Standard gratis

$8.99 al mes después de que termine la prueba. Cancela en cualquier momento
Pruébalo por $0.00
Más opciones de compra

Production Data Engineering for Machine Learning

De: Jordan O'Neal
Narrado por: Virtual Voice
Pruébalo por $0.00

$8.99 al mes después de 30 días. Cancela en cualquier momento.

Compra ahora por $7.99

Compra ahora por $7.99

Background images

Este título utiliza narración de voz virtual

Voz Virtual es una narración generada por computadora para audiolibros..
Most production ML failures are data engineering failures: a feature pipeline that drifted overnight, a schema that changed without notice and corrupted a backfill, an ingestion job that silently dropped records during peak inference. For the architects and engineers who own those systems, the cost is measured in degraded models, missed SLOs, and postmortems that circle back to the same root cause: the data infrastructure was not designed to hold production conditions.
Inside this book, readers will learn how to:
  • Design durable schemas: Build feature schemas with explicit nullability, evolvable enumerations, and semantic versioning so retraining cycles and upstream changes do not require quarter-long migrations.
  • Choose the right ingestion strategy: Apply a structured decision framework for batch, micro-batch, and streaming ingestion based on freshness requirements, cost tolerance, and source-system characteristics, with every connector designed to be replay-safe by default.
  • Define and enforce data contracts: Establish enforceable producer-consumer contracts that prevent silent breakage between teams and wire those contracts into CI as build gates rather than relying on coordination by Slack message.
  • Validate features before they reach a model: Build a layered validation stack covering per-record checks at ingestion, statistical distribution tests at the batch level, and contract tests across pipeline stages.
  • Detect drift before it reaches customers: Instrument pipelines with per-feature distribution monitors and freshness alerts calibrated against historical baselines, turning data quality into an observable, actionable signal.
  • Build lineage and observability: Trace every record from source through training, feature serving, and inference, so when a model misbehaves the root cause is answered in minutes, not days.
  • Architect and operate a feature store: Eliminate training-serving skew through shared feature definitions and point-in-time correctness, and manage the full feature lifecycle from proposal through deprecation.
  • Protect data through privacy and compliance: Apply sensitivity classification, access controls, and audit logging as first-class engineering concerns embedded in pipeline design.
  • Engineer data for LLMs and RAG systems: Manage the retrieval corpora, embedding pipelines, chunking strategies, and quality gates that production large language model and retrieval-augmented generation systems require.
This book is written for ML architects who design systems others depend on, ML engineers and data engineers who build and operate those systems, and technical team leads who set the standards their organizations run on. It is intentionally tool-neutral: the patterns taught here apply across platforms and survive the next cycle of tooling change. Every chapter pairs the architect's design perspective with the engineer's implementation view, opens with a scenario drawn from common production incidents, and closes with a checklist the team can apply immediately. Readers finish with a coherent playbook covering every layer of the ML data stack, from foundational quality principles through feature serving, observability, security, scaling, and the data engineering demands of production LLM and RAG systems. Put it to work to raise the reliability floor of every ML system your organization depends on.
Ciencia de Datos Informática
adbl_web_anon_alc_button_suppression_c
Todavía no hay opiniones