Production Data Engineering for Machine Learning

Designing, Building, and Operating the Data Foundations of Real-World ML Systems Framework and Blueprints

Muestra de Voz Virtual

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

Obtén 30 días de Standard gratis

$8.99 al mes después de que termine la prueba. Cancela en cualquier momento

Pruébalo por $0.00

Más opciones de compra

Production Data Engineering for Machine Learning

De: Jordan O'Neal

Narrado por: Virtual Voice

Pruébalo por $0.00

$8.99 al mes después de 30 días. Cancela en cualquier momento.

Compra ahora por $7.99

Este título utiliza narración de voz virtual

Voz Virtual es una narración generada por computadora para audiolibros..

Most production ML failures are data engineering failures: a feature pipeline that drifted overnight, a schema that changed without notice and corrupted a backfill, an ingestion job that silently dropped records during peak inference. For the architects and engineers who own those systems, the cost is measured in degraded models, missed SLOs, and postmortems that circle back to the same root cause: the data infrastructure was not designed to hold production conditions.
Inside this book, readers will learn how to:

Design durable schemas: Build feature schemas with explicit nullability, evolvable enumerations, and semantic versioning so retraining cycles and upstream changes do not require quarter-long migrations.
Choose the right ingestion strategy: Apply a structured decision framework for batch, micro-batch, and streaming ingestion based on freshness requirements, cost tolerance, and source-system characteristics, with every connector designed to be replay-safe by default.
Define and enforce data contracts: Establish enforceable producer-consumer contracts that prevent silent breakage between teams and wire those contracts into CI as build gates rather than relying on coordination by Slack message.
Validate features before they reach a model: Build a layered validation stack covering per-record checks at ingestion, statistical distribution tests at the batch level, and contract tests across pipeline stages.
Detect drift before it reaches customers: Instrument pipelines with per-feature distribution monitors and freshness alerts calibrated against historical baselines, turning data quality into an observable, actionable signal.
Build lineage and observability: Trace every record from source through training, feature serving, and inference, so when a model misbehaves the root cause is answered in minutes, not days.
Architect and operate a feature store: Eliminate training-serving skew through shared feature definitions and point-in-time correctness, and manage the full feature lifecycle from proposal through deprecation.
Protect data through privacy and compliance: Apply sensitivity classification, access controls, and audit logging as first-class engineering concerns embedded in pipeline design.
Engineer data for LLMs and RAG systems: Manage the retrieval corpora, embedding pipelines, chunking strategies, and quality gates that production large language model and retrieval-augmented generation systems require.

This book is written for ML architects who design systems others depend on, ML engineers and data engineers who build and operate those systems, and technical team leads who set the standards their organizations run on. It is intentionally tool-neutral: the patterns taught here apply across platforms and survive the next cycle of tooling change. Every chapter pairs the architect's design perspective with the engineer's implementation view, opens with a scenario drawn from common production incidents, and closes with a checklist the team can apply immediately. Readers finish with a coherent playbook covering every layer of the ML data stack, from foundational quality principles through feature serving, observability, security, scaling, and the data engineering demands of production LLM and RAG systems. Put it to work to raise the reliability floor of every ML system your organization depends on.

Jordan O'NealVirtual VoiceCybersoft Publishers LLc

Ciencia de Datos

Informática

Ciencia de Datos Informática

Todavía no hay opiniones