Tech on the Rocks Podcast Por Kostas Nitay arte de portada

Tech on the Rocks

Tech on the Rocks

De: Kostas Nitay
Escúchala gratis

Join Kostas and Nitay as they speak with amazingly smart people who are building the next generation of technology, from hardware to cloud compute. Tech on the Rocks is for people who are curious about the foundations of the tech industry. Recorded primarily from our offices and homes, but one day we hope to record in a bar somewhere. Cheers!© 2026 Kostas, Nitay
Episodios
  • From Notebooks to Production: Xorq’s lockfile Approach for Reproducible, Portable ML Pipelines
    Jan 29 2026

    In this episode, Hussain shares the story behind xorq: a “lockfile for ML pipelines” that makes notebook work easier to reproduce, debug, and ship. We talk about why the research→production path is still so manual, how schemas (and Arrow) become the contract between systems, and what it takes to run the same pipeline across engines like Snowflake and Databricks. We also dig into escape hatches for imperative code, why feature stores didn’t become the default, and how xorq fits alongside other technologies like Iceberg.

    Chapters

    00:00 Hussain's Journey in Data Science

    06:00 The Need for xorq: Bridging Research and Production

    10:38 Challenges in Machine Learning Deployment

    17:40 The Role of Lock Files in Data Pipelines

    29:51 Understanding Schema Management in Data Systems

    34:40 Navigating Declarative and Imperative Transformations

    36:39 The Developer's Journey with xorq

    38:34 Feature Stores vs. xorq: A Comparative Analysis

    43:43 The Future of Feature Stores and Machine Learning

    51:41 Reproducibility in Data Pipelines: xorq vs. Git-like Operations

    55:47 The Future of xorq and the Data Ecosystem

    Más Menos
    57 m
  • From pandas to Arrow: Wes McKinney on the Future of Data Infrastructure
    Dec 1 2025

    Summary

    In this episode of Tech on the Rocks, Kostas and Nitay sit down with Wes McKinney the creator of pandas and co-creator of Apache Arrow and Ibis, and long-time leader in the Python data ecosystem. Wes walks us through his journey from building pandas in 2008 to rethinking how we represent and move columnar data with Arrow, and why Arrow is fundamentally different from file formats like Parquet and ORC.


    We get into the future of data file formats, DataFusion and the new generation of query engines, the rise of open data lakes (Iceberg, Delta, Hudi), and why “big metadata” is becoming just as important as big data. Wes also shares candid thoughts on open source sustainability, how companies and infrastructure projects really survive, and how AI coding agents like Claude Code are changing the day-to-day work of software engineers, especially for complex systems work.


    If you care about the foundations of modern data infrastructure, or you’ve ever called import pandas as pd, this is an episode you won’t want to miss.

    Chapters


    00:00 Intro — Wes McKinney & his journey in the Python data ecosystem

    02:15 How pandas evolved & why UX first mattered for data science

    06:14 Open source sustainability, funding & the Posit model

    07:31 From pandas to Datapad, Cloudera & the origins of Apache Arrow and Ibis

    13:38 What is Apache Arrow? In‑memory columnar data, batches & schemas

    22:23 Inside Arrow IPC — zero‑copy, Flatbuffers & cross‑language interop

    24:34 Arrow vs Parquet — columnar memory format vs columnar storage format

    29:28 The next generation of columnar file formats & GPU‑friendly encodings

    36:03 Big metadata, table formats & the rise of Iceberg/Delta/Hudi

    43:05 Rethinking data systems: from big data to DuckDB, Rust & “no JVM” stacks

    54:11 DataFusion as a modular Rust query engine for modern startups

    57:58 Open source, the composable data stack & why infra is “AI‑resistant”

    01:00:07 Vibe‑coding with AI agents — using Claude Code in real projects

    01:09:49 AI, open source maintainers & the risks of AI‑generated contributions

    01:18:57 Bridging LLMs and data: ADBC, data context & the future of infra + AI

    Más Menos
    1 h y 22 m
  • Navigating the Future of AI and Data Infrastructure with Bauplan
    Sep 8 2025

    Summary

    In this conversation, the founders of Bauplan, Jacopo and Ciro, share their extensive backgrounds in AI and data infrastructure, discussing the evolution of NLP and the challenges faced in the industry. They highlight the importance of data pipelines in AI effectiveness and the complexities of building data infrastructure.

    The discussion also covers lessons learned from previous ventures, the shifting dynamics of the AI market, and the need for collaboration between data scientists and engineers. They emphasize the significance of simplicity in data tools and the future of data management focusing on standardization and accessibility.

    In this episode

    • Bauplan was founded by experienced professionals in AI and data.
    • Data challenges remain significant despite advancements in AI.
    • Lessons from previous ventures inform current strategies.
    • Building data infrastructure is complex and requires careful planning.
    • Collaboration between data scientists and engineers is essential.
    • Data engineering will resemble more and more software engineering.
    • Simplicity in data tools can enhance user experience.
    • The future of data management will focus on standardization and accessibility.


    If you care about making AI features shippable by regular software teams—not just data specialists—this conversation maps the terrain and the trade-offs.


    Chapters

    00:00 Introduction to Bauplan and Founders' Background
    02:27 The Evolution of NLP and AI Challenges
    05:05 Shifts in Data and AI Application
    07:56 Lessons from Previous Ventures
    10:20 The Search Market Landscape
    13:05 Behavioral Data's Role in Search
    15:52 Building Data Infrastructure vs. Applications
    18:22 The Complexity of Data Management
    21:03 Bridging the Gap Between Data Science and Engineering
    23:39 Challenges in Infrastructure Development
    29:52 Navigating the Infrastructure Landscape
    32:19 The Pendulum of Centralization and Decentralization
    34:00 The Need for Standardization in Data Infrastructure
    36:52 Simplifying Data Workflows
    40:29 Radical Simplicity in Data Management
    45:28 Overcoming Resistance to Change
    48:50 The Future of Data Abstractions and Git for Data

    Más Menos
    59 m
Todavía no hay opiniones