Episodios

  • Diving Deep into Synthetic Data with Alex Watson of Gretel.ai
    Apr 20 2021
    Alex Watson is the co-founder and CEO of Gretel.ai, a startup that offers APIs for creating anonymized and synthetic datasets. Previously he was the founder of Harvest.ai, whose product Macie, an analytics platform protecting against data breaches, was acquired by AWS.Learn more about Alex and Gretel AI:http://gretel.aiEvery Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletterFollow Charlie on Twitter: https://twitter.com/CharlieYouAISubscribe to ML Engineered: https://mlengineered.com/listenComments? Questions? Submit them here: http://bit.ly/mle-surveyTake the Giving What We Can Pledge: https://www.givingwhatwecan.org/Timestamps:02:15 Introducing Alex Watson03:45 How Alex was first exposed to programming05:00 Alex's experience starting Harvest AI, getting acquired by AWS, and integrating their product at massive scale21:20 How Alex first saw the opportunity for Gretel.ai24:20 The most exciting use-cases for synthetic data28:55 Theoretical guarantees of anonymized data with differential privacy36:40 Combining pre-training with synthetic data38:40 When to anonymize data and when to synthesize it41:25 How Gretel's synthetic data engine works44:50 Requirements of a dataset to create a synthetic version49:25 Augmenting datasets with synthetic examples to address representation bias52:45 How Alex recommends teams get started with Gretel.ai59:00 Expected accuracy loss from training models on synthetic data01:03:15 Biggest surprises from building Gretel.ai01:05:25 Organizational patterns for protecting sensitive data01:07:40 Alex's vision for Gretel's data catalog01:11:15 Rapid fire questionsLinks:Gretel.ai BlogNetFlix Cancels Recommendation Contest After Privacy LawsuitGreylock - The Github of DataImproving massively imbalanced datasets in machine learning with synthetic dataDeep dive on generating synthetic data for HealthcareGretel’s New Synthetic Performance ReportThe...
    Más Menos
    1 h y 19 m
  • A Practical Approach to Learning Machine Learning with Radek Osmulski (Earth Species Project)
    Mar 30 2021

    Radek Osmulski is a fully self-taught machine learning engineer. After getting tired of his corporate job, he taught himself programming and started a new career as a Ruby on Rails developer. He then set out to learn machine learning. Since then, he's been a Fast AI International Fellow, become a Kaggle Master, and is now an AI Data Engineer on the Earth Species Project.

    Learn more about Radek:

    https://www.radekosmulski.com

    https://twitter.com/radekosmulski

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter

    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/


    Timestamps:

    02:15 How Radek got interested in programming and computer science

    09:00 How Radek taught himself machine learning

    26:40 The skills Radek learned from Fast AI

    39:20 Radek's recommendations for people learning ML now

    51:30 Why Radek is writing a book

    01:01:20 Radek's work at the Earth Species Project

    01:10:15 How the ESP collects animal language data

    01:21:05 Rapid fire questions


    Links:

    Radek's Book "Meta-Learning"

    Andrew Ng ML Coursera

    Fast AI

    Universal Language Model Fine-tuning for Text Classification

    How to do Machine Learning Efficiently

    NPR - Two Heartbeats a Minute

    Earth Species Project

    A Guide to the Good Life

    The Origin of Wealth

    Make Time

    You Are Here

    Más Menos
    1 h y 38 m
  • From Data Science Leader to ML Researcher with Rodrigo Rivera (Skoltech ADASE, Samsung NEXT)
    Mar 23 2021

    Rodrigo Rivera is a machine learning researcher at the Advanced Data Analytics in Science and Engineering Group at Skoltech and technical director of Samsung Next. He's previously been in data science and research leadership roles at companies all around the world including Rocket Internet and Philip-Morris.

    Learn more about Rodrigo:

    https://rodrigo-rivera.com/

    https://twitter.com/rodrigorivr

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter

    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/


    Timestamps:

    03:00 How Rodrigo got started in computer science and started his first company

    10:40 Rodrigo's experiences leading data science teams at Rocket Internet and PMI

    26:15 Leaving industry to get a PhD in machine learning

    28:55 Data science collaboration between business and academia

    32:45 Rodrigo's research interest in time series data

    39:25 Topological data analysis

    45:35 Framing effective research as a startup

    48:15 Neural Prophet

    01:04:10 The potential future of Julia for numerical computing

    01:08:20 Most exciting opportunities for ML in industry

    01:15:05 Rodrigo's advice for listeners

    01:17:00 Rapid fire questions


    Links:

    Rodrigo's Google Scholar

    Advanced Data Analytics in Science and Engineering Group

    Neural Prophet

    M-Competitions

    Machine Learning Refined

    Foundations of Machine Learning

    A First Course in Machine Learning

    Más Menos
    1 h y 24 m
  • The Future of ML and AI Infrastructure and Ethics with Dan Jeffries (Pachyderm, AI Infrastructure Alliance)
    Mar 16 2021

    Dan Jeffries is the chief technical evangelist at Pachyderm, a leading data science platform. He's a prominent writer and speaker on all things related to the future. He's been in software for over two decades, many of those at Redhat, and is the founder of the AI Infrastructure Alliance and Practical AI Ethics.

    Learn more about Dan:

    https://twitter.com/Dan_Jeffries1

    https://medium.com/@dan.jeffries

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter


    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/


    Timestamps:

    02:15 How Dan got started in computer science

    06:50 What Dan is most excited about in AI

    14:45 Where we are in the adoption curve of ML

    20:40 The "Canonical Stack" of ML

    32:00 Dan's goal for the AI Infrastructure Alliance

    40:55 "Problems that ML startups don't know they're going to have"

    49:00 Closed vs open source tools in the Canonical Stack

    01:00:05 Building out the "boring" part of the infrastructure to enable exciting applications

    01:08:40 Dan's practical approach to AI Ethics

    01:23:50 Rapid fire questions


    Links:

    Pachyderm

    AI Infrastructure Alliance

    Practical AI Ethics Alliance

    Rise of the Canonical Stack in Machine Learning

    Rise of AI - The Age of AI in 2030

    Google Magenta

    AlphaGo Documentary

    Thinking in Bets

    A History of the World in 6 Glasses

    Super-Thinking

    Más Menos
    1 h y 37 m
  • Developing Feast, the Leading Open Source Feature Store, with Willem Pienaar (Gojek, Tecton)
    Mar 9 2021

    Willem Pienaar is the co-creator of Feast, the leading open source feature store, which he leads the development of as a tech lead at Tecton. Previously, he led the ML platform team at Gojek, a super-app in Southeast Asia.

    Learn more:

    https://twitter.com/willpienaar

    https://feast.dev/

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter


    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/


    Timestamps:

    02:15 How Willem got started in computer science

    03:40 Paying for college by starting an ISP

    05:25 Willem's experience creating Gojek's ML platform

    21:45 Issues faced that led to the creation of Feast

    26:45 Lessons learned building Feast

    33:45 Integrating Feast with data quality monitoring tools

    40:10 What it looks like for a team to adopt Feast

    44:20 Feast's current integrations and future roadmap

    46:05 How a data scientist would use Feast when creating a model

    49:40 How the feature store pattern handles DAGs of models

    52:00 Priorities for a startup's data infrastructure

    55:00 Integrating with Amundsen, Lyft's data catalog

    57:15 The evolution of data and MLOps tool standards for interoperability

    01:01:35 Other tools in the modern data stack

    01:04:30 The interplay between open and closed source offerings


    Links:

    Feast's Github

    Gojek Data Science Blog

    Data Build Tool (DBT)

    Tensorflow Data Validation (TFDV)

    A State of Feast

    Google BigQuery

    Lyft Amundsen

    Cortex

    Kubeflow

    MLFlow

    Más Menos
    1 h y 12 m
  • Bringing DevOps Best Practices into Machine Learning with Benedikt Koller from ZenML
    Mar 2 2021

    Benedikt Koller is a self-professed "Ops guy", having spent over 12 years working in roles such as DevOps engineer, platform engineer, and infrastructure tech lead at companies like Stylight and Talentry in addition to his own consultancy KEMB. He's recently dove head first into the world of ML, where he hopes to bring his extensive ops knowledge into the field as the co-founder of Maiot, the company behind ZenML, an open source MLOps framework.

    Learn more:

    https://zenml.io/

    https://maiot.io/

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter

    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/

    Timestamps:

    02:15 Introducing Benedikt Koller

    05:30 What the "DevOps revolution" was

    10:10 Bringing good Ops practices into ML projects

    30:50 Pivoting from vehicle predictive analytics to open source ML tooling

    34:35 Design decisions made in ZenML

    39:20 Most common problems faced by applied ML teams

    49:00 The importance of separating configurations from code

    55:25 Resources Ben recommends for learning Ops

    57:30 What to monitor in an ML pipelines

    01:00:45 Why you should run experiments in automated pipelines

    01:08:20 The essential components of an MLOps stack

    01:10:25 Building an open source business and what's next for ZenML

    01:20:20 Rapid fire questions

    Links:

    ZenML's GitHub

    Maiot Blog

    The Twelve Factor App

    12 Factors of reproducible Machine Learning in production

    Seldon

    Pachyderm

    KubeFlow

    Something Deeply Hidden

    The Expanse Series

    The Three Body Problem

    Extreme Ownership

    Más Menos
    1 h y 28 m
  • Starting an Independent AI Research Lab with Josh Albrecht from Generally Intelligent
    Feb 23 2021

    Josh Albrecht is the co-founder and CTO of Generally Intelligent, an independent research lab investigating the fundamentals of learning across humans and machines. Previously, he was the lead data architect at Addepar, CTO of CloudFab, and CTO of Sourceress, which Generally Intelligent is a pivot from.

    Learn more about Josh:

    http://joshalbrecht.com/

    http://generallyintelligent.ai/

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter

    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/

    Timestamps:

    02:15 Introducing Josh Albrecht

    03:30 How Josh got started in computer science

    06:35 Josh's first two startup attempts

    09:15 The tech behind Sourceress, an AI recruiting platform

    16:10 Pivoting from Sourceress to Generally Intelligent, an AI research lab

    23:50 How Josh defines "general intelligence"

    28:35 Why Josh thinks self-supervised learning is the current most promising research area

    36:15 Generally Intelligent's immediate research roadmap: BYOL, simulated environments

    59:20 How Josh thinks about creating an optimal research environment

    01:11:35 The "why" behind starting an independent research lab

    01:13:30 AI alignment

    01:17:00 Rapid fire questions


    Links:

    Bootstrap your own latent: A new approach to self-supervised Learning

    Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL)

    BYOL works even without batch statistics

    Generally Intelligent Podcast

    Consequences of Misaligned AI

    Why We Sleep

    Peak

    Más Menos
    1 h y 25 m
  • Industrial Machine Learning and Building Tools for Data and Model Monitoring with Evidently AI Co-Founders Elena Samuylova and Emeli Dral
    Feb 16 2021

    Elena Samuylova and Emeli Dral are the co-founders of Evidently AI, where they build open source tools to analyze and monitor machine learning models. Elena was previously the head of the startup ecosystem at Yandex, director of business development at their data factory and chief product officer at Mechanica AI. Emeli was previously a data scientist at Yandex, chief data scientist at the data factory and Mechanica AI in addition to teaching machine learning both online and at multiple universities.

    Learn more about Elena, Emeli, and Evidently AI:

    https://evidentlyai.com/

    https://twitter.com/elenasamuylova

    https://twitter.com/EmeliDral

    Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter


    Follow Charlie on Twitter: https://twitter.com/CharlieYouAI

    Subscribe to ML Engineered: https://mlengineered.com/listen

    Comments? Questions? Submit them here: http://bit.ly/mle-survey

    Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/


    Timestamps:

    02:15 How Emeli and Elena each got started in data science

    07:10 Applying machine learning across a wide variety of industries at the Yandex Data Factory

    14:55 Using ML for industrial process improvement

    23:35 Challenges encountered in industrial ML and technical solutions

    27:15 The huge opportunity for ML in manufacturing

    34:35 How to ensure safety when using models in physical systems

    37:40 Why they started working on tools for data and ML monitoring

    42:50 Different kinds of data drift and how to address them

    48:25 Common mistakes ML teams make in monitoring

    55:25 Features of Evidently AI's library

    57:35 Building open source software

    01:02:25 Technical roadmap for Evidently

    01:05:50 Monitoring complex data

    01:08:50 Business roadmap for Evidently

    01:11:35 Rapid fire questions


    Links:

    Evidently on Github

    Evidently AI's Blog

    Thinking Fast and Slow

    Flow

    Doing Good Better

    Más Menos
    1 h y 21 m
adbl_web_global_use_to_activate_webcro805_stickypopup