• Unboxing AI: The Podcast for Computer Vision Engineers

  • By: Unboxing AI
  • Podcast

Unboxing AI: The Podcast for Computer Vision Engineers  By  cover art

Unboxing AI: The Podcast for Computer Vision Engineers

By: Unboxing AI
  • Summary

  • I'm Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.
    Unboxing AI
    Show more Show less
Episodes
  • YOLO: Building AI with an Open-Source Community
    Apr 16 2023

    ABSTRACT
    Our guest this episode is Glenn Jocher, CEO and founder of Ultralytics, the company that brought you YOLO v5 and v8. Gil and Glenn discuss how to build an open-source community on Github, the history of YOLO and even particle physics. They also talk about the progress of AI, diffusion and transformer models and the importance of simulated synthetic data today. The first episode of season 2 is full of stimulating conversation to understand the applications of YOLO and the impact of open source on the AI community. TOPICS & TIMESTAMPS

    0:00 Introduction
    2:03 First Steps in Machine Learning
    9:40 Neutrino Particles and Simulating Neutrino Detectors
    14:18 Ultralytics
    17:36 Github
    21:09 History of YOLO
    25:28 YOLO for Keypoints
    29:00 Applications of YOLO
    30:48 Transformer and Diffusion Models for Detection
    35:00 Speed Bottleneck
    37:23 Simulated Synthetic Data Today
    42:08 Sentience of AGI and Progress of AI
    46:42 ChatGPT, CLIP and LLaMA Open Source Models
    50:04 Advice for Next Generation CV Engineers

    LINKS & RESOURCES

    Linkedin

    Twitter

    Google scholar 

    Ultralytics

    Github

    National Geospatial Intelligence Agency

    Neutrino

    Antineutrino

    Joseph Redmon

    Ali Farhadi

    Enrico Fermi

    Kashmir World Foundation

    R-CNN

    Fast R-CNN

    LLaMA model

    MS COCO

    GUEST BIO

    Glenn Jocher is currently the founder and CEO of Ultralytics, a company focused on enabling developers to create practical, real-time computer vision capabilities with a mission to make AI easy to develop. He has built one of the largest developer communities on GitHub in the machine learning space with over 50,000 stars for his YOLO v5 and YOLO v8 releases. This is one of the leading packages used for the development of edge device computer vision with a focus on object classification, detection, and segmentation at real-time speeds with limited compute resources. Glenn previously worked at the United States National Geospatial Intelligence Agency and published the first ever Global Antineutrino map. 

    ABOUT THE HOST:

    I’m Gil Elbaz, co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.


    Show more Show less
    53 mins
  • Synthetic Data: Simulation & Visual Effects at Scale
    Jan 4 2023

    ABSTRACT

    Gil Elbaz speaks with Tadas Baltrusaitis, who recently released the seminal paper DigiFace 1M: 1 Million Digital Face Images for Face Recognition. Tadas is a true believer in synthetic data and shares his deep knowledge of the subject along with  insights on the current state of the field and what CV engineers need to know. Join Gil as they discuss morphable models, multimodal learning, domain gaps, edge cases and more

    TOPICS & TIMESTAMPS

    0:00 Introduction

    2:06 Getting started in computer science

    3:40 Inferring mental states from facial expressions

    7:16 Challenges of facial expressions

    8:40 Open Face

    10:46 MATLAB to Python

    13:17 Multimodal Machine Learning

    15:52 Multimodals and Synthetic Data

    16:54 Morphable Models

    19:34 HoloLens

    22:07 Skill Sets for CV Engineers

    25:25 What is Synthetic Data?

    27:07 GANs and Diffusion Models

    31:24 Fake it Til You Make It

    35:25 Domain Gaps

    36:32 Long Tails (Edge Cases)

    39:42 Training vs. Testing

    41:53 Future of NeRF and Diffusion Models

    48:26 Avatars and VR/AR

    50:39 Advice for Next Generation CV Engineers

    51:58 Season One Wrap-Up

    LINKS & RESOURCES

    Tadas Baltrusaitis

    LinkedIn Github  Google Scholar

    Fake it Til You Make It

    Video 

    Github

    Digiface 1M

    A 3D Morphable Eye Region Model for Gaze Estimation
    Hololens

    Multimodal Machine Learning: A Survey and Taxonomy 

    3d face reconstruction with dense landmarks

    Open Face

    Open Face 2.0

    Dr. Rana el Kaliouby

    Dr. Louis-Philippe Morency

    Peter Robinson

    Jamie Shotton

    Errol Wood

    Affectiva

    GUEST BIO

    Tadas Baltrusaitis is a principal scientist working in the Microsoft Mixed Reality and AI lab in Cambridge, UK where he leads the human synthetics team. He recently co-authored the groundbreaking paper DigiFace 1M, a data set of 1 million synthetic images for facial recognition. Tadas is also the co-author of Fake It Till You Make It: Face Analysis in the Wild Using Synthetic Data Alone, among other outstanding papers. His PhD research focused on automatic facial expression analysis in  difficult real world settings and he was a postdoctoral associate at Carnegie Mellon University where his primary research lay in automatic understanding of human behavior, expressions and mental states using computer vision.

    ABOUT THE HOST

    I’m Gil Elbaz, co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.


    Show more Show less
    54 mins
  • SLAM and the Evolution of Spatial AI
    Nov 7 2022

    Host Gil Elbaz welcomes Andrew J. Davison, the father of SLAM. Andrew and Gil dive right into how SLAM has evolved and how it started. They speak about Spatial AI and what it means along with a discussion about global belief propagation. Of course, they talk about robotics, how it's impacted by new technologies like NeRF and what is the current state-of-the-art.

    Timestamps and Topics

    [00:00:00] Intro

    [00:02:07] Early Research Leading to SLAM

    [00:04:49] Why SLAM

    [00:08:20] Computer Vision Based SLAM

    [00:09:18] MonoSLAM Breakthrough

    [00:13:47] Applications of SLAM [00:16:27] Modern Versions of SLAM [00:21:50] Spatial AI [00:26:04] Implicit vs. Explicit Scene Representations [00:34:32] Impact on Robotics [00:38:46] Reinforcement Learning (RL) [00:43:10] Belief Propagation Algorithms for Parallel Compute [00:50:51] Connection to Cellular Automata [00:55:55] Recommendations for the Next Generation of Researchers

    Interesting Links:

    Andrew Blake

    Hugh Durrant-Whyte

    John Leonard

    Steven J. Lovegrove

    Alex  Mordvintsev

    Prof. David Murray

    Richard Newcombe

    Renato Salas-Moreno 

    Andrew Zisserman

    A visual introduction to Gaussian Belief Propagation

    Github: Gaussian Belief Propagation

    A Robot Web for Distributed Many-Device Localisation

    In-Place Scene Labelling and Understanding with Implicit Scene Representation

    Video 

    Video: Robotic manipulation of object using SOTA

    Andrew Reacting to NERF in 2020

    Cellular automata

    Neural cellular automata

    Dyson Robotics

    Guest Bio

    Andrew Davison is a professor of Robot Vision at the Department of Computing, Imperial College London. In addition, he is the director and founder of the Dyson robotics laboratory. Andrew pioneered the cornerstone algorithm - SLAM (Simultaneous Localisation and Mapping) and has continued to develop SLAM  in substantial ways since then. His research focus is in improving & enhancing SLAM in terms of dynamics, scale, detail level, efficiency and semantic understanding of real-time video. SLAM has evolved into a whole new domain of “Spatial AI” leveraging neural implicit representations and the suite of cutting-edge methods creating a full coherent representation of the real world from video.

    About the Host

    I'm Gil Elbaz, co-founder and CTO of Datagen. I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It's about much more than the technical processes. It's about people, journeys and ideas. Turn up the volume, insights inside.


    Show more Show less
    1 hr and 3 mins

What listeners say about Unboxing AI: The Podcast for Computer Vision Engineers

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.