התחל במצב לא מקוון עם האפליקציה Player FM !
#055 Embedding Intelligence: AI's Move to the Edge
Manage episode 500007836 series 3585930
Nicolay here,
while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.
Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.
His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.
Key Insight: The Real World Action Gap
LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.
This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.
Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.
💡 Core Concepts
Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification
ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio
Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains
⏱ Important Moments
Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control
Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations
Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information
Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands
8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression
On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems
🛠 Tools & Tech
Whisper: github.com/openai/whisper
Moonshine: github.com/usefulsensors/moonshine
TinyML Book: oreilly.com/library/view/tinyml/9781492052036
Stanford Edge ML: github.com/petewarden/stanford-edge-ml
📚 Resources
Looking to Listen Paper: looking-to-listen.github.io
Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635
Connect: [email protected] | petewarden.com | usefulsensors.com
Beta Opportunity: Moonshine browser implementation for client-side speech processing in
JavaScript
63 פרקים
Manage episode 500007836 series 3585930
Nicolay here,
while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.
Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.
His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.
Key Insight: The Real World Action Gap
LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.
This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.
Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.
💡 Core Concepts
Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification
ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio
Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains
⏱ Important Moments
Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control
Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations
Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information
Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands
8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression
On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems
🛠 Tools & Tech
Whisper: github.com/openai/whisper
Moonshine: github.com/usefulsensors/moonshine
TinyML Book: oreilly.com/library/view/tinyml/9781492052036
Stanford Edge ML: github.com/petewarden/stanford-edge-ml
📚 Resources
Looking to Listen Paper: looking-to-listen.github.io
Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635
Connect: [email protected] | petewarden.com | usefulsensors.com
Beta Opportunity: Moonshine browser implementation for client-side speech processing in
JavaScript
63 פרקים
כל הפרקים
×ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.