Artwork

Player FM - Internet Radio Done Right

17 subscribers

Checked 11d ago
הוסף לפני three שנים
תוכן מסופק על ידי Lukas Biewald. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Lukas Biewald או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.
Player FM - אפליקציית פודקאסט
התחל במצב לא מקוון עם האפליקציה Player FM !
icon Daily Deals

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

55:32
 
שתפו
 

Manage episode 455956665 series 3011550
תוכן מסופק על ידי Lukas Biewald. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Lukas Biewald או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

  continue reading

121 פרקים

Artwork
iconשתפו
 
Manage episode 455956665 series 3011550
תוכן מסופק על ידי Lukas Biewald. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Lukas Biewald או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

  continue reading

121 פרקים

כל הפרקים

×
 
In this episode of Gradient Dissent, host Lukas Biewald speaks with Captain Jon Haase, United States Navy about real-world applications of AI and autonomy in defense. From underwater mine detection with autonomous vehicles to the ethics of lethal AI systems, this conversation dives into how the U.S. military is integrating AI into mission-critical operations — and why humans will always be at the center of warfighting. They explore the challenges of underwater autonomy, multi-agent collaboration, cybersecurity, and the growing role of large language models like Gemini and Claude in the defense space. Essential listening for anyone curious about military AI, defense tech, and the future of autonomous systems. ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, host Lukas Biewald sits down with João Moura, CEO & Founder of CrewAI, one of the leading platforms enabling AI agents for enterprise applications. Joe shares insights into how AI agents are being successfully deployed in over 40% of Fortune 500 companies, what tools these agents rely on, and how software companies are adapting to an agentic world. They also discuss: What defines a true AI agent versus simple automation How AI agents are transforming business processes in industries like finance, insurance, and software The evolving business models for APIs as AI agents become the dominant software users What the next breakthroughs in agentic AI might look like in 2025 and beyond If you're curious about the cutting edge of AI automation, enterprise AI adoption, and the real impact of multi-agent systems, this episode is packed with essential insights.…
 
In this episode of Gradient Dissent , host Lukas Biewald sits down with Mike Knoop , Co-founder and CEO of Ndea , a cutting-edge AI research lab. Mike shares his journey from building Zapier into a major automation platform to diving into the frontiers of AI research. They discuss DeepSeek’s R1, OpenAI’s O-series models, and the ARC Prize , a competition aimed at advancing AI’s reasoning capabilities. Mike explains how program synthesis and deep learning must merge to create true AGI , and why he believes AI reliability is the biggest hurdle for automation adoption. This conversation covers AGI timelines, research breakthroughs, and the future of intelligent systems , making it essential listening for AI enthusiasts, researchers, and entrepreneurs. Mentioned Show Notes: https://ndea.com https://arcprize.org/blog/r1-zero-r1-results-analysis https://arcprize.org/blog/oai-o3-pub-breakthrough 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Mike Knoop" @mikeknoop Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent , host Lukas Biewald sits down with David Cahn, partner at Sequoia Capital, for a compelling discussion on the dynamic world of AI investments. They dive into recent developments, including DeepSeek and Stargate, exploring their implications for the AI industry. Drawing from his articles, "AI's $200 Billion Question" and "AI's $600 Billion Question," David unpacks the financial challenges and opportunities surrounding AI infrastructure spending and the staggering revenue required to sustain these investments. Together, they examine the competitive strategies of cloud providers, the transformative impact of AI on business models, and predictions for the next wave of AI-driven growth. This episode offers an in-depth look at the crossroads of AI innovation and financial strategy. Mentioned Articles: AI’s $200B Question AI’s $600B Question 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with David Cahn: @DavidCahn6 Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent , Akshay Agrawal, Co-Founder of Marimo, joins host Lukas Biewald to discuss the future of collaborative AI development. They dive into how Marimo is enabling developers and researchers to collaborate seamlessly on AI projects, the challenges of scaling AI tools, and the importance of fostering open ecosystems for innovation. Akshay shares insights into building a platform that empowers teams to iterate faster and solve complex AI challenges together. Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs. They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development. 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Julian Green, Co-founder & CEO of Brightband, joins host Lukas Biewald to discuss how AI is transforming weather forecasting and climate solutions. They explore Brightband's innovative approach to using AI for extreme weather prediction, the shift from physics-based models to AI-driven forecasting, and the potential for democratizing weather data. Julian shares insights into building trust in AI for critical decisions, navigating the challenges of deep tech entrepreneurship, and the broader implications of AI in mitigating climate risks. This episode delves into the intersection of AI and Earth systems, highlighting its transformative impact on weather and climate decision-making. 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Julian Green: @juliangreensf Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Jonathan Siddharth, CEO & Co-Founder of Turing, joins host Lukas Biewald to discuss the path to AGI. They explore how Turing built a "developer cloud" of 3.7 million engineers to power AGI training, providing high-quality code and reasoning data to leading AI labs. Jonathan shares insights on Turing’s journey, from building coding datasets to solving enterprise AI challenges and enabling human-in-the-loop solutions. This episode offers a unique perspective on the intersection of human intelligence and AGI, with an eye on the expansion of new domains beyond coding. ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Jonathan Siddharth: https://www.linkedin.com/in/jonsid/ Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Guillermo Rauch, CEO & Founder of Vercel, joins host Lukas Biewald for a wide ranging discussion on how AI is changing web development and front end engineering. They discuss how Vercel’s v0 expert AI agent is generating code and UI based on simple ChatGPT-like prompts, the importance of releasing daily for AI applications, and the changing landscape of frontier model performance between open and closed models. Listen on Apple Podcasts: http://wandb.me/apple-podcasts Listen on Spotify: http://wandb.me/spotify Subscribe to Weights & Biases: https://bit.ly/45BCkYz Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Guillermo Rauch: https://www.linkedin.com/in/rauchg/ https://x.com/rauchg Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Snowflake CEO Sridhar Ramaswamy joins host Lukas Biewald to explore how AI is transforming enterprise data strategies. They discuss Sridhar's journey from Google to Snowflake, diving into the evolving role of foundation models, Snowflake’s AI strategy, and the challenges of scaling AI in business. Sridhar also shares his thoughts on leadership, rapid iteration, and creating meaningful AI solutions for enterprise clients. Tune in to discover how Snowflake is driving innovation in the AI and data space. Connect with Sridhar Ramaswamy: https://www.linkedin.com/in/sridhar-ramaswamy/ Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Erik Bernhardsson, CEO & Founder of Modal Labs, joins host Lukas Biewald to discuss the future of machine learning infrastructure. They explore how Modal is enhancing the developer experience, handling large-scale GPU workloads, and simplifying cloud execution for data teams. If you’re into AI, data pipelines, or building robust ML systems, this episode is packed with valuable insights! 🎙 *Listen on Apple Podcasts*: http://wandb.me/apple-podcasts 🎙 *Listen on Spotify*: http://wandb.me/spotify ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Erik Bernhardsson: https://www.linkedin.com/in/erikbern/ https://x.com/bernhardsson Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Howie Lou, CEO of Airtable, joins host Lukas Biewald to dive into Airtable's transformation from a no-code app builder to a platform capable of supporting complex AI-driven workflows. They discuss the strategic decisions that propelled Airtable's growth, the challenges of scaling AI in enterprise settings, and the future of AI in business operations. Discover how Airtable is reshaping digital transformation and why flexibility and innovation are key in today's tech landscape. Tune in now to learn about the evolving role of AI in business and product development. 🎙 *Listen on Apple Podcasts*: http://wandb.me/apple-podcasts 🎙 *Listen on Spotify*: http://wandb.me/spotify ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Howie Liu: https://www.linkedin.com/in/howieliu/ https://x.com/howietl Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Andrew Feldman, CEO of Cerebras Systems, joins host Lukas Biewald to discuss the latest advancements in AI inference technology. They explore Cerebras Systems' groundbreaking new AI inference product, examining how their wafer-scale chips are setting new benchmarks in speed, accuracy, and cost efficiency. Andrew shares insights on the architectural innovations that make this possible and discusses the broader implications for AI workloads in production. This episode provides a comprehensive look at the cutting-edge of AI hardware and its impact on the future of machine learning. ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz 🎙 Get our podcasts on these platforms: Apple Podcasts: http://wandb.me/apple-podcasts Spotify: http://wandb.me/spotify Google: http://wandb.me/gd_google YouTube: http://wandb.me/youtube Connect with Andrew Feldman: https://www.linkedin.com/in/andrewdfeldman/ Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3 Paper Andrew referenced Paul David- Economic historian https://www.jstor.org/stable/2006600…
 
In this episode of Gradient Dissent, Kanjun Qiu, CEO and Co-founder of Imbue, joins host Lukas Biewald to discuss how AI agents are transforming code generation and software development. Discover the potential impact and challenges of creating autonomous AI systems that can write and verify code and and learn about the practical research involved. ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz Connect with Kanjun Qiu: https://www.linkedin.com/in/kanjun/ https://x.com/kanjun General Intelligent Podcast: https://imbue.com/podcast/ Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3…
 
In this episode of Gradient Dissent, Stephen Balaban, CEO of Lambda Labs, joins host Lukas Biewald to discuss the journey of scaling Lambda Labs to an impressive $400M in revenue. They explore the pivotal moments that shaped the company, the future of GPU technology, and the impact of AI data centers on the energy grid. Discover the challenges and triumphs of running a successful hardware and cloud business in the AI industry. Tune in now to explore the evolving landscape of AI hardware and cloud services. ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz Connect with Stephen Balaban: https://www.linkedin.com/in/sbalaban/ https://x.com/stephenbalaban Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb…
 
Loading …

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

 

icon Daily Deals
icon Daily Deals
icon Daily Deals

מדריך עזר מהיר

האזן לתוכנית הזו בזמן שאתה חוקר
הפעלה