903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

Super Data Science: ML & AI Podcast with Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

Player FM - Internet Radio Done Right

53 subscribers

เพิ่มแล้วเมื่อ sixปีที่ผ่านมา

תוכן מסופק על ידי Jon Krohn. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Jon Krohn או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Our Skin: A Personal Discovery Podcast

1
You Are Your Longest Relationship: Artist DaQuane Cherry on Psoriasis, Art, and Self-Care 32:12

לפני 27 ימים32:12

הפעל מאוחר יותר

רשימות

לייק

אהבתי

32:12

DaQuane Cherry was once the kid who wore a hoodie to hide skin flare-ups in school. Now he’s an artist and advocate helping others feel seen. He reflects on his psoriasis journey, the power of small joys, and why loving yourself first isn’t a cliché—it’s essential. Plus, a deep dive into the history of La Roche-Posay’s legendary spring. See omnystudio.com/listener for privacy information.…

לפני 21 ימים 1:28:20

MP3•בית הפרקים

Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠

This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠.

Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

In this episode you will learn:

(16:48) Sinan’s new podcast, Practically Intelligent
(21:54) What to know about the limits of AI benchmarking
(53:22) Alternatives to AI benchmarks
(1:01:23) The difficulties in getting a model to recognize its mistakes

987 פרקים

#Data Science #Science #Tech #Software Development #Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

Super Data Science: ML & AI Podcast with Jon Krohn

53 subscribers

published לפני 21 ימים

שתפו

MP3•בית הפרקים

Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠

This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠.

Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

In this episode you will learn:

(16:48) Sinan’s new podcast, Practically Intelligent
(21:54) What to know about the limits of AI benchmarking
(53:22) Alternatives to AI benchmarks
(1:01:23) The difficulties in getting a model to recognize its mistakes

987 פרקים

#Data Science #Science #Tech #Software Development #Jon Krohn

כל הפרקים

Super Data Science: ML & AI Podcast with Jon Krohn

1
909: Causal AI, with Dr. Robert Usazuwa Ness 1:22:27

לפני 10 hours1:22:27

1:22:27

Researcher at Microsoft Robert Usazuwa Ness talks to Jon Krohn about how to achieve causality in AI with correlation-based learning, the right libraries, and handling statistical inference. When dealing with causal AI, Robert notes how important it is to keep aware of variables in the data that may mislead us and force inaccurate assumptions. Not all variables will be useful. It is essential, then, that any assumptions are grounded in a deeper understanding of how the data were gathered, and not what appears in the dataset. Listen to the episode to hear how you can apply causal AI to your projects. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠ www.superdatascience.com/907⁠⁠⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS and by the Dell AI Factory with NVIDIA . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
908: AI Agents Blackmail Humans 96% of the Time (Agentic Misalignment) 8:50

לפני 4 ימים8:50

8:50

The moral and ethical implications of letting AI take the wheel in business, as revealed by Anthropic: Jon Krohn looks into Anthropic’s latest research on how to use and deploy LLMs safely, specifically in business environments. The team designed scenarios to test the behavior of AI agents when given a goal and a set of obstacles to reach it. Those obstacles included 1) threats to the AI’s continued operation, and 2) conflict between the AI’s goals and the goals of the company. Hear Jon break down the results of this research in this Five-Minute Friday. Additional materials: ⁠⁠⁠⁠⁠⁠ ⁠www.superdatascience.com/908⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
907: Neuroscience, AI and the Limitations of LLMs, with Dr. Zohar Bronfman 1:21:16

לפני 7 ימים1:21:16

1:21:16

“Intelligence has many forms,” says Zohar Bronfman, who speaks with Jon Krohn about the fascinating intersection between computational neuroscience and philosophy, and how it has brought him closer to understanding what is necessary to develop human-like intelligence in machines, as well as his motivations for launching Pecan AI and why predictive models outstrip generative models in business. Additional materials: ⁠⁠⁠⁠⁠⁠⁠ www.superdatascience.com/907⁠⁠⁠ This episode is brought to you ⁠⁠⁠ by, ⁠⁠⁠⁠ Adverity, the conversational analytics platform ⁠⁠⁠⁠ and by the ⁠⁠⁠⁠ Dell AI Factory with NVIDIA ⁠⁠⁠⁠ . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (03:47) Why LLMs aren’t bringing us closer to AGI (33:44) About Pecan AI (51:03) Why data modeling is so challenging (1:01:25) How Pecan AI makes its tools widely accessible…

Super Data Science: ML & AI Podcast with Jon Krohn

1
906: How Prof. Jason Corso Solved Computer Vision’s Data Problem 29:29

לפני 11 ימים29:29

29:29

Jason Corso speaks to Jon Krohn in this Five-Minute Friday all about Voxel51’s latest tool, Verified Auto-Labelling, and the company’s incredible success in developing popular tools for computer vision. Additional materials: ⁠⁠⁠⁠⁠⁠ ⁠www.superdatascience.com/906⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
905: Why RAG Makes LLMs Less Safe (And How to Fix It), with Bloomberg’s Dr. Sebastian Gehrmann 57:49

לפני 14 ימים57:49

57:49

RAG LLMs are not safer: Sebastian Gehrmann speaks to Jon Krohn about his latest research into how retrieval-augmented generation (RAG) actually makes LLMs less safe, the three ‘H’s for gauging the effectivity and value of a RAG, and the custom guardrails and procedures we need to use to ensure our RAG is fit-for-purpose and secure. This is a great episode for anyone who wants to know how to work with RAG in the context of LLMs, as you’ll hear how to select the best model for purpose, useful approaches and taxonomies to keep your projects secure, and which models he finds safest when RAG is applied. Additional materials: ⁠⁠⁠⁠ ⁠⁠www.superdatascience.com/905⁠⁠ This episode is brought to you ⁠ by, ⁠⁠⁠ Adverity, the conversational analytics platform ⁠⁠⁠ and by the ⁠⁠⁠ Dell AI Factory with NVIDIA ⁠⁠⁠ . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (03:28) Findings from the paper “RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models” (09:35) What attack surfaces are in the context of AI (38:51) Small versus large models with RAG (46:27) How to select an LLM with safety in mind…

Super Data Science: ML & AI Podcast with Jon Krohn

1
904: A.I. is Disrupting the Entire Advertising Industry 9:14

לפני 18 ימים9:14

9:14

In this Five-Minute Friday, Jon Krohn reveals how AI is taking on the glitzy world of advertising. Bold claims from Meta and OpenAI contend that users will soon be able to plug in what they want and have AI churn out an ad campaign for little to no cost are shaking the advertising industry to its core. The fact that the four biggest sellers of ads (Google, Meta, Amazon, and ByteDance) are digital companies and accounted for over half of the global market in 2024 adds salt to the wound. Hear the three ways that AI is disrupting the industry, and who (or what) has the most influence on digital consumers to date. Additional materials: ⁠⁠⁠⁠⁠⁠ www.superdatascience.com/904 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir 1:28:20

לפני 21 ימים1:28:20

1:28:20

Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models. Additional materials: ⁠⁠⁠⁠ ⁠www.superdatascience.com/903⁠ ⁠⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠ Adverity, the conversational analytics platform ⁠⁠ and by the ⁠⁠ Dell AI Factory with NVIDIA ⁠⁠ . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (16:48) Sinan’s new podcast, Practically Intelligent (21:54) What to know about the limits of AI benchmarking (53:22) Alternatives to AI benchmarks (1:01:23) The difficulties in getting a model to recognize its mistakes…

Super Data Science: ML & AI Podcast with Jon Krohn

1
902: In Case You Missed It in June 2025 29:29

לפני 25 ימים29:29

29:29

In this episode of “In Case You Missed It”, Jon recaps his June interviews on The SuperDataScience Podcast . Hear from Diane Hare, Avery Smith, Kirill Eremenko, and Shaun Johnson as they talk about the best portfolios for AI practitioners, how to stand out in a saturated candidate market for AI roles, how to tell when an AI startup is going places, and ways to lead AI change in business. Additional materials: ⁠ ⁠⁠www.superdatascience.com/902 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
901: Automating Legal Work with Data-Centric ML (feat. Lilith Bat-Leah) 1:06:12

לפני 28 ימים1:06:12

1:06:12

Senior Director of AI Labs for Epiq Lilith Bat-Leah speaks to Jon Krohn about the ways AI have disrupted the legal industry using LLMs and retrieval-augmented generation (RAG), as well as how the data-centric machine learning research movement (DMLR) is systematically improving data quality, and why that is so important. Additional materials: ⁠⁠⁠⁠ ⁠www.superdatascience.com/901⁠ ⁠⁠⁠ This episode is brought to you by the ⁠⁠ Dell AI Factory with NVIDIA ⁠⁠ and Adverity, the conversational analytics platform ⁠⁠⁠ . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (05:45) Deciphering legal tech terms (TAR, e-discovery) (13:47) How legal firms use data and AI (29:01) All about data-centric machine learning research (DMLR) (46:58) Lilith’s career journey in the AI industry…

Super Data Science: ML & AI Podcast with Jon Krohn

1
900: 95-Year-Old Annie on How to Stay Healthy and Happy 15:06

לפני 5 weeks15:06

15:06

“Stay happy and healthy”: In this special Five-Minute Friday, Jon Krohn speaks with Annie, his grandmother, on her 95th birthday. Hear how she is physically and mentally coping with illnesses that limit her mobility and the joys of having a pet. Additional materials: ⁠⁠⁠⁠ ⁠⁠www.superdatascience.com/900⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
899: Landing $200k+ AI Roles: Real Cases from the SuperDataScience Community, with Kirill Eremenko 1:33:12

לפני 5 weeks1:33:12

1:33:12

Data science skills, a data science bootcamp, and why Python and SQL still reign supreme: In this episode, Kirill Eremenko returns to the podcast to speak to Jon Krohn about SuperDataScience subscriber success stories, where to focus in a field that is evolving incredibly quickly, and why in-person working and networking might give you the edge over other candidates in landing a top AI role. Additional materials: ⁠⁠⁠⁠ www.superdatascience.com/899 ⁠⁠⁠ This episode is brought to you by ⁠ Adverity, the conversational analytics platform ⁠ and by the ⁠ Dell AI Factory with NVIDIA ⁠ . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (04:35) Stories from five SuperDataScience subscribers (27:32) How to secure a career in a fast-paced industry (44:19) How to stand out against huge competition in data science (1:01:40) The importance of communication in data science (1:16:41) Where to focus your skills in AI engineering…

Super Data Science: ML & AI Podcast with Jon Krohn

1
898: My Four-Hour Agentic AI Workshop is Live and 100% Free 5:06

לפני 6 weeks5:06

5:06

In this Five-Minute Friday, Jon Krohn announces his new, free workshop on Agentic AI. On this four-hour comprehensive course, you’ll learn the key terminology for working with these flexible, multi-agent systems and then get to grips with developing and deploying this artificial “team of experts” for all your AI-driven projects. Additional materials: ⁠⁠⁠⁠ ⁠www.superdatascience.com/898⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
897: How to Enable Enterprise AI Transformation, with Strategy Consultant Diane Hare 1:02:42

לפני 6 weeks1:02:42

1:02:42

Diane Hare talks to Jon Krohn about the power of storytelling for corporate buy-in of AI initiatives, how to actively implement AI to transform organizations, and how emerging professionals can upskill themselves. Hear how she discovered her background in storytelling at Ernst & Young and her work with Simon Sinek, which she finds to be integral to her process. Inspired by Sinek’s aphorism “start with why”, Diane notes that many companies neglect this crucial part of their mission because they never take the time to work on it. Additional materials: ⁠⁠⁠ www.superdatascience.com/897 ⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS , by Adverity, the conversational analytics platform and by the Dell AI Factory with NVIDIA . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (04:51) How Y Carrot works with BizLove (14:19) How BizLove prioritizes change management (29:18) How to upskill effectively (42:37) How BizLove integrated data from two enterprises (48:52) How to enable change in your business…

Super Data Science: ML & AI Podcast with Jon Krohn

1
896: AI (Probably) Isn’t Taking Your Job (At Least Anytime Soon) 7:51

לפני 7 weeks7:51

7:51

The Economist reported that global Google searches for "AI unemployment" hit an all-time high earlier this year. But do we have to worry about AI taking our jobs? In this week’s Five-Minute Friday, Jon Krohn investigates whether the rise of AI has directly led to an increase in unemployment. Additional materials: ⁠⁠⁠⁠ www.superdatascience.com/896 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.…

Super Data Science: ML & AI Podcast with Jon Krohn

1
895: The Future of Enterprise AI: Investor Shaun Johnson Reveals What Actually Works 1:16:26

לפני 7 weeks1:16:26

1:16:26

How to get funded by a VC specializing in AI: Head of AIX Ventures Shaun Johnson talks to Jon Krohn about investment strategies, how to simplify AI adoption, why a little competition can be so beneficial to AI startups, and how Big Tech is circumventing anti-monopoly measures. Additional materials: ⁠ ⁠www.superdatascience.com/895⁠ This episode is brought to you by the ⁠⁠ Dell AI Factory with NVIDIA and by ⁠ Adverity, the conversational analytics platform . Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (10:36) What Shaun looks for when evaluating early-stage AI startups (19:11) Building out AI startups (41:44) How AI practitioners can future-proof their careers (45:27) How to measure AI impact (53:30) The key verticals ripe for AI disruption…

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

תקשיבו ל-500+ נושאים

53 subscribers

דומה לSuper Data Science: ML & AI Podcast with Jon Krohn

Amazon eGift Card - Bright Balloons (Animated)

2025 Topps Series 1 Baseball - Factory Sealed - Value Box

Apple 2025 MacBook Air 13-inch Laptop with M4 chip: Built for Apple Intelligence, 13.6-inch Liquid Retina Display, 16GB Unified Memory, 256GB SSD Storage, 12MP Center Stage Camera, Touch ID; Midnight

פודקאסטים ששווה להאזין

Super Data Science: ML & AI Podcast with Jon Krohn « » 903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

פודקאסטים ששווה להאזין

ברוכים הבאים אל Player FM!

Minecraft

TERRO Ant Killer Bait Stations T300B - Liquid Bait to Eliminate Ants - 12 Count Stations for Effective Indoor Ant Control

Amazon Basics Multipurpose Copy Printer Paper, 8.5 x 11 Inches, 20 lb, 1 Ream, (500 Sheets), 92 Bright, White

Golden

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

דומה לSuper Data Science: ML & AI Podcast with Jon Krohn

מדריך עזר מהיר

Super Data Science: ML & AI Podcast with Jon Krohn « »
903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir