Sveriges 100 mest populära podcasts

Vanishing Gradients

Vanishing Gradients

A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson. It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.

Prenumerera

iTunes / Overcast / RSS

Webbplats

vanishinggradients.fireside.fm

Avsnitt

Episode 1: Introducing Vanishing Gradients

In this brief introduction, Hugo introduces the rationale behind launching a new data science podcast and gets excited about his upcoming guests: Jeremy Howard, Rachael Tatman, and Heather Nolis! Original music, bleeps, and blops by local Sydney legend PlaneFace (https://planeface.bandcamp.com/album/fishing-from-an-asteroid)!
2022-02-16
Länk till avsnitt

Episode 24: LLM and GenAI Accessibility

Hugo speaks with Johno Whitaker, a Data Scientist/AI Researcher doing R&D with answer.ai. His current focus is on generative AI, flitting between different modalities. He also likes teaching and making courses, having worked with both Hugging Face and fast.ai in these capacities. Johno recently reminded Hugo how hard everything was 10 years ago: ?Want to install TensorFlow? Good luck. Need data? Perhaps try ImageNet. But now you can use big models from Hugging Face with hi-res satellite data and do all of this in a Colab notebook. Or think ecology and vision models? or medicine and multimodal models!? We talk about where we?ve come from regarding tooling and accessibility for foundation models, ML, and AI, where we are, and where we?re going. We?ll delve into What the Generative AI mindset is, in terms of using atomic building blocks, and how it evolved from both the data science and ML mindsets; How fast.ai democratized access to deep learning, what successes they had, and what was learned; The moving parts now required to make GenAI and ML as accessible as possible; The importance of focusing on UX and the application in the world of generative AI and foundation models; The skillset and toolkit needed to be an LLM and AI guru; What they?re up to at answer.ai to democratize LLMs and foundation models. LINKS The livestream on YouTube (https://youtube.com/live/hxZX6fBi-W8?feature=share) Zindi, the largest professional network for data scientists in Africa (https://zindi.africa/) A new old kind of R&D lab: Announcing Answer.AI (http://www.answer.ai/posts/2023-12-12-launch.html) Why and how I?m shifting focus to LLMs by Johno Whitaker (https://johnowhitaker.dev/dsc/2023-07-01-why-and-how-im-shifting-focus-to-llms.html) Applying AI to Immune Cell Networks by Rachel Thomas (https://www.fast.ai/posts/2024-01-23-cytokines/) Replicate -- a cool place to explore GenAI models, among other things (https://replicate.com/explore) Hands-On Generative AI with Transformers and Diffusion Models (https://www.oreilly.com/library/view/hands-on-generative-ai/9781098149239/) Johno on Twitter (https://twitter.com/johnowhitaker) Hugo on Twitter (https://twitter.com/hugobowne) Vanishing Gradients on Twitter (https://twitter.com/vanishingdata) SciPy 2024 CFP (https://www.scipy2024.scipy.org/#CFP) Escaping Generative AI Walled Gardens with Omoju Miller, a Vanishing Gradients Livestream (https://lu.ma/xonnjqe4)
2024-02-27
Länk till avsnitt

Episode 23: Statistical and Algorithmic Thinking in the AI Age

Hugo speaks with Allen Downey, a curriculum designer at Brilliant, Professor Emeritus at Olin College, and the author of Think Python, Think Bayes, Think Stats, and other computer science and data science books. In 2019-20 he was a Visiting Professor at Harvard University. He previously taught at Wellesley College and Colby College and was a Visiting Scientist at Google. He is also the author of the upcoming book Probably Overthinking It! They discuss Allen's new book and the key statistical and data skills we all need to navigate an increasingly data-driven and algorithmic world. The goal was to dive deep into the statistical paradoxes and fallacies that get in the way of using data to make informed decisions. For example, when it was reported in 2021 that ?in the United Kingdom, 70-plus percent of the people who die now from COVID are fully vaccinated,? this was correct but the implication was entirely wrong. Their conversation jumps into many such concrete examples to get to the bottom of using data for more than ?lies, damned lies, and statistics.? They cover Information and misinformation around pandemics and the base rate fallacy; The tools we need to comprehend the small probabilities of high-risk events such as stock market crashes, earthquakes, and more; The many definitions of algorithmic fairness, why they can't all be met at once, and what we can do about it; Public health, the need for robust causal inference, and variations on Berkson?s paradox, such as the low-birthweight paradox: an influential paper found that that the mortality rate for children of smokers is lower for low-birthweight babies; Why none of us are normal in any sense of the word, both in physical and psychological measurements; The Inspection paradox, which shows up in the criminal justice system and distorts our perception of prison sentences and the risk of repeat offenders. LINKS The livestream on YouTube (https://youtube.com/live/G8LulD72kzs?feature=share) Allen Downey on Github (https://github.com/AllenDowney) Allen's new book Probably Overthinking It! (https://greenteapress.com/wp/probably-overthinking-it/) Allen on Twitter (https://twitter.com/AllenDowney) Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions by Mitchell et al. (https://arxiv.org/abs/1811.07867)
2023-12-20
Länk till avsnitt

Episode 22: LLMs, OpenAI, and the Existential Crisis for Machine Learning Engineering

Jeremy Howard (Fast.ai), Shreya Shankar (UC Berkeley), and Hamel Husain (Parlance Labs) join Hugo Bowne-Anderson to talk about how LLMs and OpenAI are changing the worlds of data science, machine learning, and machine learning engineering. Jeremy Howard (https://twitter.com/jeremyphoward) is co-founder of fast.ai, an ex-Chief Scientist at Kaggle, and creator of the ULMFiT approach on which all modern language models are based. Shreya Shankar (https://twitter.com/sh_reya) is at UC Berkeley, ex Google brain, Facebook, and Viaduct. Hamel Husain (https://twitter.com/HamelHusain) has his own generative AI and LLM consultancy Parlance Labs (https://parlance-labs.com/) and was previously at Outerbounds, Github, and Airbnb. They talk about How LLMs shift the nature of the work we do in DS and ML, How they change the tools we use, The ways in which they could displace the role of traditional ML (e.g. will we stop using xgboost any time soon?), How to navigate all the new tools and techniques, The trade-offs between open and closed models, Reactions to the recent Open Developer Day and the increasing existential crisis for ML. LINKS The panel on YouTube (https://youtube.com/live/MTJHvgJtynU?feature=share) Hugo and Jeremy's upcoming livestream on what the hell happened recently at OpenAI, among many other things (https://lu.ma/byxyzfrr?utm_source=vg) Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Vanishing Gradients on twitter (https://twitter.com/VanishingData)
2023-11-27
Länk till avsnitt

Episode 21: Deploying LLMs in Production: Lessons Learned

Hugo speaks with Hamel Husain, a machine learning engineer who loves building machine learning infrastructure and tools ?. Hamel leads and contributes to many popular open-source machine learning projects. He also has extensive experience (20+ years) as a machine learning engineer across various industries, including large tech companies like Airbnb and GitHub. At GitHub, he led CodeSearchNet (https://github.com/github/CodeSearchNet), a large language model for semantic search that was a precursor to CoPilot. Hamel is the founder of Parlance-Labs (https://parlance-labs.com/), a research and consultancy focused on LLMs. They talk about generative AI, large language models, the business value they can generate, and how to get started. They delve into Where Hamel is seeing the most business interest in LLMs (spoiler: the answer isn?t only tech); Common misconceptions about LLMs; The skills you need to work with LLMs and GenAI models; Tools and techniques, such as fine-tuning, RAGs, LoRA, hardware, and more! Vendor APIs vs OSS models. LINKS Our upcoming livestream LLMs, OpenAI Dev Day, and the Existential Crisis for Machine Learning Engineering with Jeremy Howard (Fast.ai), Shreya Shankar (UC Berkeley), and Hamel Husain (Parlance Labs): Sign up for free! (https://lu.ma/m81oepqe/utm_source=vghh) Our recent livestream Data and DevOps Tools for Evaluating and Productionizing LLMs (https://youtube.com/live/B_DMMlDuJB0) with Hamel and Emil Sedgh, Lead AI engineer at Rechat -- in it, we showcase an actual industrial use case that Hamel and Emil are working on with Rechat, a real estate CRM, taking you through LLM workflows and tools. Extended Guide: Instruction-tune Llama 2 (https://www.philschmid.de/instruction-tune-llama-2) by Philipp Schmid The livestream recoding of this episode! (https://youtube.com/live/l7jJhL9geZQ?feature=share) Hamel on twitter (https://twitter.com/HamelHusain)
2023-11-14
Länk till avsnitt

Episode 20: Data Science: Past, Present, and Future

Hugo speaks with Chris Wiggins (Columbia, NYTimes) and Matthew Jones (Princeton) about their recent book How Data Happened, and the Columbia course it expands upon, data: past, present, and future. Chris is an associate professor of applied mathematics at Columbia University and the New York Times? chief data scientist, and Matthew is a professor of history at Princeton University and former Guggenheim Fellow. From facial recognition to automated decision systems that inform who gets loans and who receives bail, we all now move through a world determined by data-empowered algorithms. These technologies didn?t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search. DJ Patil, former U.S. Chief Data Scientist, said of the book "This is the first comprehensive look at the history of data and how power has played a critical role in shaping the history. It?s a must read for any data scientist about how we got here and what we need to do to ensure that data works for everyone." If you?re a data scientist, machine learning engineer, or work with data in any way, it?s increasingly important to know more about the history and future of the work that you do and understand how your work impacts society and the world. Among other things, they'll delve into * the history of human use of data; * how data are used to reveal insight and support decisions; * how data and data-powered algorithms shape, constrain, and manipulate our commercial, civic, and personal transactions and experiences; and * how exploration and analysis of data have become part of our logic and rhetoric of communication and persuasion. You can also sign up for our next livestreamed podcast recording here (https://www.eventbrite.com/e/data-science-past-present-and-future-tickets-695643357007?aff=kjvg)! LINKS How Data Happened, the book! (https://wwnorton.com/books/how-data-happened) data: past, present, and future, the course (https://data-ppf.github.io/) Race After Technology, by Ruha Benjamin (https://www.ruhabenjamin.com/race-after-technology) The problem with metrics is a big problem for AI by Rachel Thomas (https://www.ruhabenjamin.com/race-after-technology) Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA)
2023-10-05
Länk till avsnitt

Episode 19: Privacy and Security in Data Science and Machine Learning

Hugo speaks with Katharine Jarmul about privacy and security in data science and machine learning. Katharine is a Principal Data Scientist at Thoughtworks Germany focusing on privacy, ethics, and security for data science workflows. Previously, she has held numerous roles at large companies and startups in the US and Germany, implementing data processing and machine learning systems with a focus on reliability, testability, privacy, and security. In this episode, Hugo and Katharine talk about What data privacy and security are, what they aren?t and the differences between them (hopefully dispelling common misconceptions along the way!); Why you should care about them (hint: the answers will involve regulatory, ethical, risk, and organizational concerns); Data governance, anonymization techniques, and privacy in data pipelines; Privacy attacks! The state of the art in privacy-aware machine learning and data science, including federated learning; What you need to know about the current state of regulation, including GDPR and CCPA? And much more, all the while grounding our conversation in real-world examples from data science, machine learning, business, and life! You can also sign up for our next livestreamed podcast recording here (https://lu.ma/4b5xalpz)! LINKS Win a copy of Practical Data Privacy, Katharine's new book! (https://forms.gle/wkF92vyvjfZLM6qt8) Katharine on twitter (https://twitter.com/kjam) Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Probably Private, a newsletter for privacy and data science enthusiasts (https://probablyprivate.com/) Probably Private on YouTube (https://www.youtube.com/@ProbablyPrivate)
2023-08-14
Länk till avsnitt

Episode 18: Research Data Science in Biotech

Hugo speaks with Eric Ma about Research Data Science in Biotech. Eric leads the Research team in the Data Science and Artificial Intelligence group at Moderna Therapeutics. Prior to that, he was part of a special ops data science team at the Novartis Institutes for Biomedical Research's Informatics department. In this episode, Hugo and Eric talk about What tools and techniques they use for drug discovery (such as mRNA vaccines and medicines); The importance of machine learning, deep learning, and Bayesian inference; How to think more generally about such high-dimensional, multi-objective optimization problems; The importance of open-source software and Python; Institutional and cultural questions, including hiring and the trade-offs between being an individual contributor and a manager; How they?re approaching accelerating discovery science to the speed of thought using computation, data science, statistics, and ML. And as always, much, much more! LINKS Eric's website (https://ericmjl.github.io/) Eric on twitter (https://twitter.com/ericmjl) Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Cell Biology by the Numbers by Ron Milo and Rob Phillips (http://book.bionumbers.org/) Eric's JAX tutorials at PyCon (https://youtu.be/ztthQJQFe20) and SciPy (https://youtu.be/DmR36wtel4Y) Eric's blog post on Hiring data scientists at Moderna! (https://ericmjl.github.io/blog/2021/8/26/hiring-data-scientists-at-moderna-2021/)
2023-05-25
Länk till avsnitt

Episode 17: End-to-End Data Science

Hugo speaks with Tanya Cashorali, a data scientist and consultant that helps businesses get the most out of data, about what end-to-end data science looks like across many industries, such as retail, defense, biotech, and sports, including scoping out projects, figuring out the correct questions to ask, how projects can change, delivering on the promise, the importance of rapid prototyping, what it means to put models in production, and how to measure success. And much more, all the while grounding their conversation in real-world examples from data science, business, and life. In a world where most organizations think they need AI and yet 10-15% of data science actually involves model building, it?s time to get real about how data science and machine learning actually deliver value! LINKS Tanya on Twitter (https://twitter.com/tanyacash21) Vanishing Gradients on YouTube (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Saving millions with a Shiny app | Data Science Hangout with Tanya Cashorali (https://youtu.be/qdAroyFRFCg) Our next livestream: Research Data Science in Biotech with Eric Ma (https://www.eventbrite.com/e/research-data-science-in-biotech-tickets-550400882857?aff=fs)
2023-02-17
Länk till avsnitt

Episode 16: Data Science and Decision Making Under Uncertainty

Hugo speaks with JD Long, agricultural economist, quant, and stochastic modeler, about decision making under uncertainty and how we can use our knowledge of risk, uncertainty, probabilistic thinking, causal inference, and more to help us use data science and machine learning to make better decisions in an uncertain world. This is part 2 of a two part conversation in which we delve into decision making under uncertainty. Feel free to check out part 1 here (https://vanishinggradients.fireside.fm/15) but this episode should also stand alone. Why am I speaking to JD about all of this? Because not only is he a wild conversationalist with a real knack for explaining hard to grok concepts with illustrative examples and useful stories, but he has worked for many years in re-insurance, that?s right, not insurance but re-insurance ? these are the people who insure the insurers so if anyone can actually tell us about risk and uncertainty in decision making, it?s him! In part 1, we discussed risk, uncertainty, probabilistic thinking, and simulation, all with a view towards improving decision making. In this, part 2, we discuss the ins and outs of decision making under uncertainty, including How data science can be more tightly coupled with the decision function in organisations; Some common mistakes and failure modes of making decisions under uncertainty; Heuristics for principled decision-making in data science; The intersection of model building, storytelling, and cognitive biases to keep in mind; As JD says, and I paraphrase, ?You may think you train your models, but your models are really training you.? Links Vanishing Gradients' new YouTube channel! (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) JD on twitter (https://twitter.com/CMastication) Executive Data Science, episode 5 of Vanishing Gradients, in which Jim Savage and Hugo talk through decision making and why you should always be integrating your loss function over your posterior (https://vanishinggradients.fireside.fm/5) Fooled by Randomness by Nassim Taleb (https://en.wikipedia.org/wiki/Fooled_by_Randomness) Superforecasting: The Art and Science of Prediction Philip E. Tetlock and Dan Gardner (https://en.wikipedia.org/wiki/Superforecasting:_The_Art_and_Science_of_Prediction) Thinking in Bets by Annie Duke (https://www.penguin.com.au/books/thinking-in-bets-9780735216372) The Signal and the Noise: Why So Many Predictions Fail by Nate Silver (https://en.wikipedia.org/wiki/The_Signal_and_the_Noise) Thinking, Fast and Slow by Daniel Kahneman (https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow)
2022-12-14
Länk till avsnitt

Episode 15: Uncertainty, Risk, and Simulation in Data Science

Hugo speaks with JD Long, agricultural economist, quant, and stochastic modeler, about decision making under uncertainty and how we can use our knowledge of risk, uncertainty, probabilistic thinking, causal inference, and more to help us use data science and machine learning to make better decisions in an uncertain world. This is part 1 of a two part conversation. In this, part 1, we discuss risk, uncertainty, probabilistic thinking, and simulation, all with a view towards improving decision making and we draw on examples from our personal lives, the pandemic, our jobs, the reinsurance space, and the corporate world. In part 2, we?ll get into the nitty gritty of decision making under uncertainty. As JD says, and I paraphrase, ?You may think you train your models, but your models are really training you.? Links Vanishing Gradients' new YouTube channel! (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) JD on twitter (https://twitter.com/CMastication) Executive Data Science, episode 5 of Vanishing Gradients, in which Jim Savage and Hugo talk through decision making and why you should always be integrating your loss function over your posterior (https://vanishinggradients.fireside.fm/5) Fooled by Randomness by Nassim Taleb (https://en.wikipedia.org/wiki/Fooled_by_Randomness) Superforecasting: The Art and Science of Prediction Philip E. Tetlock and Dan Gardner (https://en.wikipedia.org/wiki/Superforecasting:_The_Art_and_Science_of_Prediction) Thinking in Bets by Annie Duke (https://www.penguin.com.au/books/thinking-in-bets-9780735216372) The Signal and the Noise: Why So Many Predictions Fail by Nate Silver (https://en.wikipedia.org/wiki/The_Signal_and_the_Noise) Thinking, Fast and Slow by Daniel Kahneman (https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow)
2022-12-07
Länk till avsnitt

Episode 14: Decision Science, MLOps, and Machine Learning Everywhere

Hugo Bowne-Anderson, host of Vanishing Gradients, reads 3 audio essays about decision science, MLOps, and what happens when machine learning models are everywhere. Links Our upcoming Vanishing Gradients live recording of Data Science and Decision Making Under Uncertainty with Hugo and JD Long! (https://www.eventbrite.com/e/data-science-and-decision-making-under-uncertainty-tickets-467379864757?aff=vg) Decision-Making in a Time of Crisis (https://www.oreilly.com/radar/decision-making-in-a-time-of-crisis/) by Hugo Bowne-Anderson MLOps and DevOps: Why Data Makes It Different (https://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/) by Ville Tuulos and Hugo Bowne-Anderson The above essay syndicated on VentureBeat (https://venturebeat.com/business/mlops-vs-devops-why-data-makes-it-different/) When models are everywhere (https://www.oreilly.com/radar/when-models-are-everywhere/) by Hugo Bowne-Anderson and Mike Loukides
2022-11-21
Länk till avsnitt

Episode 13: The Data Science Skills Gap, Economics, and Public Health

Hugo speak with Norma Padron about data science education and continuous learning for people working in healthcare, broadly construed, along with how we can think about the democratization of data science skills more generally. Norma is CEO of EmpiricaLab, where her team?s mission is to bridge work and training and empower healthcare teams to focus on what they care about the most: patient care. In a word, EmpiricaLab is a platform focused on peer learning and last-mile training for healthcare teams. As you?ll discover, Norma?s background is fascinating: with a Ph.D. in health policy and management from Yale University, a master's degree in economics from Duke University (among other things), and then working with multiple early stage digital health companies to accelerate their growth and scale, this is a wide ranging conversation about how and where learning actually occurs, particularly with respect to data science; we talk about how the worlds of economics and econometrics, including causal inference, can be used to make data science and more robust and less fragile field, and why these disciplines are essential to both public and health policy. It was really invigorating to talk about the data skills gaps that exists in organizations and how Norma?s team at Empiricalab is thinking about solving it in the health space using a 3 tiered solution of content creation, a social layer, and an information discovery platform. All of this in service of a key question we?re facing in this field: how do you get the right data skills, tools, and workflows, in the hands of the people who need them, when the space is evolving so quickly? Links Norma's website (https://www.normapadron.com/) EmpiricaLab (https://www.empiricalab.com/) Norma on twitter (https://twitter.com/NormaPadron__)
2022-10-12
Länk till avsnitt

Episode 12: Data Science for Social Media: Twitter and Reddit

Hugo speakswith Katie Bauer (https://twitter.com/imightbemary) about her time working in data science at both Twitter and Reddit. At the time of recording, Katie was a data science manager at Twitter and prior to that, a founding member of the data team at Reddit. She?s now Head of Data Science at Gloss Genius so congrats on the new job, Katie! In this conversation, we dive into what type of challenges social media companies face that data science is equipped to solve: in doing so, we traverse the difference and similarities in companies such as Twitter and Reddit, the major differences in being an early member of a data team and joining an established data function at a larger organization, the supreme importance of robust measurement and telemetry in data science, along with the mixed incentives for career data scientists, such as building flashy new things instead of maintaining existing infrastructure. I?ve always found conversations with Katie to be a treasure trove of insights into data science and machine learning practice, along with key learnings about data science management. In a word, Katie helps me to understand our space better. In this conversation, she told me that one important function data science can serve in any organization is creating a shared context for lots of different people in the org. We dive deep into what this actually means, how it can play out, traversing the world of dashboards, metric stores, feature stores, machine learning products, the need for top-down support, and much, much more.
2022-09-30
Länk till avsnitt

Episode 11: Data Science: The Great Stagnation

Hugo speaks with Mark Saroufim, an Applied AI Engineer at Meta who works on PyTorch where his team?s main focus is making it as easy as possible for people to deploy PyTorch in production outside Meta. Mark first came on our radar with an essay he wrote called Machine Learning: the Great Stagnation (https://marksaroufim.substack.com/p/machine-learning-the-great-stagnation), which was concerned with the stagnation in machine learning in academic research and in which he stated Machine learning researchers can now engage in risk-free, high-income, high-prestige work. They are today?s Medieval Catholic priests. This is just the tip of the icebergs of Mark?s critical and often sociological eye and one of the reasons I was excited to speak with him. In this conversation, we talk about the importance of open source software in modern data science and machine learning and how Mark thinks about making it as easy to use as possible. We also talk about risk assessments in considering whether to adopt open source or not, the supreme importance of good documentation, and what we can learn from the world of video game development when thinking about open source. We then dive into the rise of the machine learning cult leader persona, in the context of examples such as Hugging Face and the community they?ve built. We discuss the role of marketing in open source tooling, along with for profit data science and ML tooling, how it can impact you as an end user, and how much of data science can be considered differing forms of live action role playing and simulation. We also talk about developer marketing and content for data professionals and how we see some of the largest names in ML researchers being those that have gigantic Twitter followers, such as Andrei Karpathy. This is part of a broader trend in society about the skills that are required to capture significant mind share these days. If that?s not enough, we jump into how machine learning ideally allows businesses to build sustainable and defensible moats, by which we mean the ability to maintain competitive advantages over competitors to retain market share. In between this interview and its release, PyTorch joined the Linux Foundation, which is something we?ll need to get Mark back to discuss sometime. Links The Myth of Objective Tech Screens (https://marksaroufim.substack.com/p/the-myth-of-objective-tech-screens) Machine Learning: The Great Stagnation (https://marksaroufim.substack.com/p/machine-learning-the-great-stagnation) Fear the Boom and Bust: Keynes vs. Hayek - The Original Economics Rap Battle! (https://www.youtube.com/watch?v=d0nERTFo-Sk) History and the Security of Property (https://archive.ph/dRXEK#selection-21.0-21.36) by Nick Szabo Mark on YouTube (https://www.youtube.com/marksaroufim) Mark's Substack (https://marksaroufim.substack.com/p/machine-learning-the-great-stagnation) Mark's Discord (https://discord.com/invite/drmuTjWZrm)
2022-09-16
Länk till avsnitt

Episode 10: Investing in Machine Learning

Hugo speaks with Sarah Catanzaro, General Partner at Amplify Partners, about investing in data science and machine learning tooling and where we see progress happening in the space. Sarah invests in the tools that we both wish we had earlier in our careers: tools that enable data scientists and machine learners to collect, store, manage, analyze, and model data more effectively. As you?ll discover, Sarah identifies as a scientist first and an investor second and still believes that her mission is to enable companies to become data-driven and to generate ROI through machine and statistical learning. In her words, she?s still that cuckoo kid who?s ranting and raving about how data and AI will shift every tide. In this conversation, we talk about what scientific inquiry actually is and the elements of playfulness and seriousness it necessarily involves, and how it can be used to generate business value. We talk about Sarah?s unorthodox path from a data scientist working in defense to her time at Palantir and how that led her to build out a data team and function for a venture capital firm and then to becoming a VC in the data tooling space. We then really dive into the data science and machine learning tooling space to figure out why it?s so fragmented: we look to the data analytics stack and software engineering communities to find historical tethers that may be useful. We discuss the moving parts that led to the establishment of a standard, a system of record, and clearly defined roles in analytics and what we can learn from that for machine learning! We also dive into the development of tools, workflows, and division of labour as partial exercises in pattern recognition and how this can be at odds with the variance we see in the machine learning landscape, more generally! Two take-aways are that we need best practices and we need more standardization. We also discussed that, with all our focus and conversations on tools, what conversation we?re missing and Sarah was adamant that we need to be focusing on questions, not solutions, and even questioning what ML is useful for and what it isn?t, diving into a bunch of thoughtful and nuanced examples. I?m also grateful that Sarah let me take her down a slightly dangerous and self-critical path where we riffed on both our roles in potentially contributing to the tragedy of commons we?re all experiencing in the data tooling landscape, me working in tool building, developer relations, and in marketing, and Sarah in venture capital.
2022-08-18
Länk till avsnitt

9: AutoML, Literate Programming, and Data Tooling Cargo Cults

Hugo speaks with Hamel Husain, Head of Data Science at Outerbounds, with extensive experience in data science consulting, at DataRobot, Airbnb, and Github. In this conversation, they talk about Hamel's early days in data science, consulting for a wide array of companies, such as Crocs, restaurants, and casinos in Las Vegas, diving into what data science even looked like in 2005 and how you could think about delivering business value using data and analytics back then. They talk about his trajectory in moving to data science and machine learning in Silicon Valley, what his expectations were, and what he actually found there. They then take a dive into AutoML, discussing what should be automated in Machine learning and what shouldn?t. They talk about software engineering best practices and what aspects it would be useful for data scientists to know about. They also got to talk about the importance of literate programming, notebooks, and documentation in data science and ML. All this and more! Links Hamel on twitter (https://twitter.com/HamelHusain) The Outerbounds documentation project repo (https://github.com/outerbounds/docs) Practical Advice for R in Production (https://www.rstudio.com/blog/practical-advice-for-r-in-production-answering-your-questions/) nbdev: Create delightful python projects using Jupyter Notebooks (https://nbdev.fast.ai/)
2022-07-19
Länk till avsnitt

Episode 8: The Open Source Cybernetic Revolution

Hugo speaks with Peter Wang, CEO of Anaconda, about what the value proposition of data science actually is, data not as the new oil, but rather data as toxic, nuclear sludge, the fact that data isn?t real (and what we really have are frozen models), and the future promise of data science. They also dive into an experimental conversation around open source software development as a model for the development of human civilization, in the context of developing systems that prize local generativity over global extractive principles. If that?s a mouthful, which it was, or an earful, which it may have been, all will be revealed in the conversation. LInks Peter on twitter (https://twitter.com/pwang) Anaconda Nucleus (https://anaconda.cloud/) Jordan Hall on the Jim Rutt Show (https://www.jimruttshow.com/jordan-greenhall-hall/): Game B Meditations On Moloch (https://slatestarcodex.com/2014/07/30/meditations-on-moloch) -- On multipolar traps Here Comes Everybody: The Power of Organizing Without Organizations (https://en.wikipedia.org/wiki/Here_Comes_Everybody_(book)) by Clay Shirky Finite and Infinite Games (https://en.wikipedia.org/wiki/Finite_and_Infinite_Games) by James Carse Governing the Commons: The Evolution of Institutions for Collective Action (https://www.cambridge.org/core/books/governing-the-commons/7AB7AE11BADA84409C34815CC288CD79) by Elinor Olstrom Elinor Ostrom's 8 Principles for Managing A Commmons (https://www.onthecommons.org/magazine/elinor-ostroms-8-principles-managing-commmons) Haunted by Data (https://idlewords.com/talks/haunted_by_data.htm), a beautiful and mesmerising talk by Pinboard.in founder Maciej Ceglowski
2022-05-16
Länk till avsnitt

Episode 7: The Evolution of Python for Data Science

Hugo speaks with Peter Wang, CEO of Anaconda, about how Python became so big in data science, machine learning, and AI. They jump into many of the technical and sociological beginnings of Python being used for data science, a history of PyData, the conda distribution, and NUMFOCUS. They also talk about the emergence of online collaborative environments, particularly with respect to open source, and attempt to figure out the movings parts of PyData and why it has had the impact it has, including the fact that many core developers were not computer scientists or software engineers, but rather scientists and researchers building tools that they needed on an as-needed basis They also discuss the challenges in getting adoption for Python and the things that the PyData stack solves, those that it doesn?t and what progress is being made there. People who have listened to Hugo podcast for some time may have recognized that he's interested in the sociology of the data science space and he really considered speaking with Peter a fascinating opportunity to delve into how the Pythonic data science space evolved, particularly with respect to tooling, not only because Peter had a front row seat for much of it, but that he was one of several key actors at various different points. On top of this, Hugo wanted to allow Peter?s inner sociologist room to breathe and evolve in this conversation. What happens then is slightly experimental ? Peter is a deep, broad, and occasionally hallucinatory thinker and Hugo wanted to explore new spaces with him so we hope you enjoy the experiments they play as they begin to discuss open-source software in the broader context of finite and infinite games and how OSS is a paradigm of humanity?s ability to create generative, nourishing and anti-rivlarous systems where, by anti-rivalrous, we mean things that become more valuable for everyone the more people use them! But we need to be mindful of finite-game dynamics (for example, those driven by corporate incentives) co-opting and parasitizing the generative systems that we build. These are all considerations they delve far deeper into in Part 2 of this interview, which will be the next episode of VG, where we also dive into the relationship between OSS, tools, and venture capital, amonh many others things. LInks Peter on twitter (https://twitter.com/pwang) Anaconda Nucleus (https://anaconda.cloud/) Calling out SciPy on diversity (even though it hurts) (https://ilovesymposia.com/2015/04/03/calling-out-scipy-on-diversity/) by Juan Nunez-Iglesias Here Comes Everybody: The Power of Organizing Without Organizations (https://en.wikipedia.org/wiki/Here_Comes_Everybody_(book)) by Clay Shirky Finite and Infinite Games (https://en.wikipedia.org/wiki/Finite_and_Infinite_Games) by James Carse Governing the Commons: The Evolution of Institutions for Collective Action (https://www.cambridge.org/core/books/governing-the-commons/7AB7AE11BADA84409C34815CC288CD79) by Elinor Olstrom Elinor Ostrom's 8 Principles for Managing A Commmons (https://www.onthecommons.org/magazine/elinor-ostroms-8-principles-managing-commmons)
2022-05-01
Länk till avsnitt

Episode 6: Bullshit Jobs in Data Science (and what to do about them)

Hugo speaks with Jacqueline Nolis, Chief Product Officer at Saturn Cloud (formerly Head of Data Science), about all types of failure modes in data science, ML, and AI, and they delve into bullshit jobs in data science (yes, that?s a technical term, as you?ll find out) ?they discuss the elements that are bullshit, the elements that aren?t, and how to increase the ratio of the latter to the former. They also talk about her journey in moving from mainly working in prescriptive analytics building reports in PDFs and power points to deploying machine learning products in production. They delve into her motion from doing data science to designing products for data scientists and how to think about choosing career paths. Jacqueline has been an individual contributor, a team lead, and a principal data scientist so has a lot of valuable experience here. They talk about her experience of transitioning gender while working in data science and they work hard to find a bright vision for the future of this industry! Links Jacqueline on twitter (https://twitter.com/skyetetra) Building a Career in Data Science (https://jnolis.com/book/) by Jacqueline and Emily Robinson Saturn Cloud (https://saturncloud.io/) Why are we so surprised? (http://allendowney.blogspot.com/2016/11/why-are-we-so-surprised.html), a post by Allen Downey on communicating and thinking through uncertainty Data Mishaps Night! (https://datamishapsnight.com/) The Trump administration?s ?cubic model? of coronavirus deaths, explained (https://www.vox.com/2020/5/8/21250641/kevin-hassett-cubic-model-smoothing) by Matthew Yglesias Working Class Deep Learner (https://marksaroufim.substack.com/p/working-class-deep-learner?s=r) by Mark Saroufim
2022-04-04
Länk till avsnitt

Episode 5: Executive Data Science

Hugo speaks with Jim Savage, the Director of Data Science at Schmidt Futures, about the need for data science in executive training and decision, what data scientists can learn from economists, the perils of "data for good", and why you should always be integrating your loss function over your posterior. Jim and Hugo talk about what data science is and isn?t capable of, what can actually deliver value, and what people really enjoy doing: the intersection in this Venn diagram is where we need to focus energy and it may not be quite what you think it is! They then dive into Jim's thoughts on what he dubs Executive Data Science. You may be aware of the slicing of the data science and machine learning spaces into descriptive analytics, predictive analytics, and prescriptive analytics but, being the thought surgeon that he is, Jim proposes a different slicing into (1) tool building OR data science as a product, (2) tools to automate and augment parts of us, and (3) what Jim calls Executive Data Science. Jim and Hugo also talk about decision theory, the woeful state of causal inference techniques in contemporary data science, and what techniques it would behoove us all to import from econometrics and economics, more generally. If that?s not enough, they talk about the importance of thinking through the data generating process and things that can go wrong if you don?t. In terms of allowing your data work to inform your decision making, thery also discuss Jim?s maxim ?ALWAYS BE INTEGRATING YOUR LOSS FUNCTION OVER YOUR POSTERIOR? Last but definitively not least, as Jim has worked in the data for good space for much of his career, they talk about what this actually means, with particular reference to fast.ai founder & QUT professor of practice Rachel Thomas? blog post called ?Doing Data Science for Social Good, Responsibly? (https://www.fast.ai/2021/11/23/data-for-good/). Rachel?s post takes as its starting point the following words of Sarah Hooker, a researcher at Google Brain: "Data for good" is an imprecise term that says little about who we serve, the tools used, or the goals. Being more precise can help us be more accountable & have a greater positive impact. And Jim and I discuss his work in the light of these foundational considerations. Links Jim on twitter (https://twitter.com/abiylfoyp/) What Is Causal Inference?An Introduction for Data Scientists (https://www.oreilly.com/radar/what-is-causal-inference/) by Hugo Bowne-Anderson and Mike Loukides Jim's must-watch Data Council talk on Productizing Structural Models (https://www.datacouncil.ai/talks/productizing-structural-models) [Mastering Metrics}(https://www.masteringmetrics.com/) by Angrist and Pischke Mostly Harmless Econometrics: An Empiricist's Companion (https://press.princeton.edu/books/paperback/9780691120355/mostly-harmless-econometrics) by Angrist and Pischke The Book of Why (https://en.wikipedia.org/wiki/The_Book_of_Why) by Judea Pearl Decision-Making in a Time of Crisis (https://www.oreilly.com/radar/decision-making-in-a-time-of-crisis/) by Hugo Bowne-Anderson Doing Data Science for Social Good, Responsibly (https://www.fast.ai/2021/11/23/data-for-good/) by Rachel Thomas
2022-03-23
Länk till avsnitt

Episode 4: Machine Learning at T-Mobile

Hugo speaks with Heather Nolis, Principal Machine Learning engineer at T-mobile, about what data science, machine learning, and AI look like at T-mobile, along with Heather?s path from a software development intern there to principal ML engineer running a team of 15. They talk about: how to build a DS culture from scratch and what executive-level support looks like, as well as how to demonstrate machine learning value early on from a shark tank style pitch night to the initial investment through to the POC and building out the function; all the great work they do with R and the Tidyverse in production; what it?s like to be a lesbian in tech, and about what it was like to discover she was autistic and how that impacted her work; how to measure and demonstrate success and ROI for the org; some massive data science fails!; how to deal with execs wanting you to use the latest GPT-X ? in a fragmented tooling landscape; how to use the simplest technology to deliver the most value. Finally, the team just hired their first FT ethicist and they speak about how ethics can be embedded in a team and across an institution. Links Put R in prod (https://putrinprod.com/): Tools and guides to put R models into production Enterprise Web Services with Neural Networks Using R and TensorFlow (https://medium.com/tmobile-tech/enterprise-web-services-with-neural-networks-using-r-and-tensorflow-a09c1b100c11) Heather on twitter (https://twitter.com/heatherklus) T-Mobile is hiring! (https://www.t-mobile.com/careers) Hugo's upcoming fireside chat and AMA with Hilary Parker about how to actually produce sustainable business value using machine learning and product management for ML! (https://www.eventbrite.com/e/select-ml-project-where-value-is-not-null-tickets-284000161127?aff=hba)
2022-03-10
Länk till avsnitt

Episode 3: Language Tech For All

Rachael Tatman is a senior developer advocate for Rasa, where she?s helping developers build and deploy ML chatbots using their open source framework. Rachael has a PhD in Linguistics from the University of Washington where her research was on computational sociolinguistics, or how our social identity affects the way we use language in computational contexts. Previously she was a data scientist at Kaggle and she?s still a Kaggle Grandmaster. In this conversation, Rachael and I talk about the history of NLP and conversational AI//chatbots and we dive into the fascinating tension between rule-based techniques and ML and deep learning ? we also talk about how to incorporate machine and human intelligence together by thinking through questions such as ?should a response to a human ever be automated?? Spoiler alert: the answer is a resounding NO WAY! In this journey, something that becomes apparent is that many of the trends, concepts, questions, and answers, although framed for NLP and chatbots, are applicable to much of data science, more generally. We also discuss the data scientist?s responsibility to end-users and stakeholders using, among other things, the lens of considering those whose data you?re working with to be data donors. We then consider what globalized language technology looks like and can look like, what we can learn from the history of science here, particularly given that so much training data and models are in English when it accounts for so little of language spoken globally. Links Rachael's website (https://www.rctatman.com/) Rasa (https://rasa.com/) Speech and Language Processing (https://web.stanford.edu/~jurafsky/slp3/) by Dan Jurafsky and James H. Martin Masakhane (https://twitter.com/MasakhaneNLP), putting African languages on the #NLP map since 2019 The Distributed AI Research Institute (https://www.dair-institute.org/), a space for independent, community-rooted AI research, free from Big Tech?s pervasive influence The Algorithmic Justice League (https://www.ajl.org/), unmasking AI harms and biases Black in AI (https://blackinai.github.io/#/), increasing the presence and inclusion of Black people in the field of AI by creating space for sharing ideas, fostering collaborations, mentorship and advocacy Hugo's blog post on his new job and why it's exciting for him to double down on helping scientists do better science (https://outerbounds.com/blog/hba-excited-to-join-metaflow-and-outerbounds/)
2022-03-01
Länk till avsnitt

Episode 2: Making Data Science Uncool Again

Jeremy Howard is a data scientist, researcher, developer, educator, and entrepreneur. Jeremy is a founding researcher at fast.ai, a research institute dedicated to making deep learning more accessible. He is also a Distinguished Research Scientist at the University of San Francisco, the chair of WAMRI, and is Chief Scientist at platform.ai. In this conversation, we?ll be talking about the history of data science, machine learning, and AI, where we?ve come from and where we?re going, how new techniques can be applied to real-world problems, whether it be deep learning to medicine or porting techniques from computer vision to NLP. We?ll also talk about what?s present and what?s missing in the ML skills revolution, what software engineering skills data scientists need to learn, how to cope in a space of such fragmented tooling, and paths for emerging out of the shadow of FAANG. If that?s not enough, we?ll jump into how spreading DS skills around the globe involves serious investments in education, building software, communities, and research, along with diving into the social challenges that the information age and the AI revolution (so to speak) bring with it. But to get to all of this, you?ll need to listen to a few minutes of us chatting about chocolate biscuits in Australia! Links * fast.ai · making neural nets uncool again * nbdev: create delightful python projects using Jupyter Notebooks (https://github.com/fastai/nbdev) * The fastai book, published as Jupyter Notebooks (https://github.com/fastai/fastbook) * Deep Learning for Coders with fastai and PyTorch (https://www.oreilly.com/library/view/deep-learning-for/9781492045519/) * The wonderful and terrifying implications of computers that can learn (https://www.youtube.com/watch?v=t4kyRyKyOpo) -- Jeremy' awesome TED talk! * Manna (https://marshallbrain.com/manna) by Marshall Brain * Ghost Work (https://ghostwork.info/) by Mary L. Gray and Siddharth Suri * Uberland (https://www.ucpress.edu/book/9780520324800/uberland) by Alex Rosenblat
2022-02-21
Länk till avsnitt
Hur lyssnar man på podcast?

En liten tjänst av I'm With Friends. Finns även på engelska.
Uppdateras med hjälp från iTunes.