42 avsnitt • Längd: 35 min • Oregelbundet
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.
SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.
SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.
For inquiries contact [email protected]
The podcast The Data Engineering Show is created by The Firebolt Data Bros. The podcast and the artwork on this page are embedded on this page using the public podcast feed (RSS).
SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea.
In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization.
Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.
Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture.
Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.
Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods.
Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department.
They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.
If you consider yourself a hardcore engineer, this episode is for you.
There are two types of data influencers on LinkedIn:
1. Those who talk directly about the products and companies they work for
2. Those that provide more general guidance, tips and opinions
Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly?
We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.
Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks.
She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.
Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium.
She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.
Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture.
Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.
After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that’ll actually push the industry forward?
As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.
As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.
When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.
How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.
Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm.
They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.
Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.
Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.
Sudeep Kumar, Principal Engineer at Salesforce is a ClickHouse fan. He considers the shift to Clickhouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows.
Besides a ClickHouse review from a practitioner’s point of view, Sudeep tells us about interesting use-cases he’s working on at Salesforce.
According to Maxime Beauchemin, CEO & Founder at Preset and Creator of Apache Superset and Apache Airflow, it's not so straight-forward to understand what you're really getting into and the vastness of the skills that are required in order to build a thriving company.
Picking the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.
Plus, Max walks the bros through the genesis of Airflow, Superset & Presto, and Airflow's old school marketing approach that won the hearts of developers across the world. And just like the terminator, once the machine takes over, you can't stop.
According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is to tag every table, database or ETL running to have good granularity over every feature.
Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.
Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there’s no Firebolt talk in this episode.
Klarna is one of the leading fintech companies in the world, valued at $45B.
While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.
Archana shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.
Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly.
Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics.
Speaker: Apun Hiran, Director of Software Engineering (Data), Slack
Hosts: Eldad and Boaz Farkash, CEO and CPO, Firebolt
Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the Google, Spotify and Ark Kapital data stacks.
This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space.
Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly. He talked about data applications at Zendesk and how they’re built, technologies that excite him like data lineage and data catalog, and the best routes for software engineers to get their hands dirty in the data world.
INTERVIEWER: Boaz Farkash.
ZENDESK GUEST: Ananth Packkildura - Principal Software Engineer.
Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs.
The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day.
Bolt's ride-hailing app serves 2B users in Europe and Africa and handles 500K queries every day.
Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges.
Guest: Erik Heintare - Senior Analytics Engineer at Bolt
Hosts: Eldad and Boaz Farkash, AKA The Data Bros
Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.
Guests:
Amir Arad, Director of Machine Learning, Agoda
Shaun Sit, Senior Dev Manager, Agoda
Hosts:
The Data Bros - Eldad and Boaz Farkash
How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction?
Guest: Lior Solomon, VP Data Engineering at Vimeo.
En liten tjänst av I'm With Friends. Finns även på engelska.