Sveriges mest populära poddar

DataTalks.Club

Build Your Own Data Pipeline - Andreas Kretz

62 min • 2 juli 2021

We talked about:

  • Andreas’s background
  • Why data engineering is becoming more popular
  • Who to hire first – a data engineer or a data scientist?
  • How can I, as a data scientist, learn to build pipelines?
  • Don’t use too many tools
  • What is a data pipeline and why do we need it?
  • What is ingestion?
  • Can just one person build a data pipeline?
  • Approaches to building data pipelines for data scientists
  • Processing frameworks
  • Common setup for data pipelines — car price prediction
  • Productionizing the model with the help of a data pipeline
  • Scheduling
  • Orchestration
  • Start simple
  • Learning DevOps to implement data pipelines
  • How to choose the right tool
  • Are Hadoop, Docker, Cloud necessary for a first job/internship?
  • Is Hadoop still relevant or necessary?
  • Data engineering academy
  • How to pick up Cloud skills
  • Avoid huge datasets when learning
  • Convincing your employer to do data science
  • How to find Andreas


Links:

  • LinkedIn: https://www.linkedin.com/in/andreas-kretz
  • Data engieering cookbook: https://cookbook.learndataengineering.com/
  • Course: https://learndataengineering.com/


Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Kategorier
Förekommer på
00:00 -00:00