We talked about:
- Andreas’s background
- Why data engineering is becoming more popular
- Who to hire first – a data engineer or a data scientist?
- How can I, as a data scientist, learn to build pipelines?
- Don’t use too many tools
- What is a data pipeline and why do we need it?
- What is ingestion?
- Can just one person build a data pipeline?
- Approaches to building data pipelines for data scientists
- Processing frameworks
- Common setup for data pipelines — car price prediction
- Productionizing the model with the help of a data pipeline
- Scheduling
- Orchestration
- Start simple
- Learning DevOps to implement data pipelines
- How to choose the right tool
- Are Hadoop, Docker, Cloud necessary for a first job/internship?
- Is Hadoop still relevant or necessary?
- Data engineering academy
- How to pick up Cloud skills
- Avoid huge datasets when learning
- Convincing your employer to do data science
- How to find Andreas
Links:
- LinkedIn: https://www.linkedin.com/in/andreas-kretz
- Data engieering cookbook: https://cookbook.learndataengineering.com/
- Course: https://learndataengineering.com/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html