We talked about:
- Marysia’s background
- What data-centric AI is
- Data-centric Kaggle competitions
- The mindset shift to data-centric AI
- Data-centric does not mean you should not iterate on models
- How to implement the data-centric approach
- Focusing on the data vs focusing on the model
- Resources to help implement the data-centric approach
- Data-centric AI vs standard data cleaning
- Making sure your data is representative
- Knowing when your data is good enough
- The importance of user feedback
- “Shadow Mode” deployment
- What to do if you have a lot of bad data or incomplete data
- Marysia’s role at PyData
- How Marysia joined PyData
- The difference between PyData and PyCon
- Finding Marysia online
Links:
- Embetter & Bulk Demo: https://www.youtube.com/watch?v=L---nvDw9KU
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html