We talked about:
- Christiaan’s background
- Usual ways of collecting and curating data
- Getting the buy-in from experts and executives
- Starting an annotation booklet
- Pre-labeling
- Dataset collection
- Human level baseline and feedback
- Using the annotation booklet to boost annotation productivity
- Putting yourself in the shoes of annotators (and measuring performance)
- Active learning
- Distance supervision
- Weak labeling
- Dataset collection in career positioning and project portfolios
- IPython widgets
- GDPR compliance and non-English NLP
- Finding Christiaan online
Links:
- My personal blog: https://useml.net/
- Comtura, my company: https://comtura.ai/
- LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
- Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html