Sveriges mest populära poddar

DataTalks.Club

Dataset Creation and Curation - Christiaan Swart

56 min • 9 september 2022

We talked about:

  • Christiaan’s background
  • Usual ways of collecting and curating data
  • Getting the buy-in from experts and executives
  • Starting an annotation booklet
  • Pre-labeling
  • Dataset collection
  • Human level baseline and feedback
  • Using the annotation booklet to boost annotation productivity
  • Putting yourself in the shoes of annotators (and measuring performance)
  • Active learning
  • Distance supervision
  • Weak labeling
  • Dataset collection in career positioning and project portfolios
  • IPython widgets
  • GDPR compliance and non-English NLP
  • Finding Christiaan online


Links:

  • My personal blog: https://useml.net/
  • Comtura, my company: https://comtura.ai/
  • LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
  • Twitter: https://twitter.com/swartchris8/


ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Kategorier
Förekommer på
00:00 -00:00