De-identifying and anonymising PHI (protected/personal health information) in health records is one of the central pillars of AI success in healthcare. Without de-identified data we cannot share data between hospitals , train models confidentially, or safely create large language models. Live from new orleans, Dev and Doc are here to dive into this fascinating topic, as well as describe our experiences of building and deploying an AI model with over 99% recall for redaction of PHI.
Dev and Doc is a Podcast where developers and doctors join forces to deep dive into AI in healthcare. Together, we can build models that matter.
👨🏻⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua...
🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr
Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)
00:00 start
00:52 intro
2:10 what is PHI? Personal /private health information
7:00 approaches on de-identifying hospital records
9:55 the problem with over-redaction /anonymisation
11:33 using deep learning for anonymisation
14:13 our experiences building a over 99% recall model VS manual annotation
18:03 how to make a high performing model - the art of annotations
24:49 Dev and Docs annotation method (Zeljko et al.)
30:42 how do you prevent overfitting?
31:54 ensuring model performs in new hospital / environments
33:23 future
34:48 synthetic data
The podcast 🎙️
🔊Spotify: https://open.spotify.com/show/3QO5Lr3...
📙Substack: https://aiforhealthcare.substack.com/
🎞️ Editor-
Dragan Kraljević https://www.instagram.com/dragan_kral...
🎨Brand design and art direction -
Ana Grigorovici
https://www.behance.net/anagrigorovic...