Sveriges mest populära poddar

Impact AI

Curating Medical Image Datasets with Jie Wu from Segmed

17 min • 28 augusti 2023

The accelerated development of medical AI could be life-changing for patients. Unfortunately, accessing large amounts of diverse, standardized data has been a major stumbling block to progress. That’s where Segmed comes in, a platform that allows researchers to access diverse, high-quality, and de-identified medical imaging data. Crucially, Segmed’s platform also provides data for medical AI training and validation.

I am joined today, by Segmed’s co-founder, Jie Wu, to discuss how they are solving key data issues to rapidly accelerate medical AI development. You’ll hear Jie break down some of the biggest challenges in curating medical image datasets — including the extra computational power needed to handle high-res medical images, like CT scans — and how they are addressing these obstacles. Jie also takes the time to emphasize the need for diversity when curating medical image datasets and the importance of mitigating bias during the data curation phase. To learn more about Segmed and how they are contributing to the development of medical AI, be sure to tune in today!


Key Points:

  • A warm welcome to Jie Wu, co-founder of Segmed.
  • Insight into how Segmed is solving data issues to accelerate medical AI development.
  • Why solutions to these data issues are crucial for medical research.
  • Segmed’s focus on medical imaging data.
  • Their approach to different imaging modalities.
  • An overview of the key challenges in curating medical image datasets.
  • How Segmed determines the amount of data they will need.
  • Best practices for curating a training set of medical images.
  • Why collecting a diverse range of images is essential.
  • An overview of how the quality of labels is assessed by experts.
  • How imaging modality influences Segmed’s approach to creating datasets.
  • The variations in datasets across different imaging pathologies.
  • Special considerations that inform the validation set versus the training set.
  • How bias manifests in models trained on medical images.
  • Steps that can be taken to mitigate bias during the data curation phase.
  • How the need for diverse datasets has increased along with greater awareness of bias.
  • Jie’s thoughts on the future of foundation models in the medical AI space.


Quotes:

“A high-resolution of CT can take up to several gigabytes of storage itself.” — Jie Wu


“I think the most important piece is actually to collect as diversely as possible. So I ask that given the budget limit or maybe time limit, the size of the data set will be limited but it should be at least representative of the target population and targeted practice.” — Jie Wu


“The best quality labels are curated by experts and it is curated by multiple experts.” — Jie Wu


“A 3D image stores much more information than the 2D images, so you need less data for that.” — Jie Wu


“The external validation datasets require much more carefully curated datasets and much higher quality labels, and also it needs to be representative of the population, of the institutions, and also geographical locations.” — Jie Wu


“We hope that we can enter into the development of AI and make these algorithms go to market faster and benefit more people.” — Jie Wu


Links:

Jie Wu on LinkedIn

Segmed

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.

Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.

Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Computer Vision Advisory Services – Monthly advisory services to help you strategically plan your CV/ML capabilities, reduce the trial-and-error of model development, and get to market faster.

Förekommer på
00:00 -00:00