title:: The STOIC2021 COVID-19 AI challenge: Applying reusable training methodologies to private data
publisher:: Elsevier
people:: Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schön, Katja Ludwig, Rainer Lienhart, Simon Jégou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Müller, Silvan Mertes, Niklas Schröter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matías Nicolás Bossa, Abel Díaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis, Ngoc Dung Huynh, Imran Razzak, Reda Bouadjenek, Mario Verdicchio, Pasquale Borrelli, Marco Aiello, James A. Meakin, Alexander Lemm, Christoph Russ, Razvan Ionasec, Nikos Paragios, Bram van Ginneken, Marie-Pierre Revel
organization:: Institute for AI in Medicine (IKIM), University Hospital Essen, NVIDIA, Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen, German Cancer Consortium (DKTK), TU Dortmund
domain:: Natural Language Processing, Machine Learning, Clinical Applications, Biomedical
link:: https://doi.org/10.1016/j.media.2024.103230
Summary
This study implemented the Type Three (T3) challenge format in the STOIC2021 challenge, which allowed solutions to be trained on private data and ensured reusable training methodologies. The challenge aimed to predict from CT scans whether COVID-19 patients would experience severe outcomes (intubation or death within one month). It consisted of a Qualification phase where teams developed solutions using public data, and a Final phase where teams submitted codebases to train on private data. Six out of eight submissions in the Final phase were successfully trained by the organizers. The winning solution achieved an AUC of 0.815 for discriminating severe from non-severe COVID-19. The Final phase solutions outperformed the teams’ Qualification phase solutions, demonstrating the benefit of training on combined public and private data. The submitted codebases were publicly released, enabling third-party use of the developed training methods.
Data Points
- STOIC2021 challenge implemented Type Three (T3) format allowing training on private data
- Aimed to predict severe COVID-19 (intubation or death within 1 month) from CT scans
- Had Qualification phase (public data) and Final phase (private data)
- 2,000 CT scans with labels publicly released under CC-BY-NC 4.0 license
- 8 teams submitted codebases in Final phase, 6 were successfully trained by organizers
- Winning solution achieved AUC of 0.815 for severe COVID-19 prediction on private test set
- Final phase solutions improved over teams’ Qualification phase solutions
- Finalist codebases for training and inference publicly released under permissive licenses
- Provides reusable methods that can aid clinical tool development for COVID-19 triage