Evaluation - Multi-Modality Abdominal Multi-Organ Segmentation Challenge 2022

Multi-Modality Abdominal Multi-Organ Segmentation Challenge 2022 Banner

Two-Phase Challenge Submission¶

AMOS22 is a two-phase challenge. For the first phase (validation phase), the participants are required to submit the output of their algorithms as a single compressed zip file via the grand-challenge.org submission system. Each. teams will be allowed to create 3 submissions per day between May 10, 2022, noon and July 15, 2022, 11:59 p.m. to this phase. The submitted zip files should be formatted like the below. Make sure the predictions in the submitted zip file all be matched with the validation images one-to-one (100 preds for task1, 120 preds for task2), or are considered invalid submissions and no score will be generated.

results/
├─ amos_0008.nii.gz
├─ ...
├─ amos_xxxx.nii.gz

For the final phase (testing phase): the participants will be requested to submit their algorithm in the form of a docker container before July 20, 2022 (everywhere on the earth). We believe this will enable a more thorough confirmation of reproducibility. After the competition, the docker models will be released with the consent of the corresponding participants.

Metrics¶

Two classical medical segmentation metrics: Dice Similarity Coefficient (DSC), and normalized surface dice (NSD), will be used to assess different aspects of the performance of the segmentation methods. The metrics computing code can be found here.

Ranking Method¶

For the Validation phase, two metrics will be calculated for each label within each of the predicted label maps of the cases in the testing set. The mean and standard deviation of each label will be calculated. The two metrics will be averaged over each label and case ( num_cases * 2 scores -> 2 scores), and the participating teams will be ranked based on both average metrics' scores.

For the Test phase, again two metrics still will be calculated for each label within each test case, and then averaged over each label. For both two metrics, the participating algorithms will be ranked from low to high for each case, where the highest score receives the highest-scoring rank. The final leaderboard place will be determined by averaging the leaderboard places between these rankings, called "rank-then-aggregate."

More¶

Note that, regarding the validation set, all predictions must be present in the zip file of the submission. Incomplete submissions will not be evaluated and counted as invalid submissions. As for the final stage of the test set, the participating teams' algorithms should be submitted as docker files, and submissions that fail to output the expected results will be considered invalid. The guidance on how to pack the developed algorithm in the docker container will be provided here.