System performance will be ranked based on the following metrics, calculated from the submitted score file:
The primary metric for ranking will be the EER. Evaluations will be reported on both pooled and four types of trial pair subsets to assess overall robustness and language-specific performance. In the evaluation phase, we have two different trial pair lists that all participants must evaluate and submit results for both trial lists.
The secondary metric is the Minimum Detection Cost Function (minDCF), which provides a cost-weighted combination of false acceptance and false rejection errors. It is defined as:
DCF = Cmiss × Pmiss × Ptar + Cfa × Pfa × (1 - Ptar)
where:
The minimum value of this function over all possible decision thresholds is reported as minDCF.
Evaluation Setting: We set Cmiss = Cfa = 1.0 and Ptar = 0.01. This configuration serves as our evaluation setting and is used as a secondary metric for tie-breaking and comprehensive performance analysis.
The evaluation will be conducted with a focus on system performance, fairness, and adherence to the challenge protocol. The official test data, along with two trial pair lists for the final evaluation, will be clearly defined and documented on the challenge website.
Important: Information about the language of each utterance will not be disclosed during evaluation to ensure fair assessment of language-independent systems.
The development trial pairs cover four types of pairs to help participants understand how well their systems distinguish between speakers versus languages:
This variation allows participants to assess how much their system can distinguish between speakers versus how much it is influenced by language differences.
In the evaluation phase, participants will be provided with two different trial pair lists that all participants must evaluate and submit results for:
Trial List 1: Contains enrollment utterances from seen languages (languages present in the training and development data) and test utterances from unseen languages (languages not present in the training and development data).
Trial List 2: Contains enrollment and test utterances from unseen languages only. This list includes 38 unseen languages that are not present in the training and development data.
These trial pair structures are designed to evaluate the ability of systems to eliminate language effects and perform robust speaker verification across languages, including languages that were not encountered during training.
For information about the official baseline systems, please see the Baseline Systems page.