Here you can access the training data, development data, and trial pairs needed to participate in the challenge.
The complete dataset package containing both training and development data for the TidyVoice2026 Challenge.
Contents:
Download:
Click the link below to access the dataset page or download the API script:
Available via Mozilla Data Collective
Trial pairs file containing the evaluation protocol for the development set.
Click the link below to download the trial pairs file:
🔗 Download Trial Pairs (trial_pairs_dev.txt)Available via Google Drive
The evaluation dataset will be released closer to the evaluation phase of the challenge.
Evaluation dataset will be available here during the evaluation phase
Will be released during evaluation phase
Trial files for the official evaluation phase will be made available here.
Official evaluation trial files will be available here
Will be released during evaluation phase
Registration Required: Please complete the registration process before downloading the dataset.
pip install datacollective
Download Using Python Script:
Download the download_tidyvoice.py script from the dataset download section above, then:
YOUR_API_KEY_HERE with your Mozilla Data Collective API keyOUTPUT_DIR to your desired download locationpython download_tidyvoice.pyThe dataset is organized with speakerID folders directly inside each dataset folder, which then contain languageID subfolders with the corresponding audio files for that speaker in that specific language.
TidyVoiceX_Train/Dev
├── speaker_001/
│ ├── en/ # English recordings
│ │ ├── file1.wav
│ │ ├── file2.wav
│ │ └── ...
│ ├── fa/ # Persian recordings
│ │ ├── file1.wav
│ │ └── ...
│ └── fr/ # French recordings
│ └── ...
├── speaker_002/
│ ├── de/ # German recordings
│ ├── it/ # Italian recordings
│ └── ...
└── ...
Structure Explanation:
If you encounter any issues with the dataset download or have questions about the data format, please contact:
If you use the TidyVoice dataset in your research, please cite:
[Citation information will be provided upon dataset release]
Note: Evaluation Dataset links will be activated closer to the challenge start date. Please check back regularly for updates.