Initialize a dataset
- soundata.initialize(dataset_name, data_home=None)[source]
Load a soundata dataset by name
Example
urbansound8k = soundata.initialize('urbansound8k') # get the urbansound8k dataset urbansound8k.download() # download orchset urbansound8k.validate() # validate orchset clip = urbansound8k.choice_clip() # load a random clip print(clip) # see what data a clip contains urbansound8k.clip_ids() # load all clip ids
- Parameters:
dataset_name (str) – the dataset’s name see soundata.DATASETS for a complete list of possibilities
data_home (str or None) – path where the data lives. If None uses the default location.
- Returns:
Dataset – a soundata.core.Dataset object
Dataset Loaders
3D-MARCo
3D-MARCo Dataset Loader
Dataset Info
3D-MARCo: database of 3D sound recordings of musical performances and room impulse responses
- Created By:
- Hyunkook Lee, Dale Johnson, Bogdan Bacila.Centre for Audio and Psychoacoustic Engineering, University of Huddersfield.
Version 1.0.1
- Description:
3D-MARCo is an open-access database of 3D sound recordings of musical performances and room impulse responses. The recordings were made in the St. Paul’s concert hall in Huddersfield, UK A total of 71 microphone capsules were used simultaneously. The main microphone arrays included in the database comprise PCMA-3D, OCT-3D, 2L-Cube, Decca Cubioid, First-order Ambisonics (FOA), Higher-order Ambisonics (HOA) and Hamasaki Square with height. In addition, ORTF, side/height, Voice of God and floor channels as well as a dummy head and spot microphones are included. The sound sources recorded are string quartet, piano trio, piano solo, organ, a cappella group, various single sources and room impulse responses of a virtual ensemble with 13 source positions captured by all of the microphones. 3D-MARCo would be useful for spatial audio research, recording education, critical ear training, etc.
- Audio Files Included:
- For each musical performance sound source (Acappella, Organ, Piano Solo 1, Piano solo 2, Quartet, Trio), there are 65 wav files that correspond to:
64 individual capsules (24-bit / 96kHz resolution)
one 32-channel EigenMike file in A-format (24-bit / 48kHz resolution).
The piano recordings contain two more channels (left and right) that correspond to spot microphones placed just outside the piano pointing toward the hammers.
The quartet recordings contain four more channels corresponding to spot microphones placed above the instruments (violin 1, violin 2, cello, viola) pointing toward the F hole.
The trio recordins contain four more channels corresponding to spot microphones, two placed above the string instruments (violin, cello) pointing toward the F hole, and two placed just outside the piano pointing toward the hammers.
The single sources were recorded at 7 different azimuth angles. For each angle there are also 65 wav files.
The impulse responses were recorded at 13 different azimuth angles. For each angle there are 66 wav files. The extra one is the EigenMike 4th-order B-format ambisonics (ACN SN3D; 24-bit / 48kHz resolution).
- Annotations Included:
No event labels associated with this dataset
No predefined training, validation, or testing splits.
Angular orientation for “impulse responses” and “single sources” (follows the ITU-R convention where positive angles in the left-hand side and negative angles in the right-hand side, e.g. +30° for Front Left and -30° for Front Right).
- Please Acknowledge 3D-MARCo in Academic Research:
If you use this dataset please cite its original publication:
Lee H, Johnson D. An open-access database of 3D microphone array recordings. InAudio Engineering Society Convention 147 2019 Oct 8. Audio Engineering Society.
- License:
CC-BY NC 3.0 license (free to share and adapt the material, but not permitted to use it for commercial purposes)
- class soundata.datasets.marco.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
3D-MARCo Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
source_label (str) – label of the source being recorded
source_angle (str) – angle of the source being recorded
audio_path (str) – path to the audio file
clip_id (str) – clip id
microphone_info (list) – list of strings with all relevant microphone metadata
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- class soundata.datasets.marco.Dataset(data_home=None)[source]
The 3D-MARCo dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a 3D-MARCo audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 48000 by default, which re-samples all files except the EigenMike ones, resulting in constant sampling rate between all clips in the dataset.
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.marco.load_audio(fhandle: BinaryIO, sr=48000) Tuple[numpy.ndarray, float] [source]
Load a 3D-MARCo audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 48000 by default, which re-samples all files except the EigenMike ones, resulting in constant sampling rate between all clips in the dataset.
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
DCASE23-Task2
DCASE23_Task2 Dataset Loader
Dataset Info
- Created By
- Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, and Masahiro Yasuda (Hitachi, Ltd. and NTT Corporation).
- Version
1.0
- Description
The DCASE 2023 Task 2 “First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring” dataset provides the operating sounds of seven real/toy machines: ToyCar, ToyTrain, Fan, Gearbox, Bearing, Slide rail, and Valve. Each recording is a single-channel, 10-second audio that includes both a machine’s operating sound and environmental noise. The dataset contains training clips containing normal sounds in the source and target domain and test clips of both normal and anomalous sounds.
- Audio Files Included
10,000 ten-second audio recordings for each machine type in WAV format. The raw directory contains recordings as WAV files, with the source/target domain and attributes provided in the file name.
- Meta-data Files Included
Attribute csv files accompany the audio files for easy access to attributes that cause domain shifts. Each file lists the file names, domain shift parameters, and the value or type of these parameters.
- Please Acknowledge DCASE 2023 Task 2 in Academic Research
When the DCASE 2023 Task 2 dataset is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publications:
- Conditions of Use
The DCASE 2023 Task 2 dataset was created jointly by Hitachi, Ltd. and NTT Corporation. It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
- Feedback
For any issues or feedback regarding the dataset, please reach out to: | * Kota Dohi: kota.dohi.gr@hitachi.com | * Keisuke Imoto: keisuke.imoto@ieee.org | * Noboru Harada: noboru@ieee.org | * Daisuke Niizumi: daisuke.niizumi.dt@hco.ntt.co.jp | * Yohei Kawaguchi: yohei.kawaguchi.xk@hitachi.com.
- class soundata.datasets.dcase23_task2.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
DCASE23_Task2 Clip class :Parameters: clip_id (str) – ID of the clip
- Variables:
audio (np.ndarray, float) – Array representation of the audio clip
audio_path (str) – Path to the audio file
file_name (str) – Name of the clip file, useful for cross-referencing
d1p (str) – First domain shift parameter specifying the attribute causing the domain shift
d1v (str) – First domain shift value or type associated with the domain shift parameter
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property d1p
The clip’s first domain shift parameter (d1p).
- Returns:
str - first domain shift parameter of the clip
- property d1v
The clip’s first domain shift value (d1v).
- Returns:
str - first domain shift value of the clip
- property file_name
The clip’s file name.
Used for cross-referencing with attribute CSV files for additional metadata.
- Returns:
str - name of the clip file
- class soundata.datasets.dcase23_task2.Dataset(data_home=None)[source]
The DCASE23_Task2 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a DCASE23_Task2 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.dcase23_task2.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float] [source]
Load a DCASE23_Task2 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
DCASE23-Task4B
DCASE23 Task 4B Dataset Loader
Dataset Info
- Created By:
- Annamaria Mesaros, Tuomas Heittola, and Tuomas Virtanen.Tampere University of Technology.
Version 1.0
- Description:
MAESTRO real development contains 49 real-life audio files from 5 different acoustic scenes, each of them from 3 to 5 minutes long. The other 26 files are kept for evaluation purposes on the DCASE task 4 B. The distribution of files per scene is the following: cafe restaurant 10 files, city center 10 files, residential_area 11 files, metro station 9 files and grocery store 9 files. The total duration of the development dataset is 97 minutes and 4 seconds.
The audio files contain sounds from the following classes:
announcement
birds singing
breakes squeaking
car
cash register
children voices
coffee machine
cutlery/dishes
door opens/closes
footsteps
furniture dragging
The real life-recordings used in this study include a subset of the TUT Sound Events 2016 and a subset of TUT Sound Events 2017.
- Please Acknowledge TUT Acoustic Scenes Strong Label Dataset in Academic Research:
- If you use this dataset, please cite the following paper:A. Mesaros, T. Heittola, and T. Virtanen, “TUT database for acoustic scene classification and sound event detection,” in 2016 24th European Signal Processing Conference (EUSIPCO), 2016, pp. 1128-1132.
- License:
- License permits free academic usage. Any commercial use is strictly prohibited. For commercial use, contact dataset authors.Copyright (c) 2020 Tampere University and its licensorsAll rights reserved.
Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the MAESTRO Real - Multi Annotator Estimated Strong Labels (“Work”) described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (MAchine Listening Group at Tampere University), is acknowledged in any publication that reports research using this Work. Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to:
selling or reproducing the Work
selling or distributing the results or content achieved by use of the Work
providing services by using the Work.
- Feedback:
For questions or feedback, please contact irene.martinmorato@tuni.fi.
- class soundata.datasets.dcase23_task4b.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
DCASE23_Task4B Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
annotations_path (str) – path to the annotations file
clip_id (str) – clip id
events (soundata.annotations.Events) – sound events with start time, end time, label and confidence
split (str) – subset the clip belongs to: development or evaluation
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- events
The clip’s events.
- Returns:
annotations.Events - sound events with start time, end time, label and confidence
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property split
The clip’s split.
- Returns:
** str - subset the clip belongs to* – development or evaluation
- class soundata.datasets.dcase23_task4b.Dataset(data_home=None)[source]
The DCASE23_Task4B dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a DCASE23_Task4B audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the stereo audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- load_clips()[source]
Load all clips in the dataset
- Returns:
dict – {clip_id: clip data}
- Raises:
NotImplementedError – If the dataset does not support Clips
- soundata.datasets.dcase23_task4b.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a DCASE23_Task4B audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the stereo audio signal
float - The sample rate of the audio file
DCASE23-Task6a
DCASE 2023 Task-6A Dataset Loader
Dataset Info
- DCASE 2023 Task-6A
- Clotho (c) by K. Drossos, S. Lipping, and T. Virtanen.Clotho is licensed under the terms set by Tampere University and Creative Commons licenses for the audio files as per their origin from the Freesound platform.You should have received a copy of the license along with this work.Paper: “Clotho: an Audio Captioning Dataset,” ICASSP 2020
- Created By:
- K. Drossos, S. Lipping, and T. Virtanen.Tampere University, Finland
- Version 2.1.0
- Fixes for corrupted files and illegal characters.More details on version changes are available in the dataset repository.
- Description
Clotho is an audio captioning dataset, consisting of 6974 audio samples, each accompanied by five captions, totaling 34,870 captions.
Audio samples are 15 to 30 seconds in duration.
Captions are 8 to 20 words long.
Dataset splits: development, validation, and evaluation.
Detailed description and usage guidelines in the ICASSP 2020 paper and dataset repository.
- Audio Files Included
Development split: 3840 audio files (including 947 new files in version 2)
Validation split: 1046 new audio files
Evaluation split: No changes from version 1
File format: Single channel (mono), various bitrates and sample rates, WAV format.
- Caption Files Included
Clotho captions in CSV format for each dataset split.
Captions follow consistent word usage, no named entities or speech transcription.
Unique vocabulary across splits to prevent data leakage.
- Metadata Files Included
Accompanying metadata for each audio file, including file name, keywords, original URL, excerpt samples, uploader, and license link.
- Conditions of Use
- Dataset created by K. Drossos, S. Lipping, and T. Virtanen.Audio files under various Creative Commons licenses as per Freesound platform terms.Captions under Tampere University license, primarily non-commercial with attribution.Full details in the LICENSE file included with the dataset.
- Acknowledgment in Academic Research
- When using Clotho for academic research, please cite: K. Drossos, S. Lipping, and T. Virtanen, “Clotho: an Audio Captioning Dataset,” ICASSP 2020.
- Feedback and Contributions
- Feedback and contributions are welcome.Please contact the creators through the GitHub repository.
- class soundata.datasets.dcase23_task6a.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
DCASE’23 Task 6A Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – Audio signal and sample rate.
file_name (str) – Name of the file.
keywords (str) – Associated keywords.
sound_id (str) – Unique identifier for the sound.
sound_link (str) – Link to the sound.
start_end_samples (tuple) – Start and end samples in the audio file.
manufacturer (str) – Manufacturer of the recording equipment.
license (str) – License of the clip.
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property file_name
The name of the audio file.
- Returns:
str - Name of the file.
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property keywords
Keywords associated with the clip.
- Returns:
str - Keywords for the clip.
- property license
License of the clip.
- Returns:
str - License information.
- property manufacturer
Manufacturer of the recording equipment.
- Returns:
str - Manufacturer name.
- property sound_id
Unique identifier for the sound.
- Returns:
str - Sound ID.
- property sound_link
Link to the sound.
- Returns:
str - URL of the sound.
- property start_end_samples
Start and end samples in the audio file.
- Returns:
tuple - Start and end samples.
- class soundata.datasets.dcase23_task6a.Dataset(data_home=None)[source]
The DCASE’23 Task 6A dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a DCASE’23 Task 6A audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.dcase23_task6a.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a DCASE’23 Task 6A audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
DCASE23-Task6b
DCASE 2023 Task-6B Dataset Loader
Dataset Info
- DCASE 2023 Task-6B
- Clotho (c) by K. Drossos, S. Lipping, and T. Virtanen.Clotho is licensed under the terms set by Tampere University and Creative Commons licenses for the audio files as per their origin from the Freesound platform.You should have received a copy of the license along with this work.Paper: “Clotho: an Audio Captioning Dataset,” ICASSP 2020
- Created By:
- K. Drossos, S. Lipping, and T. Virtanen.Tampere University, Finland
- Version 2.1.0
- Fixes for corrupted files and illegal characters.More details on version changes are available in the dataset repository.
- Description
Clotho is an audio captioning dataset, consisting of 6974 audio samples, each accompanied by five captions, totaling 34,870 captions.
Audio samples are 15 to 30 seconds in duration.
Captions are 8 to 20 words long.
Dataset splits: development, validation, and evaluation.
Detailed description and usage guidelines in the ICASSP 2020 paper and dataset repository.
- Audio Files Included
Development split: 3840 audio files (including 947 new files in version 2)
Validation split: 1046 new audio files
Evaluation split: No changes from version 1
File format: Single channel (mono), various bitrates and sample rates, WAV format.
- Caption Files Included
Clotho captions in CSV format for each dataset split.
Captions follow consistent word usage, no named entities or speech transcription.
Unique vocabulary across splits to prevent data leakage.
- Metadata Files Included
Accompanying metadata for each audio file, including file name, keywords, original URL, excerpt samples, uploader, and license link.
- Conditions of Use
- Dataset created by K. Drossos, S. Lipping, and T. Virtanen.Audio files under various Creative Commons licenses as per Freesound platform terms.Captions under Tampere University license, primarily non-commercial with attribution.Full details in the LICENSE file included with the dataset.
- Acknowledgment in Academic Research
- When using Clotho for academic research, please cite: K. Drossos, S. Lipping, and T. Virtanen, “Clotho: an Audio Captioning Dataset,” ICASSP 2020.
- Feedback and Contributions
- Feedback and contributions are welcome.Please contact the creators through the GitHub repository.
- class soundata.datasets.dcase23_task6b.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
DCASE’23 Task 6B Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – Audio signal and sample rate.
file_name (str) – Name of the file.
keywords (str) – Associated keywords.
sound_id (str) – Unique identifier for the sound.
sound_link (str) – Link to the sound.
start_end_samples (tuple) – Start and end samples in the audio file.
manufacturer (str) – Manufacturer of the recording equipment.
license (str) – License of the clip.
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property file_name
The name of the audio file.
- Returns:
str - Name of the file.
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property keywords
Keywords associated with the clip.
- Returns:
str - Keywords for the clip.
- property license
License of the clip.
- Returns:
str - License information.
- property manufacturer
Manufacturer of the recording equipment.
- Returns:
str - Manufacturer name.
- property sound_id
Unique identifier for the sound.
- Returns:
str - Sound ID.
- property sound_link
Link to the sound.
- Returns:
str - URL of the sound.
- property start_end_samples
Start and end samples in the audio file.
- Returns:
tuple - Start and end samples.
- class soundata.datasets.dcase23_task6b.Dataset(data_home=None)[source]
The DCASE’23 Task 6B dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a DCASE’23 Task 6B audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.dcase23_task6b.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a DCASE’23 Task 6B audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
DCASE-bioacoustic
DCASE-BIOACOUSTIC Dataset Loader
Dataset Info
DCASE-BIOACOUSTIC
Development set:
The development set for task 5 of DCASE 2022 “Few-shot Bioacoustic Event Detection” consists of 192 audio files acquired from different bioacoustic sources. The dataset is split into training and validation sets.
Multi-class annotations are provided for the training set with positive (POS), negative (NEG) and unkwown (UNK) values for each class. UNK indicates uncertainty about a class.
Single-class (class of interest) annotations are provided for the validation set, with events marked as positive (POS) or unkwown (UNK) provided for the class of interest.
this version (3):
fixes issues with annotations from HB set
Folder Structure:
Development_Set.zip
|_Development_Set/
Development_Set_Annotations.zip has the same structure but contains only the *.csv files
Annotation structure
Each line of the annotation csv represents an event in the audio file. The column descriptions are as follows:
Audiofilename, Starttime, Endtime, CLASS_1, CLASS_2, …CLASS_N
Audiofilename, Starttime, Endtime, Q
Classes
DCASE2022_task5_training_set_classes.csv and DCASE2022_task5_validation_set_classes.csv provide a table with class code correspondence to class name for all classes in the Development set.
dataset, class_code, class_name
dataset, recording, class_code, class_name
Evaluation set
The evaluation set for task 5 of DCASE 2022 “Few-shot Bioacoustic Event Detection” consists of 46 audio files acquired from different bioacoustic sources.
The first 5 annotations are provided for each file, with events marked as positive (POS) for the class of interest.
This dataset is to be used for evaluation purposes during the task and the rest of the annotations will be released after the end of the DCASE 2022 challenge (July 1st).
Folder Structure
Evaluation_Set.zip
Evaluation_Set_5shots.zip has the same structure but contains only the *.wav files.
Evaluation_Set_5shots_annotations_only.zip has the same structure but contains only the *.csv files
The subfolders denote different recording sources and there may or may not be overlap between classes of interest from different wav files.
Annotation structure
Each line of the annotation csv represents an event in the audio file. The column descriptions are as follows: [ Audiofilename, Starttime, Endtime, Q ]
Open Access:
This dataset is available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Contact info:
Please send any feedback or questions to:
Ines Nolasco - i.dealmeidanolasco@qmul.ac.uk
- class soundata.datasets.dcase_bioacoustic.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
DCASE bioacoustic Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
csv_path (str) – path to the csv file
clip_id (str) – clip id
split (str) – subset the clip belongs to (for experiments): train, validate, or test
- Other Parameters:
events_classes (list) – list of classes annotated for the file
events (soundata.annotations.Events) – sound events with start time, end time, labels (list for all classes) and confidence
POSevents (soundata.annotations.Events) – sound events for the positive class with start time, end time, label and confidence
- POSevents
The audio events for POS (positive) class
- Returns
annotations.Events - audio event object
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- events
The audio events
- Returns
annotations.Events - audio event object
- events_classes
The audio events
- Returns
list - list of the annotated events
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property split
The data splits (e.g. train)
- Returns
str - split
- property subdataset
The (sub)dataset
- Returns
str - subdataset
- class soundata.datasets.dcase_bioacoustic.Dataset(data_home=None)[source]
The DCASE bioacoustic dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a DCASE bioacoustic audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.dcase_bioacoustic.load_POSevents(fhandle: TextIO) Events [source]
Load an DCASE bioacoustic sound events annotation file, just for POS labels
- Parameters:
fhandle (str or file-like) – File-like object or path to the sound events annotation file
- Raises:
IOError – if csv_path doesn’t exist
- Returns:
Events – sound events annotation data
- soundata.datasets.dcase_bioacoustic.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a DCASE bioacoustic audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- soundata.datasets.dcase_bioacoustic.load_events(fhandle: TextIO) Events [source]
Load an DCASE bioacoustic sound events annotation file
- Parameters:
fhandle (str or file-like) – File-like object or path to the sound events annotation file
- Raises:
IOError – if csv_path doesn’t exist
- Returns:
Events – sound events annotation data
- soundata.datasets.dcase_bioacoustic.load_events_classes(fhandle: TextIO) list [source]
Load an DCASE bioacoustic sound events annotation file
- Parameters:
fhandle (str or file-like) – File-like object or path to the sound events annotation file
positive (bool) – False get all labels, True get just POS labels
- Raises:
IOError – if csv_path doesn’t exist
- Returns:
class_ids – list of events classes
DCASE-birdVox20k
BirdVox20k Dataset Loader
Dataset Info
- Created By
- Vincent Lostanlen*^#, Justin Salamon^#, Andrew Farnsworth*, Steve Kelling*, and Juan Pablo Bello^#* Cornell Lab of Ornithology (CLO)^ Center for Urban Science and Progress, New York University# Music and Audio Research Lab, New York University
Version 1.0
- Description
The BirdVox-DCASE-20k dataset contains 20,000 ten-second audio recordings. These recordings come from ROBIN autonomous recording units, placed near Ithaca, NY, USA during the fall 2015. They were captured on the night of September 23rd, 2015, by six different sensors, originally numbered 1, 2, 3, 5, 7, and 10. Out of these 20,000 recording, 10,017 (50.09%) contain at least one bird vocalization (either song, call, or chatter). The dataset is a derivative work of the BirdVox-full-night dataset [1], containing almost as much data but formatted into ten-second excerpts rather than ten-hour full night recordings. In addition, the BirdVox-DCASE-20k dataset is provided as a development set in the context of the “Bird Audio Detection” challenge, organized by DCASE (Detection and Classification of Acoustic Scenes and Events) and the IEEE Signal Processing Society. The dataset can be used, among other things, for the development and evaluation of bioacoustic classification models.
- Audio Files Included
20,000 ten-second audio recordings (see description above) in WAV format. The wav folder contains the recordings as WAV files, sampled at 44,1 kHz, with a single channel (mono). The original sample rate was 24 kHz.
- Meta-data Files Included
A table containing a binary label “hasbird” associated to every recording in BirdVox-DCASE-20k is available on the website of the DCASE “Bird Audio Detection” challenge: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/ These labels were automatically derived from the annotations of avian flight call events in the BirdVox-full-night dataset.
- Please Acknowledge UrbanSound8K in Academic Research
When BirdVox-70k is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publication:
The creation of this dataset was supported by NSF grants 1125098 (BIRDCAST) and 1633259 (BIRDVOX), a Google Faculty Award, the Leon Levy Foundation, and two anonymous donors.
- Conditions of Use
Dataset created by Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, and Juan Pablo Bello.
The BirdVox-DCASE-20k dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/
The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, Cornell Lab of Ornithology is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the BirdVox-DCASE-20k dataset or any part of it.
- Feedback
Please help us improve BirdVox-DCASE-20k by sending your feedback to: | * Vincent Lostanlen: vincent.lostanlen@gmail.com for feedback regarding data pre-processing, | * Andrew Farnsworth: af27@cornell.edu for feedback regarding data collection and ornithology, or | * Dan Stowell: dan.stowell@qmul.ac.uk for feedback regarding the DCASE “Bird Audio Detection” challenge.
In case of a problem, please include as many details as possible.
- class soundata.datasets.dcase_birdVox20k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
BirdVox20k Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
itemid (str) – clip id
datasetid (str) – the dataset to which the clip belongs to
hasbird (str) – indication of whether the clips contains bird sounds (0/1)
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property dataset_id
The clip’s dataset ID.
- Returns:
str - ID of the dataset from where this clip is extracted
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property has_bird
The flag to tell whether the clip has bird sound or not.
- Returns:
str - 1/0 depending on whether the clip contains bird sound
- property item_id
The clip’s item ID.
- Returns:
str - ID of the clip
- class soundata.datasets.dcase_birdVox20k.Dataset(data_home=None)[source]
The BirdVox20k dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a BirdVox20k audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.dcase_birdVox20k.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float] [source]
Load a BirdVox20k audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
EigenScape
EigenScape Dataset Loader
Dataset Info
EigenScape: a database of spatial acoustic scene recordings
- Created By:
- Marc Ciufo Green, Damian Murphy.Audio Lab, Department of Electronic Engineering, University of York.
Version 2.0
- Description:
EigenScape is a database of acoustic scenes recorded spatially using the mh Acoustics EigenMike. All scenes were recorded in 4th-order Ambisonics The database contains recordings of eight different location classes: Beach, Busy Street, Park, Pedestrian Zone, Quiet Street, Shopping Centre, Train Station, Woodland. The recordings were made in May 2017 at sites across the North of England.
- Audio Files Included:
8 different examples of each location class were recorded over a duration of 10 minutes
64 recordings in total.
ACN channel ordering with SN3D normalisation at 24-bit / 48 kHz resolution.
- Annotations Included:
No event labels associated with this dataset
The metadata file gives more tempogeographic detail on each recording
the EigenScape [recording map](http://bit.ly/EigenSMap) shows the locations and classes of all the recordings.
No predefined training, validation, or testing splits.
- Please Acknowledge EigenScape in Academic Research:
- If you use this dataset please cite its original publication:
Green MC, Murphy D. EigenScape: A database of spatial acoustic scene recordings. Applied Sciences. 2017 Nov;7(11):1204.
- License:
Creative Commons Attribution 4.0 International
- *Important:
Use with caution. This loader “Engineers” a solution to obtain the correct files after Park6 and Park8 got mixed-up at the eigenscape and eigenscape_raw remotes. See the REMOTES and index if you want to understand how this engineered solution works. Also see the discussion about this engineered solution with the dataset author https://github.com/micarraylib/micarraylib/issues/8#issuecomment-1105357329
- class soundata.datasets.eigenscape.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
Eigenscape Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
tags (soundata.annotation.Tags) – tag (scene label) of the clip + confidence.
audio_path (str) – path to the audio file
clip_id (str) – clip id
location (str) – city were the audio signal was recorded
time (str) – time when the audio signal was recorded
date (str) – date when the audio signal was recorded
information (additional) – notes included by the dataset authors with other details relevant to the specific clip
- property additional_information
The clip’s additional information.
- Returns:
str - notes included by the dataset authors with other details relevant to the specific clip
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property date
The clip’s date.
- Returns:
str - date when the audio signal was recorded
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property location
The clip’s location.
- Returns:
str - Tags annotation object
- property tags
The clip’s tags
- Returns:
annotations.Tags - Tags (scene label) of the clip + confidence.
- property time
The clip’s time.
- Returns:
str - time when the audio signal was recorded
- class soundata.datasets.eigenscape.Dataset(data_home=None)[source]
The EigenScape dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load an EigenScape audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.eigenscape.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load an EigenScape audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
EigenScape Raw
EigenScape Dataset Loader
Dataset Info
EigenScape: a database of spatial acoustic scene recordings
- Created By:
- Marc Ciufo Green, Damian Murphy.Audio Lab, Department of Electronic Engineering, University of York.
Version raw
- Description:
EigenScape is a database of acoustic scenes recorded spatially using the mh Acoustics EigenMike. All scenes in this format are in Raw format (A-format) with 32 channels The database contains recordings of eight different location classes: Beach, Busy Street, Park, Pedestrian Zone, Quiet Street, Shopping Centre, Train Station, Woodland. The recordings were made in May 2017 at sites across the North of England.
- Audio Files Included:
8 different examples of each location class were recorded over a duration of 10 minutes
64 recordings in total.
EigenMike channel ordering (32 total) with calibration and PGA level (captured with firewire interface and EigenStudio). 24-bit / 48 kHz resolution.
- Annotations Included:
No event labels associated with this dataset
The metadata file gives more tempogeographic detail on each recording
the EigenScape recording map shows the locations and classes of all the recordings.
No predefined training, validation, or testing splits.
- Please Acknowledge EigenScape in Academic Research:
- If you use this dataset please cite its original publication:
Green MC, Murphy D. EigenScape: A database of spatial acoustic scene recordings. Applied Sciences. 2017 Nov;7(11):1204.
- License:
Creative Commons Attribution 4.0 International
- *Important:
Use with caution. This loader “Engineers” a solution to obtain the correct files after Park6 and Park8 got mixed-up at the eigenscape and eigenscape_raw remotes. See the REMOTES and index if you want to understand how this engineered solution works. Also see the discussion about this engineered solution with the dataset author https://github.com/micarraylib/micarraylib/issues/8#issuecomment-1105357329
- class soundata.datasets.eigenscape_raw.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
Eigenscape Raw Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio_path (str) – path to the audio file
information (additional) – notes included by the dataset authors with other details relevant to the specific clip
clip_id (str) – clip id
date (str) – date when the audio signal was recorded
location (str) – city were the audio signal was recorded
tags (soundata.annotation.Tags) – tag (scene label) of the clip + confidence.
time (str) – time when the audio signal was recorded
- property additional_information
The clip’s additional information.
- Returns:
str - notes included by the dataset authors with other details relevant to the specific clip
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property date
The clip’s date.
- Returns:
str - date when the audio signal was recorded
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property location
The clip’s location.
- Returns:
str - Tags annotation object
- property tags
The clip’s tags
- Returns:
annotations.Tags - Tags (scene label) of the clip + confidence.
- property time
00-23:59).
- Returns:
str - time when the audio signal was recorded
- Type:
The clip’s time (00
- class soundata.datasets.eigenscape_raw.Dataset(data_home=None)[source]
The EigenScape Raw dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load an EigenScape Raw audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.eigenscape_raw.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load an EigenScape Raw audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
ESC-50
ESC-50 Dataset Loader
Dataset Info
ESC-50: Dataset for Environmental Sound Classification
The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. The total duration of the dataset is 2.8 hours (2000 x 5 seconds).
The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into 5 major categories:
Animals Natural soundscapes & water sounds Human, non-speech sounds Interior/domestic sounds Exterior/urban noises Dog Rain Crying baby Door knock Helicopter Rooster Sea waves Sneezing Mouse click Chainsaw Pig Crackling fire Clapping Keyboard typing Siren Cow Crickets Breathing Door, wood creaks Car horn Frog Chirping birds Coughing Can opening Engine Cat Water drops Footsteps Washing machine Train Hen Wind Laughing Vacuum cleaner Church bells Insects (flying)Pouring water Brushing teeth Clock alarm Airplane Sheep Toilet flush Snoring Clock tick Fireworks Crow Thunderstorm Drinking, sipping Glass breaking Hand saw
Clips in this dataset have been manually extracted from public field recordings gathered by the Freesound.org project. The dataset has been prearranged into 5 folds for comparable cross-validation, making sure that fragments from the same original source file are contained in a single fold.
A more thorough description of the dataset is available in the original paper with some supplementary materials on GitHub:
https://github.com/karolpiczak/ESC-50
Repository content audio/*.wav
2000 audio recordings in WAV format (5 seconds, 44.1 kHz, mono) with the following naming convention:
{FOLD}-{CLIP_ID}-{TAKE}-{TARGET}.wav
{FOLD} - index of the cross-validation fold, {CLIP_ID} - ID of the original Freesound clip, {TAKE} - letter disambiguating between different fragments from the same Freesound clip, {TARGET} - class in numeric format [0, 49]. meta/esc50.csv
CSV file with the following structure:
filename fold target category esc10 src_file take
The esc10 column indicates if a given file belongs to the ESC-10 subset (10 selected classes, CC BY license).
https://github.com/karolpiczak/ESC-50/blob/master/meta/esc50-human.xlsx
Additional data pertaining to the crowdsourcing experiment (human classification accuracy).
- class soundata.datasets.esc50.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
ESC-50 Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
category (str) – clip class in string format, i.e., label
clip_id (str) – clip id
esc10 (bool) – True if the clip belongs to the ESC-10 subset (10 selected classes, CC BY license)
filename (str) – clip filename
fold (int) – index of the cross-validation fold the clip belongs to
src_file (str) – freesound ID of the original file from which the clip was taken
tags (soundata.annotations.Tags) – tag (label) of the clip + confidence. In ESC-50 every clip has one tag.
take (str) – letter disambiguating between different fragments from the same Freesound clip (e.g., “A”, “B”, etc.)
target (int) – clip class in numeric format
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property category
The clip’s category.
- Returns:
str - clip class in string format, i.e., label
- property esc10
The clip’s esc10.
- Returns:
bool - True if the clip belongs to the ESC-10 subset (10 selected classes, CC BY license)
- property filename
The clip’s filename
- Returns:
str - clip filename
- property fold
The clip’s fold
- Returns:
int - index of the cross-validation fold the clip belongs to
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property src_file
The clip’s source file.
- Returns:
str - freesound ID of the original file from which the clip was taken
- property tags
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property take
The clip’s take
- Returns:
str - letter disambiguating between different fragments from the same Freesound clip (e.g., “A”, “B”, etc.)
- property target
The clip’s target.
- Returns:
int - clip class in numeric format
- class soundata.datasets.esc50.Dataset(data_home=None)[source]
The ESC-50 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load an ESC-50 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which loads the file using its original sample rate of 44100.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.esc50.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load an ESC-50 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which loads the file using its original sample rate of 44100.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
Freefield1010
freefield1010 Dataset Loader
Dataset Info
freefield1010: A Dataset of Field Recording Excerpts for Bioacoustic Research
- Created By:
- Dan Stowell, Mark D. Plumbley.Centre for Digital Music, Queen Mary University of London.
Version 1.0
- Description:
The freefield1010 dataset is a collection of 7,690 field recording excerpts from various global locations, standardized for research purposes. These recordings cover a wide range of environments and locales. The dataset is part of the “Bird Audio Detection” challenge, a joint venture by DCASE (Detection and Classification of Acoustic Scenes and Events) and the IEEE Signal Processing Society. It’s particularly useful for bioacoustic classification models, with annotations indicating the presence or absence of birds in the recordings.
- Audio Files Included:
The dataset consists of 7,690 audio clips, sourced from the field-recording tag in the Freesound audio archive.
All sounds have been converted to standard CD-quality mono WAV format.
Files are stored as 16-bit 44.1 kHz WAV files in the ‘wav’ folder.
Amplitude of each excerpt has been normalized due to the varying levels in the Freesound archive.
- Meta-data Files Included:
A binary label “hasbird” is associated with every recording.
The metadata is available on the DCASE “Bird Audio Detection” challenge website: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/
- Please Acknowledge freefield1010 in Academic Research:
When using the freefield1010 dataset for academic research, please cite the following paper:
Stowell, M. Plumbley. “An open dataset for research on audio field recording archives: Freefield1010.”, Proc. Audio Engineering Society 53rd Conference on Semantic Audio (AES53), 2014.
- Conditions of Use:
The freefield1010 dataset is created by Dan Stowell and Mark D. Plumbley.
It is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/
- class soundata.datasets.freefield1010.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
freefield1010 Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
itemid (str) – clip id
datasetid (str) – the dataset to which the clip belongs to
hasbird (str) – indication of whether the clips contains bird sounds (0/1)
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property dataset_id
The clip’s dataset ID.
- Returns:
str - ID of the dataset from where this clip is extracted
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property has_bird
The flag to tell whether the clip has bird sound or not.
- Returns:
str - 1/0 depending on whether the clip contains bird sound
- property item_id
The clip’s item ID.
- Returns:
str - ID of the clip
- class soundata.datasets.freefield1010.Dataset(data_home=None)[source]
The freefield1010 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a freefield1010 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.freefield1010.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float] [source]
Load a freefield1010 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
FSD50K
FSD50K Dataset Loader
Dataset Info
FSD50K: an Open Dataset of Human-Labeled Sound Events
- Created By:
- Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra.Music Technology Group, Universitat Pompeu Fabra (Barcelona).
Version 1.0
- Description:
FSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
- Audio Files Included:
FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio.
The audio content is composed mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. The vocabulary can be inspected in vocabulary.csv.
Clips are of variable length from 0.3 to 30s, due to the diversity of the sound classes and the preferences of Freesound users when recording sounds.
All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files.
- Annotations Included:
The dataset encompasses 200 sound classes (144 leaf nodes and 56 intermediate nodes) hierarchically organized with a subset of the AudioSet Ontology. Please refer to the included vocabulary.csv file for a complete list of considered classes.
The acoustic material has been manually labeled by humans following a data labeling process using the Freesound Annotator platform.
Ground truth labels are provided at the clip-level (i.e., weak labels).
Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal, Human group actions, Human voice, Respiratory sounds, and Domestic sounds, home sounds.
Note: We use a slightly different format than AudioSet for the naming of class labels in order to avoid potential problems with spaces, commas, etc. Example: we use Accelerating_and_revving_and_vroom instead of the original Accelerating, revving, vroom. You can go back to the original AudioSet naming using the information provided in vocabulary.csv (class label and mid for the 200 classes of FSD50K) and the AudioSet Ontology specification.
- Organization:
FSD50K is split in two subsets: the developement (dev) and the evaluation (eval) sets. Especifications of both subsets is detailed below:
- Dev set:
40,966 audio clips totalling 80.4 hours of audio
Avg duration/clip: 7.1s
114,271 smeared labels (i.e., labels propagated in the upwards direction to the root of the ontology)
Labels are correct but could be occasionally incomplete
A train/validation split is provided. If a different split is used, it should be specified for reproducibility and fair comparability of results
- Eval set:
10,231 audio clips totalling 27.9 hours of audio
Avg duration/clip: 9.8s
38,596 smeared labels
Eval set is labeled exhaustively (labels are correct and complete for the considered vocabulary)
- Ground-truth Files Included:
FSD50K ground-truth is represented through the following file structure:
- dev.csv:
Each row (i.e. audio clip) of dev.csv contains the following information:
- fname:
The file name without the .wav extension, e.g., the fname 64760 corresponds to the file 64760.wav in disk. This number is the Freesound id. We always use Freesound ids as filenames.
- labels:
The class labels (i.e., the ground truth). Note these class labels are smeared, i.e., the labels have been propagated in the upwards direction to the root of the ontology. More details about the label smearing process can be found in Appendix D of our paper.
- mids:
The Freebase identifiers corresponding to the class labels, as defined in the AudioSet Ontology specification.
- split:
Whether the clip belongs to train or val (see paper for details on the proposed split)
- eval.csv:
Rows in eval.csv follow the same format as dev.csv, except that there is no split column.
- Metadata Files Included:
To allow a variety of analysis and approaches with FSD50K, we provide the following metadata:
- class_info_FSD50K.json:
Python dictionary where each entry corresponds to one sound class and contains: FAQs utilized during the annotation of the class, examples (representative audio clips), and verification_examples (audio clips presented to raters during annotation as a quality control mechanism). Audio clips are described by the Freesound id. Note: It may be that some of these examples are not included in the FSD50K release.
- dev_clips_info_FSD50K.json:
Python dictionary where each entry corresponds to one dev clip and contains: title, description, tags, clip license, and the uploader name. All these metadata are provided by the uploader.
- eval_clips_info_FSD50K.json:
Same as above, but with eval clips.
- pp_pnp_ratings.json:
Python dictionary where each entry corresponds to one clip in the dataset and contains the PP/PNP ratings for the labels associated with the clip. More specifically, these ratings are gathered for the labels validated in the validation task. This file includes 59,485 labels for the 51,197 clips in FSD50K. Out of these labels:
56,095 labels have inter-annotator agreement (PP twice, or PNP twice). Each of these combinations can be occasionally accompanied by other (non-positive) ratings.
3390 labels feature other rating configurations such as i) only one PP rating and one PNP rating (and nothing else). This can be considered inter-annotator agreement at the “Present” level; ii) only one PP rating (and nothing else); iii) only one PNP rating (and nothing else).
Ratings’ legend: PP=1; PNP=0.5; U=0; NP=-1.
Note: The PP/PNP ratings have been provided in the validation task. Subsequently, a subset of these clips corresponding to the eval set was exhaustively labeled in the refinement task, hence receiving additional labels in many cases. For these eval clips, you might want to check their labels in eval.csv in order to have more info about their audio content.
- collection folder:
This folder contains metadata for what we call the sound collection format. This format consists of the raw annotations gathered, featuring all generated class labels without any restriction. We provide the collection format to make available some annotations that do not appear in the FSD50K ground truth release. This typically happens in the case of classes for which we gathered human-provided annotations, but that were discarded in the FSD50K release due to data scarcity (more specifically, they were merged with their parents). In other words, the main purpose of the collection format is to make available annotations for tiny classes. The format of these files in analogous to that of the files in FSD50K.ground_truth/. A couple of examples show the differences between collection and ground truth formats:
clip: labels_in_collection - labels_in_ground_truth
51690: Owl - Bird,Wild_Animal,Animal
190579: Toothbrush,Electric_toothbrush - Domestic_sounds_and_home_sounds
In the first example, raters provided the label Owl. However, due to data scarcity, Owl labels were merged into their parent Bird. Then, labels Wild_Animal,Animal were added via label propagation (smearing). The second example shows one of the most extreme cases, where raters provided the labels Electric_toothbrush,Toothbrush, which both had few data. Hence, they were merged into Toothbrush’s parent, which unfortunately is Domestic_sounds_and_home_sounds (a rather vague class containing a variety of children sound classes).
Note: Labels in the collection format are not smeared.
Note: While in FSD50K’s ground truth the vocabulary encompasses 200 classes (common for dev and eval), since the collection format is composed of raw annotations, the vocabulary here is much larger (over 350 classes), and it is slightly different in dev and eval.
- Please Acknowledge FSD50K in Academic Research:
If you use the FSD50K Dataset please cite the following paper:
Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. “FSD50K: an Open Dataset of Human-Labeled Sound Events”, arXiv:2010.00475, 2020.
The authors would like to thank everyone who contributed to FSD50K with annotations, and especially Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez for their commitment and perseverance. The authors would also like to thank Daniel P.W. Ellis and Manoj Plakal from Google Research for valuable discussions. This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 AudioCommons, and two Google Faculty Research Awards 2017 and 2018, and the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
- License:
All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json and eval_clips_info_FSD50K.json. These licenses are CC0, CC-BY, CC-BY-NC and CC Sampling+.
In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSD50K.doc zip file.
Usage of FSD50K for commercial purposes: If you’d like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at eduardo.fonseca@upf.edu and frederic.font@upf.edu.
- Feedback:
For further questions, please contact eduardo.fonseca@upf.edu, or join the freesound-annotator Google Group.
- class soundata.datasets.fsd50k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
FSD50K Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
clip_id (str) – clip id
description (str) – description of the sound provided by the Freesound uploader
mids (soundata.annotations.Tags) – tag (labels) encoded in Audioset formatting
pp_pnp_ratings (dict) – PP/PNP ratings given to the main label of the clip
split (str) – flag to identify if clip belongs to developement, evaluation or validation splits
tags (soundata.annotations.Tags) – tag (label) of the clip + confidence
title (str) – the title of the uploaded file in Freesound
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio.
- Returns:
np.ndarray - audio signal
float - sample rate
- property description
The clip’s description.
- Returns:
str - description of the sound provided by the Freesound uploader
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property mids
The clip’s mids.
- Returns:
annotations.Tags - tag (labels) encoded in Audioset formatting
- property pp_pnp_ratings
The clip’s PP/PNP ratings.
- Returns:
dict - PP/PNP ratings given to the main label of the clip
- property split
The clip’s split.
- Returns:
str - flag to identify if clip belongs to developement, evaluation or validation splits
- property tags
The clip’s tags.
- Returns:
annotations.Tags - tag (label) of the clip + confidence
- property title
The clip’s title.
- Returns:
str - the title of the uploaded file in Freesound
- class soundata.datasets.fsd50k.Dataset(data_home=None)[source]
The FSD50K dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a FSD50K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- load_clips()[source]
Load all clips in the dataset
- Returns:
dict – {clip_id: clip data}
- Raises:
NotImplementedError – If the dataset does not support Clips
- load_fsd50k_vocabulary(*args, **kwargs)[source]
Load vocabulary of FSD50K to relate FSD50K labels with AudioSet onthology
- Parameters:
data_path (str) – Path to the vocabulary file
- Returns:
** fsd50k_to_audioset (dict)* – vocabulary to convert FSD50K to AudioSet * audioset_to_fsd50k (dict): vocabulary to convert from AudioSet to FSD50K
- soundata.datasets.fsd50k.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a FSD50K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- soundata.datasets.fsd50k.load_fsd50k_vocabulary(data_path)[source]
Load vocabulary of FSD50K to relate FSD50K labels with AudioSet onthology
- Parameters:
data_path (str) – Path to the vocabulary file
- Returns:
** fsd50k_to_audioset (dict)* – vocabulary to convert FSD50K to AudioSet * audioset_to_fsd50k (dict): vocabulary to convert from AudioSet to FSD50K
- soundata.datasets.fsd50k.load_ground_truth(data_path)[source]
Load ground truth files of FSD50K
- Parameters:
data_path (str) – Path to the ground truth file
- Returns:
** ground_truth_dict (dict)* – ground truth dict of the clips in the input split * clip_ids (list): list of clip ids of the input split
FSDnoisy18K
FSDnoisy18K Dataset Loader
Dataset Info
- Created By:
- Eduardo Fonseca, Mercedes Collado, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra.Music Technology Group, Universitat Pompeu Fabra (Barcelona).
Version 1.0
- Description:
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
What follows is a summary of the most basic aspects of FSDnoisy18k. For a complete description of FSDnoisy18k, make sure to check:
The FSDnoisy18k companion site: http://www.eduardofonseca.net/FSDnoisy18k/
The description provided in Section 2 of our ICASSP 2019 paper
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
The source of audio content is Freesound—a sound sharing site created an maintained by the Music Technology Group hosting over 400,000 clips uploaded by its community of users, who additionally provide some basic metadata (e.g., tags, and title). The 20 classes of FSDnoisy18k are drawn from the AudioSet Ontology and are selected based on data availability as well as on their suitability to allow the study of label noise. The 20 classes are: “Acoustic guitar”, “Bass guitar”, “Clapping”, “Coin (dropping)”, “Crash cymbal”, “Dishes, pots, and pans”, “Engine”, “Fart”, “Fire”, “Fireworks”, “Glass”, “Hi-hat”, “Piano”, “Rain”, “Slam”, “Squeak”, “Tearing”, “Walk, footsteps”, “Wind”, and “Writing”. FSDnoisy18k was created with the Freesound Annotator, which is a platform for the collaborative creation of open audio datasets.
We defined a clean portion of the dataset consisting of correct and complete labels. The remaining portion is referred to as the noisy portion. Each clip in the dataset has a single ground truth label (singly-labeled data).
The clean portion of the data consists of audio clips whose labels are rated as present in the clip and predominant (almost all with full inter-annotator agreement), meaning that the label is correct and, in most cases, there is no additional acoustic material other than the labeled class. A few clips may contain some additional sound events, but they occur in the background and do not belong to any of the 20 target classes. This is more common for some classes that rarely occur alone, e.g., “Fire”, “Glass”, “Wind” or “Walk, footsteps”.
The noisy portion of the data consists of audio clips that received no human validation. In this case, they are categorized on the basis of the user-provided tags in Freesound. Hence, the noisy portion features a certain amount of label noise.
- Included files and statistics:
FSDnoisy18k contains 18,532 audio clips (42.5h) unequally distributed in the 20 aforementioned classes drawn from the AudioSet Ontology.
The audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.
The audio clips are of variable length ranging from 300ms to 30s, and each clip has a single ground truth label (singly-labeled data).
The dataset is split into a test set and a train set. The test set is drawn entirely from the clean portion, while the remainder of data forms the train set.
The train set is composed of 17,585 clips (41.1h) unequally distributed among the 20 classes. It features a clean subset and a noisy subset. In terms of number of clips their proportion is 10%/90%, whereas in terms of duration the proportion is slightly more extreme (6%/94%). The per-class percentage of clean data within the train set is also imbalanced, ranging from 6.1% to 22.4%. The number of audio clips per class ranges from 51 to 170, and from 250 to 1000 in the clean and noisy subsets, respectively. Further, a noisy small subset is defined, which includes an amount of (noisy) data comparable (in terms of duration) to that of the clean subset.
The test set is composed of 947 clips (1.4h) that belong to the clean portion of the data. Its class distribution is similar to that of the clean subset of the train set. The number of per-class audio clips in the test set ranges from 30 to 72. The test set enables a multi-class classification problem.
FSDnoisy18k is an expandable dataset that features a per-class varying degree of types and amount of label noise. The dataset allows investigation of label noise as well as other approaches, from semi-supervised learning, e.g., self-training to learning with minimal supervision.
- Additional code:
We’ve released the code for our ICASSP 2019 paper at https://github.com/edufonseca/icassp19. The framework comprises all the basic stages: feature extraction, training, inference and evaluation. After loading the FSDnoisy18k dataset, log-mel energies are computed and a CNN baseline is trained and evaluated. The code also allows to test four noise-robust loss functions. Please check our paper for more details.
- Label noise characteristics:
FSDnoisy18k features real label noise that is representative of audio data retrieved from the web, particularly from Freesound. The analysis of a per-class, random, 15% of the noisy portion of FSDnoisy18k revealed that roughly 40% of the analyzed labels are correct and complete, whereas 60% of the labels show some type of label noise. Please check the FSDnoisy18k companion site for a detailed characterization of the label noise in the dataset, including a taxonomy of label noise for singly-labeled data as well as a per-class description of the label noise.
- Relevant links:
Source code for our preprint: https://github.com/edufonseca/icassp19
Freesound Annotator: https://annotator.freesound.org/
Freesound: https://freesound.org
Eduardo Fonseca’s personal website: http://www.eduardofonseca.net/
- Please Acknowledge FSDnoisy18K in Academic Research:
If you use the FSDnoisy18K Dataset please cite the following paper:
Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra, “Learning Sound Event Classifiers from Web Audio with Noisy Labels”, arXiv preprint arXiv:1901.01189, 2019
This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 AudioCommons. Eduardo Fonseca is also sponsored by a Google Faculty Research Award 2017. We thank everyone who contributed to FSDnoisy18k with annotations.
- License:
FSDnoisy18k has licenses at two different levels, as explained next. All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. In particular, all Freesound clips included in FSDnoisy18k are released under either CC-BY or CC0. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of audio clips and their corresponding license in the LICENSE-INDIVIDUAL-CLIPS file downloaded with the dataset.
In addition, FSDnoisy18k as a whole is the result of a curation process and it has an additional license. FSDnoisy18k is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the dataset.
- Feedback:
For further questions, please contact eduardo.fonseca@upf.edu, or join the freesound-annotator Google Group.
- class soundata.datasets.fsdnoisy18k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
FSDnoisy18K Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
aso_id (str) – the id of the corresponding category as per the AudioSet Ontology
audio_path (str) – path to the audio file
clip_id (str) – clip id
manually_verified (int) – flag to indicate whether the clip belongs to the clean portion (1), or to the noisy portion (0) of the train set
noisy_small (int) – flag to indicate whether the clip belongs to the noisy_small portion (1) of the train set
split (str) – flag to indicate whether the clip belongs the train or test split
tag (soundata.annotations.Tags) – tag (label) of the clip + confidence
- property aso_id
The clip’s Audioset ontology ID.
- Returns:
str - the id of the corresponding category as per the AudioSet Ontology
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property manually_verified
The clip’s manually annotated flag.
- Returns:
int - flag to indicate whether the clip belongs to the clean portion (1), or to the noisy portion (0) of the train set
- property noisy_small
The clip’s noisy flag.
- Returns:
int - flag to indicate whether the clip belongs to the noisy_small portion (1) of the train set
- property split
The clip’s split.
- Returns:
str - flag to indicate whether the clip belongs the train or test split
- property tags
The clip’s tags.
- Returns:
annotations.Tags - tag (label) of the clip + confidence
- class soundata.datasets.fsdnoisy18k.Dataset(data_home=None)[source]
The FSDnoisy18K dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a FSDnoisy18K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.fsdnoisy18k.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a FSDnoisy18K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
SINGA:PURA
SINGA:PURA Dataset Loader
Dataset Info
SINGA:PURA (SINGApore: Polyphonic URban Audio) v1.0a
- Created by:
Kenneth Ooi, Karn N. Watcharasupat, Santi Peksi, Furi Andi Karnapi, Zhen-Ting Ong, Danny Chua, Hui-Wen Leow, Li-Long Kwok, Xin-Lei Ng, Zhen-Ann Loh, Woon-Seng Gan
Digital Signal Processing Laboratory, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.
- Description:
The SINGA:PURA (SINGApore: Polyphonic URban Audio) dataset is a strongly-labelled polyphonic urban sound dataset with spatiotemporal context. The dataset contains 6547 strongly-labelled and 72406 unlabelled recordings from a wireless acoustic sensor network deployed in Singapore to identify and mitigate noise sources in Singapore. The strongly-labelled and unlabelled recordings are disjoint, so there are a total of 78953 unique recordings. The recordings are all 10 seconds in length, and may have 1 or 7 channels, depending on the recording device used to record them. Total duration for the labelled subset provided here is 18.2 hours.
For full details regarding the sensor units used, the recording conditions, and annotation methodology, please refer to our conference paper.
- Annotations:
Our label taxonomy is derived from the taxonomy used in the SONYC-UST datasets, but has been adapted to fit the local (Singapore) context while retaining compatibility with the SONYC-UST ontonology. We chose this taxonomy to allow the SINGA:PURA dataset to be used in conjunction with the SONYC-UST datasets when training urban sound tagging models by simply omitting the labels that are absent in the SONYC-UST taxonomy from the recordings in the SINGA:PURA dataset.
Specifically, our label taxonomy consists of 14 coarse-grained classes and 40 fine-grained classes. Their organisation is as follows:
- Engine
Small engine
Medium engine
Large engine
- Machinery impact
Rock drill
Jackhammer
Hoe ram
Pile driver
- Non-machinery impact
Glass breaking (*)
Car crash (*)
Explosion (*)
- Powered saw
Chainsaw
Small/medium rotating saw
Large rotating saw
- Alert signal
Car horn
Car alarm
Siren
Reverse beeper
- Music
Stationary music
Mobile music
- Human voice
Talking
Shouting
Large crowd
Amplified speech
Singing (*)
- Human movement (*)
Footsteps (*)
Clapping (*)
- Animal (*)
Dog barking
Bird chirping (*)
Insect chirping (*)
- Water (*)
Hose pump (*)
- Weather (*)
Rain (*)
Thunder (*)
Wind (*)
- Brake (*)
Friction brake (*)
Exhaust brake (*)
- Train (*)
Electric train (*)
- Others (*)
Screeching (*)
Plastic crinkling (*)
Cleaning (*)
Gear (*)
Classes marked with an asterisk (*) are present in the SINGA:PURA taxonomy but not the SONYC taxonomy. The “Ice cream truck” class from the SONYC taxonomy has been excluded from the SINGA:PURA taxonomy because this class does not exist in the local context.
In addition, note that the label for the coarse-grained class “Others” in the soundata loader is “0”, which is different from the label “X” that is used in the full version of the SINGA:PURA dataset.
- This dataset is also accessible via:
Zenodo (labelled subset only): https://zenodo.org/record/5645825
DR-NTU (all): https://researchdata.ntu.edu.sg/dataset.xhtml?persistentId=doi:10.21979/N9/Y8UQ6F
- Please Acknowledge SINGA:PURA in Academic Research:
If you use this dataset please cite its original publication:
Ooi, K. N. Watcharasupat, S. Peksi, F. A. Karnapi, Z.-T. Ong, D. Chua, H.-W. Leow, L.-L. Kwok, X.-L. Ng, Z.-A. Loh, W.-S. Gan, “A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context,” in Proceedings of the 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2021.
- License:
Creative Commons Attribution-ShareAlike 4.0 International.
- class soundata.datasets.singapura.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
- Parameters:
clip_id (str) – clip id of the clip
- Variables:
clip_id (str) – clip id
audio (np.ndarray, float) – audio data
audio_path (str) – path to the audio file
events (annotations.MultiAnnotator) – sound events with start time, end time, label and confidence
annotation_path (str) – path to the annotation file
sensor_id (str) – sensor_id of the device used to record the data
town (str) – town in Singapore where the sensor is located
timestamp (np.datetime) – timestamp of the recording
dotw (int) – day of the week when the clip was recorded, starting from 0 for Sunday
- property audio
The clip’s audio
- Returns:
np.ndarray - audio signal
- property dotw: int
The clip’s day of the week
- Returns:
int - day of the week when the clip was recorded, starting from 0 for Sunday
- events
The clip’s event annotations
- Returns:
annotations.MultiAnnotator - sound events with start time, end time, label and confidence
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property sensor_id: str
The clip’s sensor ID
- Returns:
str - sensor_id of the device used to record the data
- property timestamp: numpy.datetime64
The clip’s timestamp
- Returns:
np.datetime64 - timestamp of the clip
- property town: str
The clip’s location
- Returns:
str - location of the sensor, one of {‘East 1’, ‘East 2’, ‘West 1’, ‘West 2’}
- class soundata.datasets.singapura.Dataset(data_home=None)[source]
SINGA:PURA v1.0 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_annotation(*args, **kwargs)[source]
Load an annotation file.
- Parameters:
fhandle (str or file-like) – path or file-like object pointing to an annotation file
- Returns:
annotations.MultiAnnotator - sound events with start time, end time, label and confidence
- load_audio(*args, **kwargs)[source]
Load a Example audio file.
- Parameters:
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns:
np.ndarray - the audio signal at 44.1 kHz
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.singapura.load_annotation(fhandle: TextIO) MultiAnnotator [source]
Load an annotation file.
- Parameters:
fhandle (str or file-like) – path or file-like object pointing to an annotation file
- Returns:
annotations.MultiAnnotator - sound events with start time, end time, label and confidence
STARSS 2022
Sony-TAu Realistic Spatial Soundscapes (STARSS) 2022 Dataset Loader
Dataset Info
*Sony-TAu Realistic Spatial Soundscapes: sound scenes in various rooms and environments, together with temporal and spatial annotations of prominent events belonging to a set of target classes.
- Created By:
- Archontis Politis, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Tuomas VirtanenAudio Research Group, Tampere University (Finland).Yuki Mitsufuji, Kazuki Shimada, Naoya Takahashi, Yuichiro Koyama, Shusuke TakahashiSONY
Version 1.0.0
- Description:
Contains multichannel recordings of sound scenes in various rooms and environments, together with temporal and spatial annotations of prominent events belonging to a set of target classes. The dataset is collected in two different countries, in Tampere, Finland by the Audio Researh Group (ARG) of Tampere University (TAU), and in Tokyo, Japan by SONY, using a similar setup and annotation procedure. The dataset is delivered in two 4-channel spatial recording formats, a microphone array one (MIC), and first-order Ambisonics one (FOA). These recordings serve as the development dataset for the DCASE 2022 Sound Event Localization and Detection Task of the DCASE 2022 Challenge.
- Contrary to the three previous datasets of synthetic spatial sound scenes of
TAU Spatial Sound Events 2019 (development/evaluation),
TAU-NIGENS Spatial Sound Events 2020, and
TAU-NIGENS Spatial Sound Events 2021
associated with the previous iterations of the DCASE Challenge, the STARS22 dataset contains recordings of real sound scenes and hence it avoids some of the pitfalls of synthetic generation of scenes. Some such key properties are:
annotations are based on a combination of human annotators for sound event activity and optical tracking for spatial positions,
the annotated target event classes are determined by the composition of the real scenes,
the density, polyphony, occurences and co-occurences of events and sound classes is not random, and it follows actions and interactions of participants in the real scenes.
The recordings were collected between September 2021 and January 2022. Collection of data from the TAU side has received funding from Google.
- Audio Files Included:
70 recording clips of 30 sec ~ 5 min durations, with a total time of ~2hrs, contributed by SONY (development dataset).
51 recording clips of 1 min ~ 5 min durations, with a total time of ~3hrs, contributed by TAU (development dataset).
40 recordings contributed by SONY for the training split, captured in 2 rooms (dev-train-sony).
30 recordings contributed by SONY for the testing split, captured in 2 rooms (dev-test-sony).
27 recordings contributed by TAU for the training split, captured in 4 rooms (dev-train-tau).
24 recordings contributed by TAU for the testing split, captured in 3 rooms (dev-test-tau).
A total of 11 unique rooms captured in the recordings, 4 from SONY and 7 from TAU (development set).
Sampling rate 24kHz.
Two 4-channel 3-dimensional recording formats: first-order Ambisonics (FOA) and tetrahedral microphone array (MIC).
Recordings are taken in two different countries and two different sites.
Each recording clip is part of a recording session happening in a unique room.
Groups of participants, sound making props, and scene scenarios are unique for each session (with a few exceptions).
13 target classes are identified in the recordings and strongly annotated by humans.
Spatial annotations for those active events are captured by an optical tracking system.
Sound events out of the target classes are considered as interference and are not labeled.
- Annotations Included:
Each recording in the development set has labels of events and DoAs in a plain csv file with the same filename.
Each row in the csv file has a frame number, active class index, source number index, azimuth, and elevation.
Frame, class, and source enumeration begins at 0.
Frames correspond to a temporal resolution of 100msec.
Azimuth and elevation angles are given in degrees, rounded to the closest integer value, with azimuth and elevation being zero at the front, azimuth \(\phi \in [-180^{\circ}, 180^{\circ}]\), and elevation \(\theta \in [-90^{\circ}, 90^{\circ}]\). Note that the azimuth angle is increasing counter-clockwise (\(\phi = 90^{\circ}\) at the left).
The source index is a unique integer for each source in the scene, and it is provided only as additional information. Note that each unique actor gets assigned one such identifier, but not individual events produced by the same actor; e.g. a clapping event and a laughter event produced by the same person have the same identifier. Independent sources that are not actors (e.g. a loudspeaker playing music in the room) get a 0 identifier. Note that source identifier information is only included in the development metadata and is not required to be provided by the participants in their results.
Overlapping sound events are indicated with duplicate frame numbers, and can belong to a different or the same class.
- Organization
The development dataset is split in training and test sets.
The training set consists of 67 recordings.
The test set consists of 54 recordings.
- Please Acknowledge Sony-TAu Realistic Spatial Soundscapes (STARSS) 2022 in Academic Research:
If you use this dataset please cite the report on its creation, and the corresponding DCASE2022 task setup:
Politis, Adavanne, Mitsufuji, Yuki, Sudarsanam, Parthasaarathy, Shimada, Kazuki, Adavanne, Sharath, Koyama, Yuichiro, Krause, Daniel, Takahashi, Naoya, Takahashi, Shusuke, & Virtanen, Tuomas. (2022). STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6387880
- License:
This dataset is licensed under the [MIT](https://opensource.org/licenses/MIT) license
- class soundata.datasets.starss2022.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
STARSS 2022 Clip class :Parameters: clip_id (str) – id of the clip
- Variables:
audio_path (str) – path to the audio file
csv_path (str) – path to the csv file
format (str) – whether the clip is in FOA or MIC format
set (str) – the data subset the clip belongs to (development or evaluation)
split (str) – the set slip the clip belongs to (training or test)
clip_id (str) – clip id
spatial_events (SpatialEvents) – sound events with time step, elevation, azimuth, distance, label, clip_number and confidence.
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio :returns:
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- spatial_events
The clip’s event annotations :returns:
- SpatialEvents with attributes
- intervals (list): list of size n np.ndarrays of shape (m, 2), with intervals
(as floats) in TIME_UNITS in the form [start_time, end_time]
intervals_unit (str): intervals unit, one of TIME_UNITS
time_step (int, float, or None): the time-step between events
- elevations (list): list of size n with np.ndarrays with dtype int,
indicating the elevation of the sound event per time_step.
elevations_unit (str): elevations unit, one of ELEVATIONS_UNITS
- azimuths (list): list of size n with np.ndarrays with dtype int,
indicating the azimuth of the sound event per time_step if moving
azimuths_unit (str): azimuths unit, one of AZIMUTHS_UNITS
- distances (list): list of size n with np.ndarrays with dtype int,
indicating the distance of the sound event per time_step if moving
distances_unit (str): distances unit, one of DISTANCES_UNITS
labels (list): list of event labels (as strings)
labels_unit (str): labels unit, one of LABELS_UNITS
clip_number_indices (list): list of clip number indices (as strings)
confidence (np.ndarray or None): array of confidence values
- class soundata.datasets.starss2022.Dataset(data_home=None)[source]
The STARSS 2022 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a STARSS 2022 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file
sr (int or None) – sample rate for loaded audio, 24000 Hz by default.
If different from file’s sample rate it will be resampled on load.
Use None to load the file using its original sample rate (24000)
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.starss2022.load_audio(fhandle: BinaryIO, sr=24000) Tuple[numpy.ndarray, float] [source]
Load a STARSS 2022 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file
sr (int or None) – sample rate for loaded audio, 24000 Hz by default.
If different from file’s sample rate it will be resampled on load.
Use None to load the file using its original sample rate (24000)
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- soundata.datasets.starss2022.load_spatialevents(fhandle: TextIO, dt=0.1) SpatialEvents [source]
Load a STARSS 2022 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to
the sound events annotation file
dt (float) – time step
- Raises:
IOError – if fhandle doesn’t exist
- Returns:
SpatialEvents – sound spatial events annotation data
TAU NIGENS SSE 2020
TAU NIGENS SSE 2020 Dataset Loader
Dataset Info
TAU NIGENS Spatial Sound Events: scene recordings with (moving) sound events of distinct categories
- Created By:
- Archontis Politis, Sharath Adavanne, Tuomas Virtanen.Audio Research Group, Tampere University (Finland).
Version 1.2.0 Description:
Spatial sound-scene recordings, consisting of sound events of distinct categories in a variety of acoustical spaces, and from multiple source directions and distances. The spatialization of all sound events is based on filtering through real spatial room impulse responses (RIRs) of diverse acoustic environments. The sound events are spatialized as either stationary sound sources, or moving sound sources, in which case time-variant RIRs are used. Each scene recording is delivered in microphone array (MIC) and first-order Ambisonics (FOA) format.
- Audio Files Included:
600 one-minute long sound scene recordings (development dataset).
200 one-minute long sound scene recordings (evaluation dataset).
Sampling rate is 24 kHz (16-bit signed integer PCM).
About 700 sound event samples spread over 14 classes (see here for more details).
8 provided cross-validation folds of 100 recordings each, with unique sound event samples and rooms in each of them.
Two 4-channel 3-dimensional recording formats: first-order Ambisonics (FOA) and tetrahedral microphone array.
Realistic spatialization and reverberation through RIRs collected in 15 different enclosures.
From about 1500 to 3500 possible RIR positions across the different rooms.
Both static reverberant and moving reverberant sound events.
Up to two overlapping sound events allowed, temporally and spatially.
Realistic spatial ambient noise collected from each room is added to the spatialized sound events, at varying signal-to-noise ratios (SNR) ranging from noiseless (30dB) to noisy (6dB).
- Annotations Included:
Each recording in the development set has labels of events and Directions of arrival in a plain csv file with the same filename.
Each row in the csv file has a frame number, active class index, clip number index, azimuth, and elevation.
Frame, class, and clip enumeration begins at 0.
Frames correspond to a temporal resolution of 100msec.
Azimuth and elevation angles are given in degrees, rounded to the closest integer value, with azimuth and elevation being zero at the front, azimuth \(\phi \in [-180^{\circ}, 180^{\circ}]\), and elevation \(\theta \in [-90^{\circ}, 90^{\circ}]\). Note that the azimuth angle is increasing counter-clockwise (\(\phi = 90^{\circ}\) at the left).
The event number index is a unique integer for each event in the recording, enumerating them in the order of appearance. This event identifiers are useful to disentangle directions of co-occuring events through time in the metadata file.
Overlapping sound events are indicated with duplicate frame numbers, and can belong to a different or the same class.
- Please Acknowledge TAU-NIGENS SSE 2020 in Academic Research:
- If you use this dataset please cite the report on its creation, and the corresponding DCASE2020 task setup:
Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan.
- License:
Creative Commons Attribution Non Commercial 4.0 International
- class soundata.datasets.tau2020sse_nigens.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TAU NIGENS SSE 2020 Clip class :Parameters: clip_id (str) – id of the clip
- Variables:
audio_path (str) – path to the audio file
tags (soundata.annotation.Tags) – tag
clip_id (str) – clip id
spatial_events (SpatialEvents) – sound events with time step, elevation, azimuth, distance, label, clip_number and confidence.
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio :returns:
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- spatial_events
The clip’s event annotations
- Returns:
- SpatialEvents with attributes
- intervals (list): list of size n np.ndarrays of shape (m, 2), with intervals
(as floats) in TIME_UNITS in the form [start_time, end_time]
intervals_unit (str): intervals unit, one of TIME_UNITS
time_step (int, float, or None): the time-step between events
- elevations (list): list of size n with np.ndarrays with dtype int,
indicating the elevation of the sound event per time_step.
elevations_unit (str): elevations unit, one of ELEVATIONS_UNITS
- azimuths (list): list of size n with np.ndarrays with dtype int,
indicating the azimuth of the sound event per time_step if moving
azimuths_unit (str): azimuths unit, one of AZIMUTHS_UNITS
- distances (list): list of size n with np.ndarrays with dtype int,
indicating the distance of the sound event per time_step if moving
distances_unit (str): distances unit, one of DISTANCES_UNITS
labels (list): list of event labels (as strings)
labels_unit (str): labels unit, one of LABELS_UNITS
clip_number_indices (list): list of clip number indices (as strings)
confidence (np.ndarray or None): array of confidence values
- class soundata.datasets.tau2020sse_nigens.Dataset(data_home=None)[source]
The TAU NIGENS SSE 2020 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TAU NIGENS SSE 2020 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file
sr (int or None) – sample rate for loaded audio, 24000 Hz by default.
If different from file’s sample rate it will be resampled on load.
Use None to load the file using its original sample rate (24000)
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.tau2020sse_nigens.load_audio(fhandle: BinaryIO, sr=24000) Tuple[numpy.ndarray, float] [source]
Load a TAU NIGENS SSE 2020 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file
sr (int or None) – sample rate for loaded audio, 24000 Hz by default.
If different from file’s sample rate it will be resampled on load.
Use None to load the file using its original sample rate (24000)
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- soundata.datasets.tau2020sse_nigens.load_spatialevents(fhandle: TextIO, dt=0.1) SpatialEvents [source]
Load an TAU NIGENS SSE 2020 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to
the sound events annotation file
dt (float) – time step
- Raises:
IOError – if txt_path doesn’t exist
- Returns:
SpatialEvents – sound spatial events annotation data
TAU NIGENS SSE 2021
TAU NIGENS SSE 2021 Dataset Loader
Dataset Info
TAU NIGENS Spatial Sound Events: scene recordings with (moving) sound events of distinct categories
- Created By:
- Archontis Politis, Sharath Adavanne, Tuomas Virtanen.Audio Research Group, Tampere University (Finland).
Version 1.2.0
- Description:
Spatial sound-scene recordings, consisting of sound events of distinct categories in a variety of acoustical spaces, and from multiple source directions and distances. The spatialization of all sound events is based on filtering through real spatial room impulse responses (RIRs) of diverse acoustic environments. The sound events are spatialized as either stationary sound sources, or moving sound sources, in which case time-variant RIRs are used.
Each scene recording is delivered in microphone array (MIC) and first-order Ambisonics (FOA) format.
- Audio Files Included:
600 one-minute-long sound scene recordings with annotations (development dataset).
200 one-minute-long sound scene recordings without annotations (evaluation dataset).
Sampling rate is 24 kHz (16-bit signed integer PCM).
About 500 sound event samples distirbuted over 12 target classes.
About 400 sound event samples used as interference events.
1st order HOA or tetrahedral microphone array formats.
Realistic spatialization and reverberation through multichannel RIRs collected in 13 different enclosures.
From 1184 to 6480 possible RIR positions across the different rooms.
Both static reverberant and moving reverberant sound events.
Three possible angular speeds for moving sources of approximately 10, 20, or 40deg/sec.
Up to three overlapping sound events possible, temporally and spatially.
Simultaneous directional interfering sound events with their own temporal activities, static or moving.
Realistic spatial ambient noise collected from each room is added to the spatialized sound events, at varying signal-to-noise ratios (SNR) ranging from noiseless (30dB) to noisy (6dB) conditions.
- Annotations Included:
Each recording in the development set has labels of events and DoAs in a plain csv file with the same filename.
Each row in the csv file has a frame number, active class index, event number index, azimuth, and elevation.
Frame, class, and clip enumeration begins at 0.
Frames correspond to a temporal resolution of 100msec.
Azimuth and elevation angles are given in degrees, rounded to the closest integer value, with azimuth and elevation being zero at the front, azimuth \(\phi \in [-180^{\circ}, 180^{\circ}]\), and elevation \(\theta \in [-90^{\circ}, 90^{\circ}]\). Note that the azimuth angle is increasing counter-clockwise (\(\phi = 90^{\circ}\) at the left).
The event number index is a unique integer for each event in the recording, enumerating them in the order of appearance. This event identifiers are useful to disentangle directions of co-occuring events through time in the metadata file. The interferers are considered unknown and no activity or direction labels of them are provided with the training datasets.
Overlapping sound events are indicated with duplicate frame numbers, and can belong to a different or the same class.
- Organization
The development dataset is split in training, validation, and test sets.
The training set consists of 400 recordings.
The validation set consists of 100 recordings.
The test set consists of 100 recordings.
The evalutation dataset constists of 200 recordings.
- Please Acknowledge TAU-NIGENS SSE 2021 in Academic Research:
If you use this dataset please cite the report on its creation, and the corresponding DCASE2021 task setup:
Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, and Tuomas Virtanen. A dataset of dynamic reverberant sound scenes with directional interferers for sound event localization and detection. arXiv preprint arXiv:2106.06999, 2021. URL: https://arxiv.org/abs/2106.06999, arXiv:2106.06999.
- License:
Creative Commons Attribution Non Commercial 4.0 International
- class soundata.datasets.tau2021sse_nigens.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TAU NIGENS SSE 2021 Clip class :Parameters: clip_id (str) – id of the clip
- Variables:
audio_path (str) – path to the audio file
tags (soundata.annotation.Tags) – tag
clip_id (str) – clip id
spatial_events (SpatialEvents) – sound events with time step, elevation, azimuth, distance, label, clip_number and confidence.
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio :returns:
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- spatial_events
The clip’s event annotations :returns:
- SpatialEvents with attributes
- intervals (list): list of size n np.ndarrays of shape (m, 2), with intervals
(as floats) in TIME_UNITS in the form [start_time, end_time]
intervals_unit (str): intervals unit, one of TIME_UNITS
time_step (int, float, or None): the time-step between events
- elevations (list): list of size n with np.ndarrays with dtype int,
indicating the elevation of the sound event per time_step.
elevations_unit (str): elevations unit, one of ELEVATIONS_UNITS
- azimuths (list): list of size n with np.ndarrays with dtype int,
indicating the azimuth of the sound event per time_step if moving
azimuths_unit (str): azimuths unit, one of AZIMUTHS_UNITS
- distances (list): list of size n with np.ndarrays with dtype int,
indicating the distance of the sound event per time_step if moving
distances_unit (str): distances unit, one of DISTANCES_UNITS
labels (list): list of event labels (as strings)
labels_unit (str): labels unit, one of LABELS_UNITS
clip_number_indices (list): list of clip number indices (as strings)
confidence (np.ndarray or None): array of confidence values
- class soundata.datasets.tau2021sse_nigens.Dataset(data_home=None)[source]
The TAU NIGENS SSE 2021 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TAU NIGENS SSE 2021 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file
sr (int or None) – sample rate for loaded audio, 24000 Hz by default.
If different from file’s sample rate it will be resampled on load.
Use None to load the file using its original sample rate (24000)
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.tau2021sse_nigens.load_audio(fhandle: BinaryIO, sr=24000) Tuple[numpy.ndarray, float] [source]
Load a TAU NIGENS SSE 2021 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file
sr (int or None) – sample rate for loaded audio, 24000 Hz by default.
If different from file’s sample rate it will be resampled on load.
Use None to load the file using its original sample rate (24000)
- Returns:
np.ndarray - the audio signal
float - The sample rate of the audio file
- soundata.datasets.tau2021sse_nigens.load_spatialevents(fhandle: TextIO, dt=0.1) SpatialEvents [source]
Load an TAU NIGENS SSE 2021 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to
the sound events annotation file
dt (float) – time step
- Raises:
IOError – if txt_path doesn’t exist
- Returns:
SpatialEvents – sound spatial events annotation data
TAU Spatial Sound Events 2019
TAU SSE 2019 Dataset Loader
Dataset Info
TAU SSE 2019
- Created By:
- Sharath Adavanne; Archontis Politis; Tuomas VirtanenAudio Research Group, Tampere University.
Version 2
- Description:
Recordings with stationary point sources (events) from multiple sound classes. Up to two temporally overlaping sound events. Recordings of identical scenes are available in both 1st-order ambisonics and corresponding four-channel tetrahedral microphone format. Recordings can happen in one of five different rooms. The sound classes are the 11 different ones from the DCASE 2016 challenge task 2. Each class has 20 different examples.
- Audio Files Included:
500 one-minute-long recordings (400 development and 100 evaluation; 48kHz sampling rate and 16-bit precision).
- Annotations Included:
- sound event category with:
start time
end time
elevation
azimuth
distance
- Moreover, the clip id indicates:
data split number (4 in development and 1 in evaluation)
room number (IR: impulse response)
whether there are temporally-overlapping events
- Please Acknowledge TAU SSE 2019 in Academic Research:
- If you use this dataset please cite its original publication:
Sharath Adavanne, Archontis Politis, and Tuomas Virtanen. A multi-room reverberant dataset for sound event localization and uetection. In Submitted to Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019). 2019. URL: https://arxiv.org/abs/1905.08546.
- License:
Copyright (c) 2019 Tampere University and its licensors All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the TAU Spatial Sound Events 2019 - Ambisonic and Microphone Array described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (Audio Research Group at Tampere University), is acknowledged in any publication that reports research using this Work.
- Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to:
selling or reproducing the Work
selling or distributing the results or content achieved by use of the Work
providing services by using the Work.
IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
- class soundata.datasets.tau2019sse.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TAU SSE 2019 Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
spatial_events (SpatialEvents) – sound events with start time, end time, elevation, azimuth, distance, label and confidence.
audio_path (str) – path to the audio file
set (str) – subset the clip belongs to (development or evaluation)
format (str) – whether the clip is in foa or mic format
clip_id (str) – clip id
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- spatial_events
The clip’s spatial events
- Returns:
- SpatialEvents class with attributes
- intervals (np.ndarray): (n x 2) array of intervals
(as floats) in seconds in the form [start_time, end_time] with positive time stamps and end_time >= start_time.
elevations (np.ndarray): (n,) array of elevations
azimuths (np.ndarray): (n,) array of azimuths
distances (np.ndarray): (n,) array of distances
labels (list): list of event labels (as strings)
confidence (np.ndarray or None): array of confidence values, float in [0, 1]
labels_unit (str): labels unit, one of LABELS_UNITS
intervals_unit (str): intervals unit, one of TIME_UNITS
- class soundata.datasets.tau2019sse.Dataset(data_home=None)[source]
The TAU SSE 2019 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TAU SSE 2019 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 48000 without resampling.
- Returns:
np.ndarray - the multichannel audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- class soundata.datasets.tau2019sse.TAU2019_SpatialEvents(intervals, intervals_unit, elevations, elevations_unit, azimuths, azimuths_unit, distances, distances_unit, labels, labels_unit, confidence=None)[source]
TAU SSE 2019 Spatial Events
- Variables:
intervals (np.ndarray) – (n x 2) array of intervals (as floats) in seconds in the form [start_time, end_time] with positive time stamps and end_time >= start_time.
elevations (np.ndarray) – (n,) array of elevations
azimuths (np.ndarray) – (n,) array of azimuths
distances (np.ndarray) – (n,) array of distances
labels (list) – list of event labels (as strings)
confidence (np.ndarray or None) – array of confidence values, float in [0, 1]
labels_unit (str) – labels unit, one of LABELS_UNITS
intervals_unit (str) – intervals unit, one of TIME_UNITS
- soundata.datasets.tau2019sse.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a TAU SSE 2019 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 48000 without resampling.
- Returns:
np.ndarray - the multichannel audio signal
float - The sample rate of the audio file
- soundata.datasets.tau2019sse.load_spatialevents(fhandle: TextIO) TAU2019_SpatialEvents [source]
Load an TAU SSE 2019 annotation file :Parameters: fhandle (str or file-like) – File-like object or path to the sound events annotation file
- Raises:
IOError – if csv_path doesn’t exist
- Returns:
Events – sound events annotation data
- soundata.datasets.tau2019sse.validate_locations(locations)[source]
Validate if TAU SSE 2019 locations are well-formed.
If locations is None, validation passes automatically
- Parameters:
locations (np.ndarray) – (n x 3) array
- Raises:
ValueError – if locations have an invalid shape or have cartesian coordinate values outside the expected ranges.
TAU Urban Acoustic Scenes 2019
TAU Urban Acoustic Scenes 2019 Loader
Dataset Info
TAU Urban Acoustic Scenes 2019, Development, Leaderboard and Evaluation datasets
Audio Research Group, Tampere University of Technology
Authors
Recording and annotation
Henri Laakso
Ronal Bejarano Rodriguez
Toni Heittola
Links
Dataset
TAU Urban Acoustic Scenes 2019 dataset consists of 10-seconds audio segments from 10 acoustic scenes:
Airport - airport
Indoor shopping mall - shopping_mall
Metro station - metro_station
Pedestrian street - street_pedestrian
Public square - public_square
Street with medium level of traffic - street_traffic
Travelling by a tram - tram
Travelling by a bus - bus
Travelling by an underground metro - metro
Urban park - park
A detailed description of the data recording and annotation procedure is available in:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen.
"A multi-device dataset for urban acoustic scene classification",
In Proceedings of the Detection and Classification of Acoustic
Scenes and Events 2018 Workshop (DCASE2018), Surrey, UK, 2018.
Development dataset
Each acoustic scene has 1440 segments (240 minutes of audio). The dataset contains in total 40 hours of audio.
Evaluation dataset
The dataset contains in total 7200 segments (20 hours of audio).
Leaderboard dataset
The dataset contains in total 1200 segments (200 minutes of audio).
The dataset was collected by Tampere University of Technology between 05/2018 -11/2018. The data collection has received funding from the European Research Council under the ERC Grant Agreement 637422 EVERYSOUND.
Preparation of the dataset
The dataset was recorded in 12 large European cities: Amsterdam, Barcelona, Helsinki, Lisbon, London, Lyon, Madrid, Milan, Prague, Paris, Stockholm, and Vienna. For all acoustic scenes, audio was captured in multiple locations: different streets, different parks, different shopping malls. In each location, multiple 2-3 minute long audio recordings were captured in a few slightly different positions (2-4) within the selected location. Collected audio material was cut into segments of 10 seconds length.
The equipment used for recording consists of a binaural Soundman OKM II Klassik/studio A3 electret in-ear microphone and a Zoom F8 audio recorder using 48 kHz sampling rate and 24 bit resolution. During the recording, the microphones were worn by the recording person in the ears, and head movement was kept to minimum.
Post-processing of the recorded audio involves aspects related to privacy of recorded individuals, and possible errors in the recording process. The material was screened for content, and segments containing close microphone conversation were eliminated. Some interferences from mobile phones are audible, but are considered part of real-world recording process.
A subset of the dataset has been previously published as TUT Urban Acoustic Scenes 2018 Development dataset. Audio segment filenames are retained for the segments coming from this dataset.
Dataset statistics
The development dataset contains audio material from 10 cities, whereas the evaluation dataset (TAU Urban Acoustic Scenes 2019 evaluation) contains data from all 12 cities. The dataset is perfectly balanced at acoustic scene level, with very slight differences in the number of segments from each city.
Audio segments (Development dataset)
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1440 |
128 |
149 |
144 |
145 |
144 |
144 |
156 |
144 |
158 |
128 |
Bus |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Metro |
1440 |
141 |
144 |
144 |
146 |
144 |
144 |
144 |
144 |
145 |
144 |
Metro station |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Park |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Public square |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Shopping mall |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Street, pedestrian |
1440 |
145 |
145 |
144 |
145 |
144 |
144 |
144 |
144 |
145 |
140 |
Street, traffic |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Tram |
1440 |
143 |
145 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Total |
14400 |
1421 |
1447 |
1440 |
1444 |
1440 |
1440 |
1452 |
1440 |
1456 |
1420 |
Audio segments (Recording locations)
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
40 |
4 |
3 |
4 |
3 |
4 |
4 |
4 |
6 |
5 |
3 |
Bus |
71 |
4 |
4 |
11 |
7 |
7 |
7 |
11 |
10 |
6 |
4 |
Metro |
67 |
3 |
5 |
11 |
4 |
9 |
8 |
9 |
10 |
4 |
4 |
Metro station |
57 |
5 |
6 |
4 |
12 |
5 |
4 |
9 |
4 |
4 |
4 |
Park |
41 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
5 |
4 |
Public_square |
43 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
6 |
4 |
4 |
Shopping mall |
36 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
4 |
4 |
Street, pedestrian |
46 |
7 |
4 |
4 |
4 |
4 |
5 |
5 |
5 |
4 |
4 |
Street, traffic |
43 |
4 |
4 |
4 |
5 |
4 |
6 |
4 |
4 |
4 |
4 |
Tram |
70 |
4 |
4 |
6 |
9 |
7 |
11 |
9 |
11 |
5 |
4 |
Total |
514 |
43 |
42 |
56 |
54 |
52 |
56 |
63 |
65 |
45 |
39 |
Usage
The partitioning of the data was done based on the location of the original recordings. All segments recorded at the same location were included into a single subset - either development dataset or evaluation dataset. For each acoustic scene, 1440 segments were included in the development dataset provided here. Evaluation dataset is provided separately.
Training / test setup
A suggested training/test partitioning of the development set is provided in order to make results reported with this dataset uniform. The partitioning is done such that the segments recorded at the same location are included into the same subset - either training or testing. The partitioning is done aiming for a 70/30 ratio between the number of segments in training and test subsets while taking into account recording locations, and selecting the closest available option. Audio segments coming from nine cities are used for training and all ten cities are used for testing (Milan is used only for testing). Since the dataset includes balanced amount of material from ten cities, this partitioning will leave a small subset of data from Milan unused in the training / test setup. This material can be used when using full dataset to train the system and testing it with evaluation dataset.
The setup is provided with the dataset in the directory evaluation_setup.
Statistics
Scene class |
Train / Segments |
Train / Locations |
Test / Segments |
Test / Locations |
Unused / Segments |
Unused / Locations |
---|---|---|---|---|---|---|
Airport |
911 |
25 |
421 |
12 |
108 |
3 |
Bus |
928 |
46 |
415 |
20 |
97 |
5 |
Metro |
902 |
41 |
433 |
20 |
105 |
6 |
Metro station |
897 |
37 |
435 |
17 |
108 |
3 |
Park |
946 |
27 |
386 |
11 |
108 |
3 |
Public square |
945 |
28 |
387 |
12 |
108 |
3 |
Shopping mall |
896 |
24 |
441 |
10 |
103 |
2 |
Street, pedestrian |
924 |
29 |
429 |
14 |
87 |
3 |
Street, traffic |
942 |
27 |
402 |
12 |
96 |
4 |
Tram |
894 |
41 |
436 |
21 |
110 |
8 |
Total |
9185 |
325 |
4185 |
149 |
1030 |
40 |
License
License permits free academic usage. Any commercial use is strictly prohibited. For commercial use, contact dataset authors.
Copyright (c) 2019 Tampere University and its licensors All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the TAU Urban Acoustic Scenes 2019 (“Work”) described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (Audio Research Group at Tampere University of Technology), is acknowledged in any publication that reports research using this Work. Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to: - selling or reproducing the Work - selling or distributing the results or content achieved by use of the Work - providing services by using the Work.
IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
- class soundata.datasets.tau2019uas.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TAU Urban Acoustic Scenes 2019 Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
city (str) – city were the audio signal was recorded
clip_id (str) – clip id
identifier (str) – identifier present in the metadata
split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4), leaderboard or evaluation
tags (soundata.annotations.Tags) – tag (scene label) of the clip + confidence.
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property city
The clip’s city.
- Returns:
str - city were the audio signal was recorded
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property identifier
The clip’s identifier.
- Returns:
str - identifier present in the metadata
- property split
The clip’s split.
- Returns:
** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4), leaderboard or evaluation
- property tags
The clip’s tags.
- Returns:
annotations.Tags - tag (scene label) of the clip + confidence.
- class soundata.datasets.tau2019uas.Dataset(data_home=None)[source]
The TAU Urban Acoustic Scenes 2019 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TAU Urban Acoustic Scenes 2019 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.tau2019uas.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a TAU Urban Acoustic Scenes 2019 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
TAU Urban Acoustic Scenes 2020 Mobile
TAU Urban Acoustic Scenes 2020 Mobile Loader
Dataset Info
TAU Urban Acoustic Scenes 2020 Mobile, Development and Evaluation datasets
Audio Research Group, Tampere University of Technology
Authors
Recording and annotation
Henri Laakso
Ronal Bejarano Rodriguez
Toni Heittola
Links
Dataset
TAU Urban Acoustic Scenes 2020 Mobile development dataset consists of 10-seconds audio segments from 10 acoustic scenes:
Airport - airport
Indoor shopping mall - shopping_mall
Metro station - metro_station
Pedestrian street - street_pedestrian
Public square - public_square
Street with medium level of traffic - street_traffic
Travelling by a tram - tram
Travelling by a bus - bus
Travelling by an underground metro - metro
Urban park - park
A detailed description of the data recording and annotation procedure is available in:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen.
"Acoustic scene classification in DCASE 2020 Challenge:
generalization across devices and low complexity solutions",
In Proceedings of the Detection and Classification of Acoustic
Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan, 2020.
Recordings were made with three devices (A, B and C) that captured audio simultaneously and 6 simulated devices (S1-S6). Each acoustic scene has 1440 segments (240 minutes of audio) recorded with device A (main device) and 108 segments of parallel audio (18 minutes) each recorded with devices B,C, and S1-S6.
Development dataset
The dataset contains in total 64 hours of audio.
Evaluation dataset
The dataset contains in total 33 hours of audio.
The dataset was collected by Tampere University of Technology between 05/2018 -11/2018. The data collection has received funding from the European Research Council under the ERC Grant Agreement 637422 EVERYSOUND.
Preparation of the dataset
The dataset was recorded in 12 large European cities: Amsterdam, Barcelona, Helsinki, Lisbon, London, Lyon, Madrid, Milan, Prague, Paris, Stockholm, and Vienna. For all acoustic scenes, audio was captured in multiple locations: different streets, different parks, different shopping malls. In each location, multiple 2-3 minute long audio recordings were captured in a few slightly different positions (2-4) within the selected location. Collected audio material was cut into segments of 10 seconds length.
The main recording device (referred to as device A) consists of a binaural Soundman OKM IIKlassik/studio A3 electret in-ear microphone and a Zoom F8 audio recorder using 48 kHz sampling rate and 24 bit resolution. During the recording, the microphones were worn by the recording person in the ears, and head movement was kept to minimum.
Devices B and C are commonly available customer devices (e.g. smartphones, cameras) and were handled in typical ways (e.g. hand held). The audio recordings from these devices are of different quality than device A. All simultaneous recordings are time synchronized.
Post-processing of the recorded audio involves aspects related to privacy of recorded individuals, and possible errors in the recording process. The material was screened for content, and segments containing close microphone conversation were eliminated. Some interferences from mobile phones are audible, but are considered part of real-world recording process. In addition, data from device A was resampled and averaged into a single channel, to align with the properties of the data recorded with devices B and C.
Additionally, 11 mobile devices S1-S11 are simulated using the audio recorded with device A, impulse responses recorded with real devices, and additional dynamic range compression, in order to simulate realistic recordings. A recording from device A is processed through convolution with the selected Si impulse response, then processed with a selected set of parameters for dynamic range compression (device specific). The impulse responses are proprietary data and will not be published.
All provided audio data is single-channel, having a 44.1 KHz sampling rate, and 24 bit resolution.
A subset of the dataset has been previously published as TUT Urban Acoustic Scenes 2019 Development dataset. Audio segment filenames are retained for the segments coming from this dataset.
Dataset statistics
The development set contains data from 10 cities and 9 devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours. The evaluation dataset (TAU Urban Acoustic Scenes 2020 Mobile evaluation) contains data from all 12 cities, and five new devices (not available in the development set): real device D and simulated devices S7-S11.
Device A
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1440 |
128 |
149 |
144 |
145 |
144 |
144 |
156 |
144 |
158 |
128 |
Bus |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Metro |
1440 |
141 |
144 |
144 |
146 |
144 |
144 |
144 |
144 |
145 |
144 |
Metro station |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Park |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Public square |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Shopping mall |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Street, pedestrian |
1440 |
145 |
145 |
144 |
145 |
144 |
144 |
144 |
144 |
145 |
140 |
Street, traffic |
1440 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Tram |
1440 |
143 |
145 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
144 |
Total |
14400 |
1421 |
1447 |
1440 |
1444 |
1440 |
1440 |
1452 |
1440 |
1456 |
1420 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
40 |
4 |
3 |
4 |
3 |
4 |
4 |
4 |
6 |
5 |
3 |
Bus |
71 |
4 |
4 |
11 |
7 |
7 |
7 |
11 |
10 |
6 |
4 |
Metro |
67 |
3 |
5 |
11 |
4 |
9 |
8 |
9 |
10 |
4 |
4 |
Metro station |
57 |
5 |
6 |
4 |
12 |
5 |
4 |
9 |
4 |
4 |
4 |
Park |
41 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
5 |
4 |
Public_square |
43 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
6 |
4 |
4 |
Shopping mall |
36 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
4 |
4 |
Street, pedestrian |
46 |
7 |
4 |
4 |
4 |
4 |
5 |
5 |
5 |
4 |
4 |
Street, traffic |
43 |
4 |
4 |
4 |
5 |
4 |
6 |
4 |
4 |
4 |
4 |
Tram |
70 |
4 |
4 |
6 |
9 |
7 |
11 |
9 |
11 |
5 |
4 |
Total |
514 |
43 |
42 |
56 |
54 |
52 |
56 |
63 |
65 |
45 |
39 |
Device B
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
107 |
11 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
107 |
11 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1078 |
118 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
3 |
4 |
4 |
5 |
4 |
3 |
Bus |
57 |
4 |
4 |
9 |
7 |
6 |
5 |
8 |
7 |
3 |
4 |
Metro |
47 |
3 |
4 |
6 |
4 |
6 |
5 |
6 |
6 |
4 |
4 |
Metro station |
45 |
4 |
4 |
3 |
8 |
5 |
3 |
7 |
3 |
4 |
4 |
Park |
37 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
3 |
3 |
4 |
Public_square |
37 |
3 |
4 |
4 |
4 |
5 |
3 |
4 |
4 |
3 |
3 |
Shopping mall |
34 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
3 |
3 |
Street, pedestrian |
43 |
6 |
3 |
4 |
4 |
4 |
5 |
5 |
4 |
4 |
4 |
Street, traffic |
41 |
4 |
4 |
4 |
4 |
4 |
6 |
4 |
4 |
4 |
4 |
Tram |
50 |
4 |
4 |
5 |
6 |
5 |
5 |
7 |
7 |
3 |
4 |
Total |
427 |
39 |
37 |
47 |
46 |
44 |
42 |
53 |
47 |
35 |
37 |
Device C
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
107 |
11 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
107 |
12 |
12 |
12 |
10 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
107 |
11 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1077 |
118 |
120 |
120 |
109 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
38 |
4 |
3 |
4 |
3 |
3 |
4 |
4 |
5 |
5 |
3 |
Bus |
50 |
4 |
4 |
7 |
6 |
5 |
4 |
7 |
7 |
3 |
3 |
Metro |
54 |
3 |
3 |
6 |
4 |
9 |
6 |
7 |
8 |
4 |
4 |
Metro station |
48 |
5 |
3 |
4 |
8 |
5 |
4 |
7 |
4 |
4 |
4 |
Park |
39 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
40 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
6 |
3 |
4 |
Shopping mall |
35 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
41 |
6 |
3 |
4 |
4 |
3 |
5 |
4 |
5 |
4 |
3 |
Street, traffic |
40 |
4 |
3 |
4 |
4 |
4 |
6 |
4 |
4 |
4 |
3 |
Tram |
51 |
4 |
4 |
5 |
6 |
4 |
8 |
6 |
7 |
3 |
4 |
Total |
436 |
42 |
34 |
46 |
45 |
44 |
48 |
51 |
54 |
36 |
36 |
Device S1
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
37 |
4 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
54 |
4 |
4 |
8 |
6 |
6 |
6 |
7 |
6 |
3 |
4 |
Metro |
50 |
3 |
3 |
8 |
4 |
7 |
6 |
6 |
6 |
4 |
3 |
Metro station |
48 |
5 |
4 |
4 |
9 |
5 |
4 |
5 |
4 |
4 |
4 |
Park |
36 |
4 |
4 |
4 |
4 |
3 |
4 |
3 |
3 |
3 |
4 |
Public_square |
37 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
3 |
3 |
4 |
Shopping mall |
33 |
4 |
4 |
4 |
2 |
3 |
3 |
3 |
3 |
3 |
4 |
Street, pedestrian |
40 |
6 |
3 |
4 |
4 |
3 |
5 |
2 |
5 |
4 |
4 |
Street, traffic |
40 |
4 |
4 |
4 |
4 |
4 |
6 |
3 |
3 |
4 |
4 |
Tram |
52 |
4 |
4 |
5 |
7 |
6 |
7 |
6 |
6 |
3 |
4 |
Total |
427 |
42 |
37 |
49 |
47 |
45 |
49 |
42 |
43 |
35 |
38 |
Device S2
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
58 |
4 |
4 |
9 |
6 |
6 |
7 |
9 |
6 |
3 |
4 |
Metro |
55 |
3 |
3 |
10 |
4 |
8 |
8 |
5 |
7 |
4 |
3 |
Metro station |
49 |
5 |
4 |
4 |
7 |
5 |
4 |
8 |
4 |
4 |
4 |
Park |
38 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
2 |
4 |
Public_square |
41 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
5 |
3 |
4 |
Shopping mall |
34 |
4 |
4 |
3 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
42 |
7 |
3 |
4 |
4 |
3 |
5 |
5 |
4 |
4 |
3 |
Street, traffic |
42 |
4 |
4 |
4 |
5 |
4 |
6 |
4 |
4 |
4 |
3 |
Tram |
51 |
4 |
4 |
5 |
7 |
6 |
7 |
7 |
4 |
3 |
4 |
Total |
446 |
42 |
37 |
51 |
46 |
48 |
52 |
54 |
46 |
34 |
36 |
Device S3
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
50 |
4 |
4 |
6 |
5 |
6 |
6 |
7 |
5 |
3 |
4 |
Metro |
50 |
3 |
3 |
10 |
4 |
5 |
6 |
4 |
8 |
3 |
4 |
Metro station |
44 |
4 |
4 |
4 |
6 |
5 |
4 |
7 |
3 |
4 |
3 |
Park |
39 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
39 |
4 |
4 |
3 |
4 |
5 |
4 |
4 |
4 |
3 |
4 |
Shopping mall |
32 |
4 |
4 |
3 |
2 |
3 |
3 |
4 |
3 |
3 |
3 |
Street, pedestrian |
39 |
6 |
3 |
3 |
4 |
4 |
4 |
5 |
3 |
4 |
3 |
Street, traffic |
40 |
4 |
4 |
4 |
5 |
4 |
5 |
4 |
3 |
3 |
4 |
Tram |
50 |
4 |
4 |
5 |
8 |
5 |
7 |
6 |
5 |
3 |
3 |
Total |
419 |
40 |
37 |
46 |
45 |
45 |
47 |
49 |
42 |
33 |
35 |
Device S4
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
53 |
4 |
4 |
9 |
5 |
6 |
5 |
6 |
7 |
3 |
4 |
Metro |
50 |
3 |
2 |
8 |
4 |
7 |
6 |
7 |
6 |
4 |
3 |
Metro station |
47 |
5 |
4 |
4 |
7 |
5 |
4 |
6 |
4 |
4 |
4 |
Park |
38 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
38 |
4 |
4 |
3 |
3 |
5 |
4 |
4 |
4 |
3 |
4 |
Shopping mall |
35 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
42 |
7 |
3 |
3 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
Street, traffic |
41 |
4 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
4 |
4 |
Tram |
51 |
4 |
4 |
6 |
6 |
7 |
5 |
7 |
5 |
3 |
4 |
Total |
431 |
42 |
35 |
49 |
42 |
49 |
44 |
50 |
47 |
35 |
38 |
Device S5
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
38 |
4 |
3 |
4 |
3 |
4 |
4 |
3 |
5 |
5 |
3 |
Bus |
54 |
3 |
4 |
6 |
6 |
6 |
7 |
8 |
7 |
3 |
4 |
Metro |
51 |
3 |
3 |
7 |
4 |
8 |
6 |
6 |
7 |
4 |
3 |
Metro station |
45 |
5 |
3 |
3 |
7 |
4 |
4 |
7 |
4 |
4 |
4 |
Park |
36 |
3 |
4 |
3 |
3 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
39 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
6 |
3 |
4 |
Shopping mall |
33 |
3 |
4 |
3 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
42 |
6 |
3 |
4 |
4 |
4 |
4 |
5 |
5 |
4 |
3 |
Street, traffic |
38 |
3 |
3 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
Tram |
50 |
4 |
4 |
4 |
6 |
5 |
8 |
7 |
6 |
3 |
3 |
Total |
426 |
37 |
35 |
41 |
43 |
46 |
48 |
52 |
52 |
36 |
36 |
Device S6
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Bus |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Metro station |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Park |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Public square |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Shopping mall |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, pedestrian |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Street, traffic |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Tram |
108 |
12 |
12 |
12 |
11 |
11 |
10 |
10 |
10 |
10 |
10 |
Total |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
4 |
3 |
4 |
3 |
4 |
3 |
3 |
5 |
4 |
3 |
Bus |
55 |
3 |
4 |
9 |
7 |
6 |
5 |
9 |
6 |
2 |
4 |
Metro |
51 |
3 |
2 |
7 |
4 |
7 |
6 |
7 |
8 |
3 |
4 |
Metro station |
47 |
5 |
4 |
4 |
9 |
3 |
3 |
7 |
4 |
4 |
4 |
Park |
37 |
3 |
4 |
4 |
4 |
4 |
3 |
4 |
4 |
3 |
4 |
Public_square |
39 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
5 |
3 |
4 |
Shopping mall |
33 |
3 |
4 |
4 |
2 |
3 |
2 |
4 |
4 |
3 |
4 |
Street, pedestrian |
39 |
5 |
3 |
4 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
Street, traffic |
39 |
3 |
4 |
3 |
4 |
4 |
5 |
4 |
4 |
4 |
4 |
Tram |
56 |
4 |
4 |
6 |
7 |
6 |
7 |
6 |
9 |
3 |
4 |
Total |
432 |
37 |
35 |
49 |
48 |
44 |
41 |
52 |
53 |
33 |
39 |
Usage
The partitioning of the data was done based on the location of the original recordings. All segments recorded at the same location were included into a single subset - either development dataset or evaluation dataset. For each acoustic scene, 1440 segments recorded with device A, 108 segments recorded with device B, C and S1-S6 were included in the development dataset provided here. Evaluation dataset is provided separately.
Training / test setup
A suggested training/test partitioning of the development set is provided in order to make results reported with this dataset uniform. The partitioning is done such that the segments recorded at the same location are included into the same subset - either training or testing. The partitioning is done aiming for a 70/30 ratio between the number of segments in training and test subsets while taking into account recording locations, and selecting the closest available option.
Data from devices A, B, C, S1, S2, S3 are available in both training and test sets. Audio segments coming from devices S4, S5, and S6 are used only for testing. Since the dataset includes balanced amount of material from devices (B, C, and S1-S6), this partitioning will leave a small subset of data from devices S4-S6 unused in the training / test setup. This material can be used when using full dataset to train the system and testing it with evaluation dataset.
The setup is provided with the dataset in the directory evaluation_setup.
Statistics
Scene class |
Train / Segments |
Train / Locations |
Test / Segments |
Test / Locations |
Unused / Segments |
Unused / Locations |
---|---|---|---|---|---|---|
Airport |
1393 |
28 |
296 |
12 |
613 |
40 |
Bus |
1400 |
51 |
297 |
19 |
607 |
66 |
Metro |
1382 |
47 |
297 |
20 |
625 |
65 |
Metro station |
1380 |
40 |
297 |
16 |
627 |
55 |
Park |
1429 |
30 |
297 |
11 |
578 |
39 |
Public square |
1427 |
31 |
297 |
12 |
579 |
42 |
Shopping mall |
1373 |
26 |
297 |
10 |
633 |
35 |
Street, pedestrian |
1386 |
32 |
297 |
14 |
621 |
45 |
Street, traffic |
1413 |
31 |
297 |
12 |
594 |
43 |
Tram |
1379 |
49 |
296 |
20 |
628 |
67 |
Total |
13962 |
365 |
2968 |
146 |
6105 |
497 |
Number of segments in train / test setup
License
License permits free academic usage. Any commercial use is strictly prohibited. For commercial use, contact dataset authors.
Copyright (c) 2020 Tampere University and its licensors All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the TAU Urban Acoustic Scenes 2020 Mobile (“Work”) described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (Audio Research Group at Tampere University of Technology), is acknowledged in any publication that reports research using this Work. Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to: - selling or reproducing the Work - selling or distributing the results or content achieved by use of the Work - providing services by using the Work.
IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
- class soundata.datasets.tau2020uas_mobile.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TAU Urban Acoustic Scenes 2020 Mobile Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
city (str) – city were the audio signal was recorded
clip_id (str) – clip id
identifier (str) – the clip identifier
source_label (str) – source label
split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4) or evaluation
tags (soundata.annotations.Tags) – tag (label) of the clip + confidence
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property city
The clip’s city.
- Returns:
str - city were the audio signal was recorded
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property identifier
The clip’s identifier.
- Returns:
str - clip identifier
- property source_label
The clip’s source label.
- Returns:
str - source label
- property split
The clip’s split.
- Returns:
** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4) or evaluation
- property tags
The clip’s tags.
- Returns:
annotations.Tags - tag (label) of the clip + confidence
- class soundata.datasets.tau2020uas_mobile.Dataset(data_home=None)[source]
The TAU Urban Acoustic Scenes 2020 Mobile dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TAU Urban Acoustic Scenes 2020 Mobile audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.tau2020uas_mobile.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a TAU Urban Acoustic Scenes 2020 Mobile audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
TAU Urban Acoustic Scenes 2022 Mobile
TAU Urban Acoustic Scenes 2022 Mobile Loader
Dataset Info
TAU Urban Acoustic Scenes 2022 Mobile, Development and Evaluation datasets
Audio Research Group, Tampere University of Technology
Authors
Recording and annotation
Henri Laakso
Ronal Bejarano Rodriguez
Toni Heittola
Links
Dataset
TAU Urban Acoustic Scenes 2022 Mobile development dataset consists of 1-seconds audio segments from 10 acoustic scenes:
Airport - airport
Indoor shopping mall - shopping_mall
Metro station - metro_station
Pedestrian street - street_pedestrian
Public square - public_square
Street with medium level of traffic - street_traffic
Travelling by a tram - tram
Travelling by a bus - bus
Travelling by an underground metro - metro
Urban park - park
The dataset contains the same material than TAU Urban Acoustic Scenes 2020 Mobile development dataset, 10-second audio segments have been split into non-overlapping 1-second segments for 2022 version of the dataset.
A detailed description of the data recording and annotation procedure is available in:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen.
"Acoustic scene classification in DCASE 2020 Challenge:
generalization across devices and low complexity solutions",
In Proceedings of the Detection and Classification of Acoustic
Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan, 2020.
Recordings were made with three devices (A, B and C) that captured audio simultaneously and 6 simulated devices (S1-S6). Each acoustic scene has 1440 segments (240 minutes of audio) recorded with device A (main device) and 108 segments of parallel audio (18 minutes) each recorded with devices B,C, and S1-S6.
Development dataset
The dataset contains in total 64 hours of audio.
Evaluation dataset
The dataset contains in total 33 hours of audio.
The dataset was collected by Tampere University of Technology between 05/2018 -11/2018. The data collection has received funding from the European Research Council under the ERC Grant Agreement 637422 EVERYSOUND.
Preparation of the dataset
The dataset was recorded in 12 large European cities: Amsterdam, Barcelona, Helsinki, Lisbon, London, Lyon, Madrid, Milan, Prague, Paris, Stockholm, and Vienna. For all acoustic scenes, audio was captured in multiple locations: different streets, different parks, different shopping malls. In each location, multiple 2-3 minute long audio recordings were captured in a few slightly different positions (2-4) within the selected location. Collected audio material was cut into segments of 10 seconds length.
The main recording device (referred to as device A) consists of a binaural Soundman OKM IIKlassik/studio A3 electret in-ear microphone and a Zoom F8 audio recorder using 48 kHz sampling rate and 24 bit resolution. During the recording, the microphones were worn by the recording person in the ears, and head movement was kept to minimum.
Devices B and C are commonly available customer devices (e.g. smartphones, cameras) and were handled in typical ways (e.g. hand held). The audio recordings from these devices are of different quality than device A. All simultaneous recordings are time synchronized.
Post-processing of the recorded audio involves aspects related to privacy of recorded individuals, and possible errors in the recording process. The material was screened for content, and segments containing close microphone conversation were eliminated. Some interferences from mobile phones are audible, but are considered part of real-world recording process. In addition, data from device A was resampled and averaged into a single channel, to align with the properties of the data recorded with devices B and C.
Additionally, 11 mobile devices S1-S11 are simulated using the audio recorded with device A, impulse responses recorded with real devices, and additional dynamic range compression, in order to simulate realistic recordings. A recording from device A is processed through convolution with the selected Si impulse response, then processed with a selected set of parameters for dynamic range compression (device specific). The impulse responses are proprietary data and will not be published.
All provided audio data is single-channel, having a 44.1 KHz sampling rate, and 24 bit resolution.
A subset of the dataset has been previously published as TUT Urban Acoustic Scenes 2019 Development dataset. Audio segment filenames are retained for the segments coming from this dataset.
Dataset statistics
The development set contains data from 10 cities and 9 devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours. The evaluation dataset (TAU Urban Acoustic Scenes 2022 Mobile evaluation) contains data from all 12 cities, and five new devices (not available in the development set): real device D and simulated devices S7-S11.
Device A
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
14400 |
1280 |
1490 |
1440 |
1450 |
1440 |
1440 |
1560 |
1440 |
1580 |
1280 |
Bus |
14400 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Metro |
14400 |
1410 |
1440 |
1440 |
1460 |
1440 |
1440 |
1440 |
1440 |
1450 |
1440 |
Metro station |
14400 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Park |
14400 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Public square |
14400 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Shopping mall |
14400 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Street, pedestrian |
14400 |
1450 |
1450 |
1440 |
1450 |
1440 |
1440 |
1440 |
1440 |
1450 |
1400 |
Street, traffic |
14400 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Tram |
14400 |
1430 |
1450 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
1440 |
Total |
144000 |
14210 |
14470 |
14400 |
14440 |
14400 |
14400 |
14520 |
14400 |
14560 |
14200 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
40 |
4 |
3 |
4 |
3 |
4 |
4 |
4 |
6 |
5 |
3 |
Bus |
71 |
4 |
4 |
11 |
7 |
7 |
7 |
11 |
10 |
6 |
4 |
Metro |
67 |
3 |
5 |
11 |
4 |
9 |
8 |
9 |
10 |
4 |
4 |
Metro station |
57 |
5 |
6 |
4 |
12 |
5 |
4 |
9 |
4 |
4 |
4 |
Park |
41 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
5 |
4 |
Public_square |
43 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
6 |
4 |
4 |
Shopping mall |
36 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
4 |
4 |
Street, pedestrian |
46 |
7 |
4 |
4 |
4 |
4 |
5 |
5 |
5 |
4 |
4 |
Street, traffic |
43 |
4 |
4 |
4 |
5 |
4 |
6 |
4 |
4 |
4 |
4 |
Tram |
70 |
4 |
4 |
6 |
9 |
7 |
11 |
9 |
11 |
5 |
4 |
Total |
514 |
43 |
42 |
56 |
54 |
52 |
56 |
63 |
65 |
45 |
39 |
Device B
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1070 |
110 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1070 |
110 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10780 |
1180 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
3 |
4 |
4 |
5 |
4 |
3 |
Bus |
57 |
4 |
4 |
9 |
7 |
6 |
5 |
8 |
7 |
3 |
4 |
Metro |
47 |
3 |
4 |
6 |
4 |
6 |
5 |
6 |
6 |
4 |
4 |
Metro station |
45 |
4 |
4 |
3 |
8 |
5 |
3 |
7 |
3 |
4 |
4 |
Park |
37 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
3 |
3 |
4 |
Public_square |
37 |
3 |
4 |
4 |
4 |
5 |
3 |
4 |
4 |
3 |
3 |
Shopping mall |
34 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
3 |
3 |
Street, pedestrian |
43 |
6 |
3 |
4 |
4 |
4 |
5 |
5 |
4 |
4 |
4 |
Street, traffic |
41 |
4 |
4 |
4 |
4 |
4 |
6 |
4 |
4 |
4 |
4 |
Tram |
50 |
4 |
4 |
5 |
6 |
5 |
5 |
7 |
7 |
3 |
4 |
Total |
427 |
39 |
37 |
47 |
46 |
44 |
42 |
53 |
47 |
35 |
37 |
Device C
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1070 |
110 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1070 |
120 |
120 |
120 |
100 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1070 |
110 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10770 |
1180 |
1200 |
1200 |
1090 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
38 |
4 |
3 |
4 |
3 |
3 |
4 |
4 |
5 |
5 |
3 |
Bus |
50 |
4 |
4 |
7 |
6 |
5 |
4 |
7 |
7 |
3 |
3 |
Metro |
54 |
3 |
3 |
6 |
4 |
9 |
6 |
7 |
8 |
4 |
4 |
Metro station |
48 |
5 |
3 |
4 |
8 |
5 |
4 |
7 |
4 |
4 |
4 |
Park |
39 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
40 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
6 |
3 |
4 |
Shopping mall |
35 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
41 |
6 |
3 |
4 |
4 |
3 |
5 |
4 |
5 |
4 |
3 |
Street, traffic |
40 |
4 |
3 |
4 |
4 |
4 |
6 |
4 |
4 |
4 |
3 |
Tram |
51 |
4 |
4 |
5 |
6 |
4 |
8 |
6 |
7 |
3 |
4 |
Total |
436 |
42 |
34 |
46 |
45 |
44 |
48 |
51 |
54 |
36 |
36 |
Device S1
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10800 |
1200 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
37 |
4 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
54 |
4 |
4 |
8 |
6 |
6 |
6 |
7 |
6 |
3 |
4 |
Metro |
50 |
3 |
3 |
8 |
4 |
7 |
6 |
6 |
6 |
4 |
3 |
Metro station |
48 |
5 |
4 |
4 |
9 |
5 |
4 |
5 |
4 |
4 |
4 |
Park |
36 |
4 |
4 |
4 |
4 |
3 |
4 |
3 |
3 |
3 |
4 |
Public_square |
37 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
3 |
3 |
4 |
Shopping mall |
33 |
4 |
4 |
4 |
2 |
3 |
3 |
3 |
3 |
3 |
4 |
Street, pedestrian |
40 |
6 |
3 |
4 |
4 |
3 |
5 |
2 |
5 |
4 |
4 |
Street, traffic |
40 |
4 |
4 |
4 |
4 |
4 |
6 |
3 |
3 |
4 |
4 |
Tram |
52 |
4 |
4 |
5 |
7 |
6 |
7 |
6 |
6 |
3 |
4 |
Total |
427 |
42 |
37 |
49 |
47 |
45 |
49 |
42 |
43 |
35 |
38 |
Device S2
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10800 |
1200 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
58 |
4 |
4 |
9 |
6 |
6 |
7 |
9 |
6 |
3 |
4 |
Metro |
55 |
3 |
3 |
10 |
4 |
8 |
8 |
5 |
7 |
4 |
3 |
Metro station |
49 |
5 |
4 |
4 |
7 |
5 |
4 |
8 |
4 |
4 |
4 |
Park |
38 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
2 |
4 |
Public_square |
41 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
5 |
3 |
4 |
Shopping mall |
34 |
4 |
4 |
3 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
42 |
7 |
3 |
4 |
4 |
3 |
5 |
5 |
4 |
4 |
3 |
Street, traffic |
42 |
4 |
4 |
4 |
5 |
4 |
6 |
4 |
4 |
4 |
3 |
Tram |
51 |
4 |
4 |
5 |
7 |
6 |
7 |
7 |
4 |
3 |
4 |
Total |
446 |
42 |
37 |
51 |
46 |
48 |
52 |
54 |
46 |
34 |
36 |
Device S3
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10800 |
1200 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
50 |
4 |
4 |
6 |
5 |
6 |
6 |
7 |
5 |
3 |
4 |
Metro |
50 |
3 |
3 |
10 |
4 |
5 |
6 |
4 |
8 |
3 |
4 |
Metro station |
44 |
4 |
4 |
4 |
6 |
5 |
4 |
7 |
3 |
4 |
3 |
Park |
39 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
39 |
4 |
4 |
3 |
4 |
5 |
4 |
4 |
4 |
3 |
4 |
Shopping mall |
32 |
4 |
4 |
3 |
2 |
3 |
3 |
4 |
3 |
3 |
3 |
Street, pedestrian |
39 |
6 |
3 |
3 |
4 |
4 |
4 |
5 |
3 |
4 |
3 |
Street, traffic |
40 |
4 |
4 |
4 |
5 |
4 |
5 |
4 |
3 |
3 |
4 |
Tram |
50 |
4 |
4 |
5 |
8 |
5 |
7 |
6 |
5 |
3 |
3 |
Total |
419 |
40 |
37 |
46 |
45 |
45 |
47 |
49 |
42 |
33 |
35 |
Device S4
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10800 |
1200 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
3 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
3 |
Bus |
53 |
4 |
4 |
9 |
5 |
6 |
5 |
6 |
7 |
3 |
4 |
Metro |
50 |
3 |
2 |
8 |
4 |
7 |
6 |
7 |
6 |
4 |
3 |
Metro station |
47 |
5 |
4 |
4 |
7 |
5 |
4 |
6 |
4 |
4 |
4 |
Park |
38 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
38 |
4 |
4 |
3 |
3 |
5 |
4 |
4 |
4 |
3 |
4 |
Shopping mall |
35 |
4 |
4 |
4 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
42 |
7 |
3 |
3 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
Street, traffic |
41 |
4 |
4 |
4 |
4 |
4 |
5 |
4 |
4 |
4 |
4 |
Tram |
51 |
4 |
4 |
6 |
6 |
7 |
5 |
7 |
5 |
3 |
4 |
Total |
431 |
42 |
35 |
49 |
42 |
49 |
44 |
50 |
47 |
35 |
38 |
Device S5
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10800 |
1200 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
38 |
4 |
3 |
4 |
3 |
4 |
4 |
3 |
5 |
5 |
3 |
Bus |
54 |
3 |
4 |
6 |
6 |
6 |
7 |
8 |
7 |
3 |
4 |
Metro |
51 |
3 |
3 |
7 |
4 |
8 |
6 |
6 |
7 |
4 |
3 |
Metro station |
45 |
5 |
3 |
3 |
7 |
4 |
4 |
7 |
4 |
4 |
4 |
Park |
36 |
3 |
4 |
3 |
3 |
4 |
4 |
4 |
4 |
3 |
4 |
Public_square |
39 |
3 |
4 |
3 |
4 |
4 |
4 |
4 |
6 |
3 |
4 |
Shopping mall |
33 |
3 |
4 |
3 |
2 |
3 |
3 |
4 |
4 |
3 |
4 |
Street, pedestrian |
42 |
6 |
3 |
4 |
4 |
4 |
4 |
5 |
5 |
4 |
3 |
Street, traffic |
38 |
3 |
3 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
Tram |
50 |
4 |
4 |
4 |
6 |
5 |
8 |
7 |
6 |
3 |
3 |
Total |
426 |
37 |
35 |
41 |
43 |
46 |
48 |
52 |
52 |
36 |
36 |
Device S6
Audio segments
Scene class |
Segments |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Bus |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Metro station |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Park |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Public square |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Shopping mall |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, pedestrian |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Street, traffic |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Tram |
1080 |
120 |
120 |
120 |
110 |
110 |
100 |
100 |
100 |
100 |
100 |
Total |
10800 |
1200 |
1200 |
1200 |
1100 |
1100 |
1000 |
1000 |
1000 |
1000 |
1000 |
Recording locations
Scene class |
Locations |
Barcelona |
Helsinki |
Lisbon |
London |
Lyon |
Milan |
Paris |
Prague |
Stockholm |
Vienna |
---|---|---|---|---|---|---|---|---|---|---|---|
Airport |
36 |
4 |
3 |
4 |
3 |
4 |
3 |
3 |
5 |
4 |
3 |
Bus |
55 |
3 |
4 |
9 |
7 |
6 |
5 |
9 |
6 |
2 |
4 |
Metro |
51 |
3 |
2 |
7 |
4 |
7 |
6 |
7 |
8 |
3 |
4 |
Metro station |
47 |
5 |
4 |
4 |
9 |
3 |
3 |
7 |
4 |
4 |
4 |
Park |
37 |
3 |
4 |
4 |
4 |
4 |
3 |
4 |
4 |
3 |
4 |
Public_square |
39 |
4 |
4 |
4 |
4 |
4 |
3 |
4 |
5 |
3 |
4 |
Shopping mall |
33 |
3 |
4 |
4 |
2 |
3 |
2 |
4 |
4 |
3 |
4 |
Street, pedestrian |
39 |
5 |
3 |
4 |
4 |
3 |
4 |
4 |
4 |
4 |
4 |
Street, traffic |
39 |
3 |
4 |
3 |
4 |
4 |
5 |
4 |
4 |
4 |
4 |
Tram |
56 |
4 |
4 |
6 |
7 |
6 |
7 |
6 |
9 |
3 |
4 |
Total |
432 |
37 |
35 |
49 |
48 |
44 |
41 |
52 |
53 |
33 |
39 |
Usage
The partitioning of the data was done based on the location of the original recordings. All segments recorded at the same location were included into a single subset - either development dataset or evaluation dataset. For each acoustic scene, 1440 segments recorded with device A, 108 segments recorded with device B, C and S1-S6 were included in the development dataset provided here. Evaluation dataset is provided separately.
Training / test setup
A suggested training/test partitioning of the development set is provided in order to make results reported with this dataset uniform. The partitioning is done such that the segments recorded at the same location are included into the same subset - either training or testing. The partitioning is done aiming for a 70/30 ratio between the number of segments in training and test subsets while taking into account recording locations, and selecting the closest available option.
Data from devices A, B, C, S1, S2, S3 are available in both training and test sets. Audio segments coming from devices S4, S5, and S6 are used only for testing. Since the dataset includes balanced amount of material from devices (B, C, and S1-S6), this partitioning will leave a small subset of data from devices S4-S6 unused in the training / test setup. This material can be used when using full dataset to train the system and testing it with evaluation dataset.
The setup is provided with the dataset in the directory evaluation_setup.
Statistics
Scene class |
Train / Segments |
Train / Locations |
Test / Segments |
Test / Locations |
Unused / Segments |
Unused / Locations |
---|---|---|---|---|---|---|
Airport |
13930 |
28 |
2960 |
12 |
6130 |
40 |
Bus |
14000 |
51 |
2970 |
19 |
6070 |
66 |
Metro |
13820 |
47 |
2970 |
20 |
6250 |
65 |
Metro station |
13800 |
40 |
2970 |
16 |
6270 |
55 |
Park |
14290 |
30 |
2970 |
11 |
5780 |
39 |
Public square |
14270 |
31 |
2970 |
12 |
5790 |
42 |
Shopping mall |
13730 |
26 |
2970 |
10 |
6330 |
35 |
Street, pedestrian |
13860 |
32 |
2970 |
14 |
6210 |
45 |
Street, traffic |
14130 |
31 |
2970 |
12 |
5940 |
43 |
Tram |
13790 |
49 |
2960 |
20 |
6280 |
67 |
Total |
139620 |
365 |
29680 |
146 |
610500 |
497 |
Number of segments in train / test setup
License
License permits free academic usage. Any commercial use is strictly prohibited. For commercial use, contact dataset authors.
Copyright (c) 2022 Tampere University and its licensors All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the TAU Urban Acoustic Scenes 2022 Mobile (“Work”) described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (Audio Research Group at Tampere University of Technology), is acknowledged in any publication that reports research using this Work. Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to: - selling or reproducing the Work - selling or distributing the results or content achieved by use of the Work - providing services by using the Work.
IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
- class soundata.datasets.tau2022uas_mobile.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TAU Urban Acoustic Scenes 2022 Mobile Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
city (str) – city were the audio signal was recorded
clip_id (str) – clip id
identifier (str) – the clip identifier
source_label (str) – source label
split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4) or evaluation
tags (soundata.annotations.Tags) – tag (label) of the clip + confidence
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property city
The clip’s city.
- Returns:
str - city were the audio signal was recorded
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property identifier
The clip’s identifier.
- Returns:
str - clip identifier
- property source_label
The clip’s source label.
- Returns:
str - source label
- property split
The clip’s split.
- Returns:
** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4) or evaluation
- property tags
The clip’s tags.
- Returns:
annotations.Tags - tag (label) of the clip + confidence
- class soundata.datasets.tau2022uas_mobile.Dataset(data_home=None)[source]
The TAU Urban Acoustic Scenes 2022 Mobile dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TAU Urban Acoustic Scenes 2022 Mobile audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.tau2022uas_mobile.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a TAU Urban Acoustic Scenes 2022 Mobile audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
TUT Sound events 2017
TUT Sound events 2017 Dataset Loader
Dataset Info
TUT Sound events 2017, Development and Evaluation datasets
Audio Research Group, Tampere University of Technology
Authors
Recording and annotation
Eemi Fagerlund
Aku Hiltunen
Links
Dataset
TUT Sound Events 2017 dataset consists of two subsets: development dataset and evaluation dataset. Partitioning of data into these subsets was done based on the amount of examples available for each sound event class, while also taking into account recording location. Because the event instances belonging to different classes are distributed unevenly within the recordings, the partitioning of individual classes can be controlled only to a certain extent, but so that the majority of events are in the development set.
A detailed description of the data recording and annotation procedure is available in:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen.
"TUT database for acoustic scene classification and sound event
detection", In 24th European Signal Processing Conference 2016,
Budapest, Hungary, 2016.
TUT Sound events 2017, development and evaluation datasets consist of 24 and 8 audio recordings from a single acoustic scene respectively:
Development: Street (outdoor), totaling 1:32:08
Evaluation: Street (outdoor), totaling 29:09
The dataset was collected in Finland by Tampere University of Technology between 06/2015 - 01/2016. The data collection has received funding from the European Research Council under the ERC Grant Agreement 637422 EVERYSOUND.
Preparation of the dataset
The recordings were captured each in a different location (different streets). The equipment used for recording consists of a binaural Soundman OKM II Klassik/studio A3 electret in-ear microphone and a Roland Edirol R-09 wave recorder using 44.1 kHz sampling rate and 24 bit resolution.
For audio material recorded in private places, written consent was obtained from all people involved. Material recorded in public places (residential area) does not require such consent.
Individual sound events in each recording were annotated by a research assistant using freely chosen labels for sounds. The annotator was trained first on few example recordings. He was instructed to annotate all audible sound events, and choose event labels freely. This resulted in a large set of raw labels. Mapping of the raw labels was performed, merging sounds into classes described by their source before selecting target classes. Target sound event classes for the dataset were selected based on the frequency of the obtained labels, resulting in selection of most common sounds for the street acoustic scene, in sufficient numbers for learning acoustic models. Mapping of the raw labels was performed, merging sounds into classes described by their source, for example “car passing by”, “car engine running”, “car idling”, etc into “car”, sounds produced by buses and trucks into “large vehicle”, “children yelling” and “children talking” into “children”, etc.
Due to the high level of subjectivity inherent to the annotation process, a verification of the reference annotation was done using these mapped classes. Three persons (other than the annotator) listened to each audio segment annotated as belonging to one of these classes, marking agreement about the presence of the indicated sound within the segment. Agreement/disagreement did not take into account the sound event onset and offset, only the presence of the sound event within the annotated segment. Event instances that were confirmed by at least one person were kept, resulting in elimination of about 10% of the original event instances in the development set.
The original metadata file is available in the directory non_verified.
The ground truth is provided as a list of the sound events present in the recording, with annotated onset and offset for each sound instance. Annotations with only targeted sound events classes are in the directory meta.
The sound event instance counts for the dataset are shown below.
Development set
Development dataset |
Evaluation dataset |
||
---|---|---|---|
Event label |
Verified set |
Non-verified set |
Verified set |
brakes squeaking |
52 |
59 |
23 |
car |
304 |
304 |
106 |
children |
44 |
58 |
15 |
large vehicle |
61 |
61 |
24 |
people speaking |
89 |
117 |
37 |
people walking |
109 |
130 |
42 |
Total |
659 |
729 |
247 |
Usage
Partitioning of data into development dataset and evaluation dataset was done based on the amount of examples available for each event class, while also taking into account recording location. Ideally the subsets should have the same amount of data for each class, or at least the same relative amount, such as a 70-30% split. Because the event instances belonging to different classes are distributed unevenly within the recordings, the partitioning of individual classes can be controlled only to a certain extent.
The split condition was relaxed so that 65-75% of instances of each class were selected into the development set.
Cross-validation setup
The setup is provided with the dataset in the directory evaluation_setup.
License
See file EULA.pdf
- class soundata.datasets.tut2017se.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
TUT Sound events 2017 Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
annotations_path (str) – path to the annotations file
clip_id (str) – clip id
events (soundata.annotations.Events) – sound events with start time, end time, label and confidence
non_verified_annotations_path (str) – path to the non-verified annotations file
non_verified_events (soundata.annotations.Events) – non-verified sound events with start time, end time, label and confidence
split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4) or evaluation
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- events
The clip’s events.
- Returns:
annotations.Events - sound events with start time, end time, label and confidence
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- non_verified_events
The clip’s non verified events path.
- Returns:
str - path to the non-verified annotations file
- property split
The clip’s split.
- Returns:
** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4) or evaluation
- class soundata.datasets.tut2017se.Dataset(data_home=None)[source]
The TUT Sound events 2017 dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a TUT Sound events 2017 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the stereo audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- load_clips()[source]
Load all clips in the dataset
- Returns:
dict – {clip_id: clip data}
- Raises:
NotImplementedError – If the dataset does not support Clips
- soundata.datasets.tut2017se.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a TUT Sound events 2017 audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the stereo audio signal
float - The sample rate of the audio file
URBAN-SED
URBAN-SED Dataset Loader
Dataset Info
- URBAN-SED
- URBAN-SED (c) by Justin Salamon, Duncan MacConnell, Mark Cartwright, Peter Li, and Juan Pablo Bello.URBAN-SED is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).You should have received a copy of the license along with this work. If not, see <http://creativecommons.org/licenses/by/4.0/>.
- Created By:
- Justin Salamon*^, Duncan MacConnell*, Mark Cartwright*, Peter Li*, and Juan Pablo Bello*.* Music and Audio Research Lab (MARL), New York University, USA^ Center for Urban Science and Progress (CUSP), New York University, USA
- Version 2.0.0
Audio files generated with scaper v0.1.0 (identical to audio in URBAN-SED 1.0)
Jams annotation files generated with scaper v0.1.0 and updated to comply with scaper v1.0.0 (namespace changed from “sound_event” to “scaper”)
NOTE: due to updates to the scaper library, regenerating the audio from the jams annotations using scaper >=1.0.0 will result in audio files that are highly similar, but not identical, to the audio files provided. This is because the provided audio files were generated with scaper v0.1.0 and have been purposely kept the same as in URBAN-SED v1.0 to ensure comparability to previously published results.
Description
URBAN-SED is a dataset of 10,000 soundscapes with sound event annotations generated using scaper (github.com/justinsalamon/scaper).
A detailed description of the dataset is provided in the following article:
A summary is provided here:
The dataset includes 10,000 soundscapes, totals almost 30 hours and includes close to 50,000 annotated sound events
Complete annotations are provided in JAMS format, and simplified annotations are provided as tab-separated text files
Every soundscape is 10 seconds long and has a background of Brownian noise resembling the typical “hum” often heard in urban environments
- Every soundscape contains between 1-9 sound events from the following classes:
air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren and street_music
The source material for the sound events are the clips from the UrbanSound8K dataset
- URBAN-SED comes pre-sorted into three sets: train, validate and test:
There are 6000 soundscapes in the training set, generated using clips from folds 1-6 in UrbanSound8K
There are 2000 soundscapes in the validation set, generated using clips from folds 7-8 in UrbanSound8K
There are 2000 soundscapes in the test set, generated using clips from folds 9-10 in UrbanSound8K
Further details about how the soundscapes were generated including the distribution of sound event start times, durations, signal-to-noise ratios, pitch shifting, time stretching, and the range of sound event polyphony (overlap) can be found in Section 3 of the aforementioned scaper paper
The scripts used to generated URBAN-SED using scaper can be found here: https://github.com/justinsalamon/scaper_waspaa2017/tree/master/notebooks
- Audio Files Included
10,000 synthesized soundscapes in single channel (mono), 44100Hz, 16-bit, WAV format.
The files are split into a training set (6000), validation set (2000) and test set (2000).
- Annotation Files Included
The annotations list the sound events that occur in every soundscape. The annotations are “strong”, meaning for every sound event the annotations include (at least) the start time, end time, and label of the sound event. Sound events come from the following 10 labels (categories):
air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer,
siren, street_music
There are two types of annotations: full annotations in JAMS format, and simplified annotations in tab-separated txt format.
- JAMS Annotations
The full annotations are distributed in JAMS format (https://github.com/marl/jams).
There are 10,000 JAMS annotation files, each one corresponding to a single soundscape with the same filename (other than the extension)
Each JAMS file contains a single annotation in the scaper namespace format - jams >=v0.3.2 is required in order to load the annotation into python with jams:
import jams jam = jams.load(‘soundscape_train_bimodal0.jams’). * The value of each observation (sound event) is a dictionary storing all scaper-related sound event parameters:
label, source_file, source_time, event_time, event_duration, snr, role, pitch_shift, time_stretch.
Note: the event_duration stored in the value dictionary represents the specified duration prior to any time
stretching. The actual event durtation in the soundscape is stored in the duration field of the JAMS observation.
The observations (sound events) in the JAMS annotation include both foreground sound events and the background(s).
The probabilistic scaper foreground and background event specifications are stored in the annotation’s sandbox, allowing
a complete reconstruction of the soundscape audio from the JAMS annotation (assuming access to the original source material) using scaper.generate_from_jams(‘soundscape_train_bimodal0.jams’). * The annotation sandbox also includes additional metadata such as the total number of foreground sound events, the maximum polyphony (sound event overlap) of the soundscape and its gini coefficient (a measure of soundscape complexity).
- Simplified Annotations
The simplified annotations are distributed as tab-separated text files.
There are 10,000 simplified annotation files, each one corresponding to a single soundscape with the same filename (other than the extension)
Each simplified annotation has a 3-column format (no header): start_time, end_time, label.
Background sounds are NOT included in the simplified annotations (only foreground sound events)
No additional information is stored in the simplified events (see the JAMS annotations for more details).
- Please Acknowledge URBAN-SED in Academic Research
When URBAN-SED is used for academic research, we would highly appreciate it if scientific publications of works partly based on the URBAN-SED dataset cite the following publication:
The creation of this dataset was supported by NSF award 1544753.
- Conditions of Use
Dataset created by J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello. Audio files contain excerpts of recordings uploaded to www.freesound.org. Please see FREESOUNDCREDITS.txt for an attribution list.
The URBAN-SED dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/
The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, NYU is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the URBAN-SED dataset or any part of it.
- Feedback
- Please help us improve URBAN-SED by sending your feedback to: justin.salamon@nyu.eduIn case of a problem report please include as many details as possible.
- class soundata.datasets.urbansed.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
URBAN-SED Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
clip_id (str) – clip id
events (soundata.annotations.Events) – sound events with start time, end time, label and confidence
split (str) – subset the clip belongs to (for experiments): train, validate, or test
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- events
The audio events
- Returns
annotations.Events - audio event object
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property split
The data splits (e.g. train)
- Returns
str - split
- class soundata.datasets.urbansed.Dataset(data_home=None)[source]
The URBAN-SED dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a UrbanSound8K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.urbansed.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float] [source]
Load a UrbanSound8K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- soundata.datasets.urbansed.load_events(fhandle: TextIO) Events [source]
Load an URBAN-SED sound events annotation file :Parameters: fhandle (str or file-like) – File-like object or path to the sound events annotation file
- Raises:
IOError – if txt_path doesn’t exist
- Returns:
Events – sound events annotation data
UrbanSound8K
UrbanSound8K Dataset Loader
Dataset Info
- Created By:
- Justin Salamon*^, Christopher Jacoby* and Juan Pablo Bello** Music and Audio Research Lab (MARL), New York University, USA^ Center for Urban Science and Progress (CUSP), New York University, USA
Version 1.0
- Description:
This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy described in the following article, which also includes a detailed description of the dataset and how it was compiled:
All excerpts are taken from field recordings uploaded to www.freesound.org. The files are pre-sorted into ten folds (folders named fold1-fold10) to help in the reproduction of and comparison with the automatic classification results reported in the article above.
In addition to the sound excerpts, a CSV file containing metadata about each excerpt is also provided.
- Audio Files Included:
8732 audio files of urban sounds (see description above) in WAV format. The sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).
UrbanSound8k.csv
This file contains meta-data information about every audio file in the dataset. This includes:
slice_file_name:
The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav, where: [fsID] = the Freesound ID of the recording from which this excerpt (slice) is taken [classID] = a numeric identifier of the sound class (see description of classID below for further details) [occurrenceID] = a numeric identifier to distinguish different occurrences of the sound within the original recording [sliceID] = a numeric identifier to distinguish different slices taken from the same occurrence
fsID:
The Freesound ID of the recording from which this excerpt (slice) is taken
start
The start time of the slice in the original Freesound recording
end:
The end time of slice in the original Freesound recording
salience:
A (subjective) salience rating of the sound. 1 = foreground, 2 = background.
fold:
The fold number (1-10) to which this file has been allocated.
classID:
A numeric identifier of the sound class: 0 = air_conditioner 1 = car_horn 2 = children_playing 3 = dog_bark 4 = drilling 5 = engine_idling 6 = gun_shot 7 = jackhammer 8 = siren 9 = street_music
class:
The class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music.
Please Acknowledge EigenScape in Academic Research:
When UrbanSound8K is used for academic research, we would highly appreciate it if scientific publications of works partly based on the UrbanSound8K dataset cite the following publication:
The creation of this dataset was supported by a seed grant by NYU’s Center for Urban Science and Progress (CUSP).
Conditions of Use
Dataset compiled by Justin Salamon, Christopher Jacoby and Juan Pablo Bello. All files are excerpts of recordings uploaded to www.freesound.org. Please see FREESOUNDCREDITS.txt for an attribution list.
The UrbanSound8K dataset is offered free of charge for non-commercial use only under the terms of the Creative Commons Attribution Noncommercial License (by-nc), version 3.0: http://creativecommons.org/licenses/by-nc/3.0/
The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, NYU is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the UrbanSound8K dataset or any part of it.
- Feedback
- Please help us improve UrbanSound8K by sending your feedback to: justin.salamon@nyu.eduIn case of a problem report please include as many details as possible.
- class soundata.datasets.urbansound8k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
urbansound8k Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
class_id (int) – integer representation of the class label (0-9). See Dataset Info in the documentation for mapping
class_label (str) – string class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music
clip_id (str) – clip id
fold (int) – fold number (1-10) to which this clip is allocated. Use these folds for cross validation
freesound_end_time (float) – end time in seconds of the clip in the original freesound recording
freesound_id (str) – ID of the freesound.org recording from which this clip was taken
freesound_start_time (float) – start time in seconds of the clip in the original freesound recording
salience (int) – annotator estimate of class sailence in the clip: 1 = foreground, 2 = background
slice_file_name (str) – The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav Please see the Dataset Info in the soundata documentation for further details
tags (soundata.annotations.Tags) – tag (label) of the clip + confidence. In UrbanSound8K every clip has one tag
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- property class_id
The clip’s class id.
- Returns:
int - integer representation of the class label (0-9). See Dataset Info in the documentation for mapping
- property class_label
The clip’s class label.
- Returns:
** str - string class name* – air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music
- property fold
The clip’s fold.
- Returns:
int - fold number (1-10) to which this clip is allocated. Use these folds for cross validation
- property freesound_end_time
The clip’s end time in Freesound.
- Returns:
float - end time in seconds of the clip in the original freesound recording
- property freesound_id
The clip’s Freesound ID.
- Returns:
str - ID of the freesound.org recording from which this clip was taken
- property freesound_start_time
The clip’s start time in Freesound.
- Returns:
float - start time in seconds of the clip in the original freesound recording
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property salience
The clip’s salience.
- Returns:
** int - annotator estimate of class sailence in the clip* – 1 = foreground, 2 = background
- property slice_file_name
The clip’s slice filename.
- Returns:
** str - The name of the audio file. The name takes the following format* – [fsID]-[classID]-[occurrenceID]-[sliceID].wav
- property tags
The clip’s tags.
- Returns:
annotations.Tags - tag (label) of the clip + confidence. In UrbanSound8K every clip has one tag
- class soundata.datasets.urbansound8k.Dataset(data_home=None)[source]
The urbansound8k dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a UrbanSound8K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.urbansound8k.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float] [source]
Load a UrbanSound8K audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
Warblrb10k
Warblrb10k Dataset Loader
Dataset Info
- Created By
- Dan Stowell*#, Mike Wood†, Yannis Stylianou‡, and Hervé Glotin§* Machine Listening Lab, Centre for Digital Music, Queen Mary University of London† Ecosystems and Environment Research Centre, School of Environment and Life Sciences, University of Salford‡ Computer Science Department, University of Crete§ LSIS UMR CNRS, University of Toulon, Institut Universitaire de France
Version 1.0
- Description
The Warblr dataset consists of 10,000 ten-second audio files, collected via the Warblr app from users across the UK in 2015-2016. Using a classification method by Stowell and Plumbley (2014a), this app aims to identify bird species from user-submitted recordings. The dataset, inclusive of various human and environmental noises, is broadly distributed over different times and seasons but has biases towards mornings, weekends, and populated areas. Despite having initial automated bird species estimates, the recordings underwent manual annotation due to precision inadequacies for establishing ground-truth data. The dataset proves instrumental for research and development in bird species detection amidst variable noise conditions.
- Audio Files Included
10,000 ten-second audio recordings in WAV format, amassed through the Warblr app during 2015-2016 from users throughout the UK.
- Meta-data Files Included
A table containing a binary label “hasbird” associated to every recording in Warblr is available on the website of the DCASE “Bird Audio Detection” challenge: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/
- Please Acknowledge Warblr in Academic Research
When the Warblr dataset is employed for academic research, we sincerely request that scientific publications of works partially based on this dataset cite the following publication:
Stowell, Dan and Wood, Michael and Pamuła, Hanna and Stylianou, Yannis and Glotin, Hervé. “Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge”, Methods in Ecology and Evolution, 2018.
The creation and curating of this dataset were possible through the participation and contributions of the general public using the Warblr app, enabling a comprehensive collection of bird sound recordings from various regions within the UK during 2015-2016.
- Conditions of Use
Dataset created by [Creators/Researchers involved].
The Warblr dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/
The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, [Affiliated Institution/Organization] is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the Warblr dataset or any part of it.
- class soundata.datasets.warblrb10k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
warblrb10k Clip class
- Parameters:
clip_id (str) – id of the clip
- Variables:
audio (np.ndarray, float) – path to the audio file
audio_path (str) – path to the audio file
item_id (str) – clip id
has_bird (str) – indication of whether the clips contains bird sounds (0/1)
- property audio: Optional[Tuple[numpy.ndarray, float]]
The clip’s audio
- Returns:
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]
Get absolute path to clip audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- property has_bird
The flag to tell whether the clip has bird sound or not.
- Returns:
str - 1/0 depending on whether the clip contains bird sound
- property item_id
The clip’s item ID.
- Returns:
str - ID of the clip
- class soundata.datasets.warblrb10k.Dataset(data_home=None)[source]
The Warblrb10k dataset
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_audio(*args, **kwargs)[source]
Load a Warblrb10k audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- soundata.datasets.warblrb10k.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float] [source]
Load a Warblrb10k audio file.
- Parameters:
fhandle (str or file-like) – File-like object or path to audio file
sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).
- Returns:
np.ndarray - the mono audio signal
float - The sample rate of the audio file
Core
Core soundata classes
- class soundata.core.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
Clip base class
See the docs for each dataset loader’s Clip class for details
- __init__(clip_id, data_home, dataset_name, index, metadata)[source]
Clip init method. Sets boilerplate attributes, including:
clip_id
_dataset_name
_data_home
_clip_paths
_clip_metadata
- Parameters:
clip_id (str) – clip id
data_home (str) – path where soundata will look for the dataset
dataset_name (str) – the identifier of the dataset
index (dict) – the dataset’s file index
metadata (function or None) – a function returning a dictionary of metadata or None
- class soundata.core.ClipGroup(clipgroup_id, data_home, dataset_name, index, clip_class, metadata)[source]
ClipGroup class.
A clipgroup class is a collection of clip objects and their associated audio that can be mixed together. A clipgroup is itself a Clip, and can have its own associated audio (such as a mastered mix), its own metadata and its own annotations.
- __init__(clipgroup_id, data_home, dataset_name, index, clip_class, metadata)[source]
Clipgroup init method. Sets boilerplate attributes, including:
clipgroup_id
_dataset_name
_data_home
_clipgroup_paths
_clipgroup_metadata
- Parameters:
clipgroup_id (str) – clipgroup id
data_home (str) – path where soundata will look for the dataset
dataset_name (str) – the identifier of the dataset
index (dict) – the dataset’s file index
metadata (function or None) – a function returning a dictionary of metadata or None
- property clip_audio_property
The clip’s audio property.
Returns:
- get_mix()[source]
Create a linear mixture given a subset of clips.
- Parameters:
clip_keys (list) – list of clip keys to mix together
- Returns:
np.ndarray – mixture audio with shape (n_samples, n_channels)
- get_path(key)[source]
Get absolute path to clipgroup audio and annotations. Returns None if the path in the index is None
- Parameters:
key (string) – Index key of the audio or annotation type
- Returns:
str or None – joined path string or None
- get_random_target(n_clips=None, min_weight=0.3, max_weight=1.0)[source]
Get a random target by combining a random selection of clips with random weights
- Parameters:
n_clips (int or None) – number of clips to randomly mix. If None, uses all clips
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing
- Returns:
np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included clips
list - list of weights used to mix clips
- get_target(clip_keys, weights=None, average=True, enforce_length=True)[source]
Get target which is a linear mixture of clips
- Parameters:
clip_keys (list) – list of clip keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the clips if False, computes a weighted sum of the clips
enforce_length (bool) – If True, raises ValueError if the clips are not the same length. If False, pads audio with zeros to match the length of the longest clip
- Returns:
np.ndarray – target audio with shape (n_channels, n_samples)
- Raises:
ValueError – if sample rates of the clips are not equal if enforce_length=True and lengths are not equal
- class soundata.core.Dataset(data_home=None, name=None, clip_class=None, clipgroup_class=None, bibtex=None, remotes=None, download_info=None, license_info=None, custom_index_path=None)[source]
soundata Dataset class
- Variables:
data_home (str) – path where soundata will look for the dataset
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
clip (function) – a function mapping a clip_id to a soundata.core.Clip
clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup
- __init__(data_home=None, name=None, clip_class=None, clipgroup_class=None, bibtex=None, remotes=None, download_info=None, license_info=None, custom_index_path=None)[source]
Dataset init method
- Parameters:
data_home (str or None) – path where soundata will look for the dataset
name (str or None) – the identifier of the dataset
clip_class (soundata.core.Clip or None) – a Clip class
clipgroup_class (soundata.core.Clipgroup or None) – a Clipgroup class
bibtex (str or None) – dataset citation/s in bibtex format
remotes (dict or None) – data to be downloaded
download_info (str or None) – download instructions or caveats
license_info (str or None) – license of the dataset
custom_index_path (str or None) – overwrites the default index path for remote indexes
- choice_clip()[source]
Choose a random clip
- Returns:
Clip – a Clip object instantiated by a random clip_id
- choice_clipgroup()[source]
Choose a random clipgroup
- Returns:
Clipgroup – a Clipgroup object instantiated by a random clipgroup_id
- property default_path
Get the default path for the dataset
- Returns:
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally print a message.
- Parameters:
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises:
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- explore_dataset(clip_id=None)[source]
Explore the dataset for a given clip_id or a random clip if clip_id is None.
- Parameters:
clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.
- load_clipgroups()[source]
Load all clipgroups in the dataset
- Returns:
dict – {clipgroup_id: clipgroup data}
- Raises:
NotImplementedError – If the dataset does not support Clipgroups
- class soundata.core.cached_property(func)[source]
Cached propery decorator
A property that is only computed once per instance and then replaces itself with an ordinary attribute. Deleting the attribute resets the property. Source: https://github.com/bottlepy/bottle/commit/fa7733e075da0d790d809aa3d2f53071897e6f76
Annotations
soundata annotation data types
- soundata.annotations.AZIMUTH_UNITS = {'degrees': 'values in the interval [-360, 360]', 'radians': 'values in the interval [-2*pi, 2*pi]'}
Azimuth units
- soundata.annotations.DISTANCE_UNITS = {'centimeters': 'centimeters', 'meters': 'meters', 'millimeters': 'millimeters'}
Distance units
- soundata.annotations.ELEVATIONS_UNITS = {'degrees': 'degrees'}
position units
- class soundata.annotations.Events(intervals, intervals_unit, labels, labels_unit, confidence=None, azimuth=None, azimuth_unit=None, elevation=None, elevation_unit=None, distance=None, distance_unit=None, cartesian_coord=None, cartesian_coord_unit=None)[source]
Events class
- Variables:
intervals (np.ndarray) – (n x 2) array of intervals (as floats) in seconds in the form [start_time, end_time] with positive time stamps and end_time >= start_time.
labels (list) – list of event labels (as strings)
confidence (np.ndarray or None) – array of confidence values, float in [0, 1]
labels_unit (str) – labels unit, one of LABELS_UNITS
intervals_unit (str) – intervals unit, one of TIME_UNITS
azimuth (np.ndarray or None) – list of size n with np.ndarrays with dtype float, indicating the azimuth of the sound event. Values between -360 and 360 for degrees and between -2*pi, 2*pi for radians or None.
azimuth_unit (str) – azimuth unit, one of AZIMUTH_UNITS
elevation (np.ndarray or None) – list of size n with np.ndarrays with dtype float, indicating the elevation of the sound event. Values between -90 and 90 or None.
elevation_unit (str) – elevation unit, one of AZIMUTH_UNITS
distance (np.ndarray or None) – list of size n with np.ndarrays with dtype float, indicating the distance of the sound event. Values must be positive or None.
distance_unit (str) – distance unit, one of DISTANCE_UNITS
cartesian_coord (np.ndarray or None) –
cartesian_coord_unit (str) – cartesian_coord unit, one of DISTANCE_UNITS
- soundata.annotations.LABEL_UNITS = {'open': 'no strict schema or units'}
Label units
- class soundata.annotations.MultiAnnotator(annotators, annotations)[source]
Multiple annotator class. This class should be used for datasets with multiple annotators (e.g. multiple annotators per clip).
- Variables:
annotators (list) – list with annotator ids
annotations (list) – list of annotations (e.g. [annotations.Tags, annotations.Tags]
- class soundata.annotations.SpatialEvents(intervals, intervals_unit, elevations, elevations_unit, azimuths, azimuths_unit, distances, distances_unit, labels, labels_unit, clip_number_index=None, time_step=None, confidence=None)[source]
SpatialEvents class :ivar intervals: list of size n np.ndarrays of shape (m, 2), with intervals
(as floats) in TIME_UNITS in the form [start_time, end_time] with positive time stamps and end_time >= start_time. n is the number of sound events. m is the number of sounding instances for each sound event.
- Variables:
intervals_unit (str) – intervals unit, one of TIME_UNITS
time_step (int, float, or None) – the time-step between events over time in intervals_unit
elevations (list) – list of size n with np.ndarrays with dtype int, indicating the elevation of the sound event per time_step if moving or a single value if static. Values between -90 and 90
elevations_unit (str) – elevations unit, one of ELEVATIONS_UNITS
azimuths (list) – list of size n with np.ndarrays with dtype int, indicating the azimuth of the sound event per time_step if moving or a single value if static. Values between -180 and 180
azimuths_unit (str) – azimuths unit, one of AZIMUTHS_UNITS
distances (list) – list of size n with np.ndarrays with dtype int, indicating the distance of the sound event per time_step if moving or a single value if static. Values must be positive or None
distances_unit (str) – distances unit, one of DISTANCES_UNITS
labels (list) – list of event labels (as strings)
labels_unit (str) – labels unit, one of LABELS_UNITS
clip_number_indices (list) – list of clip number indices (as strings)
confidence (np.ndarray or None) – array of confidence values, float in [0, 1]
- soundata.annotations.TIME_UNITS = {'milliseconds': 'milliseconds', 'seconds': 'seconds'}
Time units
- class soundata.annotations.Tags(labels, labels_unit, confidence=None)[source]
Tags class
- Variables:
labels (list) – list of string tags
confidence (np.ndarray or None) – array of confidence values, float in [0, 1]
labels_unit (str) – labels unit, one of LABELS_UNITS
- soundata.annotations.validate_array_like(array_like, expected_type, expected_dtype, check_child=False, none_allowed=False)[source]
Validate that array-like object is well formed If array_like is None, validation passes automatically. :Parameters: * array_like (array-like) – object to validate
expected_type (type) – expected type, either list or np.ndarray
expected_dtype (type) – expected dtype
check_child (bool) – if True, checks if all elements of array are children of expected_dtype
none_allowed (bool) – if True, allows array to be None
- Raises:
TypeError – if type/dtype does not match expected_type/expected_dtype
ValueError – if array
- soundata.annotations.validate_confidence(confidence)[source]
Validate if confidence is well-formed.
If confidence is None, validation passes automatically
- Parameters:
confidence (np.ndarray) – an array of confidence values
- Raises:
ValueError – if confidence are not between 0 and 1
- soundata.annotations.validate_intervals(intervals)[source]
Validate if intervals are well-formed.
If intervals is None, validation passes automatically
- Parameters:
intervals (np.ndarray) – (n x 2) array
- Raises:
ValueError – if intervals have an invalid shape, have negative values
or if end times are smaller than start times. –
- soundata.annotations.validate_lengths_equal(array_list)[source]
Validate that arrays in list are equal in length
Some arrays may be None, and the validation for these are skipped.
- Parameters:
array_list (list) – list of array-like objects
- Raises:
ValueError – if arrays are not equal in length
- soundata.annotations.validate_locations(locations)[source]
Validate if locations are well-formed. If locations is None, validation passes automatically :Parameters: locations (np.ndarray) – (n x 3) array
- Raises:
ValueError – if locations have an invalid shape or have cartesian coordinate values outside the expected ranges.
- soundata.annotations.validate_time_steps(time_step, locations, interval)[source]
Validate if timesteps are well-formed. If locations is None, validation passes automatically :Parameters: * time_step (float) – spacing between location steps
locations (np.ndarray) – (n x 3) array
interval (np.ndarray) – (n x 2) expected start and end time for the locations
- Raises:
ValueError – if the number of locations does not match the number of time_steps that fit in the interval
- soundata.annotations.validate_times(times)[source]
Validate if times are well-formed.
If times is None, validation passes automatically
- Parameters:
times (np.ndarray) – an array of time stamps
- Raises:
ValueError – if times have negative values or are non-increasing
- soundata.annotations.validate_unit(unit, unit_values, allow_none=False)[source]
Validate that the given unit is one of the allowed unit values. :Parameters: * unit (str) – the unit name
unit_values (dict) – dictionary of possible unit values
allow_none (bool) – if true, allows unit=None to pass validation
- Raises:
ValueError – If the given unit is not one of the allowed unit values
Advanced
soundata.validate
Utility functions for soundata
- soundata.validate.log_message(message, verbose=True)[source]
Helper function to log message
- Parameters:
message (str) – message to log
verbose (bool) – if false, the message is not logged
- soundata.validate.md5(file_path)[source]
Get md5 hash of a file.
- Parameters:
file_path (str) – File path
- Returns:
str – md5 hash of data in file_path
- soundata.validate.validate(local_path, checksum)[source]
Validate that a file exists and has the correct checksum
- Parameters:
local_path (str) – file path
checksum (str) – md5 checksum
- Returns:
bool - True if file exists
bool - True if checksum matches
- soundata.validate.validate_files(file_dict, data_home, verbose)[source]
Validate files
- Parameters:
file_dict (dict) – dictionary of file information
data_home (str) – path where the data lives
verbose (bool) – if True, show progress
- Returns:
dict - missing files
dict - files with invalid checksums
- soundata.validate.validate_index(dataset_index, data_home, verbose=True)[source]
Validate files in a dataset’s index
- Parameters:
dataset_index (list) – dataset indices
data_home (str) – Local home path that the dataset is being stored
verbose (bool) – if true, prints validation status while running
- Returns:
dict - file paths that are in the index but missing locally
dict - file paths with differing checksums
- soundata.validate.validate_metadata(file_dict, data_home, verbose)[source]
Validate files
- Parameters:
file_dict (dict) – dictionary of file information
data_home (str) – path where the data lives
verbose (bool) – if True, show progress
- Returns:
dict - missing files
dict - files with invalid checksums
- soundata.validate.validator(dataset_index, data_home, verbose=True)[source]
Checks the existence and validity of files stored locally with respect to the paths and file checksums stored in the reference index. Logs invalid checksums and missing files.
- Parameters:
dataset_index (list) – dataset indices
data_home (str) – Local home path that the dataset is being stored
verbose (bool) – if True (default), prints missing and invalid files to stdout. Otherwise, this function is equivalent to validate_index.
- Returns:
missing_files (list) –
- List of file paths that are in the dataset index
but missing locally.
- invalid_checksums (list): List of file paths that file exists in the
dataset index but has a different checksum compare to the reference checksum.
soundata.download_utils
utilities for downloading from the web.
- class soundata.download_utils.DownloadProgressBar(*_, **__)[source]
Wrap tqdm to show download progress
- class soundata.download_utils.RemoteFileMetadata(filename, url, checksum, destination_dir=None, unpack_directories=None)[source]
The metadata for a remote file
- Variables:
filename (str) – the remote file’s basename
url (str) – the remote file’s url
checksum (str) – the remote file’s md5 checksum
destination_dir (str or None) – the relative path for where to save the file
unpack_directories (list or None) – list of relative directories. For each directory the contents will be moved to destination_dir (or data_home if not provided)
- soundata.download_utils.download_7z_file(tar_remote, save_dir, force_overwrite, cleanup)[source]
Download and untar a tar file.
- Parameters:
tar_remote (RemoteFileMetadata) – Object containing download information
save_dir (str) – Path to save downloaded file
force_overwrite (bool) – If True, overwrites existing files
cleanup (bool) – If True, remove tarfile after untarring
- soundata.download_utils.download_from_remote(remote, save_dir, force_overwrite)[source]
Download a remote dataset into path Fetch a dataset pointed by remote’s url, save into path using remote’s filename and ensure its integrity based on the MD5 Checksum of the downloaded file.
Adapted from scikit-learn’s sklearn.datasets.base._fetch_remote.
- Parameters:
remote (RemoteFileMetadata) – Named tuple containing remote dataset meta information: url, filename and checksum
save_dir (str) – Directory to save the file to. Usually data_home
force_overwrite (bool) – If True, overwrite existing file with the downloaded file. If False, does not overwrite, but checks that checksum is consistent.
- Returns:
str – Full path of the created file.
- soundata.download_utils.download_multipart_zip(zip_remotes, save_dir, force_overwrite, cleanup)[source]
Download and unzip a multipart zip file.
- Parameters:
zip_remotes (list) – A list of RemoteFileMetadata Objects containing download information
save_dir (str) – Path to save downloaded file
force_overwrite (bool) – If True, overwrites existing files
cleanup (bool) – If True, remove zipfile after unziping
- soundata.download_utils.download_tar_file(tar_remote, save_dir, force_overwrite, cleanup)[source]
Download and untar a tar file.
- Parameters:
tar_remote (RemoteFileMetadata) – Object containing download information
save_dir (str) – Path to save downloaded file
force_overwrite (bool) – If True, overwrites existing files
cleanup (bool) – If True, remove tarfile after untarring
- soundata.download_utils.download_zip_file(zip_remote, save_dir, force_overwrite, cleanup)[source]
Download and unzip a zip file.
- Parameters:
zip_remote (RemoteFileMetadata) – Object containing download information
save_dir (str) – Path to save downloaded file
force_overwrite (bool) – If True, overwrites existing files
cleanup (bool) – If True, remove zipfile after unziping
- soundata.download_utils.downloader(save_dir, remotes=None, partial_download=None, info_message=None, force_overwrite=False, cleanup=False)[source]
Download data to save_dir and optionally log a message.
- Parameters:
save_dir (str) – The directory to download the data
remotes (dict or None) – A dictionary of RemoteFileMetadata tuples of data in zip format. If an element of the dictionary is a list of RemoteFileMetadata,
it is handled as a multipart zip file
If None, there is no data to download
partial_download (list or None) – A list of keys to partially download the remote objects of the download dict. If None, all data is downloaded
info_message (str or None) – A string of info to log when this function is called. If None, no string is logged.
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete the zip/tar file after extracting.
- soundata.download_utils.extractall_unicode(zfile, out_dir)[source]
Extract all files inside a zip archive to a output directory.
In comparison to the zipfile, it checks for correct file name encoding
- Parameters:
zfile (obj) – Zip file object created with zipfile.ZipFile
out_dir (str) – Output folder
- soundata.download_utils.move_directory_contents(source_dir, target_dir)[source]
Move the contents of source_dir into target_dir, and delete source_dir
- Parameters:
source_dir (str) – path to source directory
target_dir (str) – path to target directory
- soundata.download_utils.un7z(sevenz_path, cleanup)[source]
Unzip a 7z file inside its current directory.
- Parameters:
sevenz_path (str) – Path to the 7z file
cleanup (bool) – If True, remove 7z file after extraction
soundata.jams_utils
Utilities for converting soundata Annotation classes to jams format.
- soundata.jams_utils.events_to_jams(events, annotator=None, description=None)[source]
Convert events annotations into jams format.
- Parameters:
events (annotations.Events) – events data object
annotator (str) – annotator id
description (str) – annotation description
- Returns:
jams.Annotation – jams annotation object.
- soundata.jams_utils.jams_converter(audio_path=None, spectrogram_path=None, metadata=None, tags=None, events=None)[source]
Convert annotations from a clip to JAMS format.
- Parameters:
audio_path (str or None) – A path to the corresponding audio file, or None. If provided, the audio file will be read to compute the duration. If None, ‘duration’ must be a field in the metadata dictionary, or the resulting jam object will not validate.
spectrogram_path (str or None) – A path to the corresponding spectrum file, or None.
tags (annotations.Tags or annotations.MultiAnnotator or None) – An instance of annotations.Tags/annotations.MultiAnnotator describing the audio tags.
events (annotations.Events or annotations.MultiAnnotator or None) – An instance of annotations.Events/annotations.MultiAnnotator describing the sound events.
- Returns:
jams.JAMS – A JAMS object containing the annotations.
- soundata.jams_utils.multiannotator_to_jams(multiannot: MultiAnnotator, converter: Callable[[...], Annotation], **kwargs) List[jams.Annotation] [source]
Convert tags annotations into jams format.
- Parameters:
tags (annotations.MultiAnnotator) – MultiAnnotator object
converter (Callable[…, annotations.Annotation]) – a function that takes an annotation object, its annotator, (and other optional arguments), and return a jams annotation object
- Returns:
List[jams.Annotation] – List of jams annotation objects.
- soundata.jams_utils.tags_to_jams(tags, annotator=None, duration=0, namespace='tag_open', description=None)[source]
Convert tags annotations into jams format.
- Parameters:
tags (annotations.Tags) – tags annotation object
annotator (str) – annotator id
namespace (str) – the jams-compatible tag namespace
description (str) – annotation description
- Returns:
jams.Annotation – jams annotation object.