Initialize a dataset

soundata.initialize(dataset_name, data_home=None)[source]

Load a soundata dataset by name

Example

urbansound8k = soundata.initialize('urbansound8k')  # get the urbansound8k dataset
urbansound8k.download()  # download orchset
urbansound8k.validate()  # validate orchset
clip = urbansound8k.choice_clip()  # load a random clip
print(clip)  # see what data a clip contains
urbansound8k.clip_ids()  # load all clip ids
Parameters:
  • dataset_name (str) – the dataset’s name see soundata.DATASETS for a complete list of possibilities

  • data_home (str or None) – path where the data lives. If None uses the default location.

Returns:

Dataset – a soundata.core.Dataset object

soundata.list_datasets()[source]

Get a list of all soundata dataset names

Returns:

list – list of dataset names as strings

Dataset Loaders

3D-MARCo

3D-MARCo Dataset Loader

class soundata.datasets.marco.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

3D-MARCo Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • source_label (str) – label of the source being recorded

  • source_angle (str) – angle of the source being recorded

  • audio_path (str) – path to the audio file

  • clip_id (str) – clip id

  • microphone_info (list) – list of strings with all relevant microphone metadata

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.marco.Dataset(data_home=None)[source]

The 3D-MARCo dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a 3D-MARCo audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 48000 by default, which re-samples all files except the EigenMike ones, resulting in constant sampling rate between all clips in the dataset.

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.marco.load_audio(fhandle: BinaryIO, sr=48000) Tuple[numpy.ndarray, float][source]

Load a 3D-MARCo audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 48000 by default, which re-samples all files except the EigenMike ones, resulting in constant sampling rate between all clips in the dataset.

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

DCASE23-Task2

DCASE23_Task2 Dataset Loader

class soundata.datasets.dcase23_task2.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

DCASE23_Task2 Clip class :Parameters: clip_id (str) – ID of the clip

Variables:
  • audio (np.ndarray, float) – Array representation of the audio clip

  • audio_path (str) – Path to the audio file

  • file_name (str) – Name of the clip file, useful for cross-referencing

  • d1p (str) – First domain shift parameter specifying the attribute causing the domain shift

  • d1v (str) – First domain shift value or type associated with the domain shift parameter

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property d1p

The clip’s first domain shift parameter (d1p).

Returns:

  • str - first domain shift parameter of the clip

property d1v

The clip’s first domain shift value (d1v).

Returns:

  • str - first domain shift value of the clip

property file_name

The clip’s file name.

Used for cross-referencing with attribute CSV files for additional metadata.

Returns:

  • str - name of the clip file

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.dcase23_task2.Dataset(data_home=None)[source]

The DCASE23_Task2 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a DCASE23_Task2 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.dcase23_task2.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float][source]

Load a DCASE23_Task2 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

DCASE23-Task4B

DCASE23 Task 4B Dataset Loader

class soundata.datasets.dcase23_task4b.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

DCASE23_Task4B Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • annotations_path (str) – path to the annotations file

  • clip_id (str) – clip id

  • events (soundata.annotations.Events) – sound events with start time, end time, label and confidence

  • split (str) – subset the clip belongs to: development or evaluation

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

events

The clip’s events.

Returns:

  • annotations.Events - sound events with start time, end time, label and confidence

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property split

The clip’s split.

Returns:

** str - subset the clip belongs to* – development or evaluation

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.dcase23_task4b.Dataset(data_home=None)[source]

The DCASE23_Task4B dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a DCASE23_Task4B audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the stereo audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

load_events(*args, **kwargs)[source]

Load a DCASE23_Task4B annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to the sound

  • events annotation file

Returns:

Events – sound events annotation data

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.dcase23_task4b.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a DCASE23_Task4B audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the stereo audio signal

  • float - The sample rate of the audio file

soundata.datasets.dcase23_task4b.load_events(fhandle: TextIO) Events[source]

Load a DCASE23_Task4B annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to the sound

  • events annotation file

Returns:

Events – sound events annotation data

DCASE23-Task6a

DCASE 2023 Task-6A Dataset Loader

class soundata.datasets.dcase23_task6a.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

DCASE’23 Task 6A Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – Audio signal and sample rate.

  • file_name (str) – Name of the file.

  • keywords (str) – Associated keywords.

  • sound_id (str) – Unique identifier for the sound.

  • sound_link (str) – Link to the sound.

  • start_end_samples (tuple) – Start and end samples in the audio file.

  • manufacturer (str) – Manufacturer of the recording equipment.

  • license (str) – License of the clip.

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property file_name

The name of the audio file.

Returns:

  • str - Name of the file.

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property keywords

Keywords associated with the clip.

Returns:

  • str - Keywords for the clip.

property license

License of the clip.

Returns:

  • str - License information.

property manufacturer

Manufacturer of the recording equipment.

Returns:

  • str - Manufacturer name.

property sound_id

Unique identifier for the sound.

Returns:

  • str - Sound ID.

Link to the sound.

Returns:

  • str - URL of the sound.

property start_end_samples

Start and end samples in the audio file.

Returns:

  • tuple - Start and end samples.

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.dcase23_task6a.Dataset(data_home=None)[source]

The DCASE’23 Task 6A dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a DCASE’23 Task 6A audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.dcase23_task6a.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a DCASE’23 Task 6A audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

DCASE23-Task6b

DCASE 2023 Task-6B Dataset Loader

class soundata.datasets.dcase23_task6b.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

DCASE’23 Task 6B Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – Audio signal and sample rate.

  • file_name (str) – Name of the file.

  • keywords (str) – Associated keywords.

  • sound_id (str) – Unique identifier for the sound.

  • sound_link (str) – Link to the sound.

  • start_end_samples (tuple) – Start and end samples in the audio file.

  • manufacturer (str) – Manufacturer of the recording equipment.

  • license (str) – License of the clip.

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property file_name

The name of the audio file.

Returns:

  • str - Name of the file.

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property keywords

Keywords associated with the clip.

Returns:

  • str - Keywords for the clip.

property license

License of the clip.

Returns:

  • str - License information.

property manufacturer

Manufacturer of the recording equipment.

Returns:

  • str - Manufacturer name.

property sound_id

Unique identifier for the sound.

Returns:

  • str - Sound ID.

Link to the sound.

Returns:

  • str - URL of the sound.

property start_end_samples

Start and end samples in the audio file.

Returns:

  • tuple - Start and end samples.

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.dcase23_task6b.Dataset(data_home=None)[source]

The DCASE’23 Task 6B dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a DCASE’23 Task 6B audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.dcase23_task6b.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a DCASE’23 Task 6B audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

DCASE-bioacoustic

DCASE-BIOACOUSTIC Dataset Loader

class soundata.datasets.dcase_bioacoustic.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

DCASE bioacoustic Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • csv_path (str) – path to the csv file

  • clip_id (str) – clip id

  • split (str) – subset the clip belongs to (for experiments): train, validate, or test

Other Parameters:
  • events_classes (list) – list of classes annotated for the file

  • events (soundata.annotations.Events) – sound events with start time, end time, labels (list for all classes) and confidence

  • POSevents (soundata.annotations.Events) – sound events for the positive class with start time, end time, label and confidence

POSevents

The audio events for POS (positive) class

Returns
  • annotations.Events - audio event object

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

events

The audio events

Returns
  • annotations.Events - audio event object

events_classes

The audio events

Returns
  • list - list of the annotated events

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property split

The data splits (e.g. train)

Returns
  • str - split

property subdataset

The (sub)dataset

Returns
  • str - subdataset

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.dcase_bioacoustic.Dataset(data_home=None)[source]

The DCASE bioacoustic dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a DCASE bioacoustic audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.dcase_bioacoustic.load_POSevents(fhandle: TextIO) Events[source]

Load an DCASE bioacoustic sound events annotation file, just for POS labels

Parameters:

fhandle (str or file-like) – File-like object or path to the sound events annotation file

Raises:

IOError – if csv_path doesn’t exist

Returns:

Events – sound events annotation data

soundata.datasets.dcase_bioacoustic.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a DCASE bioacoustic audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

soundata.datasets.dcase_bioacoustic.load_events(fhandle: TextIO) Events[source]

Load an DCASE bioacoustic sound events annotation file

Parameters:

fhandle (str or file-like) – File-like object or path to the sound events annotation file

Raises:

IOError – if csv_path doesn’t exist

Returns:

Events – sound events annotation data

soundata.datasets.dcase_bioacoustic.load_events_classes(fhandle: TextIO) list[source]

Load an DCASE bioacoustic sound events annotation file

Parameters:
  • fhandle (str or file-like) – File-like object or path to the sound events annotation file

  • positive (bool) – False get all labels, True get just POS labels

Raises:

IOError – if csv_path doesn’t exist

Returns:

class_ids – list of events classes

DCASE-birdVox20k

BirdVox20k Dataset Loader

class soundata.datasets.dcase_birdVox20k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

BirdVox20k Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • itemid (str) – clip id

  • datasetid (str) – the dataset to which the clip belongs to

  • hasbird (str) – indication of whether the clips contains bird sounds (0/1)

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property dataset_id

The clip’s dataset ID.

Returns:

  • str - ID of the dataset from where this clip is extracted

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property has_bird

The flag to tell whether the clip has bird sound or not.

Returns:

  • str - 1/0 depending on whether the clip contains bird sound

property item_id

The clip’s item ID.

Returns:

  • str - ID of the clip

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.dcase_birdVox20k.Dataset(data_home=None)[source]

The BirdVox20k dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a BirdVox20k audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.dcase_birdVox20k.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float][source]

Load a BirdVox20k audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

EigenScape

EigenScape Dataset Loader

class soundata.datasets.eigenscape.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

Eigenscape Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • tags (soundata.annotation.Tags) – tag (scene label) of the clip + confidence.

  • audio_path (str) – path to the audio file

  • clip_id (str) – clip id

  • location (str) – city were the audio signal was recorded

  • time (str) – time when the audio signal was recorded

  • date (str) – date when the audio signal was recorded

  • information (additional) – notes included by the dataset authors with other details relevant to the specific clip

property additional_information

The clip’s additional information.

Returns:

  • str - notes included by the dataset authors with other details relevant to the specific clip

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property date

The clip’s date.

Returns:

  • str - date when the audio signal was recorded

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property location

The clip’s location.

Returns:

  • str - Tags annotation object

property tags

The clip’s tags

Returns:

  • annotations.Tags - Tags (scene label) of the clip + confidence.

property time

The clip’s time.

Returns:

  • str - time when the audio signal was recorded

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.eigenscape.Dataset(data_home=None)[source]

The EigenScape dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load an EigenScape audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.eigenscape.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load an EigenScape audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

EigenScape Raw

EigenScape Dataset Loader

class soundata.datasets.eigenscape_raw.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

Eigenscape Raw Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio_path (str) – path to the audio file

  • information (additional) – notes included by the dataset authors with other details relevant to the specific clip

  • clip_id (str) – clip id

  • date (str) – date when the audio signal was recorded

  • location (str) – city were the audio signal was recorded

  • tags (soundata.annotation.Tags) – tag (scene label) of the clip + confidence.

  • time (str) – time when the audio signal was recorded

property additional_information

The clip’s additional information.

Returns:

  • str - notes included by the dataset authors with other details relevant to the specific clip

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property date

The clip’s date.

Returns:

  • str - date when the audio signal was recorded

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property location

The clip’s location.

Returns:

  • str - Tags annotation object

property tags

The clip’s tags

Returns:

  • annotations.Tags - Tags (scene label) of the clip + confidence.

property time

00-23:59).

Returns:

  • str - time when the audio signal was recorded

Type:

The clip’s time (00

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.eigenscape_raw.Dataset(data_home=None)[source]

The EigenScape Raw dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load an EigenScape Raw audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.eigenscape_raw.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load an EigenScape Raw audio file. :Parameters: * fhandle (str or file-like) – file-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sampling rate of 48000 without resampling.

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

ESC-50

ESC-50 Dataset Loader

class soundata.datasets.esc50.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

ESC-50 Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • category (str) – clip class in string format, i.e., label

  • clip_id (str) – clip id

  • esc10 (bool) – True if the clip belongs to the ESC-10 subset (10 selected classes, CC BY license)

  • filename (str) – clip filename

  • fold (int) – index of the cross-validation fold the clip belongs to

  • src_file (str) – freesound ID of the original file from which the clip was taken

  • tags (soundata.annotations.Tags) – tag (label) of the clip + confidence. In ESC-50 every clip has one tag.

  • take (str) – letter disambiguating between different fragments from the same Freesound clip (e.g., “A”, “B”, etc.)

  • target (int) – clip class in numeric format

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property category

The clip’s category.

Returns:

  • str - clip class in string format, i.e., label

property esc10

The clip’s esc10.

Returns:

  • bool - True if the clip belongs to the ESC-10 subset (10 selected classes, CC BY license)

property filename

The clip’s filename

Returns:

  • str - clip filename

property fold

The clip’s fold

Returns:

  • int - index of the cross-validation fold the clip belongs to

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property src_file

The clip’s source file.

Returns:

  • str - freesound ID of the original file from which the clip was taken

property tags

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property take

The clip’s take

Returns:

  • str - letter disambiguating between different fragments from the same Freesound clip (e.g., “A”, “B”, etc.)

property target

The clip’s target.

Returns:

  • int - clip class in numeric format

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.esc50.Dataset(data_home=None)[source]

The ESC-50 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load an ESC-50 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which loads the file using its original sample rate of 44100.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.esc50.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load an ESC-50 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which loads the file using its original sample rate of 44100.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

Freefield1010

freefield1010 Dataset Loader

class soundata.datasets.freefield1010.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

freefield1010 Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • itemid (str) – clip id

  • datasetid (str) – the dataset to which the clip belongs to

  • hasbird (str) – indication of whether the clips contains bird sounds (0/1)

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property dataset_id

The clip’s dataset ID.

Returns:

  • str - ID of the dataset from where this clip is extracted

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property has_bird

The flag to tell whether the clip has bird sound or not.

Returns:

  • str - 1/0 depending on whether the clip contains bird sound

property item_id

The clip’s item ID.

Returns:

  • str - ID of the clip

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.freefield1010.Dataset(data_home=None)[source]

The freefield1010 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a freefield1010 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.freefield1010.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float][source]

Load a freefield1010 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

FSD50K

FSD50K Dataset Loader

class soundata.datasets.fsd50k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

FSD50K Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • clip_id (str) – clip id

  • description (str) – description of the sound provided by the Freesound uploader

  • mids (soundata.annotations.Tags) – tag (labels) encoded in Audioset formatting

  • pp_pnp_ratings (dict) – PP/PNP ratings given to the main label of the clip

  • split (str) – flag to identify if clip belongs to developement, evaluation or validation splits

  • tags (soundata.annotations.Tags) – tag (label) of the clip + confidence

  • title (str) – the title of the uploaded file in Freesound

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio.

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property description

The clip’s description.

Returns:

  • str - description of the sound provided by the Freesound uploader

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property mids

The clip’s mids.

Returns:

  • annotations.Tags - tag (labels) encoded in Audioset formatting

property pp_pnp_ratings

The clip’s PP/PNP ratings.

Returns:

  • dict - PP/PNP ratings given to the main label of the clip

property split

The clip’s split.

Returns:

  • str - flag to identify if clip belongs to developement, evaluation or validation splits

property tags

The clip’s tags.

Returns:

  • annotations.Tags - tag (label) of the clip + confidence

property title

The clip’s title.

Returns:

  • str - the title of the uploaded file in Freesound

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.fsd50k.Dataset(data_home=None)[source]

The FSD50K dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a FSD50K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

load_fsd50k_vocabulary(*args, **kwargs)[source]

Load vocabulary of FSD50K to relate FSD50K labels with AudioSet onthology

Parameters:

data_path (str) – Path to the vocabulary file

Returns:

** fsd50k_to_audioset (dict)* – vocabulary to convert FSD50K to AudioSet * audioset_to_fsd50k (dict): vocabulary to convert from AudioSet to FSD50K

load_ground_truth(*args, **kwargs)[source]

Load ground truth files of FSD50K

Parameters:

data_path (str) – Path to the ground truth file

Returns:

** ground_truth_dict (dict)* – ground truth dict of the clips in the input split * clip_ids (list): list of clip ids of the input split

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.fsd50k.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a FSD50K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

soundata.datasets.fsd50k.load_fsd50k_vocabulary(data_path)[source]

Load vocabulary of FSD50K to relate FSD50K labels with AudioSet onthology

Parameters:

data_path (str) – Path to the vocabulary file

Returns:

** fsd50k_to_audioset (dict)* – vocabulary to convert FSD50K to AudioSet * audioset_to_fsd50k (dict): vocabulary to convert from AudioSet to FSD50K

soundata.datasets.fsd50k.load_ground_truth(data_path)[source]

Load ground truth files of FSD50K

Parameters:

data_path (str) – Path to the ground truth file

Returns:

** ground_truth_dict (dict)* – ground truth dict of the clips in the input split * clip_ids (list): list of clip ids of the input split

FSDnoisy18K

FSDnoisy18K Dataset Loader

class soundata.datasets.fsdnoisy18k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

FSDnoisy18K Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • aso_id (str) – the id of the corresponding category as per the AudioSet Ontology

  • audio_path (str) – path to the audio file

  • clip_id (str) – clip id

  • manually_verified (int) – flag to indicate whether the clip belongs to the clean portion (1), or to the noisy portion (0) of the train set

  • noisy_small (int) – flag to indicate whether the clip belongs to the noisy_small portion (1) of the train set

  • split (str) – flag to indicate whether the clip belongs the train or test split

  • tag (soundata.annotations.Tags) – tag (label) of the clip + confidence

property aso_id

The clip’s Audioset ontology ID.

Returns:

  • str - the id of the corresponding category as per the AudioSet Ontology

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property manually_verified

The clip’s manually annotated flag.

Returns:

  • int - flag to indicate whether the clip belongs to the clean portion (1), or to the noisy portion (0) of the train set

property noisy_small

The clip’s noisy flag.

Returns:

  • int - flag to indicate whether the clip belongs to the noisy_small portion (1) of the train set

property split

The clip’s split.

Returns:

  • str - flag to indicate whether the clip belongs the train or test split

property tags

The clip’s tags.

Returns:

  • annotations.Tags - tag (label) of the clip + confidence

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.fsdnoisy18k.Dataset(data_home=None)[source]

The FSDnoisy18K dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a FSDnoisy18K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.fsdnoisy18k.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a FSDnoisy18K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

SINGA:PURA

SINGA:PURA Dataset Loader

class soundata.datasets.singapura.Clip(clip_id, data_home, dataset_name, index, metadata)[source]
Parameters:

clip_id (str) – clip id of the clip

Variables:
  • clip_id (str) – clip id

  • audio (np.ndarray, float) – audio data

  • audio_path (str) – path to the audio file

  • events (annotations.MultiAnnotator) – sound events with start time, end time, label and confidence

  • annotation_path (str) – path to the annotation file

  • sensor_id (str) – sensor_id of the device used to record the data

  • town (str) – town in Singapore where the sensor is located

  • timestamp (np.datetime) – timestamp of the recording

  • dotw (int) – day of the week when the clip was recorded, starting from 0 for Sunday

property audio

The clip’s audio

Returns:

  • np.ndarray - audio signal

property dotw: int

The clip’s day of the week

Returns:

  • int - day of the week when the clip was recorded, starting from 0 for Sunday

events

The clip’s event annotations

Returns:

  • annotations.MultiAnnotator - sound events with start time, end time, label and confidence

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property sensor_id: str

The clip’s sensor ID

Returns:

  • str - sensor_id of the device used to record the data

property timestamp: numpy.datetime64

The clip’s timestamp

Returns:

  • np.datetime64 - timestamp of the clip

to_jams()[source]

Jams: the clip’s data in jams format

property town: str

The clip’s location

Returns:

  • str - location of the sensor, one of {‘East 1’, ‘East 2’, ‘West 1’, ‘West 2’}

class soundata.datasets.singapura.Dataset(data_home=None)[source]

SINGA:PURA v1.0 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_annotation(*args, **kwargs)[source]

Load an annotation file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an annotation file

Returns:

  • annotations.MultiAnnotator - sound events with start time, end time, label and confidence

load_audio(*args, **kwargs)[source]

Load a Example audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

  • np.ndarray - the audio signal at 44.1 kHz

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.singapura.load_annotation(fhandle: TextIO) MultiAnnotator[source]

Load an annotation file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an annotation file

Returns:

  • annotations.MultiAnnotator - sound events with start time, end time, label and confidence

soundata.datasets.singapura.load_audio(fhandle)[source]

Load a Example audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

  • np.ndarray - the audio signal at 44.1 kHz

STARSS 2022

Sony-TAu Realistic Spatial Soundscapes (STARSS) 2022 Dataset Loader

class soundata.datasets.starss2022.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

STARSS 2022 Clip class :Parameters: clip_id (str) – id of the clip

Variables:
  • audio_path (str) – path to the audio file

  • csv_path (str) – path to the csv file

  • format (str) – whether the clip is in FOA or MIC format

  • set (str) – the data subset the clip belongs to (development or evaluation)

  • split (str) – the set slip the clip belongs to (training or test)

  • clip_id (str) – clip id

  • spatial_events (SpatialEvents) – sound events with time step, elevation, azimuth, distance, label, clip_number and confidence.

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio :returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

spatial_events

The clip’s event annotations :returns:

  • SpatialEvents with attributes
    • intervals (list): list of size n np.ndarrays of shape (m, 2), with intervals

      (as floats) in TIME_UNITS in the form [start_time, end_time]

    • intervals_unit (str): intervals unit, one of TIME_UNITS

    • time_step (int, float, or None): the time-step between events

    • elevations (list): list of size n with np.ndarrays with dtype int,

      indicating the elevation of the sound event per time_step.

    • elevations_unit (str): elevations unit, one of ELEVATIONS_UNITS

    • azimuths (list): list of size n with np.ndarrays with dtype int,

      indicating the azimuth of the sound event per time_step if moving

    • azimuths_unit (str): azimuths unit, one of AZIMUTHS_UNITS

    • distances (list): list of size n with np.ndarrays with dtype int,

      indicating the distance of the sound event per time_step if moving

    • distances_unit (str): distances unit, one of DISTANCES_UNITS

    • labels (list): list of event labels (as strings)

    • labels_unit (str): labels unit, one of LABELS_UNITS

    • clip_number_indices (list): list of clip number indices (as strings)

    • confidence (np.ndarray or None): array of confidence values

to_jams()[source]

Get the clip’s data in jams format :returns: jams.JAMS – the clip’s data in jams format

class soundata.datasets.starss2022.Dataset(data_home=None)[source]

The STARSS 2022 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a STARSS 2022 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file

  • sr (int or None) – sample rate for loaded audio, 24000 Hz by default.

  • If different from file’s sample rate it will be resampled on load.

  • Use None to load the file using its original sample rate (24000)

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.starss2022.load_audio(fhandle: BinaryIO, sr=24000) Tuple[numpy.ndarray, float][source]

Load a STARSS 2022 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file

  • sr (int or None) – sample rate for loaded audio, 24000 Hz by default.

  • If different from file’s sample rate it will be resampled on load.

  • Use None to load the file using its original sample rate (24000)

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

soundata.datasets.starss2022.load_spatialevents(fhandle: TextIO, dt=0.1) SpatialEvents[source]

Load a STARSS 2022 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to

the sound events annotation file

  • dt (float) – time step

Raises:

IOError – if fhandle doesn’t exist

Returns:

SpatialEvents – sound spatial events annotation data

TAU NIGENS SSE 2020

TAU NIGENS SSE 2020 Dataset Loader

class soundata.datasets.tau2020sse_nigens.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TAU NIGENS SSE 2020 Clip class :Parameters: clip_id (str) – id of the clip

Variables:
  • audio_path (str) – path to the audio file

  • tags (soundata.annotation.Tags) – tag

  • clip_id (str) – clip id

  • spatial_events (SpatialEvents) – sound events with time step, elevation, azimuth, distance, label, clip_number and confidence.

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio :returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

spatial_events

The clip’s event annotations

Returns:

  • SpatialEvents with attributes
    • intervals (list): list of size n np.ndarrays of shape (m, 2), with intervals

      (as floats) in TIME_UNITS in the form [start_time, end_time]

    • intervals_unit (str): intervals unit, one of TIME_UNITS

    • time_step (int, float, or None): the time-step between events

    • elevations (list): list of size n with np.ndarrays with dtype int,

      indicating the elevation of the sound event per time_step.

    • elevations_unit (str): elevations unit, one of ELEVATIONS_UNITS

    • azimuths (list): list of size n with np.ndarrays with dtype int,

      indicating the azimuth of the sound event per time_step if moving

    • azimuths_unit (str): azimuths unit, one of AZIMUTHS_UNITS

    • distances (list): list of size n with np.ndarrays with dtype int,

      indicating the distance of the sound event per time_step if moving

    • distances_unit (str): distances unit, one of DISTANCES_UNITS

    • labels (list): list of event labels (as strings)

    • labels_unit (str): labels unit, one of LABELS_UNITS

    • clip_number_indices (list): list of clip number indices (as strings)

    • confidence (np.ndarray or None): array of confidence values

to_jams()[source]

Get the clip’s data in jams format :returns: jams.JAMS – the clip’s data in jams format

class soundata.datasets.tau2020sse_nigens.Dataset(data_home=None)[source]

The TAU NIGENS SSE 2020 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TAU NIGENS SSE 2020 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file

  • sr (int or None) – sample rate for loaded audio, 24000 Hz by default.

  • If different from file’s sample rate it will be resampled on load.

  • Use None to load the file using its original sample rate (24000)

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.tau2020sse_nigens.load_audio(fhandle: BinaryIO, sr=24000) Tuple[numpy.ndarray, float][source]

Load a TAU NIGENS SSE 2020 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file

  • sr (int or None) – sample rate for loaded audio, 24000 Hz by default.

  • If different from file’s sample rate it will be resampled on load.

  • Use None to load the file using its original sample rate (24000)

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

soundata.datasets.tau2020sse_nigens.load_spatialevents(fhandle: TextIO, dt=0.1) SpatialEvents[source]

Load an TAU NIGENS SSE 2020 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to

the sound events annotation file

  • dt (float) – time step

Raises:

IOError – if txt_path doesn’t exist

Returns:

SpatialEvents – sound spatial events annotation data

TAU NIGENS SSE 2021

TAU NIGENS SSE 2021 Dataset Loader

class soundata.datasets.tau2021sse_nigens.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TAU NIGENS SSE 2021 Clip class :Parameters: clip_id (str) – id of the clip

Variables:
  • audio_path (str) – path to the audio file

  • tags (soundata.annotation.Tags) – tag

  • clip_id (str) – clip id

  • spatial_events (SpatialEvents) – sound events with time step, elevation, azimuth, distance, label, clip_number and confidence.

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio :returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

spatial_events

The clip’s event annotations :returns:

  • SpatialEvents with attributes
    • intervals (list): list of size n np.ndarrays of shape (m, 2), with intervals

      (as floats) in TIME_UNITS in the form [start_time, end_time]

    • intervals_unit (str): intervals unit, one of TIME_UNITS

    • time_step (int, float, or None): the time-step between events

    • elevations (list): list of size n with np.ndarrays with dtype int,

      indicating the elevation of the sound event per time_step.

    • elevations_unit (str): elevations unit, one of ELEVATIONS_UNITS

    • azimuths (list): list of size n with np.ndarrays with dtype int,

      indicating the azimuth of the sound event per time_step if moving

    • azimuths_unit (str): azimuths unit, one of AZIMUTHS_UNITS

    • distances (list): list of size n with np.ndarrays with dtype int,

      indicating the distance of the sound event per time_step if moving

    • distances_unit (str): distances unit, one of DISTANCES_UNITS

    • labels (list): list of event labels (as strings)

    • labels_unit (str): labels unit, one of LABELS_UNITS

    • clip_number_indices (list): list of clip number indices (as strings)

    • confidence (np.ndarray or None): array of confidence values

to_jams()[source]

Get the clip’s data in jams format :returns: jams.JAMS – the clip’s data in jams format

class soundata.datasets.tau2021sse_nigens.Dataset(data_home=None)[source]

The TAU NIGENS SSE 2021 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TAU NIGENS SSE 2021 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file

  • sr (int or None) – sample rate for loaded audio, 24000 Hz by default.

  • If different from file’s sample rate it will be resampled on load.

  • Use None to load the file using its original sample rate (24000)

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.tau2021sse_nigens.load_audio(fhandle: BinaryIO, sr=24000) Tuple[numpy.ndarray, float][source]

Load a TAU NIGENS SSE 2021 audio file. :Parameters: * fhandle (str or file-like) – path or file-like object pointing to an audio file

  • sr (int or None) – sample rate for loaded audio, 24000 Hz by default.

  • If different from file’s sample rate it will be resampled on load.

  • Use None to load the file using its original sample rate (24000)

Returns:

  • np.ndarray - the audio signal

  • float - The sample rate of the audio file

soundata.datasets.tau2021sse_nigens.load_spatialevents(fhandle: TextIO, dt=0.1) SpatialEvents[source]

Load an TAU NIGENS SSE 2021 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to

the sound events annotation file

  • dt (float) – time step

Raises:

IOError – if txt_path doesn’t exist

Returns:

SpatialEvents – sound spatial events annotation data

TAU Spatial Sound Events 2019

TAU SSE 2019 Dataset Loader

class soundata.datasets.tau2019sse.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TAU SSE 2019 Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • spatial_events (SpatialEvents) – sound events with start time, end time, elevation, azimuth, distance, label and confidence.

  • audio_path (str) – path to the audio file

  • set (str) – subset the clip belongs to (development or evaluation)

  • format (str) – whether the clip is in foa or mic format

  • clip_id (str) – clip id

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

spatial_events

The clip’s spatial events

Returns:

  • SpatialEvents class with attributes
    • intervals (np.ndarray): (n x 2) array of intervals

      (as floats) in seconds in the form [start_time, end_time] with positive time stamps and end_time >= start_time.

    • elevations (np.ndarray): (n,) array of elevations

    • azimuths (np.ndarray): (n,) array of azimuths

    • distances (np.ndarray): (n,) array of distances

    • labels (list): list of event labels (as strings)

    • confidence (np.ndarray or None): array of confidence values, float in [0, 1]

    • labels_unit (str): labels unit, one of LABELS_UNITS

    • intervals_unit (str): intervals unit, one of TIME_UNITS

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.tau2019sse.Dataset(data_home=None)[source]

The TAU SSE 2019 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TAU SSE 2019 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 48000 without resampling.

Returns:

  • np.ndarray - the multichannel audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

class soundata.datasets.tau2019sse.TAU2019_SpatialEvents(intervals, intervals_unit, elevations, elevations_unit, azimuths, azimuths_unit, distances, distances_unit, labels, labels_unit, confidence=None)[source]

TAU SSE 2019 Spatial Events

Variables:
  • intervals (np.ndarray) – (n x 2) array of intervals (as floats) in seconds in the form [start_time, end_time] with positive time stamps and end_time >= start_time.

  • elevations (np.ndarray) – (n,) array of elevations

  • azimuths (np.ndarray) – (n,) array of azimuths

  • distances (np.ndarray) – (n,) array of distances

  • labels (list) – list of event labels (as strings)

  • confidence (np.ndarray or None) – array of confidence values, float in [0, 1]

  • labels_unit (str) – labels unit, one of LABELS_UNITS

  • intervals_unit (str) – intervals unit, one of TIME_UNITS

soundata.datasets.tau2019sse.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a TAU SSE 2019 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 48000 without resampling.

Returns:

  • np.ndarray - the multichannel audio signal

  • float - The sample rate of the audio file

soundata.datasets.tau2019sse.load_spatialevents(fhandle: TextIO) TAU2019_SpatialEvents[source]

Load an TAU SSE 2019 annotation file :Parameters: fhandle (str or file-like) – File-like object or path to the sound events annotation file

Raises:

IOError – if csv_path doesn’t exist

Returns:

Events – sound events annotation data

soundata.datasets.tau2019sse.validate_locations(locations)[source]

Validate if TAU SSE 2019 locations are well-formed.

If locations is None, validation passes automatically

Parameters:

locations (np.ndarray) – (n x 3) array

Raises:

ValueError – if locations have an invalid shape or have cartesian coordinate values outside the expected ranges.

TAU Urban Acoustic Scenes 2019

TAU Urban Acoustic Scenes 2019 Loader

class soundata.datasets.tau2019uas.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TAU Urban Acoustic Scenes 2019 Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • city (str) – city were the audio signal was recorded

  • clip_id (str) – clip id

  • identifier (str) – identifier present in the metadata

  • split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4), leaderboard or evaluation

  • tags (soundata.annotations.Tags) – tag (scene label) of the clip + confidence.

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property city

The clip’s city.

Returns:

  • str - city were the audio signal was recorded

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property identifier

The clip’s identifier.

Returns:

  • str - identifier present in the metadata

property split

The clip’s split.

Returns:

** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4), leaderboard or evaluation

property tags

The clip’s tags.

Returns:

  • annotations.Tags - tag (scene label) of the clip + confidence.

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.tau2019uas.Dataset(data_home=None)[source]

The TAU Urban Acoustic Scenes 2019 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TAU Urban Acoustic Scenes 2019 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.tau2019uas.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a TAU Urban Acoustic Scenes 2019 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

TAU Urban Acoustic Scenes 2020 Mobile

TAU Urban Acoustic Scenes 2020 Mobile Loader

class soundata.datasets.tau2020uas_mobile.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TAU Urban Acoustic Scenes 2020 Mobile Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • city (str) – city were the audio signal was recorded

  • clip_id (str) – clip id

  • identifier (str) – the clip identifier

  • source_label (str) – source label

  • split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4) or evaluation

  • tags (soundata.annotations.Tags) – tag (label) of the clip + confidence

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property city

The clip’s city.

Returns:

  • str - city were the audio signal was recorded

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property identifier

The clip’s identifier.

Returns:

  • str - clip identifier

property source_label

The clip’s source label.

Returns:

  • str - source label

property split

The clip’s split.

Returns:

** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4) or evaluation

property tags

The clip’s tags.

Returns:

  • annotations.Tags - tag (label) of the clip + confidence

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.tau2020uas_mobile.Dataset(data_home=None)[source]

The TAU Urban Acoustic Scenes 2020 Mobile dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TAU Urban Acoustic Scenes 2020 Mobile audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.tau2020uas_mobile.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a TAU Urban Acoustic Scenes 2020 Mobile audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

TAU Urban Acoustic Scenes 2022 Mobile

TAU Urban Acoustic Scenes 2022 Mobile Loader

class soundata.datasets.tau2022uas_mobile.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TAU Urban Acoustic Scenes 2022 Mobile Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • city (str) – city were the audio signal was recorded

  • clip_id (str) – clip id

  • identifier (str) – the clip identifier

  • source_label (str) – source label

  • split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4) or evaluation

  • tags (soundata.annotations.Tags) – tag (label) of the clip + confidence

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property city

The clip’s city.

Returns:

  • str - city were the audio signal was recorded

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property identifier

The clip’s identifier.

Returns:

  • str - clip identifier

property source_label

The clip’s source label.

Returns:

  • str - source label

property split

The clip’s split.

Returns:

** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4) or evaluation

property tags

The clip’s tags.

Returns:

  • annotations.Tags - tag (label) of the clip + confidence

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.tau2022uas_mobile.Dataset(data_home=None)[source]

The TAU Urban Acoustic Scenes 2022 Mobile dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TAU Urban Acoustic Scenes 2022 Mobile audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.tau2022uas_mobile.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a TAU Urban Acoustic Scenes 2022 Mobile audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

TUT Sound events 2017

TUT Sound events 2017 Dataset Loader

class soundata.datasets.tut2017se.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

TUT Sound events 2017 Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • annotations_path (str) – path to the annotations file

  • clip_id (str) – clip id

  • events (soundata.annotations.Events) – sound events with start time, end time, label and confidence

  • non_verified_annotations_path (str) – path to the non-verified annotations file

  • non_verified_events (soundata.annotations.Events) – non-verified sound events with start time, end time, label and confidence

  • split (str) – subset the clip belongs to (for experiments): development (fold1, fold2, fold3, fold4) or evaluation

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

events

The clip’s events.

Returns:

  • annotations.Events - sound events with start time, end time, label and confidence

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

non_verified_events

The clip’s non verified events path.

Returns:

  • str - path to the non-verified annotations file

property split

The clip’s split.

Returns:

** str - subset the clip belongs to (for experiments)* – development (fold1, fold2, fold3, fold4) or evaluation

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.tut2017se.Dataset(data_home=None)[source]

The TUT Sound events 2017 dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a TUT Sound events 2017 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the stereo audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

load_events(*args, **kwargs)[source]

Load an TUT Sound events 2017 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to the sound

  • events annotation file

Returns:

Events – sound events annotation data

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.tut2017se.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a TUT Sound events 2017 audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the stereo audio signal

  • float - The sample rate of the audio file

soundata.datasets.tut2017se.load_events(fhandle: TextIO) Events[source]

Load an TUT Sound events 2017 annotation file :Parameters: * fhandle (str or file-like) – File-like object or path to the sound

  • events annotation file

Returns:

Events – sound events annotation data

URBAN-SED

URBAN-SED Dataset Loader

class soundata.datasets.urbansed.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

URBAN-SED Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • clip_id (str) – clip id

  • events (soundata.annotations.Events) – sound events with start time, end time, label and confidence

  • split (str) – subset the clip belongs to (for experiments): train, validate, or test

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

events

The audio events

Returns
  • annotations.Events - audio event object

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property split

The data splits (e.g. train)

Returns
  • str - split

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.urbansed.Dataset(data_home=None)[source]

The URBAN-SED dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a UrbanSound8K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.urbansed.load_audio(fhandle: BinaryIO, sr=None) Tuple[numpy.ndarray, float][source]

Load a UrbanSound8K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, None by default, which uses the file’s original sample rate of 44100 without resampling.

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

soundata.datasets.urbansed.load_events(fhandle: TextIO) Events[source]

Load an URBAN-SED sound events annotation file :Parameters: fhandle (str or file-like) – File-like object or path to the sound events annotation file

Raises:

IOError – if txt_path doesn’t exist

Returns:

Events – sound events annotation data

UrbanSound8K

UrbanSound8K Dataset Loader

class soundata.datasets.urbansound8k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

urbansound8k Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • class_id (int) – integer representation of the class label (0-9). See Dataset Info in the documentation for mapping

  • class_label (str) – string class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music

  • clip_id (str) – clip id

  • fold (int) – fold number (1-10) to which this clip is allocated. Use these folds for cross validation

  • freesound_end_time (float) – end time in seconds of the clip in the original freesound recording

  • freesound_id (str) – ID of the freesound.org recording from which this clip was taken

  • freesound_start_time (float) – start time in seconds of the clip in the original freesound recording

  • salience (int) – annotator estimate of class sailence in the clip: 1 = foreground, 2 = background

  • slice_file_name (str) – The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav Please see the Dataset Info in the soundata documentation for further details

  • tags (soundata.annotations.Tags) – tag (label) of the clip + confidence. In UrbanSound8K every clip has one tag

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

property class_id

The clip’s class id.

Returns:

  • int - integer representation of the class label (0-9). See Dataset Info in the documentation for mapping

property class_label

The clip’s class label.

Returns:

** str - string class name* – air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music

property fold

The clip’s fold.

Returns:

  • int - fold number (1-10) to which this clip is allocated. Use these folds for cross validation

property freesound_end_time

The clip’s end time in Freesound.

Returns:

  • float - end time in seconds of the clip in the original freesound recording

property freesound_id

The clip’s Freesound ID.

Returns:

  • str - ID of the freesound.org recording from which this clip was taken

property freesound_start_time

The clip’s start time in Freesound.

Returns:

  • float - start time in seconds of the clip in the original freesound recording

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property salience

The clip’s salience.

Returns:

** int - annotator estimate of class sailence in the clip* – 1 = foreground, 2 = background

property slice_file_name

The clip’s slice filename.

Returns:

** str - The name of the audio file. The name takes the following format* – [fsID]-[classID]-[occurrenceID]-[sliceID].wav

property tags

The clip’s tags.

Returns:

  • annotations.Tags - tag (label) of the clip + confidence. In UrbanSound8K every clip has one tag

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.urbansound8k.Dataset(data_home=None)[source]

The urbansound8k dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a UrbanSound8K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.urbansound8k.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float][source]

Load a UrbanSound8K audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

Warblrb10k

Warblrb10k Dataset Loader

class soundata.datasets.warblrb10k.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

warblrb10k Clip class

Parameters:

clip_id (str) – id of the clip

Variables:
  • audio (np.ndarray, float) – path to the audio file

  • audio_path (str) – path to the audio file

  • item_id (str) – clip id

  • has_bird (str) – indication of whether the clips contains bird sounds (0/1)

property audio: Optional[Tuple[numpy.ndarray, float]]

The clip’s audio

Returns:

  • np.ndarray - audio signal

  • float - sample rate

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

property has_bird

The flag to tell whether the clip has bird sound or not.

Returns:

  • str - 1/0 depending on whether the clip contains bird sound

property item_id

The clip’s item ID.

Returns:

  • str - ID of the clip

to_jams()[source]

Get the clip’s data in jams format

Returns:

jams.JAMS – the clip’s data in jams format

class soundata.datasets.warblrb10k.Dataset(data_home=None)[source]

The Warblrb10k dataset

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_audio(*args, **kwargs)[source]

Load a Warblrb10k audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

soundata.datasets.warblrb10k.load_audio(fhandle: BinaryIO, sr=44100) Tuple[numpy.ndarray, float][source]

Load a Warblrb10k audio file.

Parameters:
  • fhandle (str or file-like) – File-like object or path to audio file

  • sr (int or None) – sample rate for loaded audio, 44100 Hz by default. If different from file’s sample rate it will be resampled on load. Use None to load the file using its original sample rate (sample rate varies from file to file).

Returns:

  • np.ndarray - the mono audio signal

  • float - The sample rate of the audio file

Core

Core soundata classes

class soundata.core.Clip(clip_id, data_home, dataset_name, index, metadata)[source]

Clip base class

See the docs for each dataset loader’s Clip class for details

__init__(clip_id, data_home, dataset_name, index, metadata)[source]

Clip init method. Sets boilerplate attributes, including:

  • clip_id

  • _dataset_name

  • _data_home

  • _clip_paths

  • _clip_metadata

Parameters:
  • clip_id (str) – clip id

  • data_home (str) – path where soundata will look for the dataset

  • dataset_name (str) – the identifier of the dataset

  • index (dict) – the dataset’s file index

  • metadata (function or None) – a function returning a dictionary of metadata or None

get_path(key)[source]

Get absolute path to clip audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

class soundata.core.ClipGroup(clipgroup_id, data_home, dataset_name, index, clip_class, metadata)[source]

ClipGroup class.

A clipgroup class is a collection of clip objects and their associated audio that can be mixed together. A clipgroup is itself a Clip, and can have its own associated audio (such as a mastered mix), its own metadata and its own annotations.

__init__(clipgroup_id, data_home, dataset_name, index, clip_class, metadata)[source]

Clipgroup init method. Sets boilerplate attributes, including:

  • clipgroup_id

  • _dataset_name

  • _data_home

  • _clipgroup_paths

  • _clipgroup_metadata

Parameters:
  • clipgroup_id (str) – clipgroup id

  • data_home (str) – path where soundata will look for the dataset

  • dataset_name (str) – the identifier of the dataset

  • index (dict) – the dataset’s file index

  • metadata (function or None) – a function returning a dictionary of metadata or None

property clip_audio_property

The clip’s audio property.

Returns:

get_mix()[source]

Create a linear mixture given a subset of clips.

Parameters:

clip_keys (list) – list of clip keys to mix together

Returns:

np.ndarray – mixture audio with shape (n_samples, n_channels)

get_path(key)[source]

Get absolute path to clipgroup audio and annotations. Returns None if the path in the index is None

Parameters:

key (string) – Index key of the audio or annotation type

Returns:

str or None – joined path string or None

get_random_target(n_clips=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of clips with random weights

Parameters:
  • n_clips (int or None) – number of clips to randomly mix. If None, uses all clips

  • min_weight (float) – minimum possible weight when mixing

  • max_weight (float) – maximum possible weight when mixing

Returns:

  • np.ndarray - mixture audio with shape (n_samples, n_channels)

  • list - list of keys of included clips

  • list - list of weights used to mix clips

get_target(clip_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of clips

Parameters:
  • clip_keys (list) – list of clip keys to mix together

  • weights (list or None) – list of positive scalars to be used in the average

  • average (bool) – if True, computes a weighted average of the clips if False, computes a weighted sum of the clips

  • enforce_length (bool) – If True, raises ValueError if the clips are not the same length. If False, pads audio with zeros to match the length of the longest clip

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the clips are not equal if enforce_length=True and lengths are not equal

class soundata.core.Dataset(data_home=None, name=None, clip_class=None, clipgroup_class=None, bibtex=None, remotes=None, download_info=None, license_info=None, custom_index_path=None)[source]

soundata Dataset class

Variables:
  • data_home (str) – path where soundata will look for the dataset

  • name (str) – the identifier of the dataset

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • readme (str) – information about the dataset

  • clip (function) – a function mapping a clip_id to a soundata.core.Clip

  • clipgroup (function) – a function mapping a clipgroup_id to a soundata.core.Clipgroup

__init__(data_home=None, name=None, clip_class=None, clipgroup_class=None, bibtex=None, remotes=None, download_info=None, license_info=None, custom_index_path=None)[source]

Dataset init method

Parameters:
  • data_home (str or None) – path where soundata will look for the dataset

  • name (str or None) – the identifier of the dataset

  • clip_class (soundata.core.Clip or None) – a Clip class

  • clipgroup_class (soundata.core.Clipgroup or None) – a Clipgroup class

  • bibtex (str or None) – dataset citation/s in bibtex format

  • remotes (dict or None) – data to be downloaded

  • download_info (str or None) – download instructions or caveats

  • license_info (str or None) – license of the dataset

  • custom_index_path (str or None) – overwrites the default index path for remote indexes

choice_clip()[source]

Choose a random clip

Returns:

Clip – a Clip object instantiated by a random clip_id

choice_clipgroup()[source]

Choose a random clipgroup

Returns:

Clipgroup – a Clipgroup object instantiated by a random clipgroup_id

cite()[source]

Print the reference

clip_ids[source]

Return clip ids

Returns:

list – A list of clip ids

clipgroup_ids[source]

Return clip ids

Returns:

list – A list of clip ids

property default_path

Get the default path for the dataset

Returns:

str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally print a message.

Parameters:
  • partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:
  • ValueError – if invalid keys are passed to partial_download

  • IOError – if a downloaded file’s checksum is different from expected

explore_dataset(clip_id=None)[source]

Explore the dataset for a given clip_id or a random clip if clip_id is None.

Parameters:

clip_id (str or None) – The identifier of the clip to explore. If None, a random clip will be chosen.

license()[source]

Print the license

load_clipgroups()[source]

Load all clipgroups in the dataset

Returns:

dict – {clipgroup_id: clipgroup data}

Raises:

NotImplementedError – If the dataset does not support Clipgroups

load_clips()[source]

Load all clips in the dataset

Returns:

dict – {clip_id: clip data}

Raises:

NotImplementedError – If the dataset does not support Clips

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

  • list - files in the index but are missing locally

  • list - files which have an invalid checksum

class soundata.core.cached_property(func)[source]

Cached propery decorator

A property that is only computed once per instance and then replaces itself with an ordinary attribute. Deleting the attribute resets the property. Source: https://github.com/bottlepy/bottle/commit/fa7733e075da0d790d809aa3d2f53071897e6f76

soundata.core.copy_docs(original)[source]

Decorator function to copy docs from one function to another

soundata.core.docstring_inherit(parent)[source]

Decorator function to inherit docstrings from the parent class.

Adds documented Attributes from the parent to the child docs.

Annotations

soundata annotation data types

soundata.annotations.AZIMUTH_UNITS = {'degrees': 'values in the interval [-360, 360]', 'radians': 'values in the interval [-2*pi, 2*pi]'}

Azimuth units

class soundata.annotations.Annotation[source]

Annotation base class

soundata.annotations.DISTANCE_UNITS = {'centimeters': 'centimeters', 'meters': 'meters', 'millimeters': 'millimeters'}

Distance units

soundata.annotations.ELEVATIONS_UNITS = {'degrees': 'degrees'}

position units

class soundata.annotations.Events(intervals, intervals_unit, labels, labels_unit, confidence=None, azimuth=None, azimuth_unit=None, elevation=None, elevation_unit=None, distance=None, distance_unit=None, cartesian_coord=None, cartesian_coord_unit=None)[source]

Events class

Variables:
  • intervals (np.ndarray) – (n x 2) array of intervals (as floats) in seconds in the form [start_time, end_time] with positive time stamps and end_time >= start_time.

  • labels (list) – list of event labels (as strings)

  • confidence (np.ndarray or None) – array of confidence values, float in [0, 1]

  • labels_unit (str) – labels unit, one of LABELS_UNITS

  • intervals_unit (str) – intervals unit, one of TIME_UNITS

  • azimuth (np.ndarray or None) – list of size n with np.ndarrays with dtype float, indicating the azimuth of the sound event. Values between -360 and 360 for degrees and between -2*pi, 2*pi for radians or None.

  • azimuth_unit (str) – azimuth unit, one of AZIMUTH_UNITS

  • elevation (np.ndarray or None) – list of size n with np.ndarrays with dtype float, indicating the elevation of the sound event. Values between -90 and 90 or None.

  • elevation_unit (str) – elevation unit, one of AZIMUTH_UNITS

  • distance (np.ndarray or None) – list of size n with np.ndarrays with dtype float, indicating the distance of the sound event. Values must be positive or None.

  • distance_unit (str) – distance unit, one of DISTANCE_UNITS

  • cartesian_coord (np.ndarray or None) –

  • cartesian_coord_unit (str) – cartesian_coord unit, one of DISTANCE_UNITS

soundata.annotations.LABEL_UNITS = {'open': 'no strict schema or units'}

Label units

class soundata.annotations.MultiAnnotator(annotators, annotations)[source]

Multiple annotator class. This class should be used for datasets with multiple annotators (e.g. multiple annotators per clip).

Variables:
  • annotators (list) – list with annotator ids

  • annotations (list) – list of annotations (e.g. [annotations.Tags, annotations.Tags]

class soundata.annotations.SpatialEvents(intervals, intervals_unit, elevations, elevations_unit, azimuths, azimuths_unit, distances, distances_unit, labels, labels_unit, clip_number_index=None, time_step=None, confidence=None)[source]

SpatialEvents class :ivar intervals: list of size n np.ndarrays of shape (m, 2), with intervals

(as floats) in TIME_UNITS in the form [start_time, end_time] with positive time stamps and end_time >= start_time. n is the number of sound events. m is the number of sounding instances for each sound event.

Variables:
  • intervals_unit (str) – intervals unit, one of TIME_UNITS

  • time_step (int, float, or None) – the time-step between events over time in intervals_unit

  • elevations (list) – list of size n with np.ndarrays with dtype int, indicating the elevation of the sound event per time_step if moving or a single value if static. Values between -90 and 90

  • elevations_unit (str) – elevations unit, one of ELEVATIONS_UNITS

  • azimuths (list) – list of size n with np.ndarrays with dtype int, indicating the azimuth of the sound event per time_step if moving or a single value if static. Values between -180 and 180

  • azimuths_unit (str) – azimuths unit, one of AZIMUTHS_UNITS

  • distances (list) – list of size n with np.ndarrays with dtype int, indicating the distance of the sound event per time_step if moving or a single value if static. Values must be positive or None

  • distances_unit (str) – distances unit, one of DISTANCES_UNITS

  • labels (list) – list of event labels (as strings)

  • labels_unit (str) – labels unit, one of LABELS_UNITS

  • clip_number_indices (list) – list of clip number indices (as strings)

  • confidence (np.ndarray or None) – array of confidence values, float in [0, 1]

soundata.annotations.TIME_UNITS = {'milliseconds': 'milliseconds', 'seconds': 'seconds'}

Time units

class soundata.annotations.Tags(labels, labels_unit, confidence=None)[source]

Tags class

Variables:
  • labels (list) – list of string tags

  • confidence (np.ndarray or None) – array of confidence values, float in [0, 1]

  • labels_unit (str) – labels unit, one of LABELS_UNITS

soundata.annotations.validate_array_like(array_like, expected_type, expected_dtype, check_child=False, none_allowed=False)[source]

Validate that array-like object is well formed If array_like is None, validation passes automatically. :Parameters: * array_like (array-like) – object to validate

  • expected_type (type) – expected type, either list or np.ndarray

  • expected_dtype (type) – expected dtype

  • check_child (bool) – if True, checks if all elements of array are children of expected_dtype

  • none_allowed (bool) – if True, allows array to be None

Raises:
  • TypeError – if type/dtype does not match expected_type/expected_dtype

  • ValueError – if array

soundata.annotations.validate_confidence(confidence)[source]

Validate if confidence is well-formed.

If confidence is None, validation passes automatically

Parameters:

confidence (np.ndarray) – an array of confidence values

Raises:

ValueError – if confidence are not between 0 and 1

soundata.annotations.validate_intervals(intervals)[source]

Validate if intervals are well-formed.

If intervals is None, validation passes automatically

Parameters:

intervals (np.ndarray) – (n x 2) array

Raises:
  • ValueError – if intervals have an invalid shape, have negative values

  • or if end times are smaller than start times.

soundata.annotations.validate_lengths_equal(array_list)[source]

Validate that arrays in list are equal in length

Some arrays may be None, and the validation for these are skipped.

Parameters:

array_list (list) – list of array-like objects

Raises:

ValueError – if arrays are not equal in length

soundata.annotations.validate_locations(locations)[source]

Validate if locations are well-formed. If locations is None, validation passes automatically :Parameters: locations (np.ndarray) – (n x 3) array

Raises:

ValueError – if locations have an invalid shape or have cartesian coordinate values outside the expected ranges.

soundata.annotations.validate_time_steps(time_step, locations, interval)[source]

Validate if timesteps are well-formed. If locations is None, validation passes automatically :Parameters: * time_step (float) – spacing between location steps

  • locations (np.ndarray) – (n x 3) array

  • interval (np.ndarray) – (n x 2) expected start and end time for the locations

Raises:

ValueError – if the number of locations does not match the number of time_steps that fit in the interval

soundata.annotations.validate_times(times)[source]

Validate if times are well-formed.

If times is None, validation passes automatically

Parameters:

times (np.ndarray) – an array of time stamps

Raises:

ValueError – if times have negative values or are non-increasing

soundata.annotations.validate_unit(unit, unit_values, allow_none=False)[source]

Validate that the given unit is one of the allowed unit values. :Parameters: * unit (str) – the unit name

  • unit_values (dict) – dictionary of possible unit values

  • allow_none (bool) – if true, allows unit=None to pass validation

Raises:

ValueError – If the given unit is not one of the allowed unit values

Advanced

soundata.validate

Utility functions for soundata

soundata.validate.log_message(message, verbose=True)[source]

Helper function to log message

Parameters:
  • message (str) – message to log

  • verbose (bool) – if false, the message is not logged

soundata.validate.md5(file_path)[source]

Get md5 hash of a file.

Parameters:

file_path (str) – File path

Returns:

str – md5 hash of data in file_path

soundata.validate.validate(local_path, checksum)[source]

Validate that a file exists and has the correct checksum

Parameters:
  • local_path (str) – file path

  • checksum (str) – md5 checksum

Returns:

  • bool - True if file exists

  • bool - True if checksum matches

soundata.validate.validate_files(file_dict, data_home, verbose)[source]

Validate files

Parameters:
  • file_dict (dict) – dictionary of file information

  • data_home (str) – path where the data lives

  • verbose (bool) – if True, show progress

Returns:

  • dict - missing files

  • dict - files with invalid checksums

soundata.validate.validate_index(dataset_index, data_home, verbose=True)[source]

Validate files in a dataset’s index

Parameters:
  • dataset_index (list) – dataset indices

  • data_home (str) – Local home path that the dataset is being stored

  • verbose (bool) – if true, prints validation status while running

Returns:

  • dict - file paths that are in the index but missing locally

  • dict - file paths with differing checksums

soundata.validate.validate_metadata(file_dict, data_home, verbose)[source]

Validate files

Parameters:
  • file_dict (dict) – dictionary of file information

  • data_home (str) – path where the data lives

  • verbose (bool) – if True, show progress

Returns:

  • dict - missing files

  • dict - files with invalid checksums

soundata.validate.validator(dataset_index, data_home, verbose=True)[source]

Checks the existence and validity of files stored locally with respect to the paths and file checksums stored in the reference index. Logs invalid checksums and missing files.

Parameters:
  • dataset_index (list) – dataset indices

  • data_home (str) – Local home path that the dataset is being stored

  • verbose (bool) – if True (default), prints missing and invalid files to stdout. Otherwise, this function is equivalent to validate_index.

Returns:

missing_files (list)

List of file paths that are in the dataset index

but missing locally.

invalid_checksums (list): List of file paths that file exists in the

dataset index but has a different checksum compare to the reference checksum.

soundata.download_utils

utilities for downloading from the web.

class soundata.download_utils.DownloadProgressBar(*_, **__)[source]

Wrap tqdm to show download progress

class soundata.download_utils.RemoteFileMetadata(filename, url, checksum, destination_dir=None, unpack_directories=None)[source]

The metadata for a remote file

Variables:
  • filename (str) – the remote file’s basename

  • url (str) – the remote file’s url

  • checksum (str) – the remote file’s md5 checksum

  • destination_dir (str or None) – the relative path for where to save the file

  • unpack_directories (list or None) – list of relative directories. For each directory the contents will be moved to destination_dir (or data_home if not provided)

soundata.download_utils.download_7z_file(tar_remote, save_dir, force_overwrite, cleanup)[source]

Download and untar a tar file.

Parameters:
  • tar_remote (RemoteFileMetadata) – Object containing download information

  • save_dir (str) – Path to save downloaded file

  • force_overwrite (bool) – If True, overwrites existing files

  • cleanup (bool) – If True, remove tarfile after untarring

soundata.download_utils.download_from_remote(remote, save_dir, force_overwrite)[source]

Download a remote dataset into path Fetch a dataset pointed by remote’s url, save into path using remote’s filename and ensure its integrity based on the MD5 Checksum of the downloaded file.

Adapted from scikit-learn’s sklearn.datasets.base._fetch_remote.

Parameters:
  • remote (RemoteFileMetadata) – Named tuple containing remote dataset meta information: url, filename and checksum

  • save_dir (str) – Directory to save the file to. Usually data_home

  • force_overwrite (bool) – If True, overwrite existing file with the downloaded file. If False, does not overwrite, but checks that checksum is consistent.

Returns:

str – Full path of the created file.

soundata.download_utils.download_multipart_zip(zip_remotes, save_dir, force_overwrite, cleanup)[source]

Download and unzip a multipart zip file.

Parameters:
  • zip_remotes (list) – A list of RemoteFileMetadata Objects containing download information

  • save_dir (str) – Path to save downloaded file

  • force_overwrite (bool) – If True, overwrites existing files

  • cleanup (bool) – If True, remove zipfile after unziping

soundata.download_utils.download_tar_file(tar_remote, save_dir, force_overwrite, cleanup)[source]

Download and untar a tar file.

Parameters:
  • tar_remote (RemoteFileMetadata) – Object containing download information

  • save_dir (str) – Path to save downloaded file

  • force_overwrite (bool) – If True, overwrites existing files

  • cleanup (bool) – If True, remove tarfile after untarring

soundata.download_utils.download_zip_file(zip_remote, save_dir, force_overwrite, cleanup)[source]

Download and unzip a zip file.

Parameters:
  • zip_remote (RemoteFileMetadata) – Object containing download information

  • save_dir (str) – Path to save downloaded file

  • force_overwrite (bool) – If True, overwrites existing files

  • cleanup (bool) – If True, remove zipfile after unziping

soundata.download_utils.downloader(save_dir, remotes=None, partial_download=None, info_message=None, force_overwrite=False, cleanup=False)[source]

Download data to save_dir and optionally log a message.

Parameters:
  • save_dir (str) – The directory to download the data

  • remotes (dict or None) – A dictionary of RemoteFileMetadata tuples of data in zip format. If an element of the dictionary is a list of RemoteFileMetadata,

    it is handled as a multipart zip file

    If None, there is no data to download

  • partial_download (list or None) – A list of keys to partially download the remote objects of the download dict. If None, all data is downloaded

  • info_message (str or None) – A string of info to log when this function is called. If None, no string is logged.

  • force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.

  • cleanup (bool) – Whether to delete the zip/tar file after extracting.

soundata.download_utils.extractall_unicode(zfile, out_dir)[source]

Extract all files inside a zip archive to a output directory.

In comparison to the zipfile, it checks for correct file name encoding

Parameters:
  • zfile (obj) – Zip file object created with zipfile.ZipFile

  • out_dir (str) – Output folder

soundata.download_utils.move_directory_contents(source_dir, target_dir)[source]

Move the contents of source_dir into target_dir, and delete source_dir

Parameters:
  • source_dir (str) – path to source directory

  • target_dir (str) – path to target directory

soundata.download_utils.un7z(sevenz_path, cleanup)[source]

Unzip a 7z file inside its current directory.

Parameters:
  • sevenz_path (str) – Path to the 7z file

  • cleanup (bool) – If True, remove 7z file after extraction

soundata.download_utils.untar(tar_path, cleanup)[source]

Untar a tar file inside it’s current directory.

Parameters:
  • tar_path (str) – Path to tar file

  • cleanup (bool) – If True, remove tarfile after untarring

soundata.download_utils.unzip(zip_path, cleanup)[source]

Unzip a zip file inside it’s current directory.

Parameters:
  • zip_path (str) – Path to zip file

  • cleanup (bool) – If True, remove zipfile after unzipping

soundata.jams_utils

Utilities for converting soundata Annotation classes to jams format.

soundata.jams_utils.events_to_jams(events, annotator=None, description=None)[source]

Convert events annotations into jams format.

Parameters:
  • events (annotations.Events) – events data object

  • annotator (str) – annotator id

  • description (str) – annotation description

Returns:

jams.Annotation – jams annotation object.

soundata.jams_utils.jams_converter(audio_path=None, spectrogram_path=None, metadata=None, tags=None, events=None)[source]

Convert annotations from a clip to JAMS format.

Parameters:
  • audio_path (str or None) – A path to the corresponding audio file, or None. If provided, the audio file will be read to compute the duration. If None, ‘duration’ must be a field in the metadata dictionary, or the resulting jam object will not validate.

  • spectrogram_path (str or None) – A path to the corresponding spectrum file, or None.

  • tags (annotations.Tags or annotations.MultiAnnotator or None) – An instance of annotations.Tags/annotations.MultiAnnotator describing the audio tags.

  • events (annotations.Events or annotations.MultiAnnotator or None) – An instance of annotations.Events/annotations.MultiAnnotator describing the sound events.

Returns:

jams.JAMS – A JAMS object containing the annotations.

soundata.jams_utils.multiannotator_to_jams(multiannot: MultiAnnotator, converter: Callable[[...], Annotation], **kwargs) List[jams.Annotation][source]

Convert tags annotations into jams format.

Parameters:
  • tags (annotations.MultiAnnotator) – MultiAnnotator object

  • converter (Callable[…, annotations.Annotation]) – a function that takes an annotation object, its annotator, (and other optional arguments), and return a jams annotation object

Returns:

List[jams.Annotation] – List of jams annotation objects.

soundata.jams_utils.tags_to_jams(tags, annotator=None, duration=0, namespace='tag_open', description=None)[source]

Convert tags annotations into jams format.

Parameters:
  • tags (annotations.Tags) – tags annotation object

  • annotator (str) – annotator id

  • namespace (str) – the jams-compatible tag namespace

  • description (str) – annotation description

Returns:

jams.Annotation – jams annotation object.