"""UrbanSound8K Dataset Loader
.. admonition:: Dataset Info
:class: dropdown
**UrbanSound8K**
**Created By:**
| Justin Salamon*^, Christopher Jacoby* and Juan Pablo Bello*
| * Music and Audio Research Lab (MARL), New York University, USA
| ^ Center for Urban Science and Progress (CUSP), New York University, USA
| https://urbansounddataset.weebly.com/
| https://steinhardt.nyu.edu/marl
| http://cusp.nyu.edu/
Version 1.0
*Description:*
This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn,
children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, and street_music. The classes are
drawn from the urban sound taxonomy described in the following article, which also includes a detailed description of
the dataset and how it was compiled:
.. code-block:: latex
J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research",
22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
All excerpts are taken from field recordings uploaded to www.freesound.org. The files are pre-sorted into ten folds
(folders named fold1-fold10) to help in the reproduction of and comparison with the automatic classification results
reported in the article above.
In addition to the sound excerpts, a CSV file containing metadata about each excerpt is also provided.
*Audio Files Included:*
8732 audio files of urban sounds (see description above) in WAV format. The sampling rate, bit depth, and number of
channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).
*Meta-data Files Included:*
UrbanSound8k.csv
This file contains meta-data information about every audio file in the dataset. This includes:
* slice_file_name:
The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav, where:
[fsID] = the Freesound ID of the recording from which this excerpt (slice) is taken
[classID] = a numeric identifier of the sound class (see description of classID below for further details)
[occurrenceID] = a numeric identifier to distinguish different occurrences of the sound within the original recording
[sliceID] = a numeric identifier to distinguish different slices taken from the same occurrence
* fsID:
The Freesound ID of the recording from which this excerpt (slice) is taken
* start
The start time of the slice in the original Freesound recording
* end:
The end time of slice in the original Freesound recording
* salience:
A (subjective) salience rating of the sound. 1 = foreground, 2 = background.
* fold:
The fold number (1-10) to which this file has been allocated.
* classID:
A numeric identifier of the sound class:
0 = air_conditioner
1 = car_horn
2 = children_playing
3 = dog_bark
4 = drilling
5 = engine_idling
6 = gun_shot
7 = jackhammer
8 = siren
9 = street_music
* class:
The class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer,
siren, street_music.
*Please Acknowledge EigenScape in Academic Research:*
When UrbanSound8K is used for academic research, we would highly appreciate it if scientific publications of works
partly based on the UrbanSound8K dataset cite the following publication:
.. code-block:: latex
J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research",
22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
The creation of this dataset was supported by a seed grant by NYU's Center for Urban Science and Progress (CUSP).
*Conditions of Use*
Dataset compiled by Justin Salamon, Christopher Jacoby and Juan Pablo Bello. All files are excerpts of recordings
uploaded to www.freesound.org. Please see FREESOUNDCREDITS.txt for an attribution list.
The UrbanSound8K dataset is offered free of charge for non-commercial use only under the terms of the Creative Commons
Attribution Noncommercial License (by-nc), version 3.0: http://creativecommons.org/licenses/by-nc/3.0/
The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including
without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or
completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, NYU is not
liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of
the UrbanSound8K dataset or any part of it.
*Feedback*
| Please help us improve UrbanSound8K by sending your feedback to: justin.salamon@nyu.edu
| In case of a problem report please include as many details as possible.
"""
import os
from typing import BinaryIO, Optional, TextIO, Tuple
import librosa
import numpy as np
import csv
from soundata import download_utils
from soundata import jams_utils
from soundata import core
from soundata import annotations
from soundata import io
BIBTEX = """
@inproceedings{Salamon:UrbanSound:ACMMM:14,
Address = {Orlando, FL, USA},
Author = {Salamon, J. and Jacoby, C. and Bello, J. P.},
Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)},
Month = {Nov.},
Pages = {1041--1044},
Title = {A Dataset and Taxonomy for Urban Sound Research},
Year = {2014}}
"""
REMOTES = {
"all": download_utils.RemoteFileMetadata(
filename="UrbanSound8K.tar.gz",
url="https://zenodo.org/record/1203745/files/UrbanSound8K.tar.gz?download=1",
checksum="9aa69802bbf37fb986f71ec1483a196e",
unpack_directories=["UrbanSound8K"],
)
}
LICENSE_INFO = "Creative Commons Attribution Non Commercial 4.0 International"
[docs]class Clip(core.Clip):
"""urbansound8k Clip class
Args:
clip_id (str): id of the clip
Attributes:
audio (np.ndarray, float): path to the audio file
audio_path (str): path to the audio file
class_id (int): integer representation of the class label (0-9). See Dataset Info in the documentation for mapping
class_label (str): string class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music
clip_id (str): clip id
fold (int): fold number (1-10) to which this clip is allocated. Use these folds for cross validation
freesound_end_time (float): end time in seconds of the clip in the original freesound recording
freesound_id (str): ID of the freesound.org recording from which this clip was taken
freesound_start_time (float): start time in seconds of the clip in the original freesound recording
salience (int): annotator estimate of class sailence in the clip: 1 = foreground, 2 = background
slice_file_name (str): The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav
Please see the Dataset Info in the soundata documentation for further details
tags (soundata.annotations.Tags): tag (label) of the clip + confidence. In UrbanSound8K every clip has one tag
"""
def __init__(self, clip_id, data_home, dataset_name, index, metadata):
super().__init__(clip_id, data_home, dataset_name, index, metadata)
self.audio_path = self.get_path("audio")
@property
def audio(self) -> Optional[Tuple[np.ndarray, float]]:
"""The clip's audio
Returns:
* np.ndarray - audio signal
* float - sample rate
"""
return load_audio(self.audio_path)
@property
def slice_file_name(self):
"""The clip's slice filename.
Returns:
* str - The name of the audio file. The name takes the following format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav
"""
return self._clip_metadata.get("slice_file_name")
@property
def freesound_id(self):
"""The clip's Freesound ID.
Returns:
* str - ID of the freesound.org recording from which this clip was taken
"""
return self._clip_metadata.get("freesound_id")
@property
def freesound_start_time(self):
"""The clip's start time in Freesound.
Returns:
* float - start time in seconds of the clip in the original freesound recording
"""
return self._clip_metadata.get("freesound_start_time")
@property
def freesound_end_time(self):
"""The clip's end time in Freesound.
Returns:
* float - end time in seconds of the clip in the original freesound recording
"""
return self._clip_metadata.get("freesound_end_time")
@property
def salience(self):
"""The clip's salience.
Returns:
* int - annotator estimate of class sailence in the clip: 1 = foreground, 2 = background
"""
return self._clip_metadata.get("salience")
@property
def fold(self):
"""The clip's fold.
Returns:
* int - fold number (1-10) to which this clip is allocated. Use these folds for cross validation
"""
return self._clip_metadata.get("fold")
@property
def class_id(self):
"""The clip's class id.
Returns:
* int - integer representation of the class label (0-9). See Dataset Info in the documentation for mapping
"""
return self._clip_metadata.get("class_id")
@property
def class_label(self):
"""The clip's class label.
Returns:
* str - string class name: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music
"""
return self._clip_metadata.get("class_label")
@property
def tags(self):
"""The clip's tags.
Returns:
* annotations.Tags - tag (label) of the clip + confidence. In UrbanSound8K every clip has one tag
"""
return annotations.Tags(
[self._clip_metadata.get("class_label")], "open", np.array([1.0])
)
[docs] def to_jams(self):
"""Get the clip's data in jams format
Returns:
jams.JAMS: the clip's data in jams format
"""
return jams_utils.jams_converter(
audio_path=self.audio_path, tags=self.tags, metadata=self._clip_metadata
)
[docs]@io.coerce_to_bytes_io
def load_audio(fhandle: BinaryIO, sr=44100) -> Tuple[np.ndarray, float]:
"""Load a UrbanSound8K audio file.
Args:
fhandle (str or file-like): File-like object or path to audio file
sr (int or None): sample rate for loaded audio, 44100 Hz by default.
If different from file's sample rate it will be resampled on load.
Use None to load the file using its original sample rate (sample rate
varies from file to file).
Returns:
* np.ndarray - the mono audio signal
* float - The sample rate of the audio file
"""
audio, sr = librosa.load(fhandle, sr=sr, mono=True)
return audio, sr
[docs]@core.docstring_inherit(core.Dataset)
class Dataset(core.Dataset):
"""
The urbansound8k dataset
"""
def __init__(self, data_home=None):
super().__init__(
data_home,
name="urbansound8k",
clip_class=Clip,
bibtex=BIBTEX,
remotes=REMOTES,
license_info=LICENSE_INFO,
)
[docs] @core.copy_docs(load_audio)
def load_audio(self, *args, **kwargs):
return load_audio(*args, **kwargs)
@core.cached_property
def _metadata(self):
metadata_path = os.path.join(self.data_home, "metadata", "UrbanSound8K.csv")
if not os.path.exists(metadata_path):
raise FileNotFoundError("Metadata not found. Did you run .download()?")
with open(metadata_path, "r") as fhandle:
reader = csv.reader(fhandle, delimiter=",")
raw_data = []
for line in reader:
if line[0] != "slice_file_name":
raw_data.append(line)
metadata_index = {}
for line in raw_data:
clip_id = line[0].replace(".wav", "")
metadata_index[clip_id] = {
"slice_file_name": line[0],
"freesound_id": line[1],
"freesound_start_time": float(line[2]),
"freesound_end_time": float(line[3]),
"salience": int(line[4]),
"fold": int(line[5]),
"class_id": int(line[6]),
"class_label": line[7],
}
return metadata_index