.. _tutorial:

###############
Getting started
###############

Installation
^^^^^^^^^^^^

To install Soundata simply do:

    .. code-block:: console

        pip install soundata

We recommend to do this inside a conda or virtual environment for reproducibility.

Soundata is easily imported into your Python code by:

.. code-block:: python

    import soundata


Initializing a dataset
^^^^^^^^^^^^^^^^^^^^^^

Print a list of all available dataset loaders by calling:

.. code-block:: python

    import soundata
    print(soundata.list_datasets())

To use a loader, (for example, ``urbansound8k``) you need to initialize it by calling:

.. code-block:: python

    import soundata
    dataset = soundata.initialize('urbansound8k', data_home='/choose/where/data/live')

You can specify the directory where the Soundata data is stored by passing a path to ``data_home``.

Soundata supports working with multiple dataset versions.
To see all available versions of a specific dataset, run ``soundata.list_dataset_versions('urbansound8k')``.
Use ``version`` parameter if you wish to use a version other than the default one.

.. code-block:: python

    import soundata
    dataset = soundata.initialize('urbansound8k', data_home='/choose/where/data/live', version="1.0")


Downloading a dataset
^^^^^^^^^^^^^^^^^^^^^

All dataset loaders in soundata have a ``download()`` function that allows the user to download:

* The :ref:`canonical <faq>` version of the dataset (when available).
* The dataset index, which indicates the list of clips in the dataset and the paths to audio and annotation files.

The index, which is considered part of the source files of Soundata, is specifically downloaded by running ``download(["index"])``.
Indexes will be directly stored in Soundata's indexes folder (``soundata/datasets/indexes``) whereas users can indicate where the dataset files will be stored via ``data_home``.

Downloading a dataset into the default folder
    In this first example, ``data_home`` is not specified. Thus, UrbanSound8K will be downloaded and retrieved from 
    the default folder, ``sound_datasets``, created in the user's root folder:

    .. code-block:: python

        import soundata
        dataset = soundata.initialize('urbansound8k')
        dataset.download()  # Dataset is downloaded into "sound_datasets" folder inside user's root folder

Downloading a dataset into a specified folder
    In the next example ``data_home`` is specified, so UrbanSound8K will be downloaded and retrieved from the specified location:

    .. code-block:: python

        dataset = soundata.initialize('urbansound8k', data_home='Users/johnsmith/Desktop')
        dataset.download()  # Dataset is downloaded to John Smith's desktop


Partially downloading a dataset
    The ``download()`` function allows to partially download a dataset. In other words, if applicable, the user can
    select which elements of the dataset they want to download. Each dataset has a ``REMOTES`` dictionary were all
    the available downloadable elements are listed.

    ``tau2019uas`` has different elements as seen in the ``REMOTES`` dictionary. You can specify a subset of these elements to
    download by passing the ``download()`` function a list of the ``REMOTES`` keys that we are interested in via the 
    ``partial_download`` variable.

    .. admonition:: Example REMOTES
        :class: dropdown

        .. code-block:: python

            REMOTES = {
            "development.audio.1": download_utils.RemoteFileMetadata(
                filename="TAU-urban-acoustic-scenes-2019-development.audio.1.zip",
                url="https://zenodo.org/record/2589280/files/TAU-urban-acoustic-scenes-2019-development.audio.1.zip?download=1",
                checksum="aca4ebfd9ed03d5f747d6ba8c24bc728",
            ),
            "development.audio.2": download_utils.RemoteFileMetadata(
                filename="TAU-urban-acoustic-scenes-2019-development.audio.2.zip",
                url="https://zenodo.org/record/2589280/files/TAU-urban-acoustic-scenes-2019-development.audio.2.zip?download=1",
                checksum="c4f170408ce77c8c70c532bf268d7be0",
            ),
            "development.audio.3": download_utils.RemoteFileMetadata(
                filename="TAU-urban-acoustic-scenes-2019-development.audio.3.zip",
                url="https://zenodo.org/record/2589280/files/TAU-urban-acoustic-scenes-2019-development.audio.3.zip?download=1",
                checksum="c7214a07211f10f3250290d05e72c37e",
            ),
            ....

    A partial download example for ``tau2019uas`` dataset could be:

    .. code-block:: python

        dataset = soundata.initialize('tau2019uas')
        dataset.download(partial_download=['development.audio.1', 'development.audio.2'])  # download only two remotes


Downloading a multipart dataset
    In some cases, datasets consist of multiple remote files that have to be extracted together locally to correctly recover the data.
    In those cases, remotes that need to be extracted together should be grouped in a list, so all the necessary files are downloaded at once
    (even in a partial download). An example of this is the `fsd50k` loader:

    .. admonition:: Example multipart REMOTES
        :class: dropdown

        .. code-block:: python

            REMOTES = {
                "FSD50K.dev_audio": [
                    download_utils.RemoteFileMetadata(
                        filename="FSD50K.dev_audio.zip",
                        url="https://zenodo.org/record/4060432/files/FSD50K.dev_audio.zip?download=1",
                        checksum="c480d119b8f7a7e32fdb58f3ea4d6c5a",
                    ),
                    download_utils.RemoteFileMetadata(
                        filename="FSD50K.dev_audio.z01",
                        url="https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z01?download=1",
                        checksum="faa7cf4cc076fc34a44a479a5ed862a3",
                    ),
                    download_utils.RemoteFileMetadata(
                        filename="FSD50K.dev_audio.z02",
                        url="https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z02?download=1",
                        checksum="8f9b66153e68571164fb1315d00bc7bc",
                    ),
                    download_utils.RemoteFileMetadata(
                        filename="FSD50K.dev_audio.z03",
                        url="https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z03?download=1",
                        checksum="1196ef47d267a993d30fa98af54b7159",
                    ),
                    download_utils.RemoteFileMetadata(
                        filename="FSD50K.dev_audio.z04",
                        url="https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z04?download=1",
                        checksum="d088ac4e11ba53daf9f7574c11cccac9",
                    ),
                    download_utils.RemoteFileMetadata(
                        filename="FSD50K.dev_audio.z05",
                        url="https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z05?download=1",
                        checksum="81356521aa159accd3c35de22da28c7f",
                    ),
                ],
                ...
                

Working with non-available datasets to openly download
    Some datasets are private, and therefore it is not possible to directly retrieve them from an online repository.
    In those cases, the download function will only download the index file, and if available, the dataset parts that are not private (for some cases, the annotations are available but not the audio).
    The user will have to gather the private data themselves, store it in the preferred ``data_home`` location, and then initialize the dataset as usual, indicating the data location in the ``data_home`` parameter.


    .. note::
        Private datasets may be available to the public upon request. If you are interested in a dataset that is not openly available, please contact the dataset authors or the dataset maintainers to request access.


Validating a dataset
^^^^^^^^^^^^^^^^^^^^

Using the ``validate()`` method you can ensure that the files in our local copy of a dataset are identical to the :ref:`canonical <faq>` version
of the dataset. The function computes the md5 checksum of every downloaded file to ensure it was downloaded correctly and isn't corrupted.

For big datasets: In future ``soundata`` versions, a random validation will be included. This improvement will reduce validation time for very big datasets.

Accessing annotations
^^^^^^^^^^^^^^^^^^^^^

You can choose a random clip from a dataset with the ``choice_clip()`` method.

.. admonition:: Example Index
    :class: dropdown

    .. code-block:: python

        dataset = soundata.initialize('urbansed')
        random_clip = dataset.choice_clip()
        print(random_clip)
        >>> Clip(
              audio_path="/Users/theuser/sound_datasets/urbansed/audio/test/soundscape_test_bimodal73.wav",
              clip_id="soundscape_test_bimodal73",
              jams_path="/Users/mf3734/sound_datasets/urbansed/annotations/test/soundscape_test_bimodal73.jams",
              txt_path="/Users/mf3734/sound_datasets/urbansed/annotations/test/soundscape_test_bimodal73.txt",
              audio: The clips audio
                        * np.ndarray - audio signal
                        * float - sample rate,
              events: The audio events
                        * annotations.Events - audio event object,
              split: The data splits (e.g. train)
                        * str - split,
            )


You can also access specific clips by id. The available clip ids can be accessed by doing ``dataset.clip_ids``.
In the next example we take the first clip id, and then we retrieve its ``tags``
annotation.

.. code-block:: python

    dataset = soundata.initialize('urbansound8k')
    ids = dataset.clip_ids  # the list of urbansound8k's clip ids
    clips = dataset.load_clips()  # Load all clips in the dataset
    example_clip = clips[ids[0]]  # Get the first clip

    # Accessing the clip's tags annotation
    example_tags = example_clip.tags
    print(example_tags)
    >>>> Tags(confidence, labels, labels_unit)
    print(example_tags.labels)
    >>>> ['children_playing']


You can also load a single clip without loading all clips in the dataset:

.. code-block:: python

    ids = dataset.clip_ids  # the list of urbansound8k's clip ids
    example_clip = dataset.clip(ids[0])  # load this particular clip
    example_tags = example_clip.tags  # Get the tags for the first clip


.. _Remote Data Example: 

Accessing data remotely
^^^^^^^^^^^^^^^^^^^^^^^

Annotations can also be accessed through ``load_*()`` methods which may be useful, for instance, when your data aren't available locally. 
If you specify the annotation's path, you can use the module's loading functions directly. Let's
see an example.

.. admonition:: Accessing annotations remotely example
    :class: dropdown

    .. code-block:: python

        # Load list of clip ids of the dataset
        ids = dataset.clip_ids

        # Load a single clip, specifying the remote location
        example_clip = dataset.clip(ids[0], data_home='remote/data/path')
        audio_path = example_clip.audio_path

        print(audio_path)
        >>> remote/data/path/audio/fold1/135776-2-0-49.wav
        print(os.path.exists(audio_path))
        >>> False

        # Write code here to download the remote path, e.g., to a temporary file.
        def my_downloader(remote_path):
            # the contents of this function will depend on where your data lives, and how permanently you
            # want the files to remain on your local machine. We point you to libraries handling common use cases below.
            # for data you would download via scp, you could use the [scp](https://pypi.org/project/scp/) library
            # for data on google drive, use [pydrive](https://pythonhosted.org/PyDrive/)
            # for data on google cloud storage use [google-cloud-storage](https://pypi.org/project/google-cloud-storage/)
            return local_path_to_downloaded_data

        # Get path to where your data live
        temp_path = my_downloader(audio_path)

        # Accessing the clip audio
        example_audio = dataset.load_audio(temp_path)


Annotation classes
^^^^^^^^^^^^^^^^^^

``soundata`` defines annotation-specific data classes such as `Tags` or `Events`. These data classes are meant to standardize the format for
all loaders, so you can use the same code with different datasets. The list and descriptions of available annotation classes can be found in :ref:`annotations`.

.. note:: These classes are standardized to the point that the data allow for it. In some cases where the dataset has
        its own idiosyncrasies, the classes may be extended e.g. adding a customize, uncommon attribute.

Iterating over datasets and annotations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In general, most datasets are a collection of clips, and in most cases each clip has an audio file along with annotations.

With the ``load_clips()`` method, all clips are loaded as a dictionary with the clip id as keys and
clip objects as values. The clip objects include their respective audio and annotations, which are lazy-loaded on access
to keep things speedy and memory efficient. 

.. code-block:: python

    dataset = soundata.initialize('urbansound8k')
    for key, clip in dataset.load_clips().items():
        print(key, clip.audio_path)
    >>>> soundscape_train_bimodal0 /Users/mf3734/sound_datasets/urbansed/audio/train/soundscape_train_bimodal0.wav
         .....


Alternatively, you can loop over the ``clip_ids`` list to directly access each clip in the dataset.

.. code-block:: python

    dataset = soundata.initialize('urbansound8k')
    for clip_id in dataset.clip_ids:
        print(clip_id, dataset.clip(clip_id).audio_path)
    >>>> soundscape_train_bimodal0 /Users/mf3734/sound_datasets/urbansed/audio/train/soundscape_train_bimodal0.wav
         .....


.. _Including soundata in your pipeline:

Including soundata in your pipeline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you wanted to use ``urbansound8k`` to evaluate the performance of an urban sound classifier,
(in our case, ``random_classifier``), and then split the scores based on the metadata, you could do the following:

.. admonition:: soundata usage example
    :class: dropdown

    .. code-block:: python

        import sed_eval
        import soundata
        import numpy as np
        from dcase_util.containers import MetaDataContainer, ProbabilityContainer

        def random_classifier(classes):
            return [np.random.random(1)[0] for c in classes]

        # Evaluate on the full dataset
        dataset = soundata.initialize('urbansound8k')
        scores = {}
        data = dataset.load_clips()

        classes = np.unique([c for _, clip_data in data.items() for c in clip_data.tags.labels])
        fold = 2  # Choose a fold to evaluate

        ref_tags, est_tags, est_tag_probs = [], [], []
        for id, clip in data.items():
            if clip.fold == 2:
                ref_tags.append({'filename': id, 'tags': clip.tags.labels[0]})  # Urbansound8k has one label per clip
                probs = random_classifier(classes)
                for c, p in zip(classes, probs):
                    est_tag_probs.append({'filename': id, 'label': c, 'probability': p},)
                    if p > 0.5:  # Detection threshold of 0.5
                        est_tags.append({'filename': id, 'tags': [c]})

        tag_evaluator = sed_eval.audio_tag.AudioTaggingMetrics(tags=MetaDataContainer(ref_tags).unique_tags)
        tag_evaluator.evaluate(
            reference_tag_list=MetaDataContainer(ref_tags),
            estimated_tag_list=MetaDataContainer(est_tags),
            estimated_tag_probabilities=ProbabilityContainer(est_tag_probs))


This is the result of the example above:

.. admonition:: Example result
    :class: dropdown

    .. code-block:: python

        print(tag_evaluator)
        >>> Audio tagging metrics
        ========================================
          Tags                              : 10
          Evaluated units                   : 888

          Overall metrics (micro-average)
          ======================================
          F-measure
            F-measure (F1)                  : 9.57 %
            Precision                       : 9.57 %
            Recall                          : 9.57 %
          Equal error rate
            Equal error rate (EER)          : 51.01 %

          Class-wise average metrics (macro-average)
          ======================================
          F-measure
            F-measure (F1)                  : 6.47 %
            Precision                       : 7.54 %
            Recall                          : 9.33 %
          Equal error rate
            Equal error rate (EER)          : 50.95 %

          Class-wise metrics
          ======================================
            Tag               | Nref        Nsys      | F-score     Pre         Rec       | EER
            ----------------- | ---------   --------- | ---------   ---------   --------- | ---------
            air_conditioner   | 100         419       | 19.3%       11.9        50.0      | 49.0%
            car_horn          | 42          227       | 4.5%        2.6         14.3      | 54.8%
            children_playing  | 100         126       | 9.7%        8.7         11.0      | 54.0%
            dog_bark          | 100         58        | 13.9%       19.0        11.0      | 47.1%
            drilling          | 100         31        | 9.2%        19.4        6.0       | 52.4%
            engine_idling     | 100         16        | 1.7%        6.2         1.0       | 50.0%
            gun_shot          | 35          7         | 0.0%        0.0         0.0       | 48.1%
            jackhammer        | 120         1         | 0.0%        0.0         0.0       | 52.5%
            siren             | 91          3         | 0.0%        0.0         0.0       | 51.6%
            street_music      | 100         0         | nan%        nan         0.0       | 50.0%


.. _Using soundata with tensorflow:

Using soundata with tensorflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following is a simple example of a generator that can be used to create a tensorflow Dataset.

.. admonition:: soundata with tf.data.Dataset example
    :class: dropdown

    .. code-block:: python

        import soundata
        import numpy as np
        import tensorflow as tf

        def data_generator(dataset_name):
            # using the default data_home
            dataset = soundata.initialize(dataset_name)
            ids = dataset.clip_ids()
            for clip_id in ids:
                clip = dataset.clip(clip_id)
                audio_signal, sample_rate = clip.audio
                yield {
                    "audio": audio_signal.astype(np.float32),
                    "sample_rate": sample_rate,
                    "label": clip.tags.labels[0],
                    "metadata": {"clip_id": clip.clip_id, "fold": clip.fold}
                }

        dataset = tf.data.Dataset.from_generator(
            data_generator('urbansound8k'),
            {
                "audio": tf.float32,
                "sample_rate": tf.float32,
                "label": tf.string,
                "metadata": {'clip_id': tf.string, 'fold': tf.string}
            }
        )


.. _Using soundata with pytorch:

Using soundata with pytorch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This example shows how to create a custom PyTorch Dataset class that loads audio data from Soundata.

.. admonition:: soundata with torch DataLoader
    :class: dropdown

    .. code-block:: python

        import soundata
        import torch
        from torch.utils.data import DataLoader, Dataset

        class SoundataTorchDataset(Dataset):
            """A PyTorch Dataset for loading audio data from Soundata"""
            def __init__(self, ds, split:str):
                self.dataset = ds
                
                # Filter clips by split
                self.clip_ids = [
                    clip_id for clip_id in self.dataset.clip_ids
                    if self.dataset.clip(clip_id).split == split
                ]
                
            def __len__(self):
                return len(self.clip_ids)

            def __getitem__(self, idx):
                clip = self.dataset.clip(self.clip_ids[idx])
                audio, sr = clip.audio
                
                audio_tensor = torch.tensor(audio.T, dtype=torch.float32)
                
                return audio_tensor, clip.captions

        # Initialize, download and validate the dataset
        dataset = soundata.initialize(dataset_name="dcase23_task6b")
        dataset.download()
        dataset.validate()

        # Pass the dataset to the custom dataset class specifying the split
        dev_dataset = SoundataTorchDataset(dataset, split='dev')

        def custom_collate(batch):
            """Custom collate function to handle variable-length sequences"""
            pass

        # Create a Torch DataLoader providing the dataset and a custom collate function
        dev_loader = DataLoader(
            dev_dataset,
            batch_size=32,
            shuffle=True,
            num_workers=4,
            collate_fn=custom_collate 
        )

        
Using soundata to explore dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``explore_dataset()`` function in ``soundata`` allows you to visualize various aspects of the dataset. This can be particularly useful for understanding the distribution of events and the nature of the audio data before proceeding with analysis or model training.

Using ``explore_dataset()`` to Visualize Data in Jupyter Notebook
-----------------------------------------------------------------

If you want to use the plot functionalities used in ``display_plot_utils.py`` you must install the optional dependencies too:

    .. code-block:: console

        pip install soundata"[plots]"

If you try to load the visualizations without the optional dependencies, you will be thrown an exception indicating that the dependencies are missing.
Please do install the optional dependencies using the command above in order to use the visualization functionalities.

.. note::
        If you encounter any error during the installation of ``simpleaudio``, please visit `simpleaudio installation <https://simpleaudio.readthedocs.io/en/latest/installation.html>`__ guide and check the dependencies.


To explore the dataset, first initialize it and then call the ``explore_dataset()`` method:

.. code-block:: python

    import soundata

    # Initialize the dataset
    dataset = soundata.initialize('urbansound8k', data_home='your_data_directory')

    # Explore the dataset
    dataset.explore_dataset()

When you run this function, an interface will appear with several options, allowing you to choose what to plot.

.. toggle:: dataset explorer

   .. image:: ../img/dataset_exp.png
       :alt: class dataset explorer
       :scale: 80%

Class Distribution
==================

Displays the distribution of different event classes in the dataset.

.. toggle:: class distribution plot example

   .. image:: ../img/class_dist.png
       :alt: class distribution plot example
       :scale: 50%


Statistics (Computational)
==========================

Provides computational statistics about the dataset (Time-consuming operation).

.. toggle:: statistics plot example

   .. image:: ../img/class_stat.png
       :alt: statistics plot example
       :scale: 50%


Audio Visualization
===================

Offers visualizations related to the audio data, such as waveforms or spectrograms.

.. toggle:: audio visualization plot example

   .. image:: ../img/audio_plot.png
       :alt: audio visualization plot example
       :scale: 50%


By using the ``explore_dataset()`` function, you can gain a comprehensive overview of the dataset's structure and content, which is crucial for effective analysis and model building.