Enlighten Research Data

In this section

GISE-51

Yadav, S. and Foster, M. E. (2021) GISE-51. [Data Collection]

Collection description

GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at https://github.com/SarthakYadav/GISE-51-pytorch.

Citation

If you use the GISE-51 dataset and/or the released code, please cite our paper:

Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021
Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper:

Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.
About GISE-51 and GISE-51-Mixtures

The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper.

GISE-51

Three subsets: train, val and eval with 12465, 1716, and2176 utterances. Subsets are in coherence with the FSD50K release.
Encompasses 51 sound classes from the FSD50K release
View meta/lbl_map.csv for the complete vocabulary.
The dataset was obtained from FSD50K using the following steps:
Unsmearing annotations to obtain single instances with a single label using the provided metadata and ground truth in FSD50K.
Manual inspection to qualitatively evaluate shortlisted utterances.
Volume-threshold based automated silence filtering using sox. Different volume thresholds are selected for various sound event class bins using trial-and-error. silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/strip_silence_sox.py in the code repository.
Re-evaluate sound event classes, removing ones with too few samples and merging those with high inter-class ambiguity.
GISE-51-Mixtures

Synthetic 5-second soundscapes with up to 3 events created using Scaper.
Weighted sampling with replacement for sound event selection, effectively oversampling events with very few samples. Synthetic soundscapes generated thus have a near equal number of annotations per sound event.
The number of soundscapes in val and eval set is 10000 each.
The number of soundscapes in the final train set is 60000. We do provide training sets with 5k-100k soundscapes.
GISE-51-Mixtures is our proposed subset that can be used to benchmark the performance of future works.
LICENSE

All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.

GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz.

Baselines

Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at https://github.com/SarthakYadav/GISE-51-pytorch.

Files

GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz.

Keywords:

Audio dataset, Sound event recognition

College / School:

College of Science and Engineering > School of Computing Science

Date Deposited:

27 Aug 2024 10:04

URI:

https://researchdata.gla.ac.uk/id/eprint/1725