Enlighten Research Data

In this section

Code for Synergising Multi-Modal Data: Unveiling Audio-Visual and Radar Insights

Ghadban, N., Hameed, H., Abbasi, Q. H. , Imran, M. , Cooper, J. and Le Kernec, J. (2025) Code for Synergising Multi-Modal Data: Unveiling Audio-Visual and Radar Insights. [Data Collection]

Datacite DOI: 10.5525/gla.researchdata.2111

Related Enlighten Publications

Artificial Intelligence and Radar-Enhanced Audio-Visual Speech Recognition for the Next Generation of Communication Technologies
Advancing Speech Recognition for the Hearing Impaired: Key Insights from the Language and Technology Workshop
Unlocking the Future: AI and Radar-Enhanced Audio-Visual Speech Recognition
Noise Reduction in Audio Visual Radar Speech Recognition System
Radar-enhanced multimodal speech recognition under occlusion and noise: methodology and experimental analysis

Collection description

The study uses a multimodal dataset comprising synchronized audio, video, and radar recordings collected to evaluate the effectiveness of audio–visual–radar (AVR) fusion in speech-recognition tasks. The dataset contains 800 labeled samples, produced by four fluent but non-native English speakers aged 28–40, each repeating ten phonetically challenging English words twenty times under controlled recording conditions.

Audio was recorded at 44.1 kHz/16-bit, video at 1080p/30 fps, and radar data were acquired using the XeThru X4M03 IR-UWB sensor positioned approximately 2 meters from the speaker to capture speech-related micro-Doppler articulatory signatures. The three modalities were automatically segmented and aligned using pretrained machine-learning models to ensure consistent synchronization across streams.

The purpose of this dataset is to serve as a resource for studying how radar sensing can enhance speech-recognition robustness under noise, occlusion, and other adverse conditions.

All audio–visual recordings contain identifiable human data and will therefore be shared only under controlled access to protect participant privacy.

Funding:

Engineering and Physical Sciences Research Council (EPSRC) [EP/T021020/1]

College / School:

College of Science and Engineering > School of Engineering
College of Science and Engineering > School of Engineering > Biomedical Engineering
College of Science and Engineering > School of Engineering > Electronics and Nanoscale Engineering
College of Science and Engineering > School of Engineering > Systems Power and Energy

Date Deposited:

25 Jun 2026 14:45

Related resources:

https://doi.org/10.5525/gla.researchdata.2109 [Research Data]

URI:

https://researchdata.gla.ac.uk/id/eprint/2111

Additional details

Available Files

Data

2111_code.zip

Visible to:	Anyone
File size:	5kB
License:	CC BY 4.0

Read me

readme.docx

Visible to:	Anyone
File size:	21kB
License:	CC BY 4.0

Repository Staff Only: Update this record

Cite this record

Ghadban, N., Hameed, H., Abbasi, Q. H. , Imran, M. , Cooper, J. and Le Kernec, J. (2025); Code for Synergising Multi-Modal Data: Unveiling Audio-Visual and Radar Insights

University of Glasgow

DOI: 10.5525/gla.researchdata.2111

Retrieved: 2026-07-17

Altmetric

Download Statistics

Downloads

Downloads per month over past year