SARA - A Collection of Sensitivity-Aware Relevance Assessments

McKechnie, J. and McDonald, G. (2023) SARA - A Collection of Sensitivity-Aware Relevance Assessments. [Data Collection]

Datacite DOI: 10.5281/zenodo.8006820

Collection description

SARA - A Collection of Sensitivity-Aware Relevance Assessments

Presented here is a collection of Sensitivity-Aware Relevance Assessments for the UC Berkely labelled subset of the Enron Email Collection. The Hearst [1] labelled version of the Enron Email Collection is a subset of the CMU collection that contains 1702 emails that were annotated as part of a class project at UC Berkley. Students in the Natural Language Processing course were tasked with annotating the emails as relevant or not relevant to 53 different categories. Therefore, the labelled version of the Enron email collection provides a rich taxonomy of labels which can be used for multiple definitions of sensitivity such as the Purely Personal and Personal but in a Professional Context. The categories that the emails are labelled for can be seen in [Table 1](#table-1). The files for the labelled version of the Enron Email Collection are available from the UC Berkely website.

We deploy a topic modelling approach to identify topical themes in the labelled Enron collection that serve as a basis for our information needs which are in turn used to gather queries and relevance assessments, the notebook for which is available here. Two separate crowdsourcing tasks are carried out in the development of SARA. Firstly, query formulations are crowdsourced to represent the information needs and, secondly, relevance assessments are crowdsourced for a pooled set of documents from the labelled Enron collection for each of the information needs.

The SARA Collection of Sensitivity-Aware Relevance Assessments is available through the popular ir_datasets library. More information can be found on the ir_datasets GitHub and website.

Keywords: Test collection, Sensitivity, Relevance, Information retrieval
College / School: College of Science and Engineering > School of Computing Science
Date Deposited: 26 Aug 2024 14:33
URI: https://researchdata.gla.ac.uk/id/eprint/1722

Available Files

There are no files for this dataset available to download.

Repository Staff Only: Update this record

McKechnie, J. and McDonald, G. (2023); SARA - A Collection of Sensitivity-Aware Relevance Assessments

Zenodo

DOI: 10.5281/zenodo.8006820

Retrieved: 2024-12-22