# Dataset 
This comprises the dataset and simulation code used for the paper
"Efficient human-machine control with asymmetric marginal reliability input devices", John Williamson, Melissa Quek, Iulia Popescu, Andrew Ramsay, Roderick Murray-Smith.

This is a collection of measurements of a human user using a simulated noisy binary switch to select targets using a feedback error-correting code. The dataset consists of measurements of the state of the experimental software during each experiment.

## Human-in-the-loop experiments
The data from the human-in-the-loop experiments is held in `data/`. The original, raw data is in `data/raw` and is a custom JSON format. The scripts in `src/preprocess` will convert that to a collection of CSV files with named columns. These are provided preprocessed for convenience.

## Simulations and visualisation
The code to generate the simulated data and to regenerate all of the figures used in the report from the original data are provided in `src/visualization`.

## File structure

File structures:

        data/                   # Data from human-in-the-loop experiments
            raw/                # The data, as captured during the trial, in a custom JSON-based format
                [id_n]/         # Data for one participant/condition combination (e.g. bjp_0 is participant bjp, condition 0)
                    trial.log   # The data from the experimental trial

            processed/          # Post-processed data, as a set of CSV files from raw/ by the src/build_features.py script
                questionnaire_processed.csv # Questionnaire results
                trials.csv                  # Log of state for each entire condition
                targets.csv                 # Log of state after each target acquired
                decisions.csv               # Log of state after each keypress
                trial_configs.csv           # The configuration of the experimental software in each condition

        src/
            preprocess/         # Scripts to convert the raw data into the processed data
            visualization/      # Scripts to generate *all* of the plots in the paper

Note: questionnaire data in `questionnaire_processed.csv` was collected but not used in the publication.

## Running scripts

### Requirements
Python, with the following packages are required to recreate the visualisations and/or reprocess the data from the raw format to CSV.
* python 3.6+
* numpy
* scipy
* pandas
* matplotlib
* papermill
* pyx

### Regenerating processed data
To re-generate the processed data:

    cd src/preprocess
    python build_features.py

This will recreate all of the CSV files in `processed/`.

### Recreating visualisations

    cd src/visualizations

To re-run all simulations and generate all figures

    python visualize.py

WARNING: This will take several hours. You can also regenerate some subsets of the figures:

* Just the experimental figures `python visualize_experimental.py`
* Just the illustrations `python visualize_examples.py`
* Just the simulations (SLOW!) `python visualize_simulation.py`