*** Purpose ***

This document describes the data sets and analysis scripts deposited in the University of Glasgow Enlighten Research Data repository under the following DOI:
http://dx.doi.org/10.5525/gla.researchdata.1287

These items were used for the data analysis presented in the paper "Evaluating the potential of whole-genome sequencing for tracing transmission routes in experimental infections and natural outbreaks of bovine respiratory syncytial virus" published in the journal Veterinary Research.

A link to the published paper and supplementary information (where most of the detailed methods can be found) are available at the University of Glasgow Enlighten Publications repository here, which in turn will be linked from here: http://dx.doi.org/10.5525/gla.researchdata.1287

Paul Johnson, University of Glasgow
2022-11-01

*** Contents ***

What follows is a list of all the scripts, input data files (with the exception of cattle herd latitude and longitude, which are confidential data), and outputs used for this analysis. For each analysis script, script name, inputs and outputs are listed. The purpose of each script is given in the header text of the script.

Script: BRSVtransmission_NGS_diversitypipeline_FINAL_V01.R
Inputs: 
   BRSV_NGS_project_metadata_2021-12-01_redacted.xlsx
   Folder refseqs, unzipped from refseqs.zip
   *.fastq.gz files for each experiment (A, B, C, D, F, G, O): paired-end Illumina MiSeq fastq files, available via https://www.ncbi.nlm.nih.gov/bioproject/PRJNA893434. Accession numbers for these files are also provided in Additional File 1.
Outputs:
   Consensus sequences and diversity data and plots for each sample in each experiment (A, B, C, D, F, G, O), in the folder NGSdata, which is zipped as NGSdata.zip.

Script: BRSVtransmission_NGS_diversity_FINAL_V01.R
Inputs: 
   BRSV_NGS_project_metadata_2021-12-01_redacted.xlsx
   BRSV_gene_positions.csv
   Folder NGSdata, unzipped from NGSdata.zip
   Folder refseqs, unzipped from refseqs.zip
   BRSVtransmission_Paper1_Map_cov20x.pdf
Outputs:
   Figure 2 (map + NJ phylogenetic trees: O.seq.njtree.plus.map.jpg)
   Figure 3 (within-sample diversity: DiversityPlot.pdf)
   Figure 4 (Cumulative Shannon along the genome: CumulativeDiversity.pdf)
   Figure 5 (Comparison of diversity in genes SH+G: GeneDiversityComparison.pdf)

Script: BRSVtransmission_MapPaper1_FINAL_V01.R
Inputs:
   BRSV_NGS_project_metadata_2021-12-01_redacted.xlsx
   gadm36_SWE_1_sp.rds (Sweden map data file, not included in here, but available via this link: https://www.canvas.umu.se/courses/5899/files/826800?verifier=MUdrvNCpmSgcdwxIDcehGrSyNpSnLu0g7gUleKVu&wrap=1)
Outputs:
   Figure 2A (BRSVtransmission_Paper1_Map_cov20x.pdf)
Note: This script requires latitude and longitude of herd locations, which have been redacted from the input data file due to confidentiality.

Script: BRSVtransmission_NGS_BALvsNS_FINAL_V02.R
Inputs:
   BRSV_NGS_project_metadata_2020-01-29_redacted.xlsx
Outputs:
   Figure 'Additional file 1H' (BALvsNS_fig.pdf)
   LMM analysis of association between coverage and both sampling method (BAL vs NS) and real-time PCR Ct value, exported manually from the R console.

Script: BRSVtransmission_NGS_ErrorThresh_FINAL_V02.R
Inputs:
   Folder ErrorThreshFiguresAndData, unzipped from ErrorThreshFiguresAndData.zip
Outputs:
   Figure 'Additional file 1I' (ErrorThreshFigure.pdf, found inside ErrorThreshFiguresAndData.zip)

Script: https://github.com/pcdjohnson/BB_bottleneck/blob/master/Bottleneck_size_estimation_exact.r
Inputs:
   BRSV_NGS_project_metadata_2021-12-01_redacted.xlsx
   Folder NGSdata, unzipped from NGSdata.zip
Outputs: 
   Data for Figure 6 (G_bottleneck_size.csv and O_bottleneck_size.csv)

Script: BRSVtransmission_NGScorrelation_FINAL_V02.R
Inputs:
   BRSV_NGS_project_metadata_2021-12-01_redacted.xlsx
   G_bottleneck_size.csv (estimated bottleneck sizes experiment G)
   O_bottleneck_size.csv (estimated bottleneck sizes experiment O)
   O_connections.csv (assumed approx. epidemiological distances between O samples)
Outputs:
   Figure 1 (Gtree.pdf)
   Data for Figure 6 and Table 'Additional file 1C' (G_bottleneck_size_outputcorrs.csv and O_bottleneck_size_outputcorrs.csv)
   Figure 6 (G_bottleneck_corr.pdf and O_bottleneck_corr.pdf, merged as G_and_O_bottleneck_corr.jpg)

Script: BRSVtransmission_NGS_badtrip_FINAL_V01.R
Note: This R script prepares inputs for and processes outputs from the BADTRIP v1.01 package (https://bitbucket.org/nicofmay/badtrip) for BEAST2.5.2 (https://github.com/CompEvol/beast2/releases/tag/v2.5.2)
Inputs:
   BRSV_NGS_project_metadata_2021-12-01_redacted.xlsx
   Folder NGSdata, unzipped from NGSdata.zip
Outputs:
   BADTRIP outputs a *.log file and a *.trees file. The *.log file contains the MCMC output which should be visualised to assess convergence. The *.trees file is the posterior distribution of the transmission tree. This is visualised by creating a * _tree_out_direct_transmissions.jpg image, from which Figures 'Additional file 1J' and 'Additional file 1K' were made, and from which the transmission probabilities in Table 4 were taken. These files are too large and too numerous to include here.
