Codes used for MS2LDA+ data processing --- 1. R scripts ============= The `R` directory contains the codes to extract the MS1-MS2 peaklists from .mzXML/.mzML files. - **MS1MS2_MatrixGeneration.R** processes mzXML/mzML files and produces a list of MS1 & MS2 peaks. This is the main script to run to perform data conversion for MS2LDA+ from .mzXML and .mzML to peaklists in CSV format. Be sure to set the configuration files inside the `config` folder to point to the right location of the files. - **xcmsPeakPicking.R** is an additional script to perform peak picking in XCMS, producing peakML files. Used for differential analysis. - **mzMatch_process.R** performs the matrix of MS1 peaks across samples. Used for differential analysis. Requirements: install.packages('yaml') install.packages('gtools') source('https://bioconductor.org/biocLite.R') biocLite('xcms') biocLite('RMassBank') 2. Python scripts ============= Once the peaklist has been generated by the R scripts above, we can run LDA. The `Python` contains the necessary codes to do this. Specifically: - **run_lda.py** is the main script to call. Be sure to adjust the parameters pointing to the right location of the extracted peaklists there. - Equivalently, **run_lda.ipynb** performs the same function as the script above but in Juypyter notebook format. It also demonstrates some of the analysis plots that can be computed from MS2LDA+ output. Requirements: We recommend Anaconda Python, which comes with all the required packages installed. Othewrwise you can install the following packages yourself in your preferred Python distribution: numpy, scipy, pandas. 3. Stool Sample Analysis ======================== For the analysis of the stool samples, refer to the 'multifile_fecal_2e6.ipynb' Jupyter notebook inside the 'stool_analysis' folder. The notebook shows how a completely python-based mzML parser and feature extraction steps are performed, before MS2lDA+ analysis is performed.