Data sets used for MS2LDA+ data processing --- Data sets used to test and validate MS2LDA+ were produced as follows: Samples: Urines: Urine samples from anonymized human volunteers were used from a clinical sample set in the Glasgow Polyomics archive. These samples were obtained as part of a trial for which ethical approval was applied for through the Multi-Centre Research and Ethics Committee (MREC), which was granted by the Scottish MREC and (with MREC N°06/MRE00/106). Informed consent was obtained from all individual study participants. Spot urine samples were obtained from the cohort of elderly patients upon their first admission in the clinic. A different subset as in [13] was chosen: urine extracts of 22 patients were selected as follows: diagnosed with stroke, administering a variety of drugs including a number of antihypertensives, and availability of the sample extract in the Glasgow Polyomics archive. The resulting subject’s age range spanned from 52 to 85; 13 were male, and 9 female. Beers: 10 mL samples of 18 different beers were collected from bottles over a period of 5 months and frozen immediately after sampling. One beer was sampled twice from different bottles. The following list specifies the beers: Beer 1 – Homebrew, a German wheat beer (Paul Simon, UK) Beer 2 – Glide Ale 4.6% (Jaw Brewery) Beer 3 – 7 Giraffes Extraordinary Ale 5.1% (Williams Brewery, UK) Beer 4 – Black Sheep Ale 4.4% (Black Sheep Brewery, UK) Beer 5 – Guinness original 4.2% (Guinness Brewery Dublin, Ireland) Beer 6 – Citra IPA 4.9% (Cambridgeshire, Mark&Spencer’s beer, UK) Beer 7 – Black Wolf Rok IPA 4.0% (Black Wolf Brewery, UK) Beer 8 – Innis&Gun Toasted Oak IPA 5.6% (Innis and Gun Brewery, UK) Beer 9 – Duvel Triple Hops 2015 9% (Duvel Moortgat Brewery, Belgium) Beer 10 – Bierkenners Amstel 5% (Amstel Brewery, NL) Beer 11 – Ceasur Augustus Lager IPA 4.1% (Williams Brewery, UK) Beer 12 – La Chouffe Blond 8% (Brasserie d'Achouffe, Belgium) Beer 13 – Duvel Belgisch Blond 8.5% (Duvel Moortgat Brewery, Belgium) Beer 14 – 7 Giraffes Extraordinary Ale 5.1% (Williams Brewery, UK) – different bottle as Beer 3 Beer 15 – Hobgoblin Traditional Craft Ale 5.2% (Wychwood Brewery, UK) Beer 16 – Bad Red IPA Evil Twin 6.8% (Heretic Brewery, USA) Beer 17 – Sierra Nevada Torpado Extra IPA 7.2% (Sierra Nevada Brewing Co., USA) Beer 18 – Homebrew, a hoppy IPA flavoured with with orange peel (Paul Simon, UK) Beer 19 – Cornish IPA 5% (Cambridgeshire, Mark&Spencer’s beer, UK) Stool samples: Stool samples originated from two children with active Crohn’s disease (9.2 and 12.9 years) who received disease induction treatment with exclusive enteral nutrition (EEN). Both children entered clinical remission and their faecal calprotectin, a marker of colonic inflammation decreased significantly at the end of their treatment. In total, five serial stool samples were collected per patient and a single one from two healthy controls (10.7 and 11.2 y). From CD children, a first sample was collected before EEN, three samples were collected during EEN (at ~15, 30 and 56 days), and a final sample was collected when patients returned to their habitual diet (~60 days after EEN cessation). Stool samples were collected within two hours of defaecation, homogenised with mechanical kneading immediately and aliquots were stored in – 80 C until further analysis. Carers and participants provided written informed consent and the study was approved by the local research ethics committee (Reference Number: 05/S0708/66). Methods used to produce the Raw data files: Sample preparation: A general metabolome extraction procedure was performed: (i) 5 µL urine was extracted in 200 µL chloroform/methanol/water (1:3:1) at 4 °C; (ii) then vortexed for 5 min at 4 °C; (iii) then centrifuged for 3 min (13,000 g) at 4 °C. The resulting supernatant was stored at −80 °C until analysis. A pooled aliquot of the 22 selected urine samples was prepared prior to the LC–MS runs with DDA applying higher collision dissociation (HCD). The same procedure was followed for the 19 beer samples. To create faecal extracts, the stool samples were freeze dried and 5 mg of lyophilised faecal material was extracted in 200 µL chloroform/methanol/water (1:3:1) at 4 °C; following by homogenisation in a FastPrep-24 homogeniser for 60 seconds at stroke setting 5, after which the same procedure as for urine was followed. Analytical platform: A Thermo Scientific Ultimate 3000 RSLCnano liquid chromatography system (Thermo Scientific, CA, USA) was used. That system was coupled to a Thermo Scientific Q-Exactive Orbitrap mass spectrometer equipped with a HESI II interface (Thermo Scientific, Hemel Hempstead, UK). Thermo Xcalibur Tune software (version 2.5) was used for instrument control and data acquisition. LC settings: The HILIC separation was performed with a SeQuant ZIC-pHILIC column (150 × 4.6 mm, 5 µm) equipped with the corresponding pre-column (Merck KGaA, Darmstadt, Germany). A linear biphasic LC gradient was conducted from 80 % B to 20 % B over 15 min, followed by a 2-min wash with 5 % B, and 7 min re-equilibration with 80 % B, where solvent B is acetonitrile and solvent A is 20 mM ammonium carbonate in water. The flow rate was 300 μL/min, column temperature was maintained at 25 °C, injection volume was 10 μL, and samples were maintained at 4 °C in the autosampler. MS and MS/MS settings: MS and MS/MS settings used to generate separate mode fragmentation files: the duty cycles consisted of a full scan in positive-ionization mode, followed by a TopN data-dependent MS/MS (MS2) fragmentation event taking the 10 most abundant ion species not on the dynamic exclusion list. MS/MS fragmentation spectra were acquired using stepped higher collision dissociation combining 25.2, 60.0, and 94.8 normalized collision energies in one MS2 scan. In full-scan mode, the duty cycle consisted of two full-scan events alternating positive and negative ionization modes.