This dataset represents a standardised corpus of 21 manually encoded manuscripts from the CH tradition. The XML encoding enriches the original text with structural and grammatical metadata for each manuscript, contributing a stepping-stone for both manual and automated examination. It enables users to efficiently access the corpus at multiple levels of granularity and facilitates the fuller application of DH methods.