NB: The encoding adopted a small degree of automation through the use of a manually created dictionary used as a reference point for each token within the corpus. This means that errors are possible, particularly in cases where a word could have multiple meanings depending on context. For the purposes for which the dataset was designed, this impact is minimal but should