Summer School “Biblical Argumentation in the Investiture Controversy: Computational Approaches to Biblical Text Analyses Summer School” (Burghausen, July 2025)

How did we proceed?

The work process was divided into several steps, each of which presented different challenges.

  1. Thanks to the secondary literature, selecting and identifying sources that could be used to analyse biblical argumentation proved relatively straightforward. The protagonists of the Investiture Controversy and the literary figures of the time provided more than enough sources.
  2. Searching for biblical passages from the Vulgate in the selected corpus of sources proved to be much more complex and, in some cases, more challenging. Ultimately, the Summer School decided to work with the regesta of Gregory VII as the basic text, as these seemed suitable due to their rich theological argumentation and clear use of the Bible. During the process, it became necessary to adjust the base texts several times, as the algorithm could not locate all the biblical passages in the register. This provided important information for future work. When working systematically with the methods used here, a revision of the basic texts is urgently needed.
  3. Another issue that needs careful consideration for the future is deciding the threshold above which the results of algorithmic processing still affect the outcome, and at what point they should no longer be taken into account. According to an initial assessment, this value ranges between 0.65 and 0.70 depending on the evaluation round. In this range, cautious review is recommended. Above this value, the algorithm is remarkably reliable. At this level, an initial critical review by the participating experts was useful, as assessing paraphrased renditions of Bible passages that could be identified by the algorithm required great sensitivity. During the process, it became apparent that significant further developments were necessary in the area of text analysis due to the relatively small amount of text. On the one hand, adding further analysis tools proved helpful in filtering out false positives. Furthermore, an analysis using sliding windows of word engrams was necessary to generate higher recognition reliability (sentence-level word engrams starting from the beginning, extending one by one, covering the whole sentence).
  4. The next step marked the beginning of the actual church historical and biblical work. Taking a close look at the context (four sentences before and after the quotation) made it possible to classify the structure of the argument. Based on this analysis, a significant proportion of Bible quotations were found to directly reference the Investiture Controversy or the self-image of the Reformation.
  5. A final step is urgently needed in the future. Thanks to the advanced state of digital cataloguing, it is now possible to compare the biblical passages found in theological arguments relating to the Investiture Controversy with their use in contemporary theological literature, such as homilies and commentaries. This would be an important step towards classifying the originality and uniqueness of the respective arguments, greatly facilitating their classification.

In the end, we decided to work with a preliminary list so that we could assess how well the algorithm ultimately performed in terms of recognition. After a good two days of work, we managed to sort out the false positives in the Bible passages in one example and establish an initial inventory in a sample work.

Computational Framework

For the text reuse detection task, we employed the pretrained LaBSE sentence transformer model , as available via HuggingFace (https://huggingface.co/sentence-transformers/LaBSE) and the sentence_transformers Python library.

We first used the model to encode all verses from Latin Vulgate. Subsequently, we applied the same model to encode all sentences from the Gregorius’ Register. This way we obtained ca. 768-dimensional vector for each Vulgate verse and each sentence from the Register.