Studi di archivistica, bibliografia, paleografia

Cracking the Historical Code

From Unstructured Correspondence Corpora to Computational Analysi

crossmark logo

Abstract
The chapter addresses a methodological approach to unstructured data and discusses the potential that structured data offers in the field of historical research. The dataset, which initially consists of textual content sourced from digital collections at the Portuguese Overseas Archives in Lisbon, undergoes a preprocessing phase that forms the basis for the extraction of structured data. The authors combine history, social sciences, and computer science to convert the correspondence repository into a machine‑processable form. This transformation is supported by an interdisciplinary strategy in which they weave together elements of effective content management, topic modelling, and social network analysis.


Open access | Peer reviewed

Submitted: Oct. 3, 2023 | Accepted: Jan. 18, 2024 | Published May 22, 2025 | Language: en

Keywords Digital infrastructureColonial Portuguese EmpirePublic correspondenceStructured dataHistorical dataset


read this chapter