Studi di archivistica, bibliografia, paleografia

Cracking the Historical Code

From Unstructured Correspondence Corpora to Computational Analysi

crossmark logo

Abstract

The chapter addresses a methodological approach to unstructured data and discusses the potential that structured data offers in the field of historical research. The dataset, which initially consists of textual content sourced from digital collections at the Portuguese Overseas Archives in Lisbon, undergoes a preprocessing phase that forms the basis for the extraction of structured data. The authors combine history, social sciences, and computer science to convert the correspondence repository into a machine‑processable form. This transformation is supported by an interdisciplinary strategy in which they weave together elements of effective content management, topic modelling, and social network analysis.


Open access | Peer reviewed

Presentato: 03 Ottobre 2023 | Accettato: 18 Gennaio 2024 | Pubblicato 22 Maggio 2025 | Lingua: en

Keywords Digital infrastructureColonial Portuguese EmpirePublic correspondenceStructured dataHistorical dataset


leggi questo capitolo