Character recognition models rely substantially on image datasets that maintain a balance of class samples. However, achieving a balance of classes is particularly challenging for ancient manuscript contexts as character instances may be significantly limited. In this paper, we present findings from a study that assess the efficacy of using synthetically generated character instances to augment an existing dataset of ancient Greek character images for use in machine learning models. We complement our model exploration by engaging professional papyrologists to better understand the practical opportunities afforded by synthetic instances. Our results suggest that synthetic instances improve model performance for limited character classes, and may have unexplored effects on character classes more generally. We also find that trained papyrologists are unable to distinguish between synthetic and non-synthetic images and regard synthetic instances as valuable assets for professional and educational contexts. We conclude by discussing the practical implications of our research.

Dataset Augmentation in Papyrology with Generative Models: A Study of Synthetic Ancient Greek Character Images / Swindall, Matthew I.; Player, Timothy; Keener, Ben; Williams, Alex C.; Brusuelas, James H.; Nicolardi, Federica; D'Angelo, Marzia; Vergara, Claudio; Mcosker, Michael; Wallin, John F.. - (2022), pp. 4973-4979. [10.24963/ijcai.2022/689]

Dataset Augmentation in Papyrology with Generative Models: A Study of Synthetic Ancient Greek Character Images

Federica Nicolardi;Marzia D'Angelo;
2022

Abstract

Character recognition models rely substantially on image datasets that maintain a balance of class samples. However, achieving a balance of classes is particularly challenging for ancient manuscript contexts as character instances may be significantly limited. In this paper, we present findings from a study that assess the efficacy of using synthetically generated character instances to augment an existing dataset of ancient Greek character images for use in machine learning models. We complement our model exploration by engaging professional papyrologists to better understand the practical opportunities afforded by synthetic instances. Our results suggest that synthetic instances improve model performance for limited character classes, and may have unexplored effects on character classes more generally. We also find that trained papyrologists are unable to distinguish between synthetic and non-synthetic images and regard synthetic instances as valuable assets for professional and educational contexts. We conclude by discussing the practical implications of our research.
2022
Dataset Augmentation in Papyrology with Generative Models: A Study of Synthetic Ancient Greek Character Images / Swindall, Matthew I.; Player, Timothy; Keener, Ben; Williams, Alex C.; Brusuelas, James H.; Nicolardi, Federica; D'Angelo, Marzia; Vergara, Claudio; Mcosker, Michael; Wallin, John F.. - (2022), pp. 4973-4979. [10.24963/ijcai.2022/689]
File in questo prodotto:
File Dimensione Formato  
2022_Swindall et alii Nicolardi JCAI-.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 2.45 MB
Formato Adobe PDF
2.45 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/895981
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact