Orthography-based dating and localisation of Middle Dutch charters

Dieter Van Uytvanck

doi:10.23728/b2share.b1092be3cd4844e0bffd7b669521ba3c

Published January 13, 2017 | Version v1

Text Open

Orthography-based dating and localisation of Middle Dutch charters

Dieter Van Uytvanck

In this study we build models for the localisation and dating of Middle Dutch charters. First, we extract character trigrams and use these to train a machine learner (K Nearest Neighbours) and an author verification algorithm (Linguistic Profiling). Both approaches work quite well, especially for the localisation task. Afterwards, an attempt is made to derive features that capture the orthographic variation between the charters more precisely. These are then used as input for the earlier tested classification algorithms. Again good results (at least as good as using the trigrams) are attained, even though proper nouns were ignored during the feature extraction. We can conclude that the localisation, and to a lesser extent the dating, is feasible. Moreover, the orthographic features we derive from the charters are an efficient basis for such a classification task. One file (PDF) contains the text of the master thesis, the other file (.tar.gz) contains all the used data sets and analysis scripts.

Files

c617828a-8357-11e3-8ef2-005056943408.pdf

Files (1.6 GB)

Name	Size	Download all
ab8d38f2-835b-11e3-b283-005056943408.gz Checksum: md5:1f1c9a68bc26574878c3114ef34ab3d3 PID: http://hdl.handle.net/11304/e9d8bf06-361c-4a45-8386-a98cf87f61a1	1.6 GB	Download
c617828a-8357-11e3-8ef2-005056943408.pdf Checksum: md5:74ae0376497172e07de55dfdec3b3acc PID: http://hdl.handle.net/11304/47e19ec5-f22f-4a00-a022-b9c16d5b804d	5.9 MB	Preview Download

Additional details

Other: 30
Other: http://hdl.handle.net/11304/31c0d886-b988-11e3-8cd7-14feb57d12b9
B2SHARE Legacy Record ID: b1092be3cd4844e0bffd7b669521ba3c

Language Code: dum
Resource Type: Other
Country/Region: Belgium and Netherlands

	All versions	This version
Views	134	134
Downloads	41	41
Downloaded data volume	23.7 GB	23.7 GB

Orthography-based dating and localisation of Middle Dutch charters

Files

c617828a-8357-11e3-8ef2-005056943408.pdf

Files (1.6 GB)

Additional details

Identifiers

CLARIN metadata

Orthography-based dating and localisation of Middle Dutch charters

Creators

Description

Files

c617828a-8357-11e3-8ef2-005056943408.pdf

Files (1.6 GB)

Additional details

Identifiers

CLARIN metadata