Troubles on the Border
Marking Japanese Case-Particle Boundaries in Grammatical Annotation
abstract
This paper aims to determine the extent and effects of the phenomenon of inconsistent case-particle boundary marking in the grammatical annotation of Japanese. It is focused on establishing what represents ‘inconsistent boundary marking’, how it is dealt with in informational terms, what effect it has on communication, and why it should be avoided. To this purpose, I will first build a typology of the tokenization strategies in the grammatical annotation of Japanese. I will then individuate several forms of inconsistent boundary marking and, more in general, of poor grammatical annotation, and discuss them according to the types of inconsistency and their different epistemic effects.
Keywords: Tokenization • Japanese • Boundary symbols • Case-particles • Script conversion • Grammatical annotation