Preparing languages for natural language generation using Wikidata lexicographical data

Mahir Morshed

doi:10.7557/5.5949

Authors

Mahir Morshed

DOI:

https://doi.org/10.7557/5.5949

Keywords:

Arctic Knot Conference 2021, Wikidata, Natural language, Lexicographical data

Abstract

In the lead-up to the launch of Abstract Wikipedia, a sufficient body of linguistic information, based on which the text within for a given language can be generated, must be in place so that different sets of functions, some working with concepts and others turning these into word sequences, can work together to produce something natural in that language. To achieve that information body's development requires more thorough consideration of a number of linguistic aspects sooner rather than later.

This session will thus discuss aspects of language planning with respect to Wikidata lexicographical data and natural language generation, including the compositionality and manipulability of lexical units, the breadth and interconnectedness of units of meaning, and the treatment of variation among a language’s lects broadly construed. Special reference to the handling of each of these aspects for Bengali and those linguistic varieties often grouped with it will be presented.