Not as rich as the original FoLiA document. It can act as input to your system as it contains all vital information, however, it is This JSON representation is automatically derived from theįoLiA documents and acts as a simplified format for this task to make it more accessible and not place an unnecessarily We deliver the trial set, the test set, and eventually the gold-standard reference in two formats: FoLiA In December, a full test set will be published in the same format. The validation set contains all of the spelling error categories listed below. The documents may contain zero, one, or more spelling errors. A validation set consisting of 50 Wikipedia articles will follow before the end of October. We initially deliver three annotated documents for validation purposes. This is in line with this FoLiA Set Definition. In parentheses are the class IDs for the error categories, this is how they should always be referred to in the data, non-word errors ( nonworderror), words that do not exist in Dutch:.archaic spelling ( archaic), outdated spelling:.Minister van Onderwijs → minister van Onderwijs.capitalisation errors ( capitalizationerror), incorrect use of capital letters:.redundant punctuation ( redundantpunctuation), redundant diacritical symbols and hyphenation marks (other cases of redundant punctuation are excluded from the task):.missing punctuation ( missingpunctuation), missing diacritical symbols and hyphenation marks (other cases of missing punctuation are excluded from the task):.redundant words ( redundantword), sentence is ungrammatical due to redundant elements:.samen met vrouw die → samen met de vrouw die.missing words ( missingword), sentence is ungrammatical due to missing elements:.runon errors ( runonerror), incorrect concatenation of words:.split errors ( spliterror), compound words which are incorrectly separated:. real-word confusions ( confusion), word is confused with a near neighbor (confusion with non-native spelling, homophony, grammatical errors, et cetera):.The corrections are evaluated in accordance with the Woordenlijst Nederlandse Taal ( ) and the Leidraad ( ).In case of officially accepted spelling variation or doubt about the correct spelling, all correct variants are accepted.The spelling errors do not have to be categorized into the categories that are listed below – only detected and corrected.Submitted spelling correctors will be evaluated for detection and correction of these – and only these – types of errors.In particular, this task addresses the detection and correction of the types of spelling errors listed in the next section. Wikipedia articles aim to be standard-Dutch texts, which may contain jargon. This shared task focuses on the detection and correction of spelling errors in Dutch Wikipedia texts. Although state of the art spell checkers perform reasonably well for everyday-life applications, reaching high accuracy remains to be a challenging task. This repository harbors the scripts for handling the data that is part of the CLIN28 shared task on spelling correction.Īutomatic spell checking and correction has been subject of research for decades. CLIN 2018 Shared Task: Spelling Correction Introduction
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |