![]() Without Typo=Yes, one could infer from the corpus that the correct plural form of the English noun cat is kats. This is important: it ensures that there is a unique mapping from lemma + part-of-speech tag + morphological features to the correct word form. Now the morphological features should include the feature Typo =Yes that marks the typo. On the other hand, LEMMA should use normalized spelling thus if the text says kats instead of cats, the lemma will be cat, not kat. The FORM field and the text attribute at the beginning of the sentence should always contain the form that really occurred in the original text. if one writes too instead of two in English, then we must decide that the author really wanted to say something else, and it may not be always obvious.) (If the result is another word of the language, e.g. The easiest type of error is a simple typo in a single word, especially if the result is a non-word. Mechanisms similar to typo handling could also be used to annotate historical corpora with historical spelling see below for more details. Technically they could be also applied to learner corpora, which are full of errors however, learner corpora usually require more thinking, and the main question is: Do we want to guess what the author would have written if they knew the language better, or do we want to approximate “the grammar in their head,” which is probably a mixture of the intended language and a language they know better? The recommendations on this page are designed with sporadic errors in mind. On the other hand, it is also desirable to mark such places as errors and to show the correct spelling, so that an application can hide bad sentences or present their correct version when necessary. In most situations it is desirable to preserve the error because taggers and parsers that learn their models from the data should learn how to deal with noisy input too. Sometimes the text underlying a UD treebank does not conform to canonical spelling or other grammatical rules of the language. Typos and Other Errors in Underlying Text ![]() Please consider enabling Javascript for this page to see the visualizations. It appears that you have Javascript disabled.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |