Pre-filter words whose diacrictic forms are not in the dictionary #15

ruohoruotsi · 2019-07-29T04:56:26Z

Pre-filter words whose non-diacrictized word-forms are not in the dictionary, before asking the model to do ADR. This way we can get more predictable results and error messages for Out-Of-Vocabulary words (OOV)

If the model sees a word like elerindodo, validate that this word's diacritic form exists in the dictionary and return an error message if it doesn't! This way, since the model doesn't know about elerindodo, it can just say so, rather than confuse the users by returning the "top probability word" which may be a random thing like aláǹtakùn!

The text was updated successfully, but these errors were encountered:

ruohoruotsi · 2019-07-29T18:27:41Z

@Olamyy nicely points out that

a word2vec/sentence2vec model might word really well here. For every entry(word/sentence) a user inputs, try to find the word in the model vocabulary. If it doesn't exist, either raise an error or get the closest word in the vocab. I suppose fasttext would work well here since it uses subword (ngram) sets.
The challenge here might just be the extra step.

ruohoruotsi added the bug Something isn't working label Jul 29, 2019

ruohoruotsi self-assigned this Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-filter words whose diacrictic forms are not in the dictionary #15

Pre-filter words whose diacrictic forms are not in the dictionary #15

ruohoruotsi commented Jul 29, 2019

ruohoruotsi commented Jul 29, 2019 •

edited

Loading

Pre-filter words whose diacrictic forms are not in the dictionary #15

Pre-filter words whose diacrictic forms are not in the dictionary #15

Comments

ruohoruotsi commented Jul 29, 2019

ruohoruotsi commented Jul 29, 2019 • edited Loading

ruohoruotsi commented Jul 29, 2019 •

edited

Loading