You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pre-filter words whose non-diacrictized word-forms are not in the dictionary, before asking the model to do ADR. This way we can get more predictable results and error messages for Out-Of-Vocabulary words (OOV)
If the model sees a word like elerindodo, validate that this word's diacritic form exists in the dictionary and return an error message if it doesn't! This way, since the model doesn't know about elerindodo, it can just say so, rather than confuse the users by returning the "top probability word" which may be a random thing like aláǹtakùn!
The text was updated successfully, but these errors were encountered:
a word2vec/sentence2vec model might word really well here. For every entry(word/sentence) a user inputs, try to find the word in the model vocabulary. If it doesn't exist, either raise an error or get the closest word in the vocab. I suppose fasttext would work well here since it uses subword (ngram) sets.
The challenge here might just be the extra step.
Pre-filter words whose non-diacrictized word-forms are not in the dictionary, before asking the model to do ADR. This way we can get more predictable results and error messages for Out-Of-Vocabulary words (OOV)
If the model sees a word like
elerindodo
, validate that this word's diacritic form exists in the dictionary and return an error message if it doesn't! This way, since the model doesn't know aboutelerindodo
, it can just say so, rather than confuse the users by returning the "top probability word" which may be a random thing likealáǹtakùn
!The text was updated successfully, but these errors were encountered: