Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jyutping Improvement #4

Open
hockyy opened this issue Jul 19, 2024 · 7 comments · May be fixed by #5
Open

Jyutping Improvement #4

hockyy opened this issue Jul 19, 2024 · 7 comments · May be fixed by #5

Comments

@hockyy
Copy link

hockyy commented Jul 19, 2024

I don't know how you farm those jyutping,

https://words.hk/faiman/analysis/wordslist.json
https://words.hk/faiman/analysis/charlist.json

but anyway, if you haven't included this method, I think you can try. I'm too lazy to code a new library so I will use your to-jyutping.

Just so if you wanna update the dictionary, you can parse all the words from there, for the tokenizer, we can use jieba

https://github.com/hockyy/jieba-cantonese

I've made a script to auto generate jieba user dict to tokenize, so querying jyutping per token can be better, if the result don't exist, fall back to per character jyutping

@hockyy
Copy link
Author

hockyy commented Jul 19, 2024

let me know if you need any help.

I'm currently developing this project https://github.com/hockyy/miteiru

@laubonghaudoi
Copy link
Member

@graphemecluster 據我所知粵典數據係一早就已經用咗嘅?而家嘅更新主要係用咗 Jon 嘅字型數據?

@graphemecluster
Copy link
Member

而家淨係用 Jon 嘅數據,但都肯定準過結巴分詞
@chaaklau 你覺得你粵典個 word list 標粵拼有冇用?

@graphemecluster
Copy link
Member

@hockyy The accuracy should reach more than 99% since our latest updates (JS/TS version 2.0.0 / Python version 0.3.0) a few days ago.

@hockyy
Copy link
Author

hockyy commented Jul 19, 2024

ack ack okk thank you info

@hockyy
Copy link
Author

hockyy commented Jul 19, 2024

image

btw 呢個import唔到

@hockyy
Copy link
Author

hockyy commented Jul 19, 2024

我聽日debug啊好眼瞓😪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants