Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修正 養精蓄鋭 → 將 essay 入面有嘅詞彙補充入碼表 #31

Open
konzertnr9 opened this issue Mar 6, 2023 · 3 comments
Open
Assignees
Labels
詞條整理 New feature or request

Comments

@konzertnr9
Copy link
Contributor

應為 joeng5 zing1 cuk1 jeoi6

@laubonghaudoi laubonghaudoi added the 詞條整理 New feature or request label Mar 6, 2023
@laubonghaudoi
Copy link
Member

呢個問題嘅原因係碼表入面冇呢個詞而essay入面有,所以一個更通用嘅修復係將essay入面有但係碼表冇嘅詞都加入嚟。我會開始呢項工作。

@laubonghaudoi laubonghaudoi self-assigned this Mar 29, 2023
@laubonghaudoi laubonghaudoi changed the title 修正 養精蓄鋭 修正 養精蓄鋭 → 將 essay 入面有嘅詞彙補充入碼表 Mar 29, 2023
@laubonghaudoi
Copy link
Member

missed.txt
呢個係我抽取出嚟嘅,喺 essay-cantonese.txt 入面有但係碼表入面冇嘅詞語。跟住落嚟要做嘅係:

  1. 將所有詞分類,抽取出其中嘅「常用粵語詞」
  2. 將抽出嘅詞再分類,確定應該放入 dict 定係 phrase_fragment 定係其他
  3. 加粵拼
  4. 檢查用字係唔係都係 OpenCC 用字
  5. 加入上游詞表

@hfhchan
Copy link
Contributor

hfhchan commented Jan 1, 2024

「養精蓄銳,joeng5 zeng1 cuk1 jeoi6」有喎。
不過 downstream 個 essay 入面好多嘢係 upstream 冇,其實好多都係錯嘅,好似早排先 del 咗「張景軒」...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
詞條整理 New feature or request
Projects
Development

No branches or pull requests

3 participants