Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Per-extension language preferences #641

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

hanguokai
Copy link
Member

This proposal has been discussed a lot in #258 and is supportive by all browsers. This is the formal version of the proposal. I followed the latest proposal template.

Welcome everyone to review this proposal. I suggest focusing on the main issues first, followed by minor issues, and finally fixing textual typos.

This first version of this proposal.
@xPaw
Copy link

xPaw commented Jun 18, 2024

Looks good to me, although I would like this proposal to also cover a getMessage overload (or separate method) to get a message in a specific language.

Copy link
Member

@oliverdunk oliverdunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jackie! I just left some initial thoughts :)

* Get all languages that supported by this extension.
* return a Promise resolved with an array of language tags.
*/
i18n.getAllLanguages(): Promise<string[]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this one makes sense, given that extensions should already know the languages they support. I think this gets tricky in certain situations (partial strings, parent language tags) so I wonder if we should leave this out. I see that having it provides some convenience but I'm not sure it's essential for the MVP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extensions should already know the languages they support

Yes. It is just a convenient method. And I think it is easy to implement.

in certain situations (partial strings, parent language tags)

Languages with regional subtags are not a problem for users. For example, a language selection menu that includes French, French (Canada), French (Belgium) and French (France). This method should return all languages if they are there.

For partial strings, from the platform's standpoint, the platform does not need to consider these, but assumes that all these languages are supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverdunk That is to say the browser per definition supports all the language tags specified in the extension. I suggest we change the method name to i18n.getAvailableLanguages to let the browser return all the languages the extension can call setCurrentLanguage to.

It is not just out of convenience. For example, for the Whale Store I have to exclude language tags with script variations. (zh-Hant, sr-Latn). Thus those are removed from _locales. Having to specify a list of all supported languages for each browser and store in every area of the extension this is used is not as straight forward as it might sound.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to know the difference between a locale and localization here. There can be many locales supported by a single localization. This API seems to want to return the list of available localizations, but users typically need to choose between the list of available locales. The locale can be used do to internationalized formatting/processing (such as in MessageFormat or using Intl), which provides a better-adapted, richer localized experience than merely translating the static messages.

For example, many applications come in just two varieties of English (US vs. UK/International English), but support many locales. The language negotiation/resource fallback (such as used by getMessage) takes care of filling in the localized strings for any requested locale (including using the default language when the requested locale has no available localization), but you still want the locale (in most cases) to provide processing/formatting. Note that many applications also tailor the fallback, so that, for example, the es-419 (Spanish, Latin America) localization serves many seemingly-unrelated (from a tag perspective) Spanish locales (es-CO, es-CR, es-AR, etc.....)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be many locales supported by a single localization. This API seems to want to return the list of available localizations

Yes, this proposal and the current browser.i18n.getMessage() mechanism ( MDN doc , Chrome doc ) mainly focus on localization (language translations). In other words, it focuses on languages, not all possible regions.

Regarding expanding locales, I think this may be an area that needs to be improved in the future, or expand by developers themselves.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding expanding locales, I think this may be an area that needs to be improved in the future, or expand by developers themselves.

I agree about alternate fallback paths (although Intl has a proposal for this and it is already the way that CLDR data--and thus Intl--works for some locale-based "best match" selection vs. BCP47 Lookup), but think that this document isn't putting enough thought into the separation of locales and localizations/translations. Trying to use the same mechanism for both will disappoint users. Trivial examples with only a couple of languages don't match the needs of those developers who need to support a fairly large set of locales.

getMessage tries to use a BCP47 Lookup type of fallback to select the correct message "like a resource bundle" (cf. Java or GNU gettext for examples of other resource bundle systems). Resource selection often had a dual fallback capability (the messages files and the specific keys within each message file).

The resource files want to use the least specific locale as their identifier possible (the better to provide coverage for many locales). The user, however, wants to specify the most specific locale possible (including script, region, and various locale extensions where applicable), the better to tailor the runtime experience.

In this document, there are calls to things such as "selecting which locale" to use (i.e. a picker) as well as "selecting which localization" to use (i.e. which language file or files to download. These lists want to be different (the list of locales is long, while the list of localizations is typically shorter).


Note that there is on-going work at W3C related to manifest (they are adopting slightly different mechanisms for managing localization). Also, I am the editor of Developing Localizable Manifests, which tries to enumerate a lot of this material and which readers on this thread might find useful. Lastly, there is work at Unicode on MessageFormat 2 and proposed for MessageResource (which is similar to the message files in getMessage)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips, thanks for taking your time to share your expertize on this issue. I am certainly out of my depth so the extra context and pointers on what to consider are really appreciated. If you'd like to attend a public meeting at some point, we could likely carve out some time to make sure we can have an in depth discussion. Otherwise your time continuing to help with the conversation here is valued regardless.

Based on all of the above, I am still of the opinion that offering a getAllLanguages that is intended for rendering a menu is going to be very hard to get right. Given users want to choose a fairly specific locale, it would likely need to return a much longer list of anything you can setCurrentLanguage to, as @carlosjeurissen suggested. However, that seems like it would be a long list which would be hard to render with the right nesting / structure without additional work. You would also need a mapping from returned codes to strings which are human readable.

I can see the value in a function that returns "which of the folders in _locales were accepted". Perhaps we could lean into that with an API like getParsedLocalesDirectories()? Based on @aphillips' explanation it seems like locales may be the wrong word there but I think we are slightly backed into a corner by the existing usage of _locales as the canonical place for extensions to store messages.json files.

For setCurrentLanguage, I'm convinced after this discussion that you should be able to set it to any valid locale, however specific, and the browser should pick the appropriate language file with fallbacks if needed. That way you can offer a fairly detailed picker for users and rely on the browser to use its built-in fallback behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rendering a menu is going to be very hard to get right ……

Creating a language selection menu has always been a challenge in i18n work, but it is also a necessary work. It can be very simple or very complex. For example, the extension can only list a few languages without variations (like regions and scripts), or list languages with limited or all regions, or let users set languages, regions and extensions (like calendar, date format, etc) separately.

You would also need a mapping from returned codes to strings which are human readable.

Intl.DisplayNames can help it. It also depends on how developers want to design it. For example, a language code can be displayed in two display names (e.g. "zh-CN" can be displayed by "Chinese Simplified (简体中文)"). Anyway, display names are not what this proposal is trying to solve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For setCurrentLanguage, I'm convinced after this discussion that you should be able to set it to any valid locale, however specific, and the browser should pick the appropriate language file with fallbacks if needed. That way you can offer a fairly detailed picker for users and rely on the browser to use its built-in fallback behavior.

@oliverdunk Do you know why Chrome only allow developers to use a fixed list of languages? It is limited by the browser or Chrome Web Store? If it is a hard limitation, that means the browser doesn't support some locales to use in setCurrentLanguage() and the /_locales/ directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanguokai My understanding is that it is based on the locales supported by some of the internal libraries we use to handle i18n.

- If the extension doesn't use `browser.i18n` (there is no "_locales" directory), return `undefined`.
- If the preferred language is not set by `i18n.setCurrentLanguage()`, returns the current language used by `i18n.getMessage()`, assuming that all languages support all possible keys.
- If the preferred language is set by `i18n.setCurrentLanguage()`, and the extension supports this language, then return this language.
- If the preferred language is set by `i18n.setCurrentLanguage()`, but the extension doesn't support this language (no message file for this language), then treat as if the preferred language is not set. This is an edge case, for example, the language was removed when the extension was upgraded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow you to get into this state? I would've expected setCurrentLanguage to throw an error and abort updating the language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the implementation. Although setCurrentLanguage() throw an error, but as I said in the "Implementation Notes" section: When an extension is upgraded, the browser should check to see if the languages it supports has changed (especially in the case of deletion). If the browser does this check, it will not get into this state, otherwise it may happen.

From a specification perspective, I'm just listing this possibility. From an implementation perspective, browsers just need to make sure that this problem can be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, following the discussion with @aphillips I'm more in favour of allowing this now. Perhaps we should still throw in extreme cases (if you try to set the language to French, but you only have English message files) but beyond that allowing you to use browser fallbacks seems helpful.

proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved

**Author:** hanguokai

**Sponsoring Browser:** Chromium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if we had already agreed this - was there a discussion somewhere? If not I will take it to the team.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, let me confirm then. I definitely like the change, I'm just being extra careful any time we add ourselves as the sponsoring browser since we agreed on some strict rules about that meaning we will implement in a timely fashion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jackie - I spoke to the team about this. We're supportive of the idea, but it is very unlikely we would be able to implement this ourselves any time soon. With that in mind, we'd only be comfortable sponsoring this if there was an external contributor able to implement it.

@xeenon / @Rob--W With that in mind, would either Apple or Mozilla be more likely to implement this soon and be interested in sponsoring?

@carlosjeurissen carlosjeurissen self-requested a review June 18, 2024 15:52
@carlosjeurissen carlosjeurissen added topic: localization i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Jun 19, 2024
Copy link
Contributor

@carlosjeurissen carlosjeurissen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for putting this together @hanguokai!

Looks good to me, although I would like this proposal to also cover a getMessage overload (or separate method) to get a message in a specific language.

This has been proposed here:
#274

We can include some variation of one of the proposals in this PR. Another potential syntax would be accepting an object for setCurrentLanguage which accepts a tabId. Something like:

await browser.i18n.setCurrentLanguage({
  code: 'en-US',
  tabId: 82938
});

Which would restrict the change of the language to a specific tab. After which all calls to i18n.getMessage would follow the language specified in this call.

However my personal favourite would still be something like this:

browser.i18n.withLanguage('pt-BR').then((getMessage) => {
  let someMessage = getMessage('some_id');
});

* Get all languages that supported by this extension.
* return a Promise resolved with an array of language tags.
*/
i18n.getAllLanguages(): Promise<string[]>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverdunk That is to say the browser per definition supports all the language tags specified in the extension. I suggest we change the method name to i18n.getAvailableLanguages to let the browser return all the languages the extension can call setCurrentLanguage to.

It is not just out of convenience. For example, for the Whale Store I have to exclude language tags with script variations. (zh-Hant, sr-Latn). Thus those are removed from _locales. Having to specify a list of all supported languages for each browser and store in every area of the extension this is used is not as straight forward as it might sound.

proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
This method return the language that the extension is displayed in.

- If the extension doesn't use `browser.i18n` (there is no "_locales" directory), return `undefined`.
- If the preferred language is not set by `i18n.setCurrentLanguage()`, returns the current language used by `i18n.getMessage()`, assuming that all languages support all possible keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the preferred language is not set by `i18n.setCurrentLanguage()`, returns the current language used by `i18n.getMessage()`, assuming that all languages support all possible keys.
- If the preferred language has not been set by `i18n.setCurrentLanguage()`, returns the first language for which a message file exists following the same fallback mechanism used by `i18n.getMessage()

As a sidenote, the fallback mechanism is not the same across browsers. See: #296

proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
@hanguokai
Copy link
Member Author

Another potential syntax would be accepting an object for setCurrentLanguage which accepts a tabId.

There is no "per-tab" demand. It complicates the design.

Co-authored-by: carlosjeurissen <[email protected]>
@carlosjeurissen
Copy link
Contributor

carlosjeurissen commented Jun 20, 2024

Another potential syntax would be accepting an object for setCurrentLanguage which accepts a tabId.

There is no "per-tab" demand. It complicates the design.

Seems @xPaw has potentially such use case. From my own experience in a content-script I sometimes want to match the language of the page instead of the extension when injecting elements to the DOM to prevent mixed-language situations.

@xPaw
Copy link

xPaw commented Jun 21, 2024

Seems @xPaw has potentially such use case.

I'd be just fine with an overload that accepts a language. Is it even possible to get current tab id without having the tabs permission?

@carlosjeurissen
Copy link
Contributor

@xPaw Can you describe your use-case in detail? Getting the current tab id can always be achieved without the tabs permission using chrome.tabs.getCurrent(). This would not work in an extension popup as it is technically not a tab.

@xPaw
Copy link

xPaw commented Jun 21, 2024

Can you describe your use-case in detail?

To be specific, I have this extension, which lists a bunch of content scripts for various pages on Steam Store/Community, it adds some extra buttons/blocks to the page like 'lowest recorded price' for example. Since I added localization support, people are getting mismatching languages on the page (like the page may be in English, but their browser is in Japanese, so they get messages added by the extension in the browser language). Users are asking to change the extension's language because of this.

If I was able to pass a specific language to getMessage, I would be able to match the messages to the current page language (I am able to figure out the page's current language).

The content scripts simply use i18n.getMessage (I have a helper function _t), other extensions solve this by loading the locale json to get messages manually.

@carlosjeurissen
Copy link
Contributor

carlosjeurissen commented Jun 24, 2024

@hanguokai and others. Let me know if you feel #274 should fit in this proposal. As the use case of @xPaw and others are not covered by all the methods in this proposal. If so I will write up suggested additions. Otherwise I could write a formal proposal as promised during the San Diego meetings.

@hanguokai
Copy link
Member Author

Let me know if you feel #274 should fit in this proposal.

I think #274 should be a separate proposal. These use cases are different from setting the user's preferred language for this extension, they are more dynamic scenarios. However, if users want to use the same language in various places, they should use this proposal's APIs.


In this document:
- **locale** or **language code** or **language tag** is a string that represent a language, defined in [BCP 47](https://www.rfc-editor.org/info/bcp47).
For example, `en-US`, `zh-CN`, `fr`. It is used by `browser.i18n`, `Date`, `Intl` and various other APIs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zh-CN is presumably meant to indicate Simplified Chinese, which is also used in Singapore. It might be better to use zh-Hans as the language tag in the example.

See also https://www.w3.org/International/articles/language-tags/index.en.html#script

Copy link
Member Author

@hanguokai hanguokai Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps due to historical reasons, browsers still uses 'zh-CN' instead of 'zh-Hans'. For example, navigator.language returns "zh-CN" in Chrome and Firefox on macOS. And Chrome only supports a list of languages in the /_locales/ directory, which only contains a limited combination of languages and regions.

I think solving this problem seems to go beyond the core issue that this proposal aims to address. This proposal is intended to expand functionality on the basis of existing capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still would be good and is also relevant for this issue what locales a browser supports / can deal with. See also: #131

For example, the Naver Whale store rejects any language tag which includes the script subtag (Hans, Latn, Cyrl, and others).

Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the surface this looks like a reasonable API proposal. I am concerned about the ability for extensions to rapidly change the locale and its impact on UI surfaces in the browser and would like to see some mitigations for this abuse by design.

proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
proposals/per-extension-language-preferences.md Outdated Show resolved Hide resolved
#### A New Predefined Message `@@current_locale`
There is an exsiting predefined message `@@ui_locale`, that reflects the value of `i18n.getUILanguage()`, but the value uses underscore (e.g. "en_US") as separator in Chrome and hyphen (e.g. "en-US") in Firefox.

Relative to `@@ui_locale`, a new predefined message, ``@@current_locale``, should be added, which reflects the value of `i18n.getCurrentLanguage()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a late addition to the proposal. What use cases are served with the proposed @current_locale? This property does not automatically follow from the three listed use cases in the proposal document.

Additionally, because the stylesheet does not update automatically in response to UI language changes, the use of the proposed @current_locale in the stylesheet can result in UI getting out of sync with the actual language, unless the extension has additional code to deal with language changes (with i18n.onLanguageChanged). If an extension already needs to account for this, then they may as well do all language-dependent CSS from JavaScript. These days, they can use CSS variables to control stylesheet behavior in a more powerful way than @-macros can. For example, the only way for an extension to reload the stylesheet to read the new @ values is to replace the style sheet element with a new one that has the same URL and a random extra addition (query string) to bust the style cache.

Because of this, I am inclined to vote for removing the proposed @current_locale from the proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't pursue this predefined message very much, but only add it for completeness reason. @carlosjeurissen Would you mind deleting this part? Because you suggested it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not feeling too strongly about it. One of the motivations is described here:
#642 (comment)

As a way to get the path of the language files in JavaScript for example.

In general I agree with not using the predefined messages in CSS. Personally not been using them in any shape or form in the past few years in any projects in CSS. It very much feels like a hack and we might want to deprecate some of them moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: localization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants