Proposal: Public Suffix API #676

mckenfra · 2024-08-19T14:25:32Z

This formalizes #231 into a concrete proposal.

proposals/public-suffix.md

oliverdunk

Thanks for this! I will reach out to the PSL maintainers I have been in contact with to ask them to take a look. I'll also share this internally to get an overall opinion from Chrome.

proposals/public-suffix.md

Rob--W · 2024-09-11T17:09:24Z

proposals/public-suffix.md

+by default to be the last domain label of the domain name, or alternatively
+the domain name could be considered invalid.
+
+**Note:** it may be more performant to allow unknown suffixes and assume a single-label


I'm not too concerned about the performance aspect of this; any IPC to handle the API call is likely more expensive than a look up in the PSL.

The proposal does not describe why unknown suffixes should be supported. Could you elaborate when one would want and when one does not want public suffixes? And offer a recommendation on default behavior that is not dependent on (likely not relevant) performance considerations?

It's probably worth pointing out explicitly that there may be web pages displayed in the browser that are not on a registrable domain, e.g. when a local intranet has custom non-public host names.

I will update this section regarding unknown suffixes as you suggest.

On the performance point: I am concerned about performance of this API, because if it turns out to be slower than extensions' own public suffix implementations, then they may not use this API. In particular, Use Cases 1 (Filter Requests by Organization) and 3 (Detect Third-Party Requests) in this proposal are performance-sensitive, because they obtain the registrable domain for every request (not just top-level pages) as the user browses.

Rob--W · 2024-09-11T17:13:47Z

proposals/public-suffix.md

+| PSL Feature            | Requirement | Discussion |
+|------------------------|-------------|------------|
+| Allow Private Suffixes | Yes         | including all suffixes in PSL means more information about third-party boundaries |
+| Allow Unknown Suffixes | Yes         | provides better performance |


Why are Unknown suffixes required in this case? Theoretically better performance does not automatically translate to a functional requirement.

Assuming there is no performance benefit from allowing unknown suffixes, I will revisit this requirement.

Rob--W · 2024-09-11T17:45:35Z

proposals/public-suffix.md

+    base?: string,
+    // The Private-suffixed registrable domain.
+    // Null if an error occurred, or if the domain has no matching Private suffix.
+    private?: string,


Do we really need all three options (domain, base, private)?

(input) domain is redundant, because the extension can keep track of the parameters that it had sent to the API.

Requiring base and private may require two lookups in the PSL, even if the extension does not need it. An extension interested in both can call the API twice with excludePrivateSuffixes set to true and false.

Given this, what do you think of reducing the number of options to just one registrableDomain?

Calling the API twice with excludePrivateSuffixes set to true and false has the following potential issues:

Duplication of work

Must parse/canonicalize the same input domain parameter on each API call.

The PSL lookup involves removing each label in turn from the input domain and testing the remaining suffix until a match is found. Therefore on the second API call, the same unmatching candidate suffixes that have already been tested in the first API call will again be retested. The difference will be that the algorithm keeps going further in terms of removing labels with the excludePrivateSuffixes=true call.

Duplication of returned arrays

If the registrable domains array obtained with excludePrivateSuffixes=false happens to only contain ICANN domains (because no matching private suffixes exist), then the second API call with excludePrivateSuffixes=true will return exactly the same registrable domains array again.

Note that my proposal does not require "two lookups". One possible implementation would simply continue removing labels after finding a private suffix until the ICANN suffix is found, i.e. not a full lookup, but a continuation of the current lookup. Even better, a more optimal implementation would have the corresponding ICANN suffix pre-calculated and stored with every private suffix, such that no further work would be needed to determine the ICANN suffix upon matching a private suffix.

I am happy to remove domain in the result, because I agree this can be inferred from the array position.

Rob--W · 2024-09-11T17:57:29Z

proposals/public-suffix.md

+
+If no matching suffix is found in the PSL for a `domain` parameter, then unless it is determined
+to be specifically [invalid](#6-invalid-domain-parameter), it should be assumed the domain has a
+single-label suffix.


What is the source for the required assumption of "single-label" suffix?

This proposal does not include an example where known vs unknown matters, but in Firefox specifically, there is at least one example where it matters (https://bugzilla.mozilla.org/show_bug.cgi?id=1621168): In determining whether to issue a search query or whether to try a navigation, a PSL lookup is made:

If valid, attempt to navigate.

If invalid, use search engine.

Unknown entries in the PSL should also trigger a search query, but unconditionally making it return a single label would rule out that use.

I am happy to remove this assumption, and instead add an option to the API allowing users to opt-in to this behaviour of assuming a single-label suffix. I was guided in my analysis by the following:

Google Chrome's Public Suffix implementation makes this assumption, I believe. See GetDomainAndRegistry: "If no matching rule is found in the effective-TLD data (or in the default data, if the resource failed to load), the last subcomponent of the host is assumed to be the registry."

tltds also makes this assumption in its implementation. (tltds is a javascript library for obtaining Public Suffixes that is used by popular extensions such as bitwarden password manager and violentmonkey.) However, there is also an open tltds issue caused by this assumption.

Use Cases 1 (Filter Requests by Organization) and 3 (Detect Third-Party Requests) in this proposal are performance-sensitive, because they obtain the registrable domain for every request (not just top-level pages) as the user browses. I wanted to reduce the chance of making users' browsers feel permanently slower after installing these sorts of extensions. This assumption of a single-label suffix by default may allow a faster implementation due to avoiding an explicit PSL lookup for all single-label TLDs.

Use Case 2 (Group Domains in UI) in this proposal may benefit from this assumption, because without it all unknown-suffixed domains would be grouped together.

Rob--W · 2024-09-11T18:16:45Z

proposals/public-suffix.md

+or Punycode.
+
+When settling the promise returned by `getRegistrableDomain()`, the resulting
+domain name should be converted to Unicode from Punycode by default.


Why should the API convert the domain to Unicode by default? When the input is the host name from a URL, it would be reasonable to expect a valid hostname. I'd argue that punycode is the more sane default.

Extensions can bundle the custom library/logic to convert punycode to unicode if desired, because the algorithm is well known and fixed.

I can change this to returning punycode by default, with an API option to convert to unicode. Unicode registrable domains are required by Use Case 2 (Group Domains in UI) in this proposal.

Add Public Suffix API proposal

0f09b4a

Rob--W reviewed Aug 20, 2024

View reviewed changes

Rob--W requested review from oliverdunk and xeenon August 20, 2024 12:29

oliverdunk reviewed Aug 21, 2024

View reviewed changes

proposals/public-suffix.md Outdated Show resolved Hide resolved

proposals/public-suffix.md Show resolved Hide resolved

mckenfra changed the title ~~Add Public Suffix API proposal~~ Proposal: Public Suffix API Aug 23, 2024

mckenfra force-pushed the publicsuffix branch from 960f99d to 0f09b4a Compare September 6, 2024 03:05

Update Public Suffix API proposal

3301526

mckenfra force-pushed the publicsuffix branch from 638833b to 3301526 Compare September 6, 2024 04:29

mckenfra requested review from Rob--W and oliverdunk September 9, 2024 11:31

Rob--W reviewed Sep 11, 2024

View reviewed changes

Rob--W mentioned this pull request Sep 12, 2024

Publish minutes of 2024-09-12 meeting #685

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Public Suffix API #676

Proposal: Public Suffix API #676

mckenfra commented Aug 19, 2024

oliverdunk left a comment

Rob--W Sep 11, 2024

mckenfra Sep 16, 2024 •

edited

Loading

Rob--W Sep 11, 2024

mckenfra Sep 16, 2024

Rob--W Sep 11, 2024

mckenfra Sep 16, 2024 •

edited

Loading

Rob--W Sep 11, 2024

mckenfra Sep 16, 2024

Rob--W Sep 11, 2024

mckenfra Sep 16, 2024

Proposal: Public Suffix API #676

Are you sure you want to change the base?

Proposal: Public Suffix API #676

Conversation

mckenfra commented Aug 19, 2024

oliverdunk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mckenfra Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mckenfra Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mckenfra Sep 16, 2024 •

edited

Loading

mckenfra Sep 16, 2024 •

edited

Loading