You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, page titles are required to match /^[a-z][a-z0-9]*(?:\/[a-z][a-z0-9]*)*$/i, i.e. they must be one or more parts separated by slashes, where each part consists of an English letter followed by zero or more English letters or Arabic numbers. Usernames are further restricted and must match /^[a-z][a-z0-9]*$/i equivalent to one "part" above. (See also issue #33, which will introduce spaces in titles and forever ban underscores.)
This policy is obviously horrifically Anglocentric. I went ahead and implemented it because I wasn't up on Perl 5 Unicode support, and it's easier to expand the character-set allowed for titles than it is to contract it. But the question remains: what should the title format be?
Research Perl 5 Unicode support (initial reading suggests 5.18 supports it well, throughout)
Research Sqlite3 Unicode support (seems to be passthrough)
i.e. they match the regular expression: /\pL[\pL\p{Nd}\p{Mn}\p{Mc}\p{Cf}]*(?:\/\pL[\pL\p{Nd}\p{Mn}\p{Mc}\p{Cf}]*)*/
Furthermore, all page titles, slugs, what-have-you should be put into NFKC (per http://unicode.org/faq/normalization.html#2) before any processing occurs. This means that homograph attacks will not cause problems. It also means that some titles which (as input by the user) would not be accepted by the above pattern will be transformed into titles that are (e.g. titles containing Roman numerals, which in NFKC are decomposed into ASCII characters, or more practically titles copy-pasted from sources where the actual ffi ligature character was used).
Right now, page titles are required to match
/^[a-z][a-z0-9]*(?:\/[a-z][a-z0-9]*)*$/i
, i.e. they must be one or more parts separated by slashes, where each part consists of an English letter followed by zero or more English letters or Arabic numbers. Usernames are further restricted and must match/^[a-z][a-z0-9]*$/i
equivalent to one "part" above. (See also issue #33, which will introduce spaces in titles and forever ban underscores.)This policy is obviously horrifically Anglocentric. I went ahead and implemented it because I wasn't up on Perl 5 Unicode support, and it's easier to expand the character-set allowed for titles than it is to contract it. But the question remains: what should the title format be?
References:
The text was updated successfully, but these errors were encountered: