Fixup .mailmap and add maintenance script to tools/ #9856

levbishop · 2023-03-27T04:45:29Z

Summary

Our .mailmap was getting a bit crusty, with both some superfluous entries, and not properly deduplicating all authors (especially those in Co-Authored-By: lines).

It was enough of a pain to do this that I wrote a dumb script to help.
The problem is sufficiently annoying that I now understand why there is no existing tool to do this properly...
The script calls out to git log and git shortlog to get authors.
It parses the .mailmap file looking for errors and unnecessary lines.
It does loose matching on names and emails by normalizing unicode and dropping punctuation.
It makes some assumptions about uniqueness of names and emails that are too restrictive to work as a general-purpose mailmap-fixing tool, but those assumptions hold for the current state of our repo and are likely to remain true.

I'm not exactly sure what to do with the script, but checking it into tools/ with this PR, so it doesn't get lost seems a good start. If we wanted to trust it more then it's complex enough to need tests (where should tests for tools/ scripts even live?). Should it get a Makefile line or a tox environment?

The script is probably too fragile to make it part of the CI, although if you wanted to do it, you should use the form:

$ tools/fix_mailmap.py -sn

which would give a fairly terse output and not make any changes to .mailmap, just report a non-zero exit code if there is an error. It requires python 3.9 and no other dependencies.

For interactive non-CI use it gives pretty complete feedback about what it's doing but you should definitely look at the output and check that it's making good choices since there's a lot of heuristics going on. Generally the pattern should be something like:

$ tools/fix_mailmap.py
...output describing which changes were made and what the script got confused about...
$ vi .mailmap. 
...manually fix .mailmap based on the above output
$ tools/fix_mailmap.py
....new output about fresh changes
$ vi .mailmap
...and so on

See the individual commits to this PR as examples of how that went for this big fixup. Hopefully things won't get so bad in general.
It's not a good idea to just run the script a whole bunch of times in a row without looking at the output because any bad assumptions could propagate pretty far through the mailmap by that time and it will be more of a pain to fix it.

Results

Before this PR:

$ git shortlog -sn --group trailer:Co-Authored-By --group author --email | grep '<' | wc -l
515

After this PR:

$ git shortlog -sn --group trailer:Co-Authored-By --group author --email | grep '<' | wc -l
440

and

$ git shortlog -sn --group trailer:Co-Authored-By --group author --email | grep '<'

gives a clean-looking author list.

I put in the grep '<' because we have some 85 mis-formed trailer lines of the form

Co-Authored-By: firsname lastname [email protected]

rather than the correct

Co-Authored-By: firsname lastname <[email protected]>

I don't think there's any way to use a .mailmap to fix those - a history rewrite would be the only way to normalize those.

qiskit-bot · 2023-03-27T04:45:32Z

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@Qiskit/terra-core

jakelishman

I'm certainly happy tidying up the mailmap file where there's things missing. I'm also not bothered much if we want to store this script in the tools directory to help, though I don't really know where the line on what we put there ought to be - maybe this would be better just staying somewhere in a personal repo for you? (Especially if you want to work on it more.)

.mailmap

jakelishman · 2023-03-27T11:45:01Z

tools/fix_mailmap.py

+        # if email2 == canonical_email:
+        #     print_yellow(
+        #         f"Dropped mailmap superfluous second email <{email2}> that is equal to the first "
+        #         f"{mailmap_line(index)}"
+        #     )
+        #     line = f"{canonical_name} <{canonical_email}>"
+


Did you want to clean this out?

It tries to remove the cases like you pointed out for Ismael's entry only ones where the second actually email isn't needed. But it is sometimes over-zealous and I couldn't think of an easy way to get it to only trigger in the needed cases. I'll add a TODO comment to clarify and maybe someone else has an idea

jakelishman · 2023-03-27T11:48:12Z

tools/fix_mailmap.py

+    elif all(dup_emails_canonical):
+        emails_list = ", ".join(f"<{e}>" for e in dup_emails)
+        msg = f"    All {len(emails_list)} names are in mailmap: {emails_list}\n"
+        msg += "    This tool doesn't handle when normalization merges two unique authors."
+        # If this comes up, either split the list of duplicates,
+        # or do less-aggressive normalization
+    elif dup_emails_canonical.count(True) == 1:


Do we need to be concerned about performance at all here? We do a lot of iterating all the way through the tuple. I'm guessing it probably shouldn't really matter.

I dont think we care. It's fast enough since it is only running this on the cases where there is a duplication.

jakelishman · 2023-03-27T11:50:34Z

tools/fix_mailmap.py

+    # Load the raw author entries
+    raw_authors = get_raw_authors()
+    if len(raw_authors) < 10:
+        print_red("Problem: `get_raw_authors()` returned too few authors")
+        sys.exit(ExitCode.GIT_LOG)


It was just a sanity check but probably if I try this on some of the less-trafficked qiskit repos I'll need to reduce it. Maybe down to just 1.

tools/fix_mailmap.py

coveralls · 2024-09-19T18:59:42Z

Pull Request Test Coverage Report for Build 10946868160

Details

0 of 0 changed or added relevant lines in 0 files are covered.
12 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.001%) to 88.789%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/synthesis/unitary_synthesis.py	2	88.26%
crates/qasm2/src/lex.rs	4	92.48%
qiskit/synthesis/two_qubit/xx_decompose/decomposer.py	6	90.91%

Totals
Change from base Build 10946116755:	-0.001%
Covered Lines:	73488
Relevant Lines:	82767

💛 - Coveralls

levbishop added 14 commits March 26, 2023 20:49

Alphabetize .mailmap

df7b3a3

Fix mailmap dup match email

33238e4

First run drop spurious mailmap lines

2b20cd8

Second run adds some author lines

e4df3a0

Manual fixup tool choices

c64825c

Third run of tool normalize by email

7d04ad8

Manual fixup after changes

2c9bea9

Re-alphabetize

f11eaa2

Manual choose canonical emails

b4b3571

Final run

9e9b64c

Manual fix some remaining dups

7c28f83

more manual fixups visible now noise is cleared

a601580

Improve mailmap instructions

a56d92c

Commit mailmap checking script

cc7ee1e

levbishop requested a review from a team as a code owner March 27, 2023 04:45

jakelishman reviewed Mar 27, 2023

View reviewed changes

Merge branch 'main' into fixmailmap

55429c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixup .mailmap and add maintenance script to tools/ #9856

Fixup .mailmap and add maintenance script to tools/ #9856

levbishop commented Mar 27, 2023 •

edited

Loading

qiskit-bot commented Mar 27, 2023

jakelishman left a comment

jakelishman Mar 27, 2023

levbishop Mar 27, 2023

jakelishman Mar 27, 2023

levbishop Mar 27, 2023

jakelishman Mar 27, 2023

levbishop Mar 27, 2023

coveralls commented Sep 19, 2024

Fixup .mailmap and add maintenance script to tools/ #9856

Are you sure you want to change the base?

Fixup .mailmap and add maintenance script to tools/ #9856

Conversation

levbishop commented Mar 27, 2023 • edited Loading

Summary

Results

qiskit-bot commented Mar 27, 2023

jakelishman left a comment

Choose a reason for hiding this comment

jakelishman Mar 27, 2023

Choose a reason for hiding this comment

levbishop Mar 27, 2023

Choose a reason for hiding this comment

jakelishman Mar 27, 2023

Choose a reason for hiding this comment

levbishop Mar 27, 2023

Choose a reason for hiding this comment

jakelishman Mar 27, 2023

Choose a reason for hiding this comment

levbishop Mar 27, 2023

Choose a reason for hiding this comment

coveralls commented Sep 19, 2024

Pull Request Test Coverage Report for Build 10946868160

Details

💛 - Coveralls

levbishop commented Mar 27, 2023 •

edited

Loading