Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup .mailmap and add maintenance script to tools/ #9856

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

levbishop
Copy link
Member

@levbishop levbishop commented Mar 27, 2023

Summary

Our .mailmap was getting a bit crusty, with both some superfluous entries, and not properly deduplicating all authors (especially those in Co-Authored-By: lines).

It was enough of a pain to do this that I wrote a dumb script to help.
The problem is sufficiently annoying that I now understand why there is no existing tool to do this properly...
The script calls out to git log and git shortlog to get authors.
It parses the .mailmap file looking for errors and unnecessary lines.
It does loose matching on names and emails by normalizing unicode and dropping punctuation.
It makes some assumptions about uniqueness of names and emails that are too restrictive to work as a general-purpose mailmap-fixing tool, but those assumptions hold for the current state of our repo and are likely to remain true.

I'm not exactly sure what to do with the script, but checking it into tools/ with this PR, so it doesn't get lost seems a good start. If we wanted to trust it more then it's complex enough to need tests (where should tests for tools/ scripts even live?). Should it get a Makefile line or a tox environment?

The script is probably too fragile to make it part of the CI, although if you wanted to do it, you should use the form:

$ tools/fix_mailmap.py -sn

which would give a fairly terse output and not make any changes to .mailmap, just report a non-zero exit code if there is an error. It requires python 3.9 and no other dependencies.

For interactive non-CI use it gives pretty complete feedback about what it's doing but you should definitely look at the output and check that it's making good choices since there's a lot of heuristics going on. Generally the pattern should be something like:

$ tools/fix_mailmap.py
...output describing which changes were made and what the script got confused about...
$ vi .mailmap. 
...manually fix .mailmap based on the above output
$ tools/fix_mailmap.py
....new output about fresh changes
$ vi .mailmap
...and so on

See the individual commits to this PR as examples of how that went for this big fixup. Hopefully things won't get so bad in general.
It's not a good idea to just run the script a whole bunch of times in a row without looking at the output because any bad assumptions could propagate pretty far through the mailmap by that time and it will be more of a pain to fix it.

Results

Before this PR:

$ git shortlog -sn --group trailer:Co-Authored-By --group author --email | grep '<' | wc -l
515

After this PR:

$ git shortlog -sn --group trailer:Co-Authored-By --group author --email | grep '<' | wc -l
440

and

$ git shortlog -sn --group trailer:Co-Authored-By --group author --email | grep '<'

gives a clean-looking author list.

I put in the grep '<' because we have some 85 mis-formed trailer lines of the form

Co-Authored-By: firsname lastname [email protected]

rather than the correct

Co-Authored-By: firsname lastname <[email protected]>

I don't think there's any way to use a .mailmap to fix those - a history rewrite would be the only way to normalize those.

@levbishop levbishop requested a review from a team as a code owner March 27, 2023 04:45
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm certainly happy tidying up the mailmap file where there's things missing. I'm also not bothered much if we want to store this script in the tools directory to help, though I don't really know where the line on what we put there ought to be - maybe this would be better just staying somewhere in a personal repo for you? (Especially if you want to work on it more.)

.mailmap Show resolved Hide resolved
.mailmap Show resolved Hide resolved
.mailmap Show resolved Hide resolved
.mailmap Show resolved Hide resolved
.mailmap Show resolved Hide resolved
Comment on lines +205 to +211
# if email2 == canonical_email:
# print_yellow(
# f"Dropped mailmap superfluous second email <{email2}> that is equal to the first "
# f"{mailmap_line(index)}"
# )
# line = f"{canonical_name} <{canonical_email}>"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to clean this out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tries to remove the cases like you pointed out for Ismael's entry only ones where the second actually email isn't needed. But it is sometimes over-zealous and I couldn't think of an easy way to get it to only trigger in the needed cases. I'll add a TODO comment to clarify and maybe someone else has an idea

Comment on lines +404 to +410
elif all(dup_emails_canonical):
emails_list = ", ".join(f"<{e}>" for e in dup_emails)
msg = f" All {len(emails_list)} names are in mailmap: {emails_list}\n"
msg += " This tool doesn't handle when normalization merges two unique authors."
# If this comes up, either split the list of duplicates,
# or do less-aggressive normalization
elif dup_emails_canonical.count(True) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be concerned about performance at all here? We do a lot of iterating all the way through the tuple. I'm guessing it probably shouldn't really matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we care. It's fast enough since it is only running this on the cases where there is a duplication.

Comment on lines +498 to +502
# Load the raw author entries
raw_authors = get_raw_authors()
if len(raw_authors) < 10:
print_red("Problem: `get_raw_authors()` returned too few authors")
sys.exit(ExitCode.GIT_LOG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a sanity check but probably if I try this on some of the less-trafficked qiskit repos I'll need to reduce it. Maybe down to just 1.

tools/fix_mailmap.py Show resolved Hide resolved
tools/fix_mailmap.py Show resolved Hide resolved
@coveralls
Copy link

Pull Request Test Coverage Report for Build 10946868160

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 12 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.001%) to 88.789%

Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.26%
crates/qasm2/src/lex.rs 4 92.48%
qiskit/synthesis/two_qubit/xx_decompose/decomposer.py 6 90.91%
Totals Coverage Status
Change from base Build 10946116755: -0.001%
Covered Lines: 73488
Relevant Lines: 82767

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants