Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnscry.pt update script #945

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

TBBle
Copy link

@TBBle TBBle commented Aug 12, 2024

Based on #868 and particularly #868 (comment), this is a trivial script to update the dnscry.pt entries in v3/public-resolvers.md and v3/relays.md from https://www.dnscry.pt/resolvers.json.

No-longer relevant issues

The below issues appeared in the first version of this PR, but @Brueggus (dnscry.pt maintainer) has updated the published resolver data to resolve them both, per #945 (comment)


The first run generates a lot of churn, as compared to the data in this repo, the upstream data now encodes the port number in all of its DNS Stamps. It's easy to see this after format.py is run and the v1/dnscrypt-resolvers.csv is updated. It would be possible to strip the port 443 from the DNS Stamps and recalculate them, but doing so would make these stamps incomparable to upstream, and that cost me a bit of time trying to work out why they were all different before I checked with https://dnscrypt.info/stamps/.

Also visible in v1/dnscrypt-resolvers.csv is some hard-to-avoid churn in entries where upstream has multiple resolvers with the same "location" key, as my dumb-trivial mechanism to resolve that may not pick the same one to rename as the one done previously.

This is visible for example with current dnscry.pt-amsterdam-ipv4 which ends up as dnscry.pt-amsterdam02-ipv4 (based on the public key), and the other entry with that location is now dnscry.pt-amsterdam-ipv4.

I'm not sure if this is really an issue for end users. An alternative would be to add the suffix to the location for any host with a hostname that isn't ___01, but that would rename existing dnscry.pt-singapore-ipv4 (again matching the host keys) to dnscry.pt-singapore03-ipv4. In this approach, Tokyo would also ends up with only dnscry.pt-tokyo02-ipv4 and dnscry.pt-tokyo03-ipv4, but that's actually fair since current dnscry.pt-tokyo-ipv4 is gone, and gets replaced in-place with what upstream calls tyo03.dnscry.pt.

I didn't commit the result of running the scripts since I'm not set up for minisig: I'm not sure what the expectation here is, but it would be trivial to add such a commit if desired.

It might also make for cleaner history if I in-place updated all the dnscry.pt DNS Stamps in-place first to have the :443 port number, then the churn will be much more readable.

The current output from execution looks like this:

> py -3 .\utils\update-dnscry.pt-entries.py

[v3/public-resolvers.md]
Duplicate entry: [dnscry.pt-amsterdam-ipv4] => [dnscry.pt-amsterdam02-ipv4]
Duplicate entry: [dnscry.pt-amsterdam-ipv6] => [dnscry.pt-amsterdam02-ipv6]
Duplicate entry: [dnscry.pt-hongkong-ipv4] => [dnscry.pt-hongkong02-ipv4]
Duplicate entry: [dnscry.pt-hongkong-ipv6] => [dnscry.pt-hongkong02-ipv6]
Duplicate entry: [dnscry.pt-losangeles-ipv4] => [dnscry.pt-losangeles02-ipv4]
Duplicate entry: [dnscry.pt-losangeles-ipv6] => [dnscry.pt-losangeles02-ipv6]

[v3/relays.md]
Duplicate entry: [dnscry.pt-anon-amsterdam-ipv4] => [dnscry.pt-anon-amsterdam02-ipv4]
Duplicate entry: [dnscry.pt-anon-amsterdam-ipv6] => [dnscry.pt-anon-amsterdam02-ipv6]
Duplicate entry: [dnscry.pt-anon-hongkong-ipv4] => [dnscry.pt-anon-hongkong02-ipv4]
Duplicate entry: [dnscry.pt-anon-hongkong-ipv6] => [dnscry.pt-anon-hongkong02-ipv6]
Duplicate entry: [dnscry.pt-anon-losangeles-ipv4] => [dnscry.pt-anon-losangeles02-ipv4]
Duplicate entry: [dnscry.pt-anon-losangeles-ipv6] => [dnscry.pt-anon-losangeles02-ipv6]

@Brueggus
Copy link

Brueggus commented Sep 6, 2024

dnscry.pt maintainer here. Thanks for your great work! Keeping this repository in sync with the resolver lists I publish has been haunting me for months, so I am happy to see the progress made in this PR!

The first run generates a lot of churn, as compared to the data in this repo, the upstream data now encodes the port number in all of its DNS Stamps.

That's (unfortunately) due to laziness: I am taking the DNS Stamps for DNSCrypt and Anonymized DNS straight from the output of encrypted-dns-server. When I started the project, I only supported DNSCrypt, so there was no need to calculate any other DNS Stamps. Now changing this wouldn't be a big deal since I already calculate the stamps for DoT/DoH. I'll look into that.

Also visible in v1/dnscrypt-resolvers.csv is some hard-to-avoid churn in entries where upstream has multiple resolvers with the same "location" key, as my dumb-trivial mechanism to resolve that may not pick the same one to rename as the one done previously.

I have never properly implemented having multiple resolvers in the same location and that's why things become inconsistent. At the moment, I am (ab-)using the location field, which you find in the JSON as well, and add an incrementing value if needed. For example, the resolver tokyo-ipv4 in my resolvers.md (https://www.dnscry.pt/resolvers.md) shows "location": "Tokyo" in the JSON, tokyo02-ipv4 shows "location": "Tokyo 02" and so on.

I will have to make some adjustments here, but I don't think there's a better/more proper way than adding an incrementing value to locations which host more than one resolver.

Besides that: Is there anything I can change in the JSON output to make things easier for you?

@TBBle
Copy link
Author

TBBle commented Sep 6, 2024

I think the stamp data in the JSON is fine, my concern was merely that this repo's existing (hand-maintained) stamps are either old, or were being recalculated to remove the port before publication, and so servers that haven't actually changed are showing a changed stamp when you compare the generated output to the existing data. That's why I was leaning towards a second PR to go first, which would update all the DSN stamps for dnscry.pt entries to include the port number, matching upstream, but not otherwise changing the content. That makes the diff resulting from running this PR's script much smaller.

As far as Location, then yeah, making them unique by including an incrementing key or something (I'd think the same as the DNS name, ideally) to ensure stability as servers appear and disappear makes sense. Tokyo is an example of where it's working well already, the concern was about Amsterdam, Hong Kong, and Los Angeles, where they do not have any such integer, and my quick-workaround mismatched the existing data in one of those cases, possibly surprising users if they update and don't re-select the desired service. Having the name be unique and upstream defined would let me remove the hack, and then any such mismatch or churn will only happen once, and be isolated to places where this repo and your upstream data source have an existing mismatch.

If there wasn't existing data, I wouldn't use Location as the unique name anyway, I'd prefer to derive it from the host-name. That would introduce a once-off churn for all downstream consumers of this list who are using dnscry.pt servers though, so it's probably not feasible at this point.


Of course, as you are the dnscry.pt maintainer, you are welcome to take this script, run it manually, apply any manual fixups or churn reduction you see fit, and submit the results as a PR against the data files. That might make things easier for the repo owner, as they would more-easily trust a data-only PR from you compared to a script from (random passing stranger) me.

@Brueggus
Copy link

Brueggus commented Sep 6, 2024

or were being recalculated to remove the port before publication

I think that's the case. IIRC the stamps were taken as they are first and the port has been removed in a later commit.

I have just published new resolver lists (+ JSON) which have the port removed if the default port is used, which is the case for all resolvers at the moment. This change was overdue anyway to be compliant with the official (?) specifications for DNS stamps.

the concern was about Amsterdam, Hong Kong, and Los Angeles, where they do not have any such integer

Oh boy... until now I didn't even notice those were missing the incrementing key. I wonder how the clients handled the resolver lists containing two resolvers with the same identifier. Anyways, as a quick fix I've added the 02 in the location field so that the names are unique and the hack you added is no longer required.
I still have to think of a proper way to implement this and I'd prefer to not tie the resolver name to the hostname so that users won't have to change their configs if I have to change servers... but that's out of scope here.

@TBBle
Copy link
Author

TBBle commented Sep 7, 2024

Awesome, thank you. I've rebased and rerun the scripts, and the churn is now much lower, so I included the output of the run as a commit for visibility.

Copy link
Author

@TBBle TBBle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For easier review, I've annotated the changes in the data files that weren't purely new entries. One entry's IP was updated, and three have changed names with a new entry taking their former name, so this should not break anyone's systems unless certificate pinning is common for users in this ecosystem?

I annotated v1 because it's easy to see what changed; that was all generated using format.py, my script only updates the v3 data; the same changes in DNS stamps are visible there, but not what changed.

Comment on lines 94 to 92
dnscry.pt-amsterdam-ipv4,dnscry.pt-amsterdam-ipv4,"DNSCry.pt Amsterdam - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,45.86.162.110,2.dnscrypt-cert.dnscry.pt,6E5C:573C:9A33:687D:DDD1:3F61:FFAF:4EA7:59E7:6106:1B5B:8C88:59DF:32A4:E391:9CAF,
dnscry.pt-amsterdam-ipv4,dnscry.pt-amsterdam-ipv4,"DNSCry.pt Amsterdam - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,198.140.141.46,2.dnscrypt-cert.dnscry.pt,5A9B:69F3:B181:7B8A:C0E8:18C5:0E97:52A1:D690:C52D:FF92:C8DA:70E4:7551:54E6:19BB,
dnscry.pt-amsterdam-ipv6,dnscry.pt-amsterdam-ipv6,"DNSCry.pt Amsterdam - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2a03:94e3:222b::1032],2.dnscrypt-cert.dnscry.pt,5A9B:69F3:B181:7B8A:C0E8:18C5:0E97:52A1:D690:C52D:FF92:C8DA:70E4:7551:54E6:19BB,
dnscry.pt-amsterdam02-ipv4,dnscry.pt-amsterdam02-ipv4,"DNSCry.pt Amsterdam 02 - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,45.86.162.110,2.dnscrypt-cert.dnscry.pt,6E5C:573C:9A33:687D:DDD1:3F61:FFAF:4EA7:59E7:6106:1B5B:8C88:59DF:32A4:E391:9CAF,
dnscry.pt-amsterdam02-ipv6,dnscry.pt-amsterdam02-ipv6,"DNSCry.pt Amsterdam 02 - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2a07:efc0:1001:a5ce::b4b4],2.dnscrypt-cert.dnscry.pt,6E5C:573C:9A33:687D:DDD1:3F61:FFAF:4EA7:59E7:6106:1B5B:8C88:59DF:32A4:E391:9CAF,
Copy link
Author

@TBBle TBBle Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key 6E5C:573C:9A33:687D:DDD1:3F61:FFAF:4EA7:59E7:6106:1B5B:8C88:59DF:32A4:E391:9CAF is renamed from dnscry.pt-amsterdam-ipv4 to dnscry.pt-amsterdam02-ipv4.

This is consistent with the hostname (ams02.dnscry.pt) per https://www.dnscry.pt/public-resolvers/ams02

Comment on lines 144 to 149
dnscry.pt-hongkong-ipv4,dnscry.pt-hongkong-ipv4,"DNSCry.pt Hong Kong - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,45.123.188.129,2.dnscrypt-cert.dnscry.pt,0B71:21FC:C3CB:4775:D462:C91E:BEC7:3E47:6D3F:019B:8A69:8D50:CD20:51AD:207E:31C8,
dnscry.pt-hongkong-ipv6,dnscry.pt-hongkong-ipv6,"DNSCry.pt Hong Kong - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2406:4300:bae:6b08::1],2.dnscrypt-cert.dnscry.pt,0B71:21FC:C3CB:4775:D462:C91E:BEC7:3E47:6D3F:019B:8A69:8D50:CD20:51AD:207E:31C8,
dnscry.pt-hongkong-ipv4,dnscry.pt-hongkong-ipv4,"DNSCry.pt Hong Kong - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,89.213.0.26,2.dnscrypt-cert.dnscry.pt,32DD:EDC4:D4D9:B251:982A:373B:D3D9:E117:8929:56DB:8FD9:4FD4:A7AB:1A67:F62D:EF35,
dnscry.pt-hongkong-ipv6,dnscry.pt-hongkong-ipv6,"DNSCry.pt Hong Kong - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2a13:82c1:850a::b7],2.dnscrypt-cert.dnscry.pt,32DD:EDC4:D4D9:B251:982A:373B:D3D9:E117:8929:56DB:8FD9:4FD4:A7AB:1A67:F62D:EF35,
dnscry.pt-hongkong02-ipv4,dnscry.pt-hongkong02-ipv4,"DNSCry.pt Hong Kong 02 - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,45.123.188.129,2.dnscrypt-cert.dnscry.pt,0B71:21FC:C3CB:4775:D462:C91E:BEC7:3E47:6D3F:019B:8A69:8D50:CD20:51AD:207E:31C8,
dnscry.pt-hongkong02-ipv6,dnscry.pt-hongkong02-ipv6,"DNSCry.pt Hong Kong 02 - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2406:4300:bae:6b08::1],2.dnscrypt-cert.dnscry.pt,0B71:21FC:C3CB:4775:D462:C91E:BEC7:3E47:6D3F:019B:8A69:8D50:CD20:51AD:207E:31C8,
Copy link
Author

@TBBle TBBle Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key 0B71:21FC:C3CB:4775:D462:C91E:BEC7:3E47:6D3F:019B:8A69:8D50:CD20:51AD:207E:31C8 is renamed from dnscry.pt-hongkong-ipv{4,6} to dnscry.pt-hongkong02-ipv{4,6}.

This is consistent with the hostname (hkg02.dnscry.pt) per https://www.dnscry.pt/public-resolvers/hkg02

Comment on lines 166 to 170
dnscry.pt-london-ipv4,dnscry.pt-london-ipv4,"DNSCry.pt London - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,178.239.174.244,2.dnscrypt-cert.dnscry.pt,8F66:DC44:BEBB:62C6:0CEA:2D99:2B92:5FFE:1CBE:FE09:ABB6:6140:8417:6BDB:F2CA:31EE,
dnscry.pt-london-ipv4,dnscry.pt-london-ipv4,"DNSCry.pt London - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,45.67.84.132,2.dnscrypt-cert.dnscry.pt,8F66:DC44:BEBB:62C6:0CEA:2D99:2B92:5FFE:1CBE:FE09:ABB6:6140:8417:6BDB:F2CA:31EE,
Copy link
Author

@TBBle TBBle Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IP address was out-of-date, but the host key is the same. IP address manually verified at https://www.dnscry.pt/public-resolvers/lon01. No change in the IPv6 address of this service.

Comment on lines 168 to 175
dnscry.pt-losangeles-ipv4,dnscry.pt-losangeles-ipv4,"DNSCry.pt Los Angeles - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,104.200.67.194,2.dnscrypt-cert.dnscry.pt,8871:792B:8640:7C1C:8597:6CB5:0A9C:A0A0:FF44:0B95:E30F:10AF:FD57:9971:59B9:C184,
dnscry.pt-losangeles-ipv6,dnscry.pt-losangeles-ipv6,"DNSCry.pt Los Angeles - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2602:ff75:7:b79::b4b4],2.dnscrypt-cert.dnscry.pt,8871:792B:8640:7C1C:8597:6CB5:0A9C:A0A0:FF44:0B95:E30F:10AF:FD57:9971:59B9:C184,
dnscry.pt-losangeles-ipv4,dnscry.pt-losangeles-ipv4,"DNSCry.pt Los Angeles - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,84.33.244.100,2.dnscrypt-cert.dnscry.pt,0081:FF5E:2AAB:77C9:4AEC:1980:E72C:16A4:2A14:C835:2746:A518:F03C:71BF:7143:2716,
dnscry.pt-losangeles-ipv6,dnscry.pt-losangeles-ipv6,"DNSCry.pt Los Angeles - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2a0c:8fc3:3:1:2:3:4:5],2.dnscrypt-cert.dnscry.pt,0081:FF5E:2AAB:77C9:4AEC:1980:E72C:16A4:2A14:C835:2746:A518:F03C:71BF:7143:2716,
dnscry.pt-losangeles02-ipv4,dnscry.pt-losangeles02-ipv4,"DNSCry.pt Los Angeles 02 - DNSCrypt, no filter, no logs, DNSSEC support (IPv4 server)",,,,2,yes,yes,no,104.200.67.194,2.dnscrypt-cert.dnscry.pt,8871:792B:8640:7C1C:8597:6CB5:0A9C:A0A0:FF44:0B95:E30F:10AF:FD57:9971:59B9:C184,
dnscry.pt-losangeles02-ipv6,dnscry.pt-losangeles02-ipv6,"DNSCry.pt Los Angeles 02 - DNSCrypt, no filter, no logs, DNSSEC support (IPv6 server)",,,,2,yes,yes,no,[2602:ff75:7:b79::b4b4],2.dnscrypt-cert.dnscry.pt,8871:792B:8640:7C1C:8597:6CB5:0A9C:A0A0:FF44:0B95:E30F:10AF:FD57:9971:59B9:C184,
Copy link
Author

@TBBle TBBle Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key 8871:792B:8640:7C1C:8597:6CB5:0A9C:A0A0:FF44:0B95:E30F:10AF:FD57:9971:59B9:C184 is renamed from dnscry.pt-losangeles-ipv{4,6} to dnscry.pt-losangeles02-ipv{4,6}.

This is consistent with the hostname (lax02.dnscry.pt) per https://www.dnscry.pt/public-resolvers/lax02

@Brueggus
Copy link

Just a heads-up - the IP addresses of dnscry.pt-hongkong-ipv4 and dnscry.pt-hongkong-ipv6 have changed recently. I don't think I can push any changes to this PR.

If we can get this merged, this would help me a lot to keep this repo in sync with changes on my end.

@TBBle TBBle force-pushed the dnscry.pt-update-script branch 2 times, most recently from 8e786e4 to f7ac7c7 Compare September 18, 2024 13:47
@TBBle
Copy link
Author

TBBle commented Sep 18, 2024

I've rerun the script and repushed the branch, and those addresses should now be updated.

I'm happy to trivially rebase if I'm pinged here, but I'm not actively tracking dnscry.pt data updates myself. I have not heard from the repo owner, so I'm not sure what expectation to have about timelines for merging this PR.

@jedisct1
Copy link
Member

Thanks!

Until now, dnscry.pt updates were just copied from:

Is there a difference between this and manually parsing the JSON file?

@Brueggus
Copy link

Brueggus commented Sep 18, 2024 via email

Drop some unused imports and turn `if not X in` into `if X not in`.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
On Windows, you can't rename over a file that exists, and you can't
delete a file you stil have open.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle
Copy link
Author

TBBle commented Sep 18, 2024

Rebased for latest changes to master branch, which see has had (manual, I assume) updates to the dnscry.pt servers, so the final commit's diff is much smaller now.

Is there a difference between this and manually parsing the JSON file?

It's not manual. ^_^ For me, that's a win by itself; YMMV.

For example, looking at the current diff, it's readded the three servers you removed as not working in de9d69e. (Someone should probably report that to @Brueggus if not already aware...)

It also adds a bunch of anonymous relays which aren't currently in the list. I haven't checked carefully but it looks like that's the list of relays removed in 5762a73.

Either way, it makes it super-easy to see what's changed compared to manually parsing a JSON file. (Although I suspect we want to drop the last commit as it's now somewhat reverting deliberate manual changes.)

Edit: Confirmed that a quick git cherry-pick de9d69e32f76f94b57387da45644fa920f0bb57f 5762a73773db2d9ab66517c5fe21a3b7d3d0c1ca brings us back to the current state of the master branch (ignoring the sig-file changes), so I'll remove the final commit shortly.

@jedisct1
Copy link
Member

But why use the JSON file instead of the already existing .md files?

Here are the scripts that have been used to update the dnscry.pt entries so far:

They're very simple as they just add a prefix to the names. Using the JSON file looks way more complicated.

@jedisct1
Copy link
Member

Also, copying the .md files ensures that the resolver names are exactly the same whether one is using dnscry.pt as a source, or dnscrypt.info as a source.

@TBBle
Copy link
Author

TBBle commented Sep 19, 2024

But why use the JSON file instead of the already existing .md files?

There's a built-in JSON parser in Python stdlib, so I didn't even need to think about parsing Markdown's various flavours.

I also assumed upstream JSON was canonical, and any MD output was generated from that and liable to change.

When I came into this, both formats already existed and there wasn't any hint of an existing MD parser in the lInked bug or repo that I saw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants