Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection delays observed on IPv6-only networks #1170

Open
rtbenfield opened this issue Jul 17, 2024 · 9 comments
Open

Connection delays observed on IPv6-only networks #1170

rtbenfield opened this issue Jul 17, 2024 · 9 comments
Assignees

Comments

@rtbenfield
Copy link

Summary

Hi 👋🏻 I work at Prisma on the team that is responsible for Accelerate. We've had reports from users using Accelerate with MongoDB that initial requests take a significant time, often exceeding 10 seconds, when running on Accelerate. Accelerate uses the Prisma ORM to operate its connection pooling, which has a dependency on the MongoDB Rust driver. After some investigation, we believe this is caused by a manual DNS resolution and preference for IPv4 within the driver.

Accelerate operates an IPv6-only network with outbound IPv4 requests routed through a NAT gateway using DNS64. The DNS resolution for these hosts will return both IPv4 and IPv6 addresses, though the network only supports IPv6. This is typically acceptable when establishing connections by hostname as the host will prefer the IPv6 address. Unfortunately, the manual DNS resolution in the driver here is prioritizing the IPv4 first and waits for that attempt to time out before using the DNS64 IPv6 address and succeeding. Additional requests on the established connection work as expected.

socket_addrs.sort_by_key(|addr| if addr.is_ipv4() { 0 } else { 1 });

Versions/Environment

  1. What version of Rust are you using?
    • 1.78.0
  2. What operating system are you using?
    • Amazon Linux
  3. What versions of the driver and its dependencies are you using? (Run
    cargo pkgid mongodb & cargo pkgid bson)
    • cargo pkid mongodb -> registry+https://github.com/rust-lang/crates.io-index#[email protected]
    • cargo pkid bson -> registry+https://github.com/rust-lang/crates.io-index#[email protected]
  4. What version of MongoDB are you using? (Check with the MongoDB shell using db.version())
    • Various, as Accelerate connects to the MongoDB instance supplied by the user. This is most often reported by MongoDB Atlas users.
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)?
    • Various, as Accelerate connects to the MongoDB instance supplied by the user. This is most often reported by MongoDB Atlas users.

Describe the bug

A clear and concise description of what the bug is.

BE SPECIFIC:

  • What is the expected behavior and what is actually happening?
    • The TCP connection will use the host's preferred network, whether IPv4 or IPv6.
  • Do you have any particular output that demonstrates this problem?
    • We've observed 6-10+ seconds when establishing an initial connection. If there are any logs that might be helpful please let us know.
  • Do you have any ideas on why this may be happening that could give us a
    clue in the right direction?
    • This seems to be caused by the manual DNS resolution and sorting of IPv4 first here in AsyncTcpStream::connect.
  • Did this issue arise out of nowhere, or after an update (of the driver,
    server, and/or Rust)?
    • No
  • Are there multiple ways of triggering this bug (perhaps more than one
    function produce a crash)?
    • This should be reproducible on any IPv6-only network where the MongoDB instance advertises an IPv4 and IPv6 address. The host may support both natively or the network may use DNS64.
@isabelatkinson
Copy link
Contributor

Hey @rtbenfield, thank you for this thorough report! I did some investigation and the reasoning behind this IPV4-first approach was for simplicity and consistency with other drivers; however, I agree that it is not the right solution for your network. This seems like a problem best addressed by implementing happy eyeballs instead. I filed RUST-1994 to track this, and the team will discuss prioritization shortly. Feel free to follow that ticket for more updates.

@rtbenfield
Copy link
Author

Thanks for the quick response @isabelatkinson! That sounds like it would be a great solution for our network configuration.

@isabelatkinson
Copy link
Contributor

@rtbenfield We'll be working on this early next month. Please let us know if you have any further questions!

@abr-egn
Copy link
Contributor

abr-egn commented Aug 6, 2024

I believe we have a fix, although it's hard to test since we can't replicate your exact network environment. Do you have a repeatable test you could run using that branch?

@apolanc
Copy link

apolanc commented Aug 7, 2024

Hi @abr-egn thanks. We will give this a try and come back with feedback.

@rtbenfield
Copy link
Author

Our team put together a Prisma ORM version with this fix integrated and it worked perfectly 🚀 Connections from within the Accelerate network are super fast now.

Thanks for addressing this so quickly!

@abr-egn
Copy link
Contributor

abr-egn commented Aug 19, 2024

I'm very glad to hear that 🙂 I've merged that PR in, it'll go live with 3.1.0.

@laplab
Copy link

laplab commented Aug 20, 2024

@abr-egn great news! Can you please share if 3.1.0 is planned for release this week by any chance? Next Prisma ORM release is next week, so we were hoping to include this fix there.

No pressure if not! We can always use the branch you provided in the meantime.

@abr-egn
Copy link
Contributor

abr-egn commented Aug 20, 2024

Not this week, I'm afraid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants