Connection delays observed on IPv6-only networks #1170

rtbenfield · 2024-07-17T15:54:31Z

Summary

Hi 👋🏻 I work at Prisma on the team that is responsible for Accelerate. We've had reports from users using Accelerate with MongoDB that initial requests take a significant time, often exceeding 10 seconds, when running on Accelerate. Accelerate uses the Prisma ORM to operate its connection pooling, which has a dependency on the MongoDB Rust driver. After some investigation, we believe this is caused by a manual DNS resolution and preference for IPv4 within the driver.

Accelerate operates an IPv6-only network with outbound IPv4 requests routed through a NAT gateway using DNS64. The DNS resolution for these hosts will return both IPv4 and IPv6 addresses, though the network only supports IPv6. This is typically acceptable when establishing connections by hostname as the host will prefer the IPv6 address. Unfortunately, the manual DNS resolution in the driver here is prioritizing the IPv4 first and waits for that attempt to time out before using the DNS64 IPv6 address and succeeding. Additional requests on the established connection work as expected.

mongo-rust-driver/src/runtime/stream.rs

Line 305 in 5bc56ca

socket_addrs.sort_by_key(|addr| if addr.is_ipv4() { 0 } else { 1 });

Versions/Environment

What version of Rust are you using?
- 1.78.0
What operating system are you using?
- Amazon Linux
What versions of the driver and its dependencies are you using? (Run
cargo pkgid mongodb & cargo pkgid bson)
- cargo pkid mongodb -> registry+https://github.com/rust-lang/crates.io-index#[email protected]
- cargo pkid bson -> registry+https://github.com/rust-lang/crates.io-index#[email protected]
What version of MongoDB are you using? (Check with the MongoDB shell using db.version())
- Various, as Accelerate connects to the MongoDB instance supplied by the user. This is most often reported by MongoDB Atlas users.
What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)?
- Various, as Accelerate connects to the MongoDB instance supplied by the user. This is most often reported by MongoDB Atlas users.

Describe the bug

A clear and concise description of what the bug is.

BE SPECIFIC:

What is the expected behavior and what is actually happening?
- The TCP connection will use the host's preferred network, whether IPv4 or IPv6.
Do you have any particular output that demonstrates this problem?
- We've observed 6-10+ seconds when establishing an initial connection. If there are any logs that might be helpful please let us know.
Do you have any ideas on why this may be happening that could give us a
clue in the right direction?
- This seems to be caused by the manual DNS resolution and sorting of IPv4 first here in AsyncTcpStream::connect.
Did this issue arise out of nowhere, or after an update (of the driver,
server, and/or Rust)?
- No
Are there multiple ways of triggering this bug (perhaps more than one
function produce a crash)?
- This should be reproducible on any IPv6-only network where the MongoDB instance advertises an IPv4 and IPv6 address. The host may support both natively or the network may use DNS64.

The text was updated successfully, but these errors were encountered:

isabelatkinson · 2024-07-17T16:47:25Z

Hey @rtbenfield, thank you for this thorough report! I did some investigation and the reasoning behind this IPV4-first approach was for simplicity and consistency with other drivers; however, I agree that it is not the right solution for your network. This seems like a problem best addressed by implementing happy eyeballs instead. I filed RUST-1994 to track this, and the team will discuss prioritization shortly. Feel free to follow that ticket for more updates.

rtbenfield · 2024-07-17T17:30:28Z

Thanks for the quick response @isabelatkinson! That sounds like it would be a great solution for our network configuration.

isabelatkinson · 2024-07-18T16:03:51Z

@rtbenfield We'll be working on this early next month. Please let us know if you have any further questions!

abr-egn · 2024-08-06T16:19:01Z

I believe we have a fix, although it's hard to test since we can't replicate your exact network environment. Do you have a repeatable test you could run using that branch?

apolanc · 2024-08-07T14:40:32Z

Hi @abr-egn thanks. We will give this a try and come back with feedback.

rtbenfield · 2024-08-19T13:52:03Z

Our team put together a Prisma ORM version with this fix integrated and it worked perfectly 🚀 Connections from within the Accelerate network are super fast now.

Thanks for addressing this so quickly!

abr-egn · 2024-08-19T14:26:59Z

I'm very glad to hear that 🙂 I've merged that PR in, it'll go live with 3.1.0.

laplab · 2024-08-20T08:50:09Z

@abr-egn great news! Can you please share if 3.1.0 is planned for release this week by any chance? Next Prisma ORM release is next week, so we were hoping to include this fix there.

No pressure if not! We can always use the branch you provided in the meantime.

abr-egn · 2024-08-20T14:08:58Z

Not this week, I'm afraid.

github-actions bot added the triage label Jul 17, 2024

github-actions bot assigned isabelatkinson Jul 17, 2024

isabelatkinson removed the triage label Jul 17, 2024

nurul3101 mentioned this issue Jul 18, 2024

Document "Cold starts" for Accelerate prisma/docs#5523

Closed

isabelatkinson mentioned this issue Aug 9, 2024

RUST-1994 Implement happy eyeballs for TCP connection #1183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection delays observed on IPv6-only networks #1170

Connection delays observed on IPv6-only networks #1170

rtbenfield commented Jul 17, 2024

isabelatkinson commented Jul 17, 2024

rtbenfield commented Jul 17, 2024

isabelatkinson commented Jul 18, 2024

abr-egn commented Aug 6, 2024

apolanc commented Aug 7, 2024

rtbenfield commented Aug 19, 2024

abr-egn commented Aug 19, 2024

laplab commented Aug 20, 2024

abr-egn commented Aug 20, 2024

Connection delays observed on IPv6-only networks #1170

Connection delays observed on IPv6-only networks #1170

Comments

rtbenfield commented Jul 17, 2024

Summary

Versions/Environment

Describe the bug

isabelatkinson commented Jul 17, 2024

rtbenfield commented Jul 17, 2024

isabelatkinson commented Jul 18, 2024

abr-egn commented Aug 6, 2024

apolanc commented Aug 7, 2024

rtbenfield commented Aug 19, 2024

abr-egn commented Aug 19, 2024

laplab commented Aug 20, 2024

abr-egn commented Aug 20, 2024