Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolver Infinite loop on simple version collision #3027

Open
1 task done
carlkibler opened this issue Jul 16, 2024 · 6 comments
Open
1 task done

Resolver Infinite loop on simple version collision #3027

carlkibler opened this issue Jul 16, 2024 · 6 comments
Labels
🐛 bug Something isn't working

Comments

@carlkibler
Copy link

carlkibler commented Jul 16, 2024

I read other "infinite loop" bugs (#2633, #2545, #1119, #908) but wanted to point out the general bug of running the resolver through 10k tries when there's a fairly obvious unresolvable conflict.

  • I have searched the issue tracker and believe that this is not a duplicate.

Make sure you run commands with -v flag before pasting the output.

Steps to reproduce

  1. Fresh project.
  2. Add dependencies langchain and gretel-python-client. (file below)

pyproject.toml:

[project]
name = "pdm_resolve"
version = "0.1.0"
description = "Default template for PDM package"
authors = [
    {name = "Person", email = "[email protected]"},
]
dependencies = [
    "langchain",
    "gretel-client",
]
requires-python = "==3.12.*"
readme = "README.md"
license = {text = "MIT"}


[tool.pdm]
distribution = false

Actual behavior

PDM will go into resolution spin up to value of strategy.resolve_max_rounds (default is 10k). Ok. Here's the conflict:

  • gretel specifies pydantic==1.10.17
  • langchain specifies pydantic<3.0.0,>=2.7.4

pip reports the conflict immediately:

langsmith 0.1.86 requires pydantic<3.0.0,>=2.7.4; python_full_version >= "3.12.4", but you have pydantic 1.10.13 which is incompatible.
langchain-core 0.2.19 requires pydantic<3.0.0,>=2.7.4; python_full_version >= "3.12.4", but you have pydantic 1.10.13 which is incompatible.

This is great because it's clear and tells me quickly. Having pdm lock work for 10+ minutes and not notice this is a bug to me.

Expected behavior

  1. I would hope this clear conflict doesn't do repeated loops. There is zero solution due to hardcoded version specifications, and the resolver (or logic above that?) should see that and short-circuit further evaluation.
  2. Make user aware of the 10k loop default limit.
    What is a "normal" amount of loops for a project? 10, 50? 100? I would suggest after 1 minute or 100 resolution attempts, print a message telling the user the resolver will continue trying up to 10k times and how to configure that limit.

I wonder if the default limit should be much lower (50? 100?) and tell users instead "in large projects the value may need to be set higher, and here's how...".

If a reasonable resolver attempts is <50, then make that the default to save time for the vast majority of uses, and tell huge-project users to set that value because they are a special case. It would be more user friendly.

Environment Information

PDM version:
  4.16.1
Python Interpreter:
  /Users/carl/tmp/pdm_resolve/.venv/bin/python (3.12)
Project Root:
  /Users/carl/tmp/pdm_resolve
Local Packages:
  
{
  "implementation_name": "cpython",
  "implementation_version": "3.12.4",
  "os_name": "posix",
  "platform_machine": "arm64",
  "platform_release": "23.6.0",
  "platform_system": "Darwin",
  "platform_version": "Darwin Kernel Version 23.6.0: Sun Jun 30 19:39:43 PDT 2024; root:xnu-10063.140.33~20/RELEASE_ARM64_T6030",
  "python_full_version": "3.12.4",
  "platform_python_implementation": "CPython",
  "python_version": "3.12",
  "sys_platform": "darwin"
}
@carlkibler carlkibler added the 🐛 bug Something isn't working label Jul 16, 2024
@frostming
Copy link
Collaborator

On my machine the resolution succeeds in 30s, with pydantic==1.10.7 pinned.

@pawamoy
Copy link
Sponsor Contributor

pawamoy commented Jul 17, 2024

1min for me (pdm lock -v 60.83s user 0.55s system 75% cpu 1:21.24 total), on Linux. Pydantic 1.10.17 too. Seems like pip stops backtracking earlier.

@pawamoy
Copy link
Sponsor Contributor

pawamoy commented Jul 17, 2024

@carlkibler you make good points though!

Make user aware of the 10k loop default limit.

Yep, could be printed in on each round like pdm.termui: ======== Starting round 61/10000 ========.

What is a "normal" amount of loops for a project? 10, 50? 100? I would suggest after 1 minute or 100 resolution attempts, print a message telling the user the resolver will continue trying up to 10k times and how to configure that limit.

In addition to "round 61/10000", PDM could indeed issue a message in non-verbose mode every couple hundreds rounds.

I wonder if the default limit should be much lower (50? 100?) and tell users instead "in large projects the value may need to be set higher, and here's how...".

Note that it was initially set to 500, and this was generally way too low, so @frostming increased it by a lot. You'll probably find more info by grepping the git logs or PRs on GitHub.

@carlkibler
Copy link
Author

carlkibler commented Jul 17, 2024

You all are right, which is frustrating! Thanks for trying. Some fun updates for completeness:
The gretel-python-client project did a release yesterday 30 minutes after this bug report, changing pydantic's pinned version from 1.10.13 to 1.10.17. I thought maybe that is why you all got different results. Alas, no.

Today:

  • Even pinning previous gretel-python-client version back to the 0.19.2 version active at the time, I can't replicate the behavior, though it happened easily a dozen times in a row on an EC2 linux server and my M3 Macbook pro.
  • I get dependency resolution in 118 loops on my macbook, taking 6m45s real, 2m6s user time. On that same cloud linux server 118 loops also, but the same 1 minute for real and user.
    • the macbook is far more powerful than the little c7g.large EC2 server. Interesting how much longer it takes to iterate. Nothing else heavy is running - just Chrome and a terminal. I'll run this later from home to see if it's an office wifi problem, though if all the metadata is cached it seems unlikely.
  • For my work projects, I have an extra pip location in my ~/.config/pip/pip.conf. But behavior is same with or without it. Just including for completeness.

So I can't replicate yesterday's behavior, which was up in the thousands of resolver attempts. Baffling stuff. I withdraw my specific bug report, until I actually replicate it.

--
I appreciate @pawamoy thinking over the UX suggestions and giving some feedback from history. Printing a message in non-verbose mode every few hundred rounds would be useful I think. So that's my final suggestion.

I am happy to re-craft this into a feature request toward that end, or close this and make a separate feature request. My only goal is to help you all not get pestered by issues like this one.

@frostming
Copy link
Collaborator

I thought maybe that is why you all got different results. Alas, no

No, I even used --exclude-newer=2024-07-15 to return to the old days but it also succeeds. There is no essential difference between 1.10.13 and 1.10.17, too.

Printing a message in non-verbose mode every few hundred rounds would be useful I think. So that's my final suggestion.

This sounds good to me.

@Gnomeek
Copy link
Contributor

Gnomeek commented Jul 24, 2024

Seems lots of work needed to

Printing a message in non-verbose mode every few hundred rounds

since the iteration is located in resolvelib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants