-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use tomli for reading the lock file #6562
Conversation
7001419
to
d20cd42
Compare
Another argument against introducing tomli is we'll need to vendor it into poetry-core when the lock file reader is added (for locked dist/locked build support). I think the reason for reading the lock file back is because we may write escape characters or similar in strings (since we are accepting untrusted inputs from PyPI/other package indexes + setuptools and other build-systems) that make the file invalid (even if they are valid Python strings). tomli-w encourages something similar (namely, it makes no promises that it writes valid toml and suggests reading back to validate toml), but to be fair, I'm not sure it's 100% necessary. |
poetry-core doesn't need to rewrite pyproject.toml, so the other way to go would be to remove tomlkit there altogether Edit: quite apart from being a smaller dependency, |
per CI the way I've done this is also breaking for plugins, apparently the export plugin at least was relying on Edit: breaking for the export plugin unittest rather than for the plugin itself, as it turns out. |
comment about poetry-core and vendoring got me thinking. The full programme for tomlkit-minimization - if that is indeed a desirable path, which perhaps is what this MR is really asking - would go something like this:
|
f2c3518
to
5ecf4ac
Compare
I may not have enough knowledge to be 100% sure of that, but I am also under the assumption that poetry-core is never doing any write operations on TOML files. So if we confirm that, I'm +1 on the plan highlighted by @dimbleby. python-poetry/poetry-core#445 would probably be good to have as well (issue it fixes is highlighted here, for the context). |
5ecf4ac
to
ea4c56f
Compare
I see that this one has been added to a 1.4 milestone. A couple of thoughts:
|
1.4 is next next release. The reason for this is mostly because of how invasive this is and the need to make a hard breaking change in core. I'd rather make changes in core more gradually in order to avoid branching it as much as possible. I could be convinced otherwise, but personally I'd rather try and get the new installer + |
I'm glad I spoke then, because I think we're at cross purposes. This one does not make or require a change in core, but stands alone. It's the follow-up python-poetry/poetry-core#483 and #6616 that change poetry-core |
Ah, fair point. I was under the impression that you intended #6616 to be a replacement/successor to this PR and commented on the wrong one. Am I correct in that it's a PR of a branch rebased on top of this, and you plan to keep it rebased so this can land first? In that case I don't have major objections -- I'll try to complete a review of this PR sooner than later. |
re "1.4 is next next release". Of course I understand that 1.4 is two more than 1.2, that wasn't what I was asking! It would be good to understand whether that is more likely to be (say) November, or in some far-distant future (think of the gap betwewen 1.1 and 1.2 and double it). Eg that would help me to decide whether it's even worth quibbling about how soon this gets merged. (FWIW I'd like to see poetry turn out a 1.3 pretty darned soon, if only as a demonstration to the world - and to maintainers! - that shorter gaps are the new normal). Anyway back on topic: yes, the reason I kept #6616 separate (and in two commits) was in anticipation that it might be reasonable to merge this one sooner, and that one later. |
It's very much a fluid thing, but the intention is to make releases on the order of months. I would invite you to participate in the Discord channel that you have been invited to if you'd like to join said discussions. Certainly, we're carrying several important fixes in master right now, and it might be very nice to cut a 1.3, maybe even land invasive changes to core (this, extras and sdist normalization) in 1.4, and then scope the installer work for a 1.5. There's plenty of ways to skin the cat and it's mostly deciding what people have the energy for given limited volunteer bandwidth. |
8d8616b
to
c299482
Compare
c299482
to
314d76e
Compare
I know @dimbleby is going to kill me for this and the exact format of the lock file is not relevant for most people but what about using I see at least two advantages:
Further, I don't think there is a relevant performance difference for writing the lock file in real world examples. (Running |
Control over the formatting is a double-edged sword, eg it's tomlkit's 'control' that exposes us to such as #6201. Personally I see no reason to care about fine-grained control of how lists are presented: both formats are plenty readable enough. However I must admit that I feel a strong mistrust of tomlkit and consequently perhaps I'm not as rational as I'd like to be about it...! As I said at the start of this thread, there are pros and cons, and the additional tomli-w dependency is admittedly among the cons. I think the case is stronger over in poetry-core, where removing tomlkit in favour of something that is now in the standard library (since python 3.11) seems a clear win. Also poetry-core doesn't need to write (except in unit test) so that in some sense simplifies the discussion. I think tomlkit is also slow at writing, though I don't have numbers to hand. I agree that it's debatable how important half a second here or there is - though I do think that if you cut such gaps in the right places, it can make a real difference to the feel of a command line tool. So anyway my preference would be that this stands as it is. However, I wouldn't want the perfect to be the enemy of the good: if keeping tomlkit for writing gets this over the line then let's do that, and I can try again at stepping away from it some other time... |
(I just want to acknowledge that in ten minutes experimentation I was unable to make tomlkit slow at writing, which modestly strengthens the case for not bothering with tomli-w) |
314d76e
to
03cb0b4
Compare
I pushed a commit that removes the tomlkit-using re-read of the lockfile after writing it. That is slow, per top of thread. (Probably it's that which had left me with the apparently mistaken impression that writing the file with tomlkit is slow.) |
Actually, I thought this is about commands that only have to read the lock file (like |
it maybe doesn't make much sense in the context of a solve that has likely taken much longer, but I genuinely find - at least now that I know about it! - that the gap between "Resolving dependencies" completing and "Writing lock file" is noticeable, and that it feels nicer to lose it. (Kinda weird that we print "Writing lock file" after it has already been written, but whatever). Anyway I also have never seen this check do anything other than pass. |
companion to python-poetry/poetry#6562 Co-authored-by: Randy Döring <[email protected]>
It's slow, and anyway what is this MR even for if we still read the lockfile with tomlkit?!
4eedcd0
to
dc110ff
Compare
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Pros and cons in this one, opinions are invited.
tomlkit is significantly slower than tomli / tomli-w. And also, at least in a project that uses type-checking, it's a pain in the neck.
Even for a modestly sized project like poetry itself, reading
poetry.lock
on my laptop takes 0.6 - 0.7 seconds (timings obtained with sometime.time()
statements inserted into the codebase).tomli
manages it in ~0.02 seconds.On larger projects the difference is more significant, in my experimentation that ratio of of 30-ish remains consistent.
Admittedly this is only likely to save folk a small number of seconds on any given command. But fast tools are nicer to use than slow tools.
Against this: we already have
tomlkit
in the dependency tree, addingtomli
andtomli-w
looks a bit weird.The difference is that
pyproject.toml
is human-readable and it's highly desirable to maintain formatting and comments etc in interactions with it (poetry add
,poetry remove
). Much as I'd like to usetomli
andtomli-w
everywhere: they just don't offer that feature.poetry.lock
however is machine-generated and machine-read and there is no reason to pay the tomlkit performance penalty in reading and writing it.(I've also removed the code that re-reads the lockfile after writing it, "checking lock file data consistency". Not clear to me what that was for: do you trust your toml writer or not? That same code could be removed independent of the switch away from tomlkit)