Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"take-2": new Windows backend, ctypes-based, separate class which shards via Attributes #555

Closed
wants to merge 1 commit into from

Conversation

jrobbins-LiveData
Copy link

@jaraco, here's my fresh effort to implement an improved Windows backend, to the plan you outlined:

  1. Determine if pywin32-ctypes can support the necessary interfaces herein and at least capture any shortcomings.
  2. Port the existing backend to an API that supports the necessary interfaces (perhaps an updated version of pywin32-ctypes or maybe a bespoke implementation). Cut a release to get early feedback on any regressions related to that change.
  3. Implement the ShardedKeyring as a specialized subclass or mix-in, encapsulating the sharding logic in one class.

Re 1, I contacted pywin32-ctypes, and heard back that it would take a week or two to hear back from them. I wrote the code that I would like to have from that project. It is in keyring/backends/Windows/api.py. I implemented it, this time, using only ctypes, and not cffi.

Re 2, I moved the existing backends/Windows.py to backends/Windows/__init__.py, and ported it to use api.py.

Re 3, I implemented class WinVaultAttributesKeyring(WinVaultKeyring) to isolate the new sharding logic.

In order to study use-cases of Credential Manager, I implemented CredEnumerate, which let me more readily dump out all the credentials in my Credential Manager. I learned that, apparently, Microsoft had the same issue we have, in needing more room for JWTs.

Their sharding solution uses Attributes, rather than creating multiple entries. I liked that much better, and so the sharding logic in this PR uses Attributes. I'm including a redacted version of that dump here, which includes the first few bytes of CredentialBlob from each entry. I sure hope I redacted enough!!!

redacted_credentials2.txt

While the introduction of the use of Attributes would be new to keyring, I restrict their use to the "password won't fit and we raise 1783 error" case. Passwords that fit are treated the same as in shipping keyring, so the risk introduced should be small.

I added a test illustrating the problem raised in #554. def test_read_even_length_utf8_password(self)

I also fixed the extra duplication of entries raised in #545. But I didn't know how to fix get_credentials, per the observation you made.

What's left?:
I don't understand the priority system 100%, and need some guidance there. But I did add some tests explicitly targeting the new sharding backend.

I'd like to resolve the get_credentials part of the compound_name issue. I don't understand the algorithm well enough to fix it myself. In general, the idea that I call set_password with one service, and, later, the system renames it, yet transparently finds it for me, is pretty confusing to me. I would benefit from more info about what problem is being solved. Is it that the keyring model is that only the (service, username) is meant to be unique?

My credentials file shows that only keyring users (and presumably other pywin32 or pywin32-ctypes users) are using UTF-16. Many of the credentials appear to be various 8-bit encodings, and typically do not appear to be Unicode encodings at all. If my Credential Manager data is indicative, I don't think we are helping interoperability by imposing UTF-16. Rather, I think we simply inherited this behavior from pywin32. I'd love to discuss what we can do about this, without disrupting existing users.

@jrobbins-LiveData
Copy link
Author

@jaraco any feedback on this rewrite?

Comment on lines 113 to +127
if existing_pw:
# resave the existing password using a compound target
existing_username = existing_pw['UserName']
target = self._compound_name(existing_username, service)
self._set_password(
target,
existing_username,
existing_pw.value,
)
self._set_password(service, username, str(password))

def _set_password(self, target, username, password):
# resave the existing password using a compound target
# Fixes part of https://github.com/jaraco/keyring/issues/545,
# but get_credentials also needs to be fixed to search in the same
# order as get_password.
if existing_username != username:
target = self._compound_name(existing_username, service)
self._set_password(
target,
existing_username,
existing_pw.value,
encoding=encoding,
)
self._set_password(service, username, str(password), encoding=encoding)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this comment. It's saying that by re-saving the existing password using a compound target, it fixes part of 545, yet re-saving the existing password was already part of the prior change. Moreover, "get_credentials also needs to be fixed" feels like a misplaced TODO. Additionally, the comment doesn't say anything about what the code is meant to do, only has a reference to a reported issue.

By reading the issue, I now think the comment is only about the "if existing_username" check. The problem with putting the comment here is it becomes interleaved with the other comment, making it difficult to tell what comment goes with which parts of code. Since this change adds a whole new branch of code, it may be time to consider a refactor.

At the very least, I'd move the "resave" comment back to where it was (at the beginning of the if existing_pw branch and then move the rest of the comment into the if existing_username branch and rephrase to indicate its essential purpose, something like, "Only set the compound target when usernames match. Ref #545."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

return credential


def CredReadFromAttributes(Type, TargetName, Credential=None, encoding='utf-8'):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I'd probably name this using PEP 8 conventions. The reason for using the capital camel case is to match the names/parameters of the Windows API. In this case, the functionality is not part of the Windows API, but is a helper function, so probably should follow the naming of other helper functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was writing it, I was trying to imagine the function I wish was already in the API, and wrote the one I wish was there. But that's all inside my head. I agree that it should be named as a helper, not as a faux member of the real API.

Comment on lines +172 to +173
Credential['CredentialBlob'] = accum.decode(encoding)
Credential['CredentialBlobSize'] = len(Credential['CredentialBlob'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little odd to me to be utilizing the API structures to embed the altered values, values that wouldn't be allowed in the structure. I think I'd rather see a different approach where the encoding/decoding strategy sits as a layer between the Windows API and the keyring API, and less intertwined with the Windows API.

I can provide an example later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed (see my previous comment for how I got there.)

Comment on lines -13 to -19
def tearDown(self):
# clean up any credentials created
for cred in self.credentials_created:
try:
self.keyring.delete_password(*cred)
except Exception as e:
print(e, file=sys.stderr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's going on here. This functionality seems to have been removed without justification. This code is there for a reason, preventing the pollution of a user's own keychain when running tests. Why was it removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran my tests, I noticed that this function wasn't getting called. Furthermore, the cleanup function in the underlying BackendBasicTests was getting called (seemingly accomplishing the same goal, although without the try...except.

Looking at the pytest documentation, I see that there is an optional teardown (lowercase "d") method that a test class can provide. My guess is that this function was intended to be named teardown and was mistakenly named tearDown, which made it invisible to pytest.

Since the base class BackendBasicTests cleanup seemed to clean up all the unit test password in my run, I don't know that we need this one.

I should have documented why I yanked it -- sorry about that!

Copy link
Owner

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is looking really good. I quite like the direction it's gone and it's homing in on a sound solution.

I still worry there's just too much going on. This one change affects several different behaviors (introducing new keyring, switch to local ctypes API, changes existing pw checks, introduce encoding, introduces tests for broken encoding, ...). Although I think the final implementation may look very much like what you have here, it's just too much changing for one commit.

I'd much rather see this change as a series of incremental changes, each addressing more selective concerns, either as separate, independent PRs or as separate commits if very closely related.

For example: PR1 could implement simply switching from pywin32ctypes to a native ctypes implementation. That change could be reviewed, accepted, and released so that its effect could be proven and any unexpected regressions could be addressed before accepting more changes that depend on that change. Then PR2 could implement the sharding strategy (based on PR1). PR3 could fix the existing password checks and be independent of PR1 and PR3.

How would you feel about that approach?

@jrobbins-LiveData
Copy link
Author

jrobbins-LiveData commented Jan 2, 2022

I like the idea of a phased set of pull requests. The only drawback is that I am inexperienced both with git and with tox etc on Linux. So I am getting bogged down in the mechanics.

I could really use some instructions as to how to install on Ubuntu so I can test.

Also, I don't know how to create new pull requests into my account without clobbering the existing cloned repo.

Other than these mechanical issues, I'd be happy to break it all apart into 3 PRs. Maybe you or someone else could undertake PR3 (and fix the typo with tearDown in the existing unit test for the Windows backend, since both of those issues are, as you say, unrelated to addressing the problem of accepting longer passwords on Windows?

[EDIT]
It appears that adding

import sys
if not sys.platform.startswith('win'):
    collect_ignore.append('keyring/backends/Windows/api.py')

to conftest.py would fix the tox/pytest crash on ubuntu etc. Is that the correct fix?

@jaraco
Copy link
Owner

jaraco commented Feb 12, 2022

I could really use some instructions as to how to install on Ubuntu so I can test.

For me, the easiest way to test on Ubuntu is to download Docker. I have a "tox-multipy" image I use to test Python applications. From the keyring source folder:

docker run -it -v $(pwd):/src jaraco/multipy-tox
# cd src
# $TOX_WORK_DIR='/tmp/tox'
# tox

Also, I don't know how to create new pull requests into my account without clobbering the existing cloned repo.

Oh, that must be really frustrating. I'd recommend to get a nice repo Gui like VSCode or Sourcetree or PyCharm or Sublime Merge (maybe in that order) and use that to manage your repo. It will make it easier to discover operations (like creating a branch and pushing it to your Github repo) and also visualize your changes. I'd also recommend reading through some of the tutorial resources on using Github, which talk about creating branches in Github.

For your current code, I'd recommend to create a branch for it.

git branch windows-take-2-ctypes

Then reset your main to match upstream and reset to that state.

git checkout main
git fetch upstream
git reset --hard upstream/main

Note, you may or may not have a remote called "upstream". If not, you may need to create one first.

git remote add upstream https://github.com/jaraco/keyring

I hope that helps. I would recommend spending some time getting familiar with branching and submitting PRs from branches as that will make your life immensely easier when it comes to contributing these changes.

Thanks for your patience and perseverance on this effort!

@jaraco
Copy link
Owner

jaraco commented Nov 12, 2023

Closing as stale. Feel free to revive this effort at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants