Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A urllib3 connection is not put back in the pool after HEAD #1248

Closed
timuralp opened this issue Jul 20, 2017 · 15 comments
Closed

A urllib3 connection is not put back in the pool after HEAD #1248

timuralp opened this issue Jul 20, 2017 · 15 comments
Labels
dependencies This issue is a problem in a dependency. enhancement This issue requests an improvement to a current feature.

Comments

@timuralp
Copy link
Contributor

timuralp commented Jul 20, 2017

Issuing a HEAD request results in the connection not being put back into the urllib3 pool. This is tied to this issue in urllib3: urllib3/urllib3#673. The problem is resolved in urllib3 >= 1.11. However, botocore vendors requests 2.7.0, which vendors urllib3 1.10.4. Updating the vendored requests library to >= 2.8.0 would resolve this issue.

Relevant issue in urllib3: urllib3/urllib3#1234

@timuralp
Copy link
Contributor Author

Here's an example to reproduce the issue:

import boto3

s = boto3.session.Session(aws_access_key_id=<AWS_ACCESS_KEY_ID>,
                          aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>)
client = s.client('s3')
bucket = <Bucket>
try:
    client.put_object(Bucket=bucket, Key='foo')
    urllib3_pool = client._endpoint.http_session.adapters['https://'].poolmanager.pools._container.values()[0].pool
    print 'Size after PUT:', urllib3_pool.qsize()
    client.head_object(Bucket=bucket, Key='foo')
    print 'Size after HEAD 200:', urllib3_pool.qsize()
    client.head_object(Bucket=bucket, Key='bar')
except:
    pass
print 'Size after HEAD 404:', urllib3_pool.qsize()

I can reproduce it after HEAD returns 404, but not after 200.

@joguSD joguSD added dependency enhancement This issue requests an improvement to a current feature. labels Jul 20, 2017
@joguSD
Copy link
Contributor

joguSD commented Jul 20, 2017

I can confirm that using the above snippet in python 2.7.13 will fail to return the connection back into the pool. Thanks for bringing this to our attention.

@joguSD joguSD added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jul 20, 2017
@timuralp
Copy link
Contributor Author

Spent more time debugging this. It's not actually fixed in urllib3 and requests as I original thought. I will open an issue with urllib3 and then with requests about this.

@timuralp
Copy link
Contributor Author

Filed bugs in urllib3 (urllib3/urllib3#1234) and requests (psf/requests#4199), along with a PR to fix this in urllib3 (urllib3/urllib3#1235).

There is a packaging issue for botocore. New versions of requests do not vendor urllib3 anymore. Does botocore want to continue vendoring urllib3 (previously indirectly through requests)? That would require adding a hack to register it so that requests can find it. Or if boto does not vendor urllib3, the dependency can be set in requirements.

@joguSD
Copy link
Contributor

joguSD commented Jul 21, 2017

Upgrading requests/urllib3 may fix: boto/boto3#1053

@joguSD
Copy link
Contributor

joguSD commented Aug 3, 2017

Seems that this was fixed upstream in: urllib3/urllib3#1235
Still need to update botocore to use newer versions.

@timuralp
Copy link
Contributor Author

timuralp commented Aug 3, 2017

@joguSD do you know what the plan for botocore regarding urllib3 is? Currently, botocore vendors an old requests release, which vendors urllib3. However, a newer version of requests would not do that. Do you expect the right solution would be to patch the vendored urllib3? Or to update vendored requests to a version that does not vendor urllib3 and then either vendor urllib3 or specify a version dependency to address this issue?

Happy to submit a PR if there is some clarity on what the right path forward is.

@joguSD
Copy link
Contributor

joguSD commented Aug 3, 2017

We currently vendor requests as we make some modifications to it, so upgrading it isn't straight forward. I'll bring this up and see what path we want to take moving forward. At least personally I think moving away from vendoring requests would be ideal so we can get upstream bug fixes and avoid issues when some some distributions (ubuntu for example) re-package our library. I can't make any promises though, and I wasn't around when the decision to vendor requests was made so once I hear the reasons behind that my opinion may change.

genben added a commit to miarec/botocore that referenced this issue Oct 19, 2017
Issue "A urllib3 connection is not put back in the pool after HEAD"
boto#1248
@genben
Copy link

genben commented Oct 19, 2017

I understand that update of the vendored urllib3 is not a priority. Can you just apply a hotfix?
I have submitted PR #1300.
All credits to @timuralp

@timuralp
Copy link
Contributor Author

@joguSD do you think maybe the investigating tag is no longer warranted? Is there anything else to further understand about this problem? I would think a bug label would be appropriate.

@joguSD joguSD removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Nov 9, 2017
@joguSD
Copy link
Contributor

joguSD commented Nov 29, 2017

Another reason to update urllib3: boto/boto3#1053

@joguSD
Copy link
Contributor

joguSD commented Nov 30, 2017

Another reason: ipv6 proxy supprt.
#1325

@dww84
Copy link

dww84 commented Apr 9, 2018

Any update on this fix?

@Hernrup
Copy link

Hernrup commented May 18, 2018

We are also experiencing this issue now. We desperately need a new urllib to fix multiple bugs. Boto3 is currently not usable for our company.

@joguSD
Copy link
Contributor

joguSD commented Aug 25, 2018

We have un-vendored requests/urllib3 and are now using urllib3 directly as of botocore v1.11.0.
Relevant upgrade notes.

@joguSD joguSD closed this as completed Aug 25, 2018
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Jun 26, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Jun 27, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Jun 27, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Jul 12, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Jul 29, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Jul 29, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Aug 6, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Aug 6, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
DinoBektesevic added a commit to DinoBektesevic/daf_butler that referenced this issue Aug 6, 2019
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
@swetashre swetashre added the dependencies This issue is a problem in a dependency. label Feb 27, 2020
timj pushed a commit to lsst/resources that referenced this issue Nov 29, 2021
Hardcoded Butler.get from S3 storage for simplest of Datasets with no
- rebased to include newest changes to daf_butler (namely, ButlerURI)
- removed parsePathToUriElements, S3Location and S3LocationFactory,
  and replaced all path handling with ButlerURI and its test cases.
- added transaction checks back into S3Datastore. Unsure on proper
  usage.
- added more path manipulation methods/properties to LocationFactory
  and Location.
  * bucketName returns the name of the bucket in the Location/
    LocationFactory
  * Location has a pathInBucket property that will convert posix
    style paths to S3 protocol style paths. The main difference is
    with respect to leading and trailing separators. S3 interprets
    `/path/`, `/path`, `path/` and `path` keys differently, even
    though some of them are equivalent on a POSIX compliant system.
    So what `/path/to/file.ext` would be on a POSIX system, on S3
    it would read `path/to/file.ext` and the bucket is referenced
    separately with boto3.
- For saving Config as file, moved all the URI handling logic into
  Config and out of Butler makeRepo. The only logic there is root
  directory creation.
  * The call to save config as a file at the butler root directory
    is now done through dumpToUri which then resolves the
    appropriate backend method to call.
- Improved(?) on the proposed scheme for checking if we are
  dealing with a file or a directory in absence of the trailing
  path separator.
  * Noted some differences between the requested generality of code
    in the review for writing to files and inits of config classes.
    For inits it seems as if there's an 2+ year old commit limiting
    the Config files to `yaml` type files only. However, the review
    implied that `dumpToFile` on Config class should be file format
    independent. Then, for `dumpTo*` methods, to check whether we
    have a dir or a file I only inspect whether the path ends on a
    'something.something' style. However, since I can count on
    files having to be `yaml` type and having `yaml` extensions in
    inits I use a simplified logic to determine if we have a dir or
    file. It is possible to generalize inits to other filetypes, as
    long as they have an extension added to them.
  * I assume we do not want to force users to be mindfull of
    trailing separators.
- Now raising errors on unrecognized schemes on all `if scheme ==`
  patterns.
- Closed the StreamIO in Config.dumpToFile
- fixed up the Formatters to return bytes instead of strings. The
  fromBytes methods now expect bytes as well. JSON and YAML were
  main culprints. Fixed the docs for them. At this point I am
  confident I just overwrite the fixes when rewinding changes on
  rebase by accident because I have done that before, twice.
- Added a different way to check if files exist, cheaper but can
  be slower. From boto/botocore#1248 it
  is my understanding that this should not be an issue anymore.
  But the newer boto3 versions are slow to hit package managers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies This issue is a problem in a dependency. enhancement This issue requests an improvement to a current feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants