Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

utils._http: Fix bytes -> str for py3 #340

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jdanbrown
Copy link
Contributor

  • http response content is a bytes, but downstream code expects it to be
    a str
  • A previous commit (14ce30b) had fixed the type mismatch in the
    successful code path (json.loads) but not in the error code path
    (RequestException(..., content))

Test plan:

  • Run simple query to test success path:
    • bq.Query('select 3').to_dataframe()
    • Before: returns dataframe with 3
    • After: same behavior
  • Run failing query that triggers the http error path:
    • e.g. bq.Query('select * from table').to_dataframe() where table is
      big and triggers error Response too large to return
    • Before:
Traceback (most recent call last):
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 46, in __init__
    error = json.loads(content)['error']
  File "/Users/danb/miniconda3/envs/dwh/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 322, in to_dataframe
    return self.results(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier) \
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 228, in results
    self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 528, in execute
    self._results = job.wait()
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 84, in wait
    raise e
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 82, in wait
    timeout=poll * 1000)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_api.py", line 237, in jobs_query_results
    return datalab.utils.Http.request(url, args=args, credentials=self._credentials)
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 154, in request
    raise RequestException(response.status, content)
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 51, in __init__
    lines = content.split('\n') if isinstance(content, basestring) else []
TypeError: a bytes-like object is required, not 'str'
  • After:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 322, in to_dataframe
    return self.results(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier) \
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 228, in results
    self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 528, in execute
    self._results = job.wait()
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 84, in wait
    raise e
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 82, in wait
    timeout=poll * 1000)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_api.py", line 237, in jobs_query_results
    return datalab.utils.Http.request(url, args=args, credentials=self._credentials)
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 153, in request
    raise RequestException(response.status, content)
datalab.utils._http.RequestException: HTTP request failed: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

- http response content is a bytes, but downstream code expects it to be
  a str
- A previous commit (14ce30b) had fixed the type mismatch in the
  successful code path (`json.loads`) but not in the error code path
  (`RequestException(..., content)`)

Test plan:
- Run simple query to test success path:
  - `bq.Query('select 3').to_dataframe()`
  - Before: returns dataframe with `3`
  - After: same behavior
- Run failing query that triggers the http error path:
  - e.g. `bq.Query('select * from table').to_dataframe()` where table is
    big and triggers error `Response too large to return`
  - Before:
```
Traceback (most recent call last):
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 46, in __init__
    error = json.loads(content)['error']
  File "/Users/danb/miniconda3/envs/dwh/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 322, in to_dataframe
    return self.results(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier) \
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 228, in results
    self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 528, in execute
    self._results = job.wait()
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 84, in wait
    raise e
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 82, in wait
    timeout=poll * 1000)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_api.py", line 237, in jobs_query_results
    return datalab.utils.Http.request(url, args=args, credentials=self._credentials)
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 154, in request
    raise RequestException(response.status, content)
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 51, in __init__
    lines = content.split('\n') if isinstance(content, basestring) else []
TypeError: a bytes-like object is required, not 'str'
```
  - After:
```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 322, in to_dataframe
    return self.results(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier) \
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 228, in results
    self.execute(use_cache=use_cache, dialect=dialect, billing_tier=billing_tier)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query.py", line 528, in execute
    self._results = job.wait()
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 84, in wait
    raise e
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_query_job.py", line 82, in wait
    timeout=poll * 1000)
  File "/Users/danb/hack/pydatalab/datalab/bigquery/_api.py", line 237, in jobs_query_results
    return datalab.utils.Http.request(url, args=args, credentials=self._credentials)
  File "/Users/danb/hack/pydatalab/datalab/utils/_http.py", line 153, in request
    raise RequestException(response.status, content)
datalab.utils._http.RequestException: HTTP request failed: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
```
@yebrahim
Copy link
Contributor

yebrahim commented Apr 7, 2017

content could be either string or bytes, so the check is needed. It seems like you need to make another check in the RequestException constructor.

@jdanbrown
Copy link
Contributor Author

Hmm,

  • The intent of content = content.decode() is to convert (bytes or str) -> str—am I misunderstanding something there?
  • My intent with RequestException was to assume its input is always str, but we could make it accept bytes or str as well.

@jdanbrown
Copy link
Contributor Author

Oh nvm, I see the failing py35 test now...

@jdanbrown
Copy link
Contributor Author

Ah, but isn't the error with the tests, not the code? In py3 http.request returns a bytes instead of a str, but in the test mock we're providing a str result:

This would explain why the code has been working in practice using py35 but the py35 test is failing.

Thoughts?

@yebrahim
Copy link
Contributor

yebrahim commented Apr 7, 2017

You're right, the test should return b'{}' instead to keep the behavior consistent across py2 and py3. At the risk of nitpicking though, b'{}' isn't what will be returned in the py2 case, but it doesn't know the difference so it's fine to always return it.

@jdanbrown
Copy link
Contributor Author

👍 I'll push a fix for the tests early next week.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 77.487% when pulling fe19c56 on jdanbrown:pr-http-py3 into 0ac8733 on googledatalab:master.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants