Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work with generator source #44

Open
Bubu opened this issue Mar 19, 2021 · 3 comments
Open

work with generator source #44

Bubu opened this issue Mar 19, 2021 · 3 comments
Labels

Comments

@Bubu
Copy link

Bubu commented Mar 19, 2021

Hi,
I was trying to use ijson with a json stream coming from a zip archive through through a libarchive binding. Unfortunately the package I tried first exposed only a generator for getting the file bytes out of the zip:

https://github.com/Changaco/python-libarchive-c/blob/master/libarchive/entry.py#L48-L56

This is apparently not currently supported by ijson? At least I was getting very strange errors (internal C errors with the default C backend, "too many values to unpack" with the python backend using .items()) which I eventually could narrow down to the generator when using .basic_parse(). Would it make sense to support generators as a source as well or is that somehow fundamentally incompatible?

(Meanwhile I've switched to using the other python libarchive binding which does offer a file-like interface for reading from the archive.)

@rtobar
Copy link

rtobar commented Mar 19, 2021

@Bubu thanks for the very good question. I have pondered about this myself in the past, I have also thought it would be nice to have -- so having someone else express interest in the idea is definitely good.

I think this should be possible, but needs some care to great care. The ijson functions actually already support generators as inputs, but those are assumed to be the lower-level generator functions of ijson itself (e.g., you can use ijson.parse() as the input to ijson.items, see the "Intercepting events" section of the README). It should still be possible to detect those separately from any other arbitrary generators and act differently though. After that it should all work, because funnily enough we internally turn file objects into generators!

def file_source(f, buf_size=64*1024):

The C backend might need some more extra care as well.

I can't promise anything in terms of deadlines. But like I said, I'm onboard with the idea, and if someone decides to step in and give it a crack in the meanwhile I'll be happy to review code and PRs.

@rtobar rtobar added the feature label Mar 19, 2021
This was referenced Apr 13, 2023
@rtobar
Copy link

rtobar commented Apr 13, 2023

For those coming in the future: see #58 (comment) for an (untested, personally) example of a simple file-like wrapper around a generator as a workaround.

@MorningLightMountain713
Copy link

MorningLightMountain713 commented Oct 19, 2023

For those coming in the future: see #58 (comment) for an (untested, personally) example of a simple file-like wrapper around a generator as a workaround.

Based on the above, here is what I'm using with httpx:

import httpx
import ijson
from contextlib import asynccontextmanager
from typing import AsyncIterator

class HttpxStreamAsFile:
    def __init__(self, url: str):
        self.url = url
        self.data = None
        self.client = httpx.AsyncClient()

    @asynccontextmanager
    async def create_stream(self) -> AsyncIterator:
        try:
            await self._create_stream()
            yield
        finally:
            await self.client.aclose()

    async def _create_stream(self) -> None:
        req = self.client.build_request("GET", self.url)
        res = await self.client.send(req, stream=True)
        self.data = res.aiter_bytes()

    async def read(self, n: int) -> None:
        if self.data is None or n == 0:
            return b""

        return await anext(self.data, b"")


async def main():
    url = "your-url"
    httpx_as_file = HttpxStreamAsFile(
        url
    )
    async with httpx_as_file.create_stream():
        async for prefix, event, name in ijson.items(httpx_as_file):
            print(prefix, event, name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants