Skip to content

Commit

Permalink
CLI options
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesturk committed Mar 19, 2023
1 parent 1d53923 commit 991986a
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 8 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,10 @@ You can then call the scraper with a URL to scrape:

## Command Line Usage

If you've installed the package (e.g. with `pipx`, you can use the `scrapeghost` command line tool to experiment.
If you've installed the package (e.g. with `pipx`), you can use the `scrapeghost` command line tool to experiment.

```bash
scrapeghost <https://www.ncleg.gov/Members/Biography/S/436> \
scrapeghost https://www.ncleg.gov/Members/Biography/S/436 \
--schema "{'first_name': 'str', 'last_name': 'str',
'photo_url': 'url', 'offices': [] }" \
--gpt4
Expand All @@ -83,6 +83,23 @@ scrapeghost <https://www.ncleg.gov/Members/Biography/S/436> \
}
```

```bash
Usage: scrapeghost [OPTIONS] URL

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────╮
* url TEXT [default: None] [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────╮
│ --xpath TEXT XPath selector to narrow the scrape [default: None] │
│ --css TEXT CSS selector to narrow the scrape [default: None] │
│ --schema TEXT Schema to use for scraping [default: None] │
│ --schema-file PATH Path to schema.json file [default: None] │
│ --gpt4 --no-gpt4 Use GPT-4 instead of GPT-3.5-turbo [default: no-gpt4] │
│ --verbose -v INTEGER Verbosity level 0-2 [default: 0] │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
```

## Features

### Selectors
Expand Down
14 changes: 8 additions & 6 deletions src/scrapeghost/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@

def scrape(
url: str,
xpath: str | None = None,
css: str | None = None,
schema: str | None = None,
schema_file: pathlib.Path | None = None,
gpt4: bool = False,
verbosity: int = typer.Option(0, "-v", "--verbose", count=True),
xpath: str = typer.Option(None, help="XPath selector to narrow the scrape"),
css: str = typer.Option(None, help="CSS selector to narrow the scrape"),
schema: str = typer.Option(None, help="Schema to use for scraping"),
schema_file: pathlib.Path = typer.Option(None, help="Path to schema.json file"),
gpt4: bool = typer.Option(False, help="Use GPT-4 instead of GPT-3.5-turbo"),
verbosity: int = typer.Option(
0, "-v", "--verbose", count=True, help="Verbosity level 0-2"
),
):
if schema_file:
with open(schema_file) as f:
Expand Down

0 comments on commit 991986a

Please sign in to comment.