Add generalized support for specifying Beautiful Soup options

In #206, a new `beautiful_soup_parser` configuration option was added that lets Beautiful Soup's HTML parser be specified:

```python
MarkdownConverter(beautiful_soup_parser="lxml")
```

The specified parser keyword is passed to the `features` parameter of the `BeautifulSoup` constructor, which is the second option after the HTML `markup`:

```python
class BeautifulSoup(Tag):
    # ...
    def __init__(
        self,
        markup="",
        features=None,
        builder=None,
        parse_only=None,
        from_encoding=None,
        exclude_encodings=None,
        element_classes=None,
        **kwargs,
    ):
        # ...
```

But as shown above, the `BeautifulSoup` constructor provides other options that might be useful, such as providing hints to the text encoding detection for the HTML document.

Before #206 ships in a production release, perhaps we could extend its implementation to generally support all Beautiful Soup configuration options using a kwargs-based approach (including new options in the future). For example,

```python
MarkdownConverter(beautiful_soup_options={"features": "lxml"})

MarkdownConverter(beautiful_soup_options={"exclude_encodings": ["iso-8859-7"]})
```

Or perhaps a bit shorter to make up for the extra kwargs length,

```python
MarkdownConverter(bs4_options={"features": "lxml"})

MarkdownConverter(bs4_options={"exclude_encodings": ["iso-8859-7"]})
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add generalized support for specifying Beautiful Soup options #223

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add generalized support for specifying Beautiful Soup options #223

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions