diff --git a/docs/src/markdown/about/changelog.md b/docs/src/markdown/about/changelog.md index bbff9a1..46a9076 100644 --- a/docs/src/markdown/about/changelog.md +++ b/docs/src/markdown/about/changelog.md @@ -1,312 +1,319 @@ # Changelog +## 2.5 + +- **NEW**: Update to support Python 3.12. +- **NEW**: Drop support for Python 3.7. + ## 2.4.1 -- **FIX**: Attribute syntax for case insensitive flag optionally allows a space, it does not require one. +- **FIX**: Attribute syntax for case insensitive flag optionally allows a space, it does not require one. ## 2.4 -- **NEW**: Update to support changes related to `:lang()` in the official CSS spec. `:lang("")` should match unspecified - languages, e.g. `lang=""`, but not `lang=und`. -- **NEW**: Only `:is()` and `:where()` should allow forgiving selector lists according to latest CSS (as far as Soup - Sieve supports "forgiving" which is limited to empty selectors). -- **NEW**: Formally drop Python 3.6. -- **NEW**: Formally declare support for Python 3.11. +- **NEW**: Update to support changes related to `:lang()` in the official CSS spec. `:lang("")` should match + unspecified languages, e.g. `lang=""`, but not `lang=und`. +- **NEW**: Only `:is()` and `:where()` should allow forgiving selector lists according to latest CSS (as far as Soup + Sieve supports "forgiving" which is limited to empty selectors). +- **NEW**: Formally drop Python 3.6. +- **NEW**: Formally declare support for Python 3.11. ## 2.3.2.post1 -- **FIX**: Documentation for installation from source is outdated. +- **FIX**: Documentation for installation from source is outdated. ## 2.3.2 -- **FIX**: Fix some typos in error messages. +- **FIX**: Fix some typos in error messages. ## 2.3.1 -- **FIX**: Ensure attribute selectors match tags that have new line characters in attributes. (#233) +- **FIX**: Ensure attribute selectors match tags that have new line characters in attributes. (#233) ## 2.3 -- **NEW**: Officially support Python 3.10. -- **NEW**: Add static typing. -- **NEW**: `:has()`, `:is()`, and `:where()` now use use a forgiving selector list. While not as forgiving as CSS might - be, it will forgive such things as empty sets and empty slots due to multiple consecutive commas, leading commas, or - trailing commas. Essentially, these pseudo-classes will match all non-empty selectors and ignore empty ones. As the - scraping environment is different than a browser environment, it was chosen not to aggressively forgive bad syntax and - invalid features to ensure the user is alerted that their program may not perform as expected. -- **NEW**: Add support to output a pretty print format of a compiled `SelectorList` for debug purposes. -- **FIX**: Some small corner cases discovered with static typing. +- **NEW**: Officially support Python 3.10. +- **NEW**: Add static typing. +- **NEW**: `:has()`, `:is()`, and `:where()` now use use a forgiving selector list. While not as forgiving as CSS + might be, it will forgive such things as empty sets and empty slots due to multiple consecutive commas, leading + commas, or trailing commas. Essentially, these pseudo-classes will match all non-empty selectors and ignore empty + ones. As the scraping environment is different than a browser environment, it was chosen not to aggressively forgive + bad syntax and invalid features to ensure the user is alerted that their program may not perform as expected. +- **NEW**: Add support to output a pretty print format of a compiled `SelectorList` for debug purposes. +- **FIX**: Some small corner cases discovered with static typing. ## 2.2.1 -- **FIX**: Fix an issue with namespaces when one of the keys is `self`. +- **FIX**: Fix an issue with namespaces when one of the keys is `self`. ## 2.2 -- **NEW**: `:link` and `:any-link` no longer include `#!html ` due to a change in the level 4 selector - specification. This actually yields more sane results. -- **FIX**: BeautifulSoup, when using `find`, is quite forgiving of odd types that a user may place in an element's - attribute value. Soup Sieve will also now be more forgiving and attempt to match these unexpected values in a sane - manner by normalizing them before compare. (#212) +- **NEW**: `:link` and `:any-link` no longer include `#!html ` due to a change in the level 4 selector + specification. This actually yields more sane results. +- **FIX**: BeautifulSoup, when using `find`, is quite forgiving of odd types that a user may place in an element's + attribute value. Soup Sieve will also now be more forgiving and attempt to match these unexpected values in a sane + manner by normalizing them before compare. (#212) ## 2.1 -- **NEW**: Officially support Python 3.9. -- **NEW**: Drop official support for Python 3.5. -- **NEW**: In order to avoid conflicts with future CSS specification changes, non-standard pseudo classes will now start - with the `:-soup-` prefix. As a consequence, `:contains()` will now be known as `:-soup-contains()`, though for a time - the deprecated form of `:contains()` will still be allowed with a warning that users should migrate over to - `:-soup-contains()`. -- **NEW**: Added new non-standard pseudo class `:-soup-contains-own()` which operates similar to `:-soup-contains()` - except that it only looks at text nodes directly associated with the currently scoped element and not its descendants. -- **FIX**: Import `bs4` globally instead of in local functions as it appears there are no adverse affects due to - circular imports as `bs4` does not immediately reference `soupsieve` functions and `soupsieve` does not immediately - reference `bs4` functions. This should give a performance boost to functions that had previously included `bs4` - locally. +- **NEW**: Officially support Python 3.9. +- **NEW**: Drop official support for Python 3.5. +- **NEW**: In order to avoid conflicts with future CSS specification changes, non-standard pseudo classes will now + start with the `:-soup-` prefix. As a consequence, `:contains()` will now be known as `:-soup-contains()`, though + for a time the deprecated form of `:contains()` will still be allowed with a warning that users should migrate over + to `:-soup-contains()`. +- **NEW**: Added new non-standard pseudo class `:-soup-contains-own()` which operates similar to `:-soup-contains()` + except that it only looks at text nodes directly associated with the currently scoped element and not its + descendants. +- **FIX**: Import `bs4` globally instead of in local functions as it appears there are no adverse affects due to + circular imports as `bs4` does not immediately reference `soupsieve` functions and `soupsieve` does not immediately + reference `bs4` functions. This should give a performance boost to functions that had previously included `bs4` + locally. ## 2.0.1 -- **FIX**: Remove unused code. +- **FIX**: Remove unused code. ## 2.0 -- **NEW**: `SelectorSyntaxError` is derived from `Exception` not `SyntaxError`. -- **NEW**: Remove deprecated `comments` and `icomments` from the API. -- **NEW**: Drop support for EOL Python versions (Python 2 and Python < 3.5). -- **FIX**: Corner case with splitting namespace and tag name that that have an escaped `|`. +- **NEW**: `SelectorSyntaxError` is derived from `Exception` not `SyntaxError`. +- **NEW**: Remove deprecated `comments` and `icomments` from the API. +- **NEW**: Drop support for EOL Python versions (Python 2 and Python < 3.5). +- **FIX**: Corner case with splitting namespace and tag name that that have an escaped `|`. ## 1.9.6 -!!! note "Last version for Python 2.7" +/// note | Last version for Python 2.7 +/// -- **FIX**: Prune dead code. -- **FIX**: Corner case with splitting namespace and tag name that that have an escaped `|`. +- **FIX**: Prune dead code. +- **FIX**: Corner case with splitting namespace and tag name that that have an escaped `|`. ## 1.9.5 -- **FIX**: `:placeholder-shown` should not match if the element has content that overrides the placeholder. +- **FIX**: `:placeholder-shown` should not match if the element has content that overrides the placeholder. ## 1.9.4 -- **FIX**: `:checked` rule was too strict with `option` elements. The specification for `:checked` does not require an - `option` element to be under a `select` element. -- **FIX**: Fix level 4 `:lang()` wildcard match handling with singletons. Implicit wildcard matching should not - match any singleton. Explicit wildcard matching (`*` in the language range: `*-US`) is allowed to match singletons. +- **FIX**: `:checked` rule was too strict with `option` elements. The specification for `:checked` does not require an + `option` element to be under a `select` element. +- **FIX**: Fix level 4 `:lang()` wildcard match handling with singletons. Implicit wildcard matching should not + match any singleton. Explicit wildcard matching (`*` in the language range: `*-US`) is allowed to match singletons. ## 1.9.3 -- **FIX**: `[attr!=value]` pattern was mistakenly using `:not([attr|=value])` logic instead of `:not([attr=value])`. -- **FIX**: Remove undocumented `_QUIRKS` mode flag. Beautiful Soup was meant to use it to help with transition to Soup - Sieve, but never released with it. Help with transition at this point is no longer needed. +- **FIX**: `[attr!=value]` pattern was mistakenly using `:not([attr|=value])` logic instead of `:not([attr=value])`. +- **FIX**: Remove undocumented `_QUIRKS` mode flag. Beautiful Soup was meant to use it to help with transition to Soup + Sieve, but never released with it. Help with transition at this point is no longer needed. ## 1.9.2 -- **FIX**: Shortcut last descendant calculation if possible for performance. -- **FIX**: Fix issue where `Doctype` strings can be mistaken for a normal text node in some cases. -- **FIX**: A top level tag is not a `:root` tag if it has sibling text nodes or tag nodes. This is an issue that mostly - manifests when using `html.parser` as the parser will allow multiple root nodes. +- **FIX**: Shortcut last descendant calculation if possible for performance. +- **FIX**: Fix issue where `Doctype` strings can be mistaken for a normal text node in some cases. +- **FIX**: A top level tag is not a `:root` tag if it has sibling text nodes or tag nodes. This is an issue that + mostly manifests when using `html.parser` as the parser will allow multiple root nodes. ## 1.9.1 -- **FIX**: `:root`, `:contains()`, `:default`, `:indeterminate`, `:lang()`, and `:dir()` will properly account for HTML - `iframe` elements in their logic when selecting or matching an element. Their logic will be restricted to the document - for which the element under consideration applies. -- **FIX**: HTML pseudo-classes will check that all key elements checked are in the XHTML namespace (HTML parsers that do - not provide namespaces will assume the XHTML namespace). -- **FIX**: Ensure that all pseudo-class names are case insensitive and allow CSS escapes. +- **FIX**: `:root`, `:contains()`, `:default`, `:indeterminate`, `:lang()`, and `:dir()` will properly account for + HTML `iframe` elements in their logic when selecting or matching an element. Their logic will be restricted to the + document for which the element under consideration applies. +- **FIX**: HTML pseudo-classes will check that all key elements checked are in the XHTML namespace (HTML parsers that + do not provide namespaces will assume the XHTML namespace). +- **FIX**: Ensure that all pseudo-class names are case insensitive and allow CSS escapes. ## 1.9 -- **NEW**: Allow `:contains()` to accept a list of text to search for. (#115) -- **NEW**: Add new `escape` function for escaping CSS identifiers. (#125) -- **NEW**: Deprecate `comments` and `icomments` functions in the API to ensure Soup Sieve focuses only on CSS selectors. - `comments` and `icomments` will most likely be removed in 2.0. (#130) -- **NEW**: Add Python 3.8 support. (#133) -- **FIX**: Don't install test files when installing the `soupsieve` package. (#111) -- **FIX**: Improve efficiency of `:contains()` comparison. -- **FIX**: Null characters should translate to the Unicode REPLACEMENT CHARACTER (`U+FFFD`) according to the - specification. This applies to CSS escaped NULL characters as well. (#124) -- **FIX**: Escaped EOF should translate to `U+FFFD` outside of CSS strings. In a string, they should just be ignored, - but as there is no case where we could resolve such a string and still have a valid selector, string handling remains - the same. (#128) +- **NEW**: Allow `:contains()` to accept a list of text to search for. (#115) +- **NEW**: Add new `escape` function for escaping CSS identifiers. (#125) +- **NEW**: Deprecate `comments` and `icomments` functions in the API to ensure Soup Sieve focuses only on CSS + selectors. `comments` and `icomments` will most likely be removed in 2.0. (#130) +- **NEW**: Add Python 3.8 support. (#133) +- **FIX**: Don't install test files when installing the `soupsieve` package. (#111) +- **FIX**: Improve efficiency of `:contains()` comparison. +- **FIX**: Null characters should translate to the Unicode REPLACEMENT CHARACTER (`U+FFFD`) according to the + specification. This applies to CSS escaped NULL characters as well. (#124) +- **FIX**: Escaped EOF should translate to `U+FFFD` outside of CSS strings. In a string, they should just be ignored, + but as there is no case where we could resolve such a string and still have a valid selector, string handling + remains the same. (#128) ## 1.8 -- **NEW**: Add custom selector support. (#92)(#108) -- **FIX**: Small tweak to CSS identifier pattern to ensure it matches the CSS specification exactly. Specifically, you - can't have an identifier of only `-`. (#107) -- **FIX**: CSS string patterns should allow escaping newlines to span strings across multiple lines. (#107) -- **FIX**: Newline regular expression for CSS newlines should treat `\r\n` as a single character, especially in cases - such as string escapes: `\\\r\n`. (#107) -- **FIX**: Allow `--` as a valid identifier or identifier start. (#107) -- **FIX**: Bad CSS syntax now raises a `SelectorSyntaxError`, which is still currently derived from `SyntaxError`, but - will most likely be derived from `Exception` in the future. +- **NEW**: Add custom selector support. (#92)(#108) +- **FIX**: Small tweak to CSS identifier pattern to ensure it matches the CSS specification exactly. Specifically, you + can't have an identifier of only `-`. (#107) +- **FIX**: CSS string patterns should allow escaping newlines to span strings across multiple lines. (#107) +- **FIX**: Newline regular expression for CSS newlines should treat `\r\n` as a single character, especially in cases + such as string escapes: `\\\r\n`. (#107) +- **FIX**: Allow `--` as a valid identifier or identifier start. (#107) +- **FIX**: Bad CSS syntax now raises a `SelectorSyntaxError`, which is still currently derived from `SyntaxError`, but + will most likely be derived from `Exception` in the future. ## 1.7.3 -- **FIX**: Fix regression with tag names in regards to case sensitivity, and ensure there are tests to prevent breakage - in the future. -- **FIX**: XHTML should always be case sensitive like XML. +- **FIX**: Fix regression with tag names in regards to case sensitivity, and ensure there are tests to prevent + breakage in the future. +- **FIX**: XHTML should always be case sensitive like XML. ## 1.7.2 -- **FIX**: Fix HTML detection `type` selector. -- **FIX**: Fixes for `:enabled` and `:disabled`. -- **FIX**: Provide a way for Beautiful Soup to parse selectors in a quirks mode to mimic some of the quirks of the old - select method prior to Soup Sieve, but with warnings. This is to help old scripts to not break during the transitional - period with newest Beautiful Soup. In the future, these quirks will raise an exception as Soup Sieve requires - selectors to follow the CSS specification. +- **FIX**: Fix HTML detection `type` selector. +- **FIX**: Fixes for `:enabled` and `:disabled`. +- **FIX**: Provide a way for Beautiful Soup to parse selectors in a quirks mode to mimic some of the quirks of the old + select method prior to Soup Sieve, but with warnings. This is to help old scripts to not break during the + transitional period with newest Beautiful Soup. In the future, these quirks will raise an exception as Soup Sieve + requires selectors to follow the CSS specification. ## 1.7.1 -- **FIX**: Fix issue with `:has()` selector where a leading combinator can only be provided in the first selector in a - relative selector list. +- **FIX**: Fix issue with `:has()` selector where a leading combinator can only be provided in the first selector in a + relative selector list. ## 1.7 -- **NEW**: Add support for `:in-range` and `:out-of-range` selectors. (#60) -- **NEW**: Add support for `:defined` selector. (#76) -- **FIX**: Fix pickling issue when compiled selector contains a `NullSelector` object. (#70) -- **FIX**: Better exception messages in the CSS selector parser and fix a position reporting issue that can occur in - some exceptions. (#72, #73) -- **FIX**: Don't compare prefixes when evaluating attribute namespaces, compare the actual namespace. (#75) -- **FIX**: Split whitespace attribute lists by all whitespace characters, not just space. -- **FIX**: `:nth-*` patterns were converting numbers to base 16 when they should have been converting to base 10. +- **NEW**: Add support for `:in-range` and `:out-of-range` selectors. (#60) +- **NEW**: Add support for `:defined` selector. (#76) +- **FIX**: Fix pickling issue when compiled selector contains a `NullSelector` object. (#70) +- **FIX**: Better exception messages in the CSS selector parser and fix a position reporting issue that can occur in + some exceptions. (#72, #73) +- **FIX**: Don't compare prefixes when evaluating attribute namespaces, compare the actual namespace. (#75) +- **FIX**: Split whitespace attribute lists by all whitespace characters, not just space. +- **FIX**: `:nth-*` patterns were converting numbers to base 16 when they should have been converting to base 10. ## 1.6.2 -- **FIX**: Fix pattern compile issues on Python < 2.7.4. -- **FIX**: Don't use `\d` in Unicode `Re` patterns as they will contain characters outside the range of `[0-9]`. +- **FIX**: Fix pattern compile issues on Python < 2.7.4. +- **FIX**: Don't use `\d` in Unicode `Re` patterns as they will contain characters outside the range of `[0-9]`. ## 1.6.1 -- **FIX**: Fix warning about not importing `Mapping` from `collections.abc`. +- **FIX**: Fix warning about not importing `Mapping` from `collections.abc`. ## 1.6 -- **NEW**: Add `closest` method to the API that matches closest ancestor. -- **FIX**: Add missing `select_one` reference to module's `__all__`. +- **NEW**: Add `closest` method to the API that matches closest ancestor. +- **FIX**: Add missing `select_one` reference to module's `__all__`. ## 1.5 -- **NEW**: Add `select_one` method like Beautiful Soup has. -- **NEW**: Add `:dir()` selector (HTML only). -- **FIX**: Fix issues when handling HTML fragments (elements without a `BeautifulSoup` object as a parent). -- **FIX**: Fix internal `nth` range check. +- **NEW**: Add `select_one` method like Beautiful Soup has. +- **NEW**: Add `:dir()` selector (HTML only). +- **FIX**: Fix issues when handling HTML fragments (elements without a `BeautifulSoup` object as a parent). +- **FIX**: Fix internal `nth` range check. ## 1.4.0 -- **NEW**: Throw `NotImplementedError` for at-rules: `@page`, etc. -- **NEW**: Match nothing for `:host`, `:host()`, and `:host-context()`. -- **NEW**: Add support for `:read-write` and `:read-only`. -- **NEW**: Selector patterns can be annotated with CSS comments. -- **FIX**: `\r`, `\n`, and `\f` cannot be escaped with `\` in CSS. You must use Unicode escapes. +- **NEW**: Throw `NotImplementedError` for at-rules: `@page`, etc. +- **NEW**: Match nothing for `:host`, `:host()`, and `:host-context()`. +- **NEW**: Add support for `:read-write` and `:read-only`. +- **NEW**: Selector patterns can be annotated with CSS comments. +- **FIX**: `\r`, `\n`, and `\f` cannot be escaped with `\` in CSS. You must use Unicode escapes. ## 1.3.1 -- **FIX**: Fix issue with undefined namespaces. +- **FIX**: Fix issue with undefined namespaces. ## 1.3 -- **NEW**: Add support for `:scope`. -- **NEW**: `:user-invalid`, `:playing`, `:paused`, and `:local-link` will not cause a failure, but all will match - nothing as their use cases are not possible in an environment outside a web browser. -- **FIX**: Fix `[attr~=value]` handling of whitespace. According to the spec, if the value contains whitespace, or is an - empty string, it should not match anything. -- **FIX**: Precompile internal patterns for pseudo-classes to prevent having to parse them again. +- **NEW**: Add support for `:scope`. +- **NEW**: `:user-invalid`, `:playing`, `:paused`, and `:local-link` will not cause a failure, but all will match + nothing as their use cases are not possible in an environment outside a web browser. +- **FIX**: Fix `[attr~=value]` handling of whitespace. According to the spec, if the value contains whitespace, or is + an empty string, it should not match anything. +- **FIX**: Precompile internal patterns for pseudo-classes to prevent having to parse them again. ## 1.2.1 -- **FIX**: More descriptive exceptions. Exceptions will also now mention position in the pattern that is problematic. -- **FIX**: `filter` ignores `NavigableString` objects in normal iterables and `Tag` iterables. Basically, it filters all - Beautiful Soup document parts regardless of iterable type where as it used to only filter out a `NavigableString` in a - `Tag` object. This is viewed as fixing an inconsistency. -- **FIX**: `DEBUG` flag has been added to help with debugging CSS selector parsing. This is mainly for development. -- **FIX**: If forced to search for language in `meta` tag, and no language is found, cache that there is no language in - the `meta` tag to prevent searching again during the current select. -- **FIX**: If a non `BeautifulSoup`/`Tag` object is given to the API to compare against, raise a `TypeError`. +- **FIX**: More descriptive exceptions. Exceptions will also now mention position in the pattern that is problematic. +- **FIX**: `filter` ignores `NavigableString` objects in normal iterables and `Tag` iterables. Basically, it filters + all Beautiful Soup document parts regardless of iterable type where as it used to only filter out a + `NavigableString` in a `Tag` object. This is viewed as fixing an inconsistency. +- **FIX**: `DEBUG` flag has been added to help with debugging CSS selector parsing. This is mainly for development. +- **FIX**: If forced to search for language in `meta` tag, and no language is found, cache that there is no language + in the `meta` tag to prevent searching again during the current select. +- **FIX**: If a non `BeautifulSoup`/`Tag` object is given to the API to compare against, raise a `TypeError`. ## 1.2 -- **NEW**: Add Python 2.7 support. -- **NEW**: Remove old pre 1.0 deprecations. +- **NEW**: Add Python 2.7 support. +- **NEW**: Remove old pre 1.0 deprecations. ## 1.1 -- **NEW**: Adds support for `[attr!=value]` which is equivalent to `:not([attr=value])`. -- **NEW**: Add support for `:active`, `:focus`, `:hover`, `:visited`, `:target`, `:focus-within`, `:focus-visible`, - `:target-within`, `:current()`/`:current`, `:past`, and `:future`, but they will never match as these states don't - exist in the Soup Sieve environment. -- **NEW**: Add support for `:checked`, `:enabled`, `:disabled`, `:required`, `:optional`, `:default`, and - `:placeholder-shown` which will only match in HTML documents as these concepts are not defined in XML. -- **NEW**: Add support for `:link` and `:any-link`, both of which will target all ``, ``, and `` elements - with an `href` attribute as all links will be treated as unvisited in Soup Sieve. -- **NEW**: Add support for `:lang()` (CSS4) which works in XML and HTML. -- **NEW**: Users must install Beautiful Soup themselves. This requirement is removed in the hopes that Beautiful Soup - may use this in the future. -- **FIX**: Attributes in the form `prefix:attr` can be matched with the form `[prefix\:attr]` without specifying a - namespaces if desired. -- **FIX**: Fix exception when `[type]` is used (with no value). +- **NEW**: Adds support for `[attr!=value]` which is equivalent to `:not([attr=value])`. +- **NEW**: Add support for `:active`, `:focus`, `:hover`, `:visited`, `:target`, `:focus-within`, `:focus-visible`, + `:target-within`, `:current()`/`:current`, `:past`, and `:future`, but they will never match as these states don't + exist in the Soup Sieve environment. +- **NEW**: Add support for `:checked`, `:enabled`, `:disabled`, `:required`, `:optional`, `:default`, and + `:placeholder-shown` which will only match in HTML documents as these concepts are not defined in XML. +- **NEW**: Add support for `:link` and `:any-link`, both of which will target all ``, ``, and `` + elements with an `href` attribute as all links will be treated as unvisited in Soup Sieve. +- **NEW**: Add support for `:lang()` (CSS4) which works in XML and HTML. +- **NEW**: Users must install Beautiful Soup themselves. This requirement is removed in the hopes that Beautiful Soup + may use this in the future. +- **FIX**: Attributes in the form `prefix:attr` can be matched with the form `[prefix\:attr]` without specifying a + namespaces if desired. +- **FIX**: Fix exception when `[type]` is used (with no value). ## 1.0.2 -- **FIX**: Use proper CSS identifier patterns for tag names, classes, ids, etc. Things like `#3` or `#-3` should not - match and should require `#\33` or `#-\33`. -- **FIX**: Do not raise `NotImplementedError` for supported pseudo classes/elements with bad syntax, instead raise - `SyntaxError`. +- **FIX**: Use proper CSS identifier patterns for tag names, classes, ids, etc. Things like `#3` or `#-3` should not + match and should require `#\33` or `#-\33`. +- **FIX**: Do not raise `NotImplementedError` for supported pseudo classes/elements with bad syntax, instead raise + `SyntaxError`. ## 1.0.1 -- **FIX**: When giving a tag to `select`, it should only return the children of that tag, never the tag itself. -- **FIX**: For informational purposes, raise a `NotImplementedError` when an unsupported pseudo class is used. +- **FIX**: When giving a tag to `select`, it should only return the children of that tag, never the tag itself. +- **FIX**: For informational purposes, raise a `NotImplementedError` when an unsupported pseudo class is used. ## 1.0 -- **NEW**: Official 1.0.0 release. +- **NEW**: Official 1.0.0 release. ## 1.0.0b2 -- **NEW**: Drop document flags. Document type can be detected from the Beautiful Soup object directly. -- **FIX**: CSS selectors should be evaluated with CSS whitespace rules. -- **FIX**: Processing instructions, CDATA, and declarations should all be ignored in `:contains` and child - considerations for `:empty`. -- **FIX**: In Beautiful Soup, the document itself is the first tag. Do not match the "document" tag by returning false - for any tag that doesn't have a parent. +- **NEW**: Drop document flags. Document type can be detected from the Beautiful Soup object directly. +- **FIX**: CSS selectors should be evaluated with CSS whitespace rules. +- **FIX**: Processing instructions, CDATA, and declarations should all be ignored in `:contains` and child + considerations for `:empty`. +- **FIX**: In Beautiful Soup, the document itself is the first tag. Do not match the "document" tag by returning false + for any tag that doesn't have a parent. ## 1.0.0b1 -- **NEW**: Add support for non-standard `:contains()` selector. -- **FIX**: Compare pseudo class names case insensitively when matching unexpected cases. -- **FIX**: Don't allow attribute case flags when no attribute value is defined. +- **NEW**: Add support for non-standard `:contains()` selector. +- **FIX**: Compare pseudo class names case insensitively when matching unexpected cases. +- **FIX**: Don't allow attribute case flags when no attribute value is defined. ## 0.6 -- **NEW**: `mode` attribute is now called `flags` to allow for other options in the future. -- **FIX**: More corner cases for `nth` selectors. +- **NEW**: `mode` attribute is now called `flags` to allow for other options in the future. +- **FIX**: More corner cases for `nth` selectors. ## 0.5.3 -- **FIX**: Previously, all pseudo classes' selector lists were evaluated as one big group, but now each pseudo classes' - selector lists are evaluated separately. -- **FIX**: CSS selector tokens are not case sensitive. +- **FIX**: Previously, all pseudo classes' selector lists were evaluated as one big group, but now each pseudo class's + selector lists are evaluated separately. +- **FIX**: CSS selector tokens are not case sensitive. ## 0.5.2 -- **FIX**: Add missing `s` flag to attribute selector for forced case sensitivity of attribute values. -- **FIX**: Relax attribute pattern matching to allow non-essential whitespace. -- **FIX**: Attribute selector flags themselves are not case sensitive. -- **FIX**: `type` attribute in HTML is handled special. While all other attributes values are case sensitive, `type` in - HTML is usually treated special and is insensitive. In XML, this is not the case. +- **FIX**: Add missing `s` flag to attribute selector for forced case sensitivity of attribute values. +- **FIX**: Relax attribute pattern matching to allow non-essential whitespace. +- **FIX**: Attribute selector flags themselves are not case sensitive. +- **FIX**: `type` attribute in HTML is handled special. While all other attributes values are case sensitive, `type` + in HTML is usually treated special and is insensitive. In XML, this is not the case. ## 0.5.1 -- **FIX**: Fix namespace check for `:nth-of-type`. +- **FIX**: Fix namespace check for `:nth-of-type`. ## 0.5 -- **NEW**: Deprecate `commentsiter` and `selectiter` in favor of `icomments` and `iselect`. Expect removal in version -1.0. +- **NEW**: Deprecate `commentsiter` and `selectiter` in favor of `icomments` and `iselect`. Expect removal in version + 1.0. ## 0.4 -- **NEW**: Initial prerelease. +- **NEW**: Initial prerelease. diff --git a/docs/src/markdown/about/contributing.md b/docs/src/markdown/about/contributing.md index 3ceaf3c..f912bc8 100644 --- a/docs/src/markdown/about/contributing.md +++ b/docs/src/markdown/about/contributing.md @@ -10,23 +10,23 @@ any tier you feel comfortable with. No amount is too little. We also accept one ## Bug Reports -1. Please **read the documentation** and **search the issue tracker** to try and find the answer to your question -**before** posting an issue. +1. Please **read the documentation** and **search the issue tracker** to try and find the answer to your question + **before** posting an issue. -2. When creating an issue on the repository, please provide as much information as possible: +2. When creating an issue on the repository, please provide as much information as possible: - - Version being used. - - Operating system. - - Version of Python. - - Errors in console. - - Detailed description of the problem. - - Examples for reproducing the error. You can post pictures, but if specific text or code is required to reproduce - the issue, please provide the text in a plain text format for easy copy/paste. + - Version being used. + - Operating system. + - Version of Python. + - Errors in console. + - Detailed description of the problem. + - Examples for reproducing the error. You can post pictures, but if specific text or code is required to + reproduce the issue, please provide the text in a plain text format for easy copy/paste. The more info provided the greater the chance someone will take the time to answer, implement, or fix the issue. -3. Be prepared to answer questions and provide additional information if required. Issues in which the creator refuses -to respond to follow up questions will be marked as stale and closed. +3. Be prepared to answer questions and provide additional information if required. Issues in which the creator refuses + to respond to follow up questions will be marked as stale and closed. ## Reviewing Code diff --git a/docs/src/markdown/about/development.md b/docs/src/markdown/about/development.md index b7f5777..49d9331 100644 --- a/docs/src/markdown/about/development.md +++ b/docs/src/markdown/about/development.md @@ -29,11 +29,11 @@ When writing code, the code should roughly conform to PEP8 and PEP257 suggestion linter (with some additional plugins) to ensure code conforms (give or take some of the rules). When in doubt, follow the formatting hints of existing code when adding files or modifying existing files. Listed below are the modules used: -- @gitlab:pycqa/flake8 -- @gitlab:pycqa/flake8-docstrings -- @gitlab:pycqa/pep8-naming -- @ebeweber/flake8-mutable -- @gforcada/flake8-builtins +- @gitlab:pycqa/flake8 +- @gitlab:pycqa/flake8-docstrings +- @gitlab:pycqa/pep8-naming +- @ebeweber/flake8-mutable +- @gforcada/flake8-builtins Usually this can be automated with Tox (assuming it is installed): `tox -e lint`. @@ -42,10 +42,10 @@ Usually this can be automated with Tox (assuming it is installed): `tox -e lint` Documents are in Markdown (with with some additional syntax provided by extensions) and are converted to HTML via Python Markdown. If you would like to build and preview the documentation, you must have these packages installed: -- @Python-Markdown/markdown: the Markdown parser. -- @mkdocs/mkdocs: the document site generator. -- @squidfunk/mkdocs-material: a material theme for MkDocs. -- @facelessuser/pymdown-extensions: this Python Markdown extension bundle. +- @Python-Markdown/markdown: the Markdown parser. +- @mkdocs/mkdocs: the document site generator. +- @squidfunk/mkdocs-material: a material theme for MkDocs. +- @facelessuser/pymdown-extensions: this Python Markdown extension bundle. It is advised that you just install document dependencies with the following as the above list may not include all document plugins: diff --git a/docs/src/markdown/api.md b/docs/src/markdown/api.md index edcbac6..42299ee 100644 --- a/docs/src/markdown/api.md +++ b/docs/src/markdown/api.md @@ -11,25 +11,26 @@ When detecting XHTML, Soup Sieve simply looks to see if the root element of an X and does not currently look at the `doctype`. If in the future there is a need for stricter XHTML detection, this may change. -- HTML document types (HTML, HTML5) will have their tag names and attribute names treated without case -sensitivity, like most browsers do. +- HTML document types (HTML, HTML5) will have their tag names and attribute names treated without case + sensitivity, like most browsers do. -- XML document types (including XHTML) will have their tag names and attribute names treated with case sensitivity. +- XML document types (including XHTML) will have their tag names and attribute names treated with case sensitivity. -- HTML5, XHTML and XML documents will have namespaces evaluated per the document's support (provided via the -parser). Some additional configuration is required when using namespaces, see [Namespace](#namespaces) for more -information. +- HTML5, XHTML and XML documents will have namespaces evaluated per the document's support (provided via the + parser). Some additional configuration is required when using namespaces, see [Namespace](#namespaces) for more + information. - !!! tip "Getting Proper Namespaces" - The `html5lib` parser provides proper namespaces for HTML5, but `lxml`'s HTML parser will not. If you need - namespace support for HTML5, consider using `html5lib`. + /// tip | Getting Proper Namespaces + The `html5lib` parser provides proper namespaces for HTML5, but `lxml`'s HTML parser will not. If you need + namespace support for HTML5, consider using `html5lib`. - For XML, the `lxml-xml` parser (`xml` for short) will provide proper namespaces. It is generally suggested that - `lxml-xml` is used to parse XHTML documents to take advantage of namespaces. + For XML, the `lxml-xml` parser (`xml` for short) will provide proper namespaces. It is generally suggested that + `lxml-xml` is used to parse XHTML documents to take advantage of namespaces. + /// -- While attribute values are generally treated as case sensitive, HTML5 and HTML treat the `type` attribute -special. The `type` attribute's value is always case insensitive. This is generally how most browsers treat `type`. If -you need `type` to be sensitive, you can use the `s` flag: `#!css [type="submit" s]`. +- While attribute values are generally treated as case sensitive, HTML5 and HTML treat the `type` attribute + special. The `type` attribute's value is always case insensitive. This is generally how most browsers treat `type`. + If you need `type` to be sensitive, you can use the `s` flag: `#!css [type="submit" s]`. While Soup Sieve access is exposed through Beautiful Soup's API, Soup Sieve's API can always be imported and accessed directly for more controlled tag selection if needed. @@ -180,8 +181,9 @@ would normally cause an identifier to be invalid. '�' ``` -!!! new "New in 1.9.0" - `escape` is a new API function added in 1.9.0. +/// new | New in 1.9.0 +`escape` is a new API function added in 1.9.0. +/// ## `soupsieve.compile()` diff --git a/docs/src/markdown/index.md b/docs/src/markdown/index.md index 30d8c91..c18b4b4 100644 --- a/docs/src/markdown/index.md +++ b/docs/src/markdown/index.md @@ -14,17 +14,17 @@ Soup Sieve has implemented most of the CSS selectors up through the latest CSS d number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply do not match anything. Some of the supported selectors are: -- `#!css .classes` -- `#!css #ids` -- `#!css [attributes=value]` -- `#!css parent child` -- `#!css parent > child` -- `#!css sibling ~ sibling` -- `#!css sibling + sibling` -- `#!css :not(element.class, element2.class)` -- `#!css :is(element.class, element2.class)` -- `#!css parent:has(> child)` -- and [many more](./selectors/index.md) +- `#!css .classes` +- `#!css #ids` +- `#!css [attributes=value]` +- `#!css parent child` +- `#!css parent > child` +- `#!css sibling ~ sibling` +- `#!css sibling + sibling` +- `#!css :not(element.class, element2.class)` +- `#!css :is(element.class, element2.class)` +- `#!css parent:has(> child)` +- and [many more](./selectors/index.md) ## Installation diff --git a/docs/src/markdown/selectors/basic.md b/docs/src/markdown/selectors/basic.md index 16e2c86..87f83ce 100644 --- a/docs/src/markdown/selectors/basic.md +++ b/docs/src/markdown/selectors/basic.md @@ -23,457 +23,513 @@ If a default namespace is defined in the [namespace dictionary](../api.md#namesp [namespace](#namespace-selectors) is explicitly defined, it will be assumed that the element must be in the default namespace. -=== "Syntax" +/// tab | Syntax +```css +element +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...
Here is some text.
+...
Here is some more text.
+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('div')) +[
Here is some text.
,
Here is some more text.
] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors +/// + +## Universal Selectors + +The Universal selector (`*`) matches elements of any type. + +/// tab | Syntax +```css +* +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

Here is some text.

+...
Here is some more text.
+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('*')) +[ + +
Here is some text.
+
Here is some more text.
+ + +, , +
Here is some text.
+
Here is some more text.
+ + +,
Here is some text.
,
Here is some more text.
] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors +/// + +## ID Selectors + +The ID selector matches an element based on its `id` attribute. The ID must match exactly. + +/// tab | Syntax +```css +#id +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...
Here is some text.
+...
Here is some more text.
+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('#some-id')) +[
Here is some text.
] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/ID_selectors +/// + +/// note | XML Support +While the use of the `id` attribute (in the context of CSS) is a very HTML centric idea, it is supported for XML as +well because Beautiful Soup supported it before Soup Sieve's existence. +/// + +## Class Selectors + +The class selector matches an element based on the values contained in the `class` attribute. The `class` attribute is +treated as a whitespace separated list, where each item is a **class**. + +/// tab | Syntax +```css +.class +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...
Here is some text.
+...
Here is some more text.
+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('.some-class')) +[
Here is some text.
] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Class_selectors +/// + +/// note | XML Support +While the use of the `class` attribute (in the context of CSS) is a very HTML centric idea, it is supported for XML +as well because Beautiful Soup supported it before Soup Sieve's existence. +/// + +## Attribute Selectors + +The attribute selector matches an element based on its attributes. When specifying a value of an attribute, if it +contains whitespace or special characters, you should quote them with either single or double quotes. + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors +/// + +/// define | +`[attribute]` + +- Represents elements with an attribute named **attribute**. + + //// tab | Syntax ```css - element + [attr] ``` + //// -=== "Usage" + //// tab | Usage ```pycon3 >>> from bs4 import BeautifulSoup as bs >>> html = """ ... ... ... - ...
Here is some text.
- ...
Here is some more text.
+ ...
... ... ... """ >>> soup = bs(html, 'html5lib') - >>> print(soup.select('div')) - [
Here is some text.
,
Here is some more text.
] + >>> print(soup.select('[href]')) + [Internal link, Example link, Insensitive internal link, Example org link] ``` + //// +/// -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors - -## Universal Selectors +/// define +`[attribute=value]` -The Universal selector (`*`) matches elements of any type. +- Represents elements with an attribute named **attribute** that also has a value of **value**. -=== "Syntax" + //// tab | Syntax ```css - * + [attr=value] + [attr="value"] ``` + //// -=== "Usage" + //// tab | Usage ```pycon3 >>> from bs4 import BeautifulSoup as bs >>> html = """ ... ... ... - ...

Here is some text.

- ...
Here is some more text.
+ ... ... ... ... """ >>> soup = bs(html, 'html5lib') - >>> print(soup.select('*')) - [ - -
Here is some text.
-
Here is some more text.
- + >>> print(soup.select('[href="#internal"]')) + [Internal link] + ``` + //// +/// - , , -
Here is some text.
-
Here is some more text.
+/// define +`[attribute~=value]` +- Represents elements with an attribute named **attribute** whose value is a space separated list which contains + **value**. - ,
Here is some text.
,
Here is some more text.
] + //// tab | Syntax + ```css + [attr~=value] + [attr~="value"] ``` + //// -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors + //// tab | Usage + ```pycon3 + >>> from bs4 import BeautifulSoup as bs + >>> html = """ + ... + ... + ... + ... + ... + ... + ... """ + >>> soup = bs(html, 'html5lib') + >>> print(soup.select('[class~=class2]')) + [Internal link] + ``` + //// +/// -## ID Selectors +/// define +`[attribute|=value]` -The ID selector matches an element based on its `id` attribute. The ID must match exactly. +- Represents elements with an attribute named **attribute** whose value is a dash separated list that starts with + **value**. -=== "Syntax" + //// tab | Syntax ```css - #id + [attr|=value] + [attr|="value"] ``` + //// -=== "Usage" + //// tab | Usage ```pycon3 >>> from bs4 import BeautifulSoup as bs >>> html = """ ... ... ... - ...
Here is some text.
- ...
Here is some more text.
+ ...
Some text
+ ...
Some more text
... ... ... """ >>> soup = bs(html, 'html5lib') - >>> print(soup.select('#some-id')) - [
Here is some text.
] + >>> print(soup.select('div[lang|="en"]')) + [
Some text
,
Some more text
] ``` + //// +/// -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/ID_selectors - -!!! note "XML Support" - While the use of the `id` attribute (in the context of CSS) is a very HTML centric idea, it is supported for XML as - well because Beautiful Soup supported it before Soup Sieve's existence. - -## Class Selectors +/// define +`[attribute^=value]` -The class selector matches an element based on the values contained in the `class` attribute. The `class` attribute is -treated as a whitespace separated list, where each item is a **class**. +- Represents elements with an attribute named **attribute** whose value starts with **value**. -=== "Syntax" + //// tab | Syntax ```css - .class + [attr^=value] + [attr^="value"] ``` + //// -=== "Usage" + //// tab | Usage ```pycon3 >>> from bs4 import BeautifulSoup as bs >>> html = """ ... ... ... - ...
Here is some text.
- ...
Here is some more text.
+ ... ... ... ... """ >>> soup = bs(html, 'html5lib') - >>> print(soup.select('.some-class')) - [
Here is some text.
] + >>> print(soup.select('[href^=http]')) + [Example link, Example org link] ``` + //// +/// -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Class_selectors - -!!! note "XML Support" - While the use of the `class` attribute (in the context of CSS) is a very HTML centric idea, it is supported for XML - as well because Beautiful Soup supported it before Soup Sieve's existence. +/// define -## Attribute Selectors - -The attribute selector matches an element based on its attributes. When specifying a value of an attribute, if it -contains whitespace or special characters, you should quote them with either single or double quotes. +`[attribute$=value]` -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors +- Represents elements with an attribute named **attribute** whose value ends with **value**. -`[attribute]` -: - Represents elements with an attribute named **attribute**. - - === "Syntax" - ```css - [attr] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href]')) - [Internal link, Example link, Insensitive internal link, Example org link] - ``` + //// tab | Syntax + ```css + [attr$=value] + [attr$="value"] + ``` + //// -`[attribute=value]` -: - Represents elements with an attribute named **attribute** that also has a value of **value**. - - === "Syntax" - ```css - [attr=value] - [attr="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href="#internal"]')) - [Internal link] - ``` + //// tab | Usage + ```pycon3 + >>> from bs4 import BeautifulSoup as bs + >>> html = """ + ... + ... + ... + ... + ... + ... + ... """ + >>> soup = bs(html, 'html5lib') + >>> print(soup.select('[href$=org]')) + [Example org link] + ``` + //// +/// -`[attribute~=value]` -: - Represents elements with an attribute named **attribute** whose value is a space separated list which contains - **value**. +/// define +`[attribute*=value]` - === "Syntax" - ```css - [attr~=value] - [attr~="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[class~=class2]')) - [Internal link] - ``` +- Represents elements with an attribute named **attribute** whose value containing the substring **value**. -`[attribute|=value]` -: - Represents elements with an attribute named **attribute** whose value is a dash separated list that starts with - **value**. + //// tab | Syntax + ```css + [attr*=value] + [attr*="value"] + ``` + //// - === "Syntax" - ```css - [attr|=value] - [attr|="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...
Some text
- ...
Some more text
- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('div[lang|="en"]')) - [
Some text
,
Some more text
] - ``` + //// tab | Usage + ```pycon3 + >>> from bs4 import BeautifulSoup as bs + >>> html = """ + ... + ... + ... + ... + ... + ... + ... """ + >>> soup = bs(html, 'html5lib') + >>> print(soup.select('[href*="example"]')) + [Example link, Example org link] + ``` + //// +/// -`[attribute^=value]` -: - Represents elements with an attribute named **attribute** whose value starts with **value**. - - === "Syntax" - ```css - [attr^=value] - [attr^="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href^=http]')) - [Example link, Example org link] - ``` +/// define +`[attribute!=value]`:material-star:{: title="Custom" data-md-color-primary="green" .icon} -`[attribute$=value]` -: - Represents elements with an attribute named **attribute** whose value ends with **value**. - - === "Syntax" - ```css - [attr$=value] - [attr$="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href$=org]')) - [Example org link] - ``` +- Equivalent to `#!css :not([attribute=value])`. -`[attribute*=value]` -: - Represents elements with an attribute named **attribute** whose value containing the substring **value**. - - === "Syntax" - ```css - [attr*=value] - [attr*="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href*="example"]')) - [Example link, Example org link] - ``` + //// tab | Syntax + ```css + [attr!=value] + [attr!="value"] + ``` + //// -`[attribute!=value]`:material-star:{: title="Custom" data-md-color-primary="green" .icon} -: - Equivalent to `#!css :not([attribute=value])`. - - === "Syntax" - ```css - [attr!=value] - [attr!="value"] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('a[href!="#internal"]')) - [Example link, Insensitive internal link, Example org link] - ``` + //// tab | Usage + ```pycon3 + >>> from bs4 import BeautifulSoup as bs + >>> html = """ + ... + ... + ... + ... + ... + ... + ... """ + >>> soup = bs(html, 'html5lib') + >>> print(soup.select('a[href!="#internal"]')) + [Example link, Insensitive internal link, Example org link] + ``` + //// +/// +/// define `[attribute operator value i]`:material-flask:{: title="Experimental" data-md-color-primary="purple" .icon} -: - Represents elements with an attribute named **attribute** and whose value, when the **operator** is applied, matches + +- Represents elements with an attribute named **attribute** and whose value, when the **operator** is applied, matches **value** *without* case sensitivity. In general, attribute comparison is insensitive in normal HTML, but not XML. `i` is most useful in XML documents. - === "Syntax" - ```css - [attr=value i] - [attr="value" i] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href="#INTERNAL" i]')) - [Internal link] - ``` + //// tab | Syntax + ```css + [attr=value i] + [attr="value" i] + ``` + //// + + //// tab | Usage + ```pycon3 + >>> from bs4 import BeautifulSoup as bs + >>> html = """ + ... + ... + ... + ... + ... + ... + ... """ + >>> soup = bs(html, 'html5lib') + >>> print(soup.select('[href="#INTERNAL" i]')) + [Internal link] + ``` + //// +/// +/// define `[attribute operator value s]` :material-flask:{: title="Experimental" data-md-color-primary="purple" .icon} -: - Represents elements with an attribute named **attribute** and whose value, when the **operator** is applied, matches + +- Represents elements with an attribute named **attribute** and whose value, when the **operator** is applied, matches **value** *with* case sensitivity. - === "Syntax" - ```css - [attr=value s] - [attr="value" s] - ``` - - === "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('[href="#INTERNAL" s]')) - [] - >>> print(soup.select('[href="#internal" s]')) - [Internal link] - ``` + //// tab | Syntax + ```css + [attr=value s] + [attr="value" s] + ``` + //// + + //// tab | Usage + ```pycon3 + >>> from bs4 import BeautifulSoup as bs + >>> html = """ + ... + ... + ... + ... + ... + ... + ... """ + >>> soup = bs(html, 'html5lib') + >>> print(soup.select('[href="#INTERNAL" s]')) + [] + >>> print(soup.select('[href="#internal" s]')) + [Internal link] + ``` + //// +/// ## Namespace Selectors @@ -499,46 +555,48 @@ but attributes usually do not have a namespace unless one is explicitly defined Namespaces can be used with attribute selectors as well except that when `[|attribute`] is used, it is equivalent to `[attribute]`. -=== "Syntax" - ```css - ns|element - ns|* - *|* - *|element - |element - [ns|attr] - [*|attr] - [|attr] - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

SVG Example

- ...

Soup Sieve Docs

- ... - ... - ... MDN Web Docs - ... - ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('svg|a', namespaces={'svg': 'http://www.w3.org/2000/svg'})) - [MDN Web Docs] - >>> print(soup.select('a', namespaces={'svg': 'http://www.w3.org/2000/svg'})) - [Soup Sieve Docs, MDN Web Docs] - >>> print(soup.select('a', namespaces={'': 'http://www.w3.org/1999/xhtml', 'svg': 'http://www.w3.org/2000/svg'})) - [Soup Sieve Docs] - >>> print(soup.select('[xlink|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})) - [MDN Web Docs] - >>> print(soup.select('[|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})) - [Soup Sieve Docs] - ``` +/// tab | Syntax +```css +ns|element +ns|* +*|* +*|element +|element +[ns|attr] +[*|attr] +[|attr] +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

SVG Example

+...

Soup Sieve Docs

+... +... +... MDN Web Docs +... +... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('svg|a', namespaces={'svg': 'http://www.w3.org/2000/svg'})) +[MDN Web Docs] +>>> print(soup.select('a', namespaces={'svg': 'http://www.w3.org/2000/svg'})) +[Soup Sieve Docs, MDN Web Docs] +>>> print(soup.select('a', namespaces={'': 'http://www.w3.org/1999/xhtml', 'svg': 'http://www.w3.org/2000/svg'})) +[Soup Sieve Docs] +>>> print(soup.select('[xlink|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})) +[MDN Web Docs] +>>> print(soup.select('[|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'})) +[Soup Sieve Docs] +``` +/// --8<-- selector_styles.md diff --git a/docs/src/markdown/selectors/combinators.md b/docs/src/markdown/selectors/combinators.md index 17da431..2e29a9a 100644 --- a/docs/src/markdown/selectors/combinators.md +++ b/docs/src/markdown/selectors/combinators.md @@ -7,149 +7,163 @@ CSS employs a number of tokens in order to represent lists or to provide relatio Selector lists use the comma (`,`) to join multiple selectors in a list. When presented with a selector list, any selector in the list that matches an element will return that element. -=== "Syntax" - ```css - element1, element2 - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

Title

- ...

Paragraph

- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('h1, p')) - [

Title

,

Paragraph

] - ``` +/// tab | Syntax +```css +element1, element2 +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

Title

+...

Paragraph

+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('h1, p')) +[

Title

,

Paragraph

] +``` +/// ## Descendant Combinator Descendant combinators combine two selectors with whitespace ( ) in order to signify that the second element is matched if it has an ancestor that matches the first element. -=== "Syntax" - ```css - parent descendant - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

Paragraph 1

- ...

Paragraph 2

- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('body p')) - [

Paragraph 1

,

Paragraph 2

] - ``` - -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator +/// tab | Syntax +```css +parent descendant +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

Paragraph 1

+...

Paragraph 2

+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('body p')) +[

Paragraph 1

,

Paragraph 2

] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator +/// ## Child combinator Child combinators combine two selectors with `>` in order to signify that the second element is matched if it has a parent that matches the first element. -=== "Syntax" - ```css - parent > child - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

Paragraph 1

- ...
- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('div > p')) - [

Paragraph 1

] - ``` - -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator +/// tab | Syntax +```css +parent > child +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

Paragraph 1

+...
+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('div > p')) +[

Paragraph 1

] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator +/// ## General sibling combinator General sibling combinators combine two selectors with `~` in order to signify that the second element is matched if it has a sibling that precedes it that matches the first element. -=== "Syntax" - ```css - prevsibling ~ sibling - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

Title

- ...

Paragraph 1

- ...

Paragraph 2

- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('h1 ~ p')) - [

Paragraph 1

,

Paragraph 2

] - ``` - -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/General_sibling_combinator +/// tab | Syntax +```css +prevsibling ~ sibling +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

Title

+...

Paragraph 1

+...

Paragraph 2

+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('h1 ~ p')) +[

Paragraph 1

,

Paragraph 2

] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/General_sibling_combinator +/// ## Adjacent sibling combinator Adjacent sibling combinators combine two selectors with `+` in order to signify that the second element is matched if it has an adjacent sibling that precedes it that matches the first element. -=== "Syntax" - ```css - prevsibling + nextsibling - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

Title

- ...

Paragraph 1

- ...

Paragraph 2

- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select('h1 ~ p')) - [

Paragraph 1

] - ``` - -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/Adjacent_sibling_combinator +/// tab | Syntax +```css +prevsibling + nextsibling +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

Title

+...

Paragraph 1

+...

Paragraph 2

+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select('h1 ~ p')) +[

Paragraph 1

] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/Adjacent_sibling_combinator +/// --8<-- selector_styles.md diff --git a/docs/src/markdown/selectors/index.md b/docs/src/markdown/selectors/index.md index 18f09db..f573b93 100644 --- a/docs/src/markdown/selectors/index.md +++ b/docs/src/markdown/selectors/index.md @@ -67,22 +67,23 @@ cases, we may adopt them as "custom" selectors. -!!! tip "Additional Reading" - If usage of a selector is not clear in this documentation, you can find more information by reading these - specification documents: +/// tip | Additional Reading +If usage of a selector is not clear in this documentation, you can find more information by reading these +specification documents: - [CSS Level 3 Specification](https://www.w3.org/TR/selectors-3/) - : Contains the latest official document outlying official behaviors of CSS selectors. +[CSS Level 3 Specification](https://www.w3.org/TR/selectors-3/) +: Contains the latest official document outlying official behaviors of CSS selectors. - [CSS Level 4 Working Draft](https://www.w3.org/TR/selectors-4/) - : Contains the latest published working draft of the CSS level 4 selectors which outlines the experimental new - selectors and experimental behavioral changes. +[CSS Level 4 Working Draft](https://www.w3.org/TR/selectors-4/) +: Contains the latest published working draft of the CSS level 4 selectors which outlines the experimental new +selectors and experimental behavioral changes. - [HTML5](https://www.w3.org/TR/html50/) - : The HTML 5.0 specification document. Defines the semantics regarding HTML. +[HTML5](https://www.w3.org/TR/html50/) +: The HTML 5.0 specification document. Defines the semantics regarding HTML. - [HTML Living Standard](https://html.spec.whatwg.org/) - : The HTML Living Standard document. Defines semantics regarding HTML. +[HTML Living Standard](https://html.spec.whatwg.org/) +: The HTML Living Standard document. Defines semantics regarding HTML. +/// ## Selector Terminology diff --git a/docs/src/markdown/selectors/pseudo-classes.md b/docs/src/markdown/selectors/pseudo-classes.md index 17b2238..a62d55b 100644 --- a/docs/src/markdown/selectors/pseudo-classes.md +++ b/docs/src/markdown/selectors/pseudo-classes.md @@ -13,128 +13,138 @@ implement as they might not stick around. Selects every `#!html `, or `#!html ` element that has an `href` attribute, independent of whether it has been visited. -=== "Syntax" - ```css - :any-link - ``` - -=== "Usage" - ```pycon3 - >>> from bs4 import BeautifulSoup as bs - >>> html = """ - ... - ... - ... - ...

A link to click

- ... - ... - ... """ - >>> soup = bs(html, 'html5lib') - >>> print(soup.select(':any-link')) - [click] - ``` - -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/:any-link - -!!! new "New in 2.2" - The CSS specification recently updated to not include `#!html ` in the definition; therefore, Soup Sieve has - removed it as well. +/// tab | Syntax +```css +:any-link +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +...

A link to click

+... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select(':any-link')) +[click] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/:any-link +/// + +/// new | New in 2.2 +The CSS specification recently updated to not include `#!html ` in the definition; therefore, Soup Sieve has +removed it as well. +/// ## `:checked`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon} {:#:checked} Selects any `#!html `, `#!html `, or `#!html ] - ``` - -!!! tip "Additional Reading" - https://developer.mozilla.org/en-US/docs/Web/CSS/:checked +/// tab | Syntax +```css +:checked +``` +/// + +/// tab | Usage +```pycon3 +>>> from bs4 import BeautifulSoup as bs +>>> html = """ +... +... +... +... +...
+... +... +... +... +... +...
+... +... +... +... +... +... """ +>>> soup = bs(html, 'html5lib') +>>> print(soup.select(':checked')) +[, ] +``` +/// + +/// tip | Additional Reading +https://developer.mozilla.org/en-US/docs/Web/CSS/:checked +/// ## `:default`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon}:material-flask:{: title="Experimental" data-md-color-primary="purple" .icon} {:#:default} Selects any form element that is the default among a group of related elements, including: `#!html