-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FetchNodeLevelK is not filtering out mailto:
links.
#861
Comments
which model are you using? |
4o-mini. |
github-actions bot
pushed a commit
that referenced
this issue
Jan 6, 2025
## [1.34.0-beta.16](v1.34.0-beta.15...v1.34.0-beta.16) (2025-01-06) ### Bug Fixes * add back poethepoet for pylint ([a82af04](a82af04)) * better playwright installation handling ([f6009d1](f6009d1)) * disallow mailto: ([#861](#861)) ([8d9c909](8d9c909)) * removed requirements files ([25861b0](25861b0)) * selenium import in ChromiumLoader ([e374e05](e374e05)) ### chore * chromium browser asnc handling ([5be7c49](5be7c49)) * made some libs optional ([5cdf055](5cdf055)) * pandas package is now optional ([54c69a2](54c69a2))
github-actions bot
pushed a commit
that referenced
this issue
Jan 6, 2025
## [1.34.2-beta.1](v1.34.1...v1.34.2-beta.1) (2025-01-06) ### Bug Fixes * add back poethepoet for pylint ([a82af04](a82af04)) * better playwright installation handling ([f6009d1](f6009d1)) * disallow mailto: ([#861](#861)) ([8d9c909](8d9c909)) * removed requirements files ([25861b0](25861b0)) * search graph ([d4b2679](d4b2679)) * selenium import in ChromiumLoader ([e374e05](e374e05)) ### chore * chromium browser asnc handling ([5be7c49](5be7c49)) * made some libs optional ([5cdf055](5cdf055)) * pandas package is now optional ([54c69a2](54c69a2)) ### CI * **release:** 1.34.0-beta.15 [skip ci] ([bc7ae85](bc7ae85)) * **release:** 1.34.0-beta.16 [skip ci] ([a0efb09](a0efb09)), closes [#861](#861)
github-actions bot
pushed a commit
that referenced
this issue
Jan 6, 2025
## [1.34.2](v1.34.1...v1.34.2) (2025-01-06) ### Bug Fixes * add back poethepoet for pylint ([a82af04](a82af04)) * better playwright installation handling ([f6009d1](f6009d1)) * disallow mailto: ([#861](#861)) ([8d9c909](8d9c909)) * removed requirements files ([25861b0](25861b0)) * search graph ([d4b2679](d4b2679)) * selenium import in ChromiumLoader ([e374e05](e374e05)) ### chore * chromium browser asnc handling ([5be7c49](5be7c49)) * made some libs optional ([5cdf055](5cdf055)) * pandas package is now optional ([54c69a2](54c69a2)) ### CI * **release:** 1.34.0-beta.15 [skip ci] ([bc7ae85](bc7ae85)) * **release:** 1.34.0-beta.16 [skip ci] ([a0efb09](a0efb09)), closes [#861](#861)
Hey @Kilowhisky, it is fixed in the new release! |
github-actions bot
pushed a commit
that referenced
this issue
Jan 6, 2025
## [1.35.0](v1.34.2...v1.35.0) (2025-01-06) ### Features * ⏰added graph timeout and fixed model_tokens param ([#810](#810) [#856](#856) [#853](#853)) ([01a331a](01a331a)) * ⛏️ enhanced contribution and precommit added ([fcbfe78](fcbfe78)) * add codequality workflow ([4380afb](4380afb)) * add timeout and retry_limit in loader_kwargs ([#865](#865) [#831](#831)) ([21147c4](21147c4)) * serper api search ([1c0141f](1c0141f)) ### Bug Fixes * browserbase integration ([752a885](752a885)) * local html handling ([2a15581](2a15581)) ### CI * **release:** 1.34.2-beta.1 [skip ci] ([f383e72](f383e72)), closes [#861](#861) [#861](#861) * **release:** 1.34.2-beta.2 [skip ci] ([93fd9d2](93fd9d2)) * **release:** 1.34.3-beta.1 [skip ci] ([013a196](013a196)), closes [#861](#861) [#861](#861) * **release:** 1.35.0-beta.1 [skip ci] ([c5630ce](c5630ce)), closes [#865](#865) [#831](#831) * **release:** 1.35.0-beta.2 [skip ci] ([f21c586](f21c586)) * **release:** 1.35.0-beta.3 [skip ci] ([cb54d5b](cb54d5b)) * **release:** 1.35.0-beta.4 [skip ci] ([6e375f5](6e375f5)), closes [#810](#810) [#856](#856) [#853](#853)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm trying to use the new DepthSearchGraph and it appears to be misfiring by trying to follow email links in
href
s. It should filter out non web links.mailto:
tel:
See other schemes that should probably be filtered out: https://www.w3.org/wiki/UriSchemes
https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/96064f20ee8a849a2548f293419cf9028386c47b/scrapegraphai/nodes/fetch_node_level_k.py#L155C16-L158
EDIT:
It also goes boom navigating javascript links.
The text was updated successfully, but these errors were encountered: