Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COR-785: 'Agnostic' Webpage URLs #504

Merged
merged 5 commits into from
Jul 6, 2017

Conversation

toastercup
Copy link
Member

Purpose:

This PR implements a strategy to performantly find a matching URL for a Webpage that is agnostic of not only its scheme (which already was implemented), but also its parameters (?) and anchors (#). Originally, a regex approach via Postgres was attempted that proved too convoluted and complex. Instead, we've ended with a compromise: a speedy scope that churns out best-guess matches for the URL in question, then a slow class method that refines those guesses and provides an exact match. A previous PR featured a similar implementation, but lacked the best-guess scope - this meant it had to load and loop through every Webpage in the system until it found an exact match.

To better explain why we can't just use the buest-guess scope, consider the following three URLs:

  • https://hiring.careerbuilder.com/contact-us
  • https://hiring.careerbuilder.com/contact-us-today
  • https://hiring.careerbuilder.com/contact-me

If the search term was hiring.careerbuilder.com/contact-us, the scope would properly exclude contact-me. However, because the scope will match any character on either side of the search term, the first two results are returned in the record set. This is why the class method exists - to refine these results.

JIRA:

https://cb-content-enablement.atlassian.net/browse/COR-785

Steps to Take On Prod

N/A

Changes:

  • Changes to setup

    • N/A
  • Architectural changes

    • N/A
  • Migrations

    • N/A
  • Library changes

    • N/A
  • Side effects

    • N/A

Screenshots

  • Before
    N/A

  • After
    N/A

QA Links:

How to Verify These Changes

  • Specific pages to visit

    • N/A - See above
  • Steps to take

    • In Cortex, try adding a few Webpages that mimick the scenario mentioned above. Ensure snippets are loaded properly, even for URLs with ?params and #anchors.
  • Responsive considerations

    • N/A

Relevant PRs/Dependencies:

N/A

Additional Information

N/A

@toastercup
Copy link
Member Author

@MKwenhua this needs a review

@toastercup toastercup merged commit c721728 into develop Jul 6, 2017
@toastercup toastercup deleted the bugfix/COR-785-Webpage-Feed-Not-Agnostic branch July 6, 2017 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants