Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new API endpoint for specific products #101

Closed
doamatto opened this issue Jun 19, 2021 · 4 comments · Fixed by #103
Closed

Add a new API endpoint for specific products #101

doamatto opened this issue Jun 19, 2021 · 4 comments · Fixed by #103
Assignees
Labels
enhancement New feature or content product Related to a product on PrivacySpy

Comments

@doamatto
Copy link
Collaborator

What do you want added to PrivacySpy?
The API currently has four endpoints:

  • /api/v2/index.json for the product index,
  • /api/v2/contributors.json for the contributor list,
  • /api/v2/rubric.json for the rubiric; and:
  • /api/v2/products.json for the full product index and their in-depth scores.

Although this is a very powerful system as is, the last endpoint for products is a humongous file, making it impractical for things like the web extension and the PrivacySpy scanner/bot, as it would have to wait for a large database to download all at once.

I'm proposing that we add a new endpoint /api/v#/products/slug.json. This would allow for smaller files for developers to access, resulting in far less traffic, less caching needs, and, most importantly, much faster operations in the extension and in the bot.

Have you considered implementing this addition yourself and submitting a pull request?
I could try to, but Handlebars isn't my forte.

Additional context
I'm currently working on the bot right now, and plan to open-source it near when I'm done. I'll work around this missing endpoint for the time being, but it will make things, as aforementioned, much faster with this new endpoint.

cc @milesmcc @ibarakaiev

@doamatto doamatto added the enhancement New feature or content label Jun 19, 2021
@privacyspy-bot privacyspy-bot bot added the product Related to a product on PrivacySpy label Jun 19, 2021
@privacyspy-bot
Copy link

Thanks for submitting this issue. @ibarakaiev has been assigned to determine next steps.

To learn about the PrivacySpy contribution process, check out the contribution guide.

@doamatto
Copy link
Collaborator Author

To expand, this new endpoint would essentially be the same data shown in the normal endpoint, but:

  • simplified to the appropriate product; and:
  • not have the questions and all values (seems extra imo for any but the rubric endpoint).

@ibarakaiev
Copy link
Collaborator

making it impractical for things like the web extension

Actually, that's why the endpoint was designed as is — to allow the extension to fetch the entire database and check each website against it locally. I agree that per-product API would be good to have for the bot, though, so I will work on that shortly.

On an unrelated note (this is probably on your mind, but I'll say it regardless, just in case): a lot of websites are SPAs, so before fetching their contents, the bot would need to actually render the website as well. Also, the citations in PrivacySpy might contain things like [...] to indicate that there is additional irrelevant text in the original policy that is omitted in the citation. That means that the algorithm that checks citations should be smart enough to account for this when comparing. These were the issues that kept Miles and me from starting to work on this, but it would be super awesome if you can do it!

@doamatto
Copy link
Collaborator Author

a lot of websites are SPAs, so before fetching their contents, the bot would need to actually render the website as well.

This is something that crossed my mind. Right now, it scans the HTML response raw for the data; nothing too insane. It'll need tuning to make sure there aren't false positives (in the insanely rare chance it occurs), among a few other tweaks to make work just right. Afaik most SPAs and JS rendered pages have some sort of HTML boiler as well, at the very least to allow screenreaders easy access. Could be wrong, but will still look into it.

PrivacySpy might contain things like [...] to indicate that there is additional irrelevant text in the original policy that is omitted in the citation.

This was something that I somehow forgot about, but made me realise that some of the citations are vaguely paraphrased (eg. fixing tenses), which could cause issues since it wouldn't be verbatim to the page. There are numerous variable that could really throw it in a loop (the weird « ' » but not « ' », for instance). The latter is easy to mitigate, but the former may be out of my paygrade haha.

TL;DR: The two bits that you mentioned are (slowly) being added as well. There are some other issues that need mitigating that shouldn't be too hard, but paraphrased citations (I'm pretty sure at least one exists that could cause an issue) could throw a wrench in the works in the first go around.

cc @ibarakaiev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or content product Related to a product on PrivacySpy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants