Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't quite do it, so I tinkered with it myself #57

Open
Catman-232 opened this issue Sep 3, 2024 · 5 comments
Open

Doesn't quite do it, so I tinkered with it myself #57

Catman-232 opened this issue Sep 3, 2024 · 5 comments

Comments

@Catman-232
Copy link

Catman-232 commented Sep 3, 2024

I don't think it neccesarily matters but domains can be written with a wildcard to be an enitity, so you can use google.* to catch all google alt domains, like google.co.uk for me, though the filter still was working anyway.

On Google it leaves a blank box where the result was and it seems that's because :upward("div") is only getting the div elements above what looks to be two separate a elements, but the actual result box itself is one more above that. you can use :upward(2) to grab that. Using :remove() instead of :style(opacity:0.00!important;) will delete that element from the page instantly and Google handles that very well since it always tries to fill the width with any result div elements that are present, and in the case of your example search "anime girl artwork" it does not break the page loading more results when scrolling down. I don't use the other two sites so I don't know if this method would work for them.

While tinkering I also noticed that the div element for the result box itself does contain an attribute that is the same as the href of the a elements, targetting that would mean only matching one thing per filter and you could remove it directly without using upward. Which I believe would look like ##div[data-lpage*="garbagesite"]:remove(). It works for me and even kills a freepik ai result that was somehow showing up.

This search is a pretty good way to find even more AI sites that are not in the list yet such as "wallhaven.cc" (Not exclusively AI but allows it and appears to mostly just be a rehosting site of little worth with lacking sources) and the ironically named "anime-girl.com", hah! So right now I'm using a clone of the main list that's just every domain formatted to google.*##div[data-lpage*="trash"]:remove().

Feel free to tell me if and why this method isn't ideal for reasons I do not know, or use it yourself, I don't mind either way.

@PasttaTypo
Copy link
Contributor

You are very smart! This is working much better for me, this should be implemented!

@laylavish
Copy link
Owner

Okay, a few problems I'm seeing with your solution ##div[data-lpage*="garbagesite"]:remove():

  1. It doesn't work across Duckduckgo, Bing, etc; (not main issue)
  2. It works only with images, not hyperlinks.

This list is meant to block both images and hyperlink results, since I want this to be something that you can also use to help you research information that isn't just art references. google.com##a[href*="trashsite"]:upward(2):remove() seems to be the best solution (and the one i'll probably go with) since it works with both hyperlinks and images. I've actually had a list created with both upward(2) and display:none (default behavior if you leave it blank; similar to :remove()) for a lil bit now but held it back due to my fear that the pagination issue would happen to many people using this list, which would be incredibly annoying.

Now thinking back on it, it's really silly that I did this since this pagination issue does seem to be resolved (at least from what I can see and have tested on multiple devices), not to mention the fact that the pagination issue itself is extremely rare and would really only happen if you were actively trying to make it happen, like how I demonstrated using a heavily AI-infested search query. Sorry for the rambling, but yeah, this will definitely be implemented for sure.

@Catman-232
Copy link
Author

Catman-232 commented Oct 4, 2024

Thank you for the responses, I was beginning to think nobody was going to, since I had made another issue elsewhere and a comment on another at about the same time and have still yet to get any response to those haha. Don't worry about rambling, if anything I appreciate that, as I am also inclined to ramble myself. Combined Autism and ADHD moment, if it weren't for the prescription I started a couple months ago, I never even would have bothered with any of this. I'm probably going to end up writing a wall of text here as I progressively explain while investigating at the same time.

My biggest concern with your selector approach was that it matches twice per single result, which is a bit of a multiplied workload, and since it was only making them transparent it felt like a bit of a waste to have it run at all, let alone doubled up. You could still click them too! I'm glad my suggestion wasn't useless to you in any case.

Also, when you say "block both images and hyperlink results", do you mean text results in a normal search rather than images search? I probably never noticed it removing non-image results, but I know from some tinkering last week that it can be a pain to write a simple filter for those text results. I wanted to hide Spotify and Pinterest results but Google does not use a consistent DOM structure for text results pages, meaning the selectors have to be extra intelligent or extra specific. I gave up on that in the end because I realised it doesn't really matter quite so much, in some attempts it would filter nicely but on a different page structure it would end up grabbing the entire results list instead due to differences in nesting.

Even in a relatively normal search the classes for the result div elements are not all the same. Under #RSO the results boxes are mostly .MjjYud, except that for some reason the first and third of those are empty as I look at it, the second child - which is the first actual result, is that same class but wrapped in another div classed .hlcw0c. The fourth child is the "People also ask" section (just inside a .MjjYud), which was getting blasted due to href matching, and probably weirdly too since it's so many nested divs to reach the a elements in them.

Further down is another .hlcw0c, but this time it's a forums result with the extra links under the main one (this confounds me because even further down is a "answers" type result that is practically the same thing but that one isn't in the extra div), all of the content is still contained inside the .MjjYud anyway, but there's four maybe five more href matches it would get here.

Right beneath that is a Videos box classed .ULSxyf, but still has the same nested div inside it as the others. Probably won't get any errant selector matches in these since it's predominantly Youtube videos, but still. If you selected the forums result's .MjjYud or even the .hlcw0c above it to be removed, it somehow breaks the Videos box after it, absolutely no idea why. Hiding it with display: none at least prevents the format breakage there, but would cause the issue of no longer using remove().

The massive inconsistency I found was that if I searched "pinterest" then the results element would be really inconveniently different, the top result is .hlcw0c with a .MjjYud inside it which has the main link and the indented six others underneath it each with a desc line to them. Then instead of the next node being another of those div classes mentioned above, it's actually a not-classed div that contains the other page results divs with those class names. This means that using upward to target a div that is a child to #RSO now grabs that not-classed div instead. Alright I'm going to peel myself away from this now, sorry if this is a pain to skim through.

@laylavish
Copy link
Owner

Also, when you say "block both images and hyperlink results", do you mean text results in a normal search rather than images search?

Yes, I mean the text results in a traditional search rather than an image-based search (ie.switching to the images tab)

Even in a relatively normal search the classes for the result div elements are not all the same.

Yes, I've noticed the same thing. It's quite dissimilar to that of DuckDuckGo which has all of its classes .wLL07_0Xnd1QZpzpfR4W which makes it really easy to remove them. Google on the other hand... is just very strange. Wouldn't be surprised if this was done on purpose.

@Catman-232
Copy link
Author

Catman-232 commented Oct 6, 2024

Well I guess I only really wanted it to remove image results on Google Images so I tuned it for that specifically. I don't really understand why they obsfucate the class names on Google's sites, they also do it on Gmail and Photos too, but not on Youtube oddly. I can think of a few other sites I've poked around the source of for CSS overrides that do it. It's not like it really benefits developers or users, nor prevents any tampering by anyone.

I think filtering the text results on Google the most intelligent way would probably have to be done with a script so it can react to what the results containers look like. It is pretty impressive that you managed to devise your selector to work on all three sites at once, for both text and images. I was really struggling with condensing the selectors in my attempts for just text results on Google.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants