Hacker News

Show HN: Ichido, search engine that tags sites using Google and Cloudflare

by anthonyhnon 2/26/2023, 3:12:08 PM with 17 comments

by superasnon 2/26/2023, 7:08:37 PM
I think the tags can be grouped like Extereme trackers, Moderate trackers, etc and clicking on them expands the full list.
Also one really useful tag would be "Affiliate links" if there is a way to identify a page contains affiliate links like amazon affiliate, etc. Those pages are always almost crap.
Also a tag for "Modal popups", those are too often just marketing related websites and definitely want to skip it if I know prior to visiting.
by mgon 2/26/2023, 5:10:00 PM
I run this search engine comparison tool:
https://www.gnod.com/search/
Just added Ichido.
Click on "more engines" to activate it.
by coroboon 2/27/2023, 12:31:47 AM
Search engines will do literally anything except the option "never show results from this domain again"
Is there something obvious I'm missing that makes it infeasible, or maybe is it just something only I want?
As for this site there's too many tags for them to be useful imo. Give it 2 weeks of using the search engine and I bet you could hide silly fake tags in there and I'd never notice. Lots of tags = no tags.
I was picturing maybe a little pillbox type thing you might find appended to Google search results.
For instance when a result is a PDF: https://img.imgy.org/-7lq.jpg
by coolspoton 2/26/2023, 6:40:24 PM
I would prefer more logical tags like “top 1k”, “aggregator”, “user-generated content” than technical like “utm” and “obfuscated scripts”. Also, I would prefer tags grouped together into expandable lists and not shown all by default. Every site uses javascript, I don’t want to see it over and over again unless specifically queried for that.
by jesprenjon 2/26/2023, 5:09:49 PM
An interesting search proxy is also SearX. Written in Python, it supports many backend engines and can be self hosted.
And here's a lightweight frontend/proxy I wrote in C for using Google search on low-end phones that can't render bloated HTML (SearX was too complicated to install):
http://searc.4a.si:7327/search?q=news
It's also nice that the structured never constantly changing HTML it produces makes it ideal to programatically query Google. Although you still run into captchas which it cannot solve if queries get too suspicious.
by ocdtrekkieon 2/26/2023, 4:53:01 PM
This looks great, I am really glad to see things making it more obvious how pervasive malicious Google scripts are.
I find the webp flag interesting, as I don't think webp itself is inherently harmful, except for being an image spec that solely exists because Google NIHs everything and wants to write their own everything. (Long live JPEG-XL!)
I'm curious why you chose to tag it explicitly though.
by TekMolon 2/26/2023, 4:54:51 PM
In your about page, I see you are using Bing's API. I didn't even know Bing has a search API that everyone can use!
How much do you have to pay them for this?
by danukeron 2/26/2023, 4:41:15 PM
Thank you! I think any competition is welcome for search engines, with Google going down the monetization path.
A piece of feedback: When I select "Remove top ...." and click Submit, then click Next, the popularity filter is gone.
Edit: looks like the file type filter is dropped as well. Do add the arguments to the pagination links.
by 1vuio0pswjnm7on 2/26/2023, 9:02:05 PM
The pagination keep increasing past the point where Bing will provide no more results. Testing a popular search term, for which there are no doubt millions of results, it was only possible to get new results up to page 45. Yet the website will keep incrementing the page number and result numbers as if new results are being returned.
Then tried same search with popularity set to 500000 and could not even get a single full page of 10 results. It's laughable to assume from this "search" that only, say, 500004 out of the millions of websites in existence include this term. Not that I want to browse a full list, but at least I want to know how many hits I got. Then I can add more terms and try to reduce that number.
by simultsopon 2/26/2023, 5:03:16 PM
What would be the issue of being hosted on CF? I believe it is a better option than the rest of the shared hosting industry.. If nothing critical whats the intention of tagging?
by flas9sdon 2/26/2023, 9:52:03 PM
I see you offer an opensearch.xml already - if you embed it as link node with the appropriate type it will be straightforward to add it to the browser as (default) search engine: https://developer.mozilla.org/en-US/docs/Web/OpenSearch#auto...
also: happy to give this a try, more knobs for power users
by daoudcon 2/26/2023, 10:19:04 PM
This is really cool! Please consider joining forces with us at mwmbl.org, would love to incorporate some of these ideas.
by partyguyon 2/26/2023, 5:43:14 PM
Nice project! However, when trying to search for my site (https://spacehey.com), it shows multiple tags, with most of them being false (Cloudflare, UTM Tracking, WEBP Images). I used Cloudflare at one point in the past, but don't anymore. Additionally, there has never been UTM tracking or anything like that nor WEBP images... Where do you get such data from?
Apart from that, awesome project!
by bastawhizon 2/26/2023, 5:10:17 PM
What's the use case for this? If I don't want Google scripts, I block them. I'll use a user agent that doesn't download or run them. If I don't want cookies, I'll instruct my browser not to save cookies. What situation would I be in where knowing whether a site uses these things is a search result I want to visit?
by jacooperon 2/26/2023, 8:05:42 PM
Brave goggles also do something similar, allowing to filter search the way to you want.
by KomoDon 2/26/2023, 3:49:48 PM
Too many tags, and if a site has something, like scripts, why do you say "may"?
If a site has scripts then it's not "This site may be using Javascript", it's for sure that the site uses it...?
And popularity filter doesn't work, the results are empty and if you try going to any of the other pages it removes the filter
by berry_sortoroon 2/26/2023, 5:56:25 PM
[dead]