Back when I was a stupid kid, I once did
ln -s /dev/zero index.html
on my home page as a joke. Browsers at the time didn’t like that, they basically froze, sometimes taking the client system down with them.Later on, browsers started to check for actual content I think, and would abort such requests.
These days, almost all browsers accept zstd and brotli, so these bombs can be even more effective today! [This](https://news.ycombinator.com/item?id=23496794) old comment showed an impressive 1.2M:1 compression ratio and [zstd seems to be doing even better](https://github.com/netty/netty/issues/14004).
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
> At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server. I switched to just identifying 'bad actors' who are clearly trying to do bad things and just banning their IP with firewall rules. That's becoming more challenging with IPV6 though.
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
Zip bombs are fun. I discovered a vulnerability in a security product once where it wouldn’t properly scan a file for malware if the file was or contained a zip archive greater than a certain size.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
I deployed this, instead of my usual honeypot script.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
It's worth noting that this is a gzip bomb (acts just like a normal compressed webpage), not a classical zip file that uses nested zips to knock out antiviruses.
There was an incident a little while back where some Tor Project anti-censorship infrastructure was run on the same site as a blog post about zip bombs.[0] One of the zip files got crawled by Google, and added to their list of malicious domains, which broke some pretty important parts of Tor's Snowflake tool. Took a couple weeks to get it sorted out.[1]
[0] https://www.bamsoftware.com/hacks/zipbomb/ [1] https://www.bamsoftware.com/hacks/zipbomb/#safebrowsing
I protected uploads on one of my applications by creating fixed size temporary disk partitions of like 10MB each and unzipping to those contains the fallout if someone uploads something too big.
I do something similar using a script I've cobbled together over the years. Once a year I'll check the 404 logs and add the most popular paths trying to exploit something (ie ancient phpmyadmin vulns) to the shitlist. Requesting 3 of those URLs adds that host to a greylist that only accepts requests to a very limited set of legitimate paths.
There is a similar thing for ssh servers, called endlessh (https://github.com/skeeto/endlessh). In the ssh protocol the client must wait for the server to send back a banner when it first connects, but there is no limit for the size of it ! So this program will send an infinite banner very ... very slowly; and make the crawler/script kiddie script hang out indefinitely or just crash.
Attacked Over Tor [2017]
https://www.hackerfactor.com/blog/index.php?/archives/762-At...
The same, for Caddy: https://www.dustri.org/b/serving-a-gzip-bomb-with-caddy.html
10T is probably overkill though.
As an aside, there are a lot of people out there standing up massive microservice implementationsÂą for relatively small sites/apps, which need to have this part printed, wrapped around a brick, and lobbed at their heads:
> A well-optimized, lightweight setup beats expensive infrastructure. With proper caching, a $6/month server can withstand tens of thousands of hits — no need for Kubernetes.
----
[1] Though doing this in order to play/learn/practise is, of course, understandable.
IsMalicious() doing some real heavy lifting in that pseudo code. Would love to see a bit more under THAT hood.
I'm curious why a 10GB file of all zeroes would compress only to 10MB. I mean theoretically you could compress it to one byte. I suppose the compression happens on a stream of data instead of analyzing the whole, but I'd assume it would still do better than 10MB.
Is there a list of popular attack vector urls located somewhere? I want to just auto-ban anyone sniffing for .env or ../../../../ etc.
Rather not write it myself
As I don't use PHP in my server, but get a lot of requests for various PHP related stuff, I added a rule to serve a Linux kernel encrypted with a "passphrase" derived from /dev/urandom as a reply for these requests. A zip bomb might be a worse reply ...
For all those "eagerly" fishing for content AI bots I ponder if I should set up a Markov chain to generate semi-legible text in the style of the classic https://en.wikipedia.org/wiki/Mark_V._Shaney ...
15+ years ago I fought piracy at a company with very well known training materials for a prestigious certification. I'd distribute zip bombs marked as training material filenames. That was fun.
Is there any legal exposure possible?
Like, a legitimate crawler suing you and alleging that you broke something of theirs?
> Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device
Surely, the device does crash but it isn’t destroyed?
This topic comes up from time to time and I'm surprised no one yet mentioned the usual fearmongering rhetoric of zip bombs being potentially illegal.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
> For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
This post is suspiciously similar to my post from 2017 "How to defend your website with ZIP bombs"
https://blog.haschek.at/2017/how-to-defend-your-website-with...
I also had the idea of zip bomb to confuse badly behaved scrapers (and I have mentioned it before to some other people, although I did not implemented it). However, maybe instead of 0x00, you might use a different byte value.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
See https://research.swtch.com/zip for how to make an infinite zip bomb: ie a zip file that unzips to itself, so you can keep unzipping forever without ever hitting bottom.
See also (2017) HN, https://news.ycombinator.com/item?id=14707674
I think it's a good idea, but it must be coupled with robots.txt.
I am ignorant as to how most bots work. Could you have a second line of defense for bots that avoid this bomb: Dynamically generate a file from /dev/random and trickle stream it to them, or would they just keep spawning parallel requests? They would never finish streaming it, and presumably give up at some point. The idea would be to make it more difficult for them to detect it was never going to be valid content.
It is surprising that it works (I haven't tried it). `Content-Length` had one goal - to ensure data integrity by comparing the response size with this header value. I expect http client to deal with this out of the box, whether gzip or not. Is it not the case? If yes, that changes everything, a lot of servers need priority updates.
The hard part is the content of isMalicious() function. The bots can crash but they’d be quick to restart anyway.
Do you mind sharing your specs of your digital ocean droplet? I'm trying to setup one with less cost.
If anyone is interested in writing a guide to set this up with crowdsec or fail2ban I'm all ears
"On my server, I've added a middleware that checks if the current request is malicious or not"
How accurate is that middleware? Obviously there are false negatives as you supplement with other heuristics. What about false positives? Just collateral damage?
Can someone explain why mods change post titles? What value does it provide in their mind?
I guess it goes without saying, that the first thing should be to follow security best practices. Patch vulnerabilities fast etc., before doing things like that. Then maybe his first website wouldn't have compromised either.
I can't imagine using anything other than a stream interface when dealing with web requests in a crawler.
You need that to protect against not only these types of shenanigans, but also large or slow responses.
I like a similar trick, sending very large files hosted on external servers to malicious visitors using proxies. Usually those proxies charge by bandwidth, so it increases their costs.
"But when I detect that they are either trying to inject malicious attacks, or are probing for a response" how are you detecting this? mind sharing some pseudocode?
Wouldn't it be cheaper to use Cloudflare than task a human to obsessively watch webserver logs on a box lacking proper filtering?
There's a lot of creative ideas out there for banning and/or harassing bots. There's tarpits, infinite labyrinths, proof of work || regular challenges, honeypots etc.
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
If one wanted to create the ICE of cyberspace in cyberpunk, capable to destroy the device ...
Zip libraries aren’t bomb proof yet? Seems fairly easy to detect and ignore, no?
But what about the bots written in Rust? Will that get rid of them too?
OP: Hi guys this is how I fend off hackers! Hackers: Note taken.
it'd be cool to have a proof of work protocol baked into http. like, a header that browsers understood
Serving a zip bomb is pretty illegal. The bot will restart its process anyway, and carry on as if nothing happened.
ok but where I put this?? at the files directory???
this was a cool read.very interesting stuff.
Mildly amusing, but it seems like this is thinking that two wrongs make a right, so let us serve malware instead of using a WAF or some other existing solution to the bot problem.
Once upon a time around 2001 or so I used to have a static line at home and host some stuff on my home linux box. A windows NT update had meant a lot of them had enabled this optimistic encryption thing where windows boxes would try to connect to a certain port and negotiate an s/wan before doing TCP traffic. I was used to seeing this traffic a lot on my firewall so no big deal. However there was one machine in particular that was really obnoxious. It would try to connect every few seconds and would just not quit.
I tried to contact the admin of the box (yeah that’s what people used to do) and got nowhere. Eventually I sent a message saying “hey I see your machine trying to connect every few seconds on port <whatever it is>. I’m just sending a heads up that we’re starting a new service on that port and I want to make sure it doesn’t cause you any problems.”
Of course I didn’t hear back. Then I set up a server on that port that basically read from /dev/urandom, set TCP_NODELAY and a few other flags and pushed out random gibberish as fast as possible. I figured the clients of this service might not want their strings of randomness to be null-terminated so I thoughtfully removed any nulls that might otherwise naturally occur. The misconfigured NT box connected, drank 5 seconds or so worth of randomness, then disappeared. Then 5 minutes later, reappeared, connected, took its buffer overflow medicine and disappeared again. And this pattern then continued for a few weeks until the box disappeared from the internet completely.
I like to imagine that some admin was just sitting there scratching his head wondering why his NT box kept rebooting.