Why I block bot traffic
Recently my site has become pretty popular. Especially with bots.
Even though I upgraded my hosting, my site was hardly able to cater hundreds of visitors every second. “Visitors” might not be the appropriate term because most of them were not human.
I am not talking about spam bots. I am not talking about referral spam. I am not talking about crawlers from search engines. Even though they are annoying too.
The old pest
I don’t know anything about the technical side of things so that I will not even go there. As far as I know, referral spam doesn’t actually hit your web page so that it might mess with your analytics but it will not slow your site down or use up bandwidth.
You probably have a firewall installed so that comment spam bots can only hit a few pages before they are blocked. I am not saying that they are nice but – at least for me – they do not cause a resource problem.
A new bot pest
What I am talking about is a new pest: hundreds of companies who are making scraping your content their business. Actually, one bot was scraping my coaching blog for what is called a spinner site. Spinner software is used by people to automate the process of stealing content.
It’s not the theft that worries me so much because I feel they get instant karma. Once Google catches them, as they always do at some point, their whining will fill the cyberspace.
I’m not sure that it’s a great idea to copy content from non-native speakers and I chuckled when I considered how my grammar errors and sometimes exotic sentence structure might look after spinning.
The problem is that they scrape or crawl your sites constantly and cost a lot of resources. And there are more of them every single day. Most of them ignore your robots.txt.
They get around restrictions (crawl limitations), and that makes them bad bots in my mind, by using hundreds of different IP addresses.
Most of them do not show which site they crawl for.
A few of the ones that do show which site they’re from got me really excited. Wow, perhaps they will spread my content to their readers? They might or they might not. I believe most enable their clients to easily “curate” your content.
Some are probably just start-ups who don’t mean ill but just don’t know how to properly program their bots. A few could even be useful (that said, I have never seen them as referrers for human traffic.)
For now, they’re blocked. All of them.
The next category is businesses who provide services like ranking websites, counting their backlinks, suggesting they had any idea if your business is legit and how much money you earn.
Most of them respect your robots.txt. The only problem is the sum of crawlers. Every day a new SEO company opens its virtual doors, and every day a new “scam or not” site pops up.
Tabula Rasa. I blocked them all.
I could not care less about Alexa rankings or what Ahrefs, SEM Rush & Co think about my backlinks. And the hundreds of “scam or not” sites that seem to serve only one goal: showing as many ads as possible? I don’t need them.
AmazonAWS bot spam
The biggest offender is Amazon. Only very few list which website they’re coming from. They are programmed in a way that you can’t even throttle them. Several cities, hundreds of IP’s, and each IP hits a few times. All day long. Every day.
I had a discussion with Amazon that didn’t lead to a solution. Understandably, Amazon needs to see traffic reports. I would have to hire someone to get the traffic reports for hundreds or maybe thousands of different IP’s. Each IP only hits perhaps 30 times an hour or even less but in sum, they cause a performance problem.
I have blocked AmazonAWS completely. Because I don’t know if this is a good idea I wouldn’t recommend it. That said, the largest part of my problems came from Amazon servers. I have yet to see anything good coming from there.
So far I am not aware of any problems and I doubt that there would ever be humans coming from Amazon servers.
Amazon is not only the biggest spambot and bad bot host, they also have a serious email spam problem. I hope and I am positive that they will soon take steps to end the abuse.
Until then: they’re not welcome on my sites.
A single bot is rarely a problem. Hundreds and hundreds of bots, especially when they scrape your content can harm your site and destroy the experience for your human readers.
- speed problems
- site might become unavailable
- your statistics become unusable or you have to manually set filters to remove all of the bots
- depending on your hosting: might use up all of your resources and bandwidth
- most bots need your data for their business. You don’t need them for yours
- Google might lower your ranking if your site gets slow or unavailable
I don’t know about you: I don’t mind helping people. I do mind paying for other people’s business success without consenting to it.
As we all know, slow or unavailable pages do not just annoy our clients and readers but also Google.
The idea of paying for extra hosting to enable someone to steal from me or run their business seems absurd.
In my opinion, everyone who notices unusual amounts of bot traffic should have a close look and take appropriate steps.
A free WordPress plugin is available, it’s called “stop bad bots.”