Mon. Apr 21st, 2025
Open source developers are fighting ai crawlers with smartness and

Many software developers believe that AI Web-Crawling bots are internet cockroaches. Some developers have begun to fight back in clever and humorous ways.

Any website can target bad crawler behavior, but sometimes they defeat the site, but open source developers are “disproportionately” affected. I’m writing NiccolòVenerandi, developer of Linux desktops known as Plasma and owner of the blog Librenews.

By its nature, sites hosting free and open source (FOSS) projects tend to share more infrastructure publicly and have fewer resources than commercial products.

The problem is that many AI bots do not respect the robot exclusion protocol robot.txt file. This is a tool that tells bots what is not a crawl originally created for search engine bots.

“Seeing for help” Blog post In January, FOSS developer Xe Iaso explained how Amazonbot ruthlessly slammed on the Git server’s website until it caused a DDOS outage. The GIT server hosts FOSS projects so anyone who needs it can download or contribute code.

However, the bot ignored Iaso’s robot.txt, hid behind other IP addresses, pretending to be another user, Iaso said.

“Blocking AI Crawler bots is useless because they lie, change user agents, use the home’s IP address as a proxy, etc,” lamented Iaso.

“They scrape your site until it falls, then they scrape it a little more. They click on all links on every link, and then they display the same page over and over again.

Enter the God of the Tomb

So Iaso fought back with his intelligence and built a tool called Anubis.

Anubis is Checking reverse proxy certificates You must pass the request before it hits the GIT server. It blocks bots, but is possible through human-controlled browsers.

Interesting part: Anubis is the name of the god in Egyptian mythology who leads the dead to judgement.

“Anubis weighed your soul (heart). If it is heavier than the feather, your heart will be eaten and the mega is dead,” Yeso told TechCrunch. If a web request passes the challenge and is deemed human, Cute anime photos Announce your success. The drawing is “my view on an anhumanized anubisation,” says Iaso. If it’s a bot, the request will be denied.

The hard-appointed project is spreading like a wind among the Foss community. IASO Shared on Github On March 19th, in just a few days, we gathered 2,000 stars, 20 contributors and 39 forks.

Revenge as a defence

The instant popularity of Anubis shows that Iaso’s pain is not unique. In fact, Benellandi shared the story after the story:

  • Founder CEO of Sourcehut Drew Devault explained They spend “20-100% of the time a week to alleviate aggressive LLM crawlers of scale” and “experience dozens of outages per week.”
  • Jonathan Corbet, a well-known FOSS developer who runs the Linux industry news site LWN, warned that his site is Slowing due to DDOS level traffic “From the AI ​​scraper bot.”
  • Kevin Fenzi, Sysadmin of the huge Linux Fedora project, The AI ​​Scraperbot said He had to block access to the entire Brazilian country as he was becoming so aggressive.

Venerandi tells TechCrunch that he knows several other projects experiencing the same problem. One of them said, “At some point, all Chinese IP addresses had to be temporarily banned.”

Venerandi says developers should just dodge AI bots “even having to ban the entire country” and “even having to ban the entire country” for a while.

Beyond the weight of the soul of web requesters, other developers believe that vengeance is the best defense.

A few days ago Hacker News,user Xyzal Aggressive loading of robot.txt is forbidden to ban pages that include “Bucket articles on the benefits of bleach” or “Articles on the positive effects of catching measles on performance in bed.”

“I think we need to aim for a bot to access the trap and get the negative utility value, not just the zero value,” Xyzal explained.

It happened to be a surprise in January that a tool was released called by an anonymous creator known as “Aaron.” Nepenthes It aims to do just that. It traps crawlers in an endless maze of fake content. This is a recognized goal by the developers Ars Technica It’s aggressive, even if it’s not malicious at all. This tool is named after a carnivorous plant.

And CloudFlare, perhaps the biggest commercial player that provides some tools to dodge AI crawlers, released a similar tool last week called AI Labyrinth.

CloudFlare explained that it “purposes to slowing down, confusing and wasting resources from AI crawlers and other bots that don’t respect “no crawl” instructions. With that blog post. CloudFlare said it was malfunctioning the AI ​​crawler.

Source’s Devault told TechCrunch: “Nepenthes has a satisfying sense of justice, as it gives nonsense to the crawlers and poisons the wells, but ultimately, Anubis is the solution that worked for his site.”

However, Devault issued a heartfelt plea of ​​the public for a more direct revision. “Stop justifying either LLMS or AI Image Generator or Github co-pilot or this garbage.

The possibility is Zilch, so Foss developers in particular are fighting back with a touch of smartness and humor.