Skip to main content
wordpress comment deletion

How I automated bulk deletion of pending WordPress comments

4 min 846 words

I had a WordPress site that kept filling up with pending comments — mostly spam — and manually clearing the moderation queue became a chore. If you’ve spent an afternoon clicking “Trash” a few hundred times, you know the feeling: slow, repetitive, and painfully avoidable.

So I wrote a small script to delete pending comments using the WordPress REST API. It started as a quick utility and turned into something I use whenever a site gets hit by comment spam. This post explains the problem, the pragmatic decisions I made, and the code I use. If you prefer the short version: skip to the TL;DR and the code examples.


TL;DR

  • Problem: thousands of pending comments (spam) slow down maintenance.
  • Naive approach: delete sequentially via the API — slow and brittle.
  • Better approach: reuse HTTP connections, retry on transient errors, process in batches, and run deletes in parallel.
  • Result: orders of magnitude faster and far more reliable.

You can find the full source code and the repository for this project on GitHub: https://github.com/JamithNimantha/wordpress-automaton

I’ll walk you through why each technique matters and give you the code and commands to run it locally.


Why the naive way fails

If you send one HTTP DELETE per comment and open a new connection each time, you pay the TCP and TLS handshake cost over and over. That kills throughput. On top of that, WordPress (or intermediate proxies) sometimes return 5xx or 429 when you hammer the API. A script that doesn’t handle retries or backoff will stop halfway through, leaving you with a partial cleanup.

We want a solution that’s:

  • fast (maximize throughput for I/O-heavy work),
  • resilient (retries & backoff), and
  • safe (clear summary and optional dry-run).

What I built (high level)

The script does five simple things:

  1. Uses a single requests.Session configured with connection pooling. That reuses TCP/TLS connections so each DELETE is much cheaper.
  2. Configures retries and exponential backoff for transient HTTP errors (429, 5xx).
  3. Fetches comments in pages and processes them in batches.
  4. Uses a thread pool to perform deletes concurrently (I/O-bound, so threads are fine).
  5. Shows a tqdm progress bar and prints a summary (successes + failures).

Everything is in comment-delete.py in the repo root. You only need Python 3 and the packages in requirements.txt.


Quick start (copy-paste)

  1. Create a virtualenv and install deps:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
  1. Add your site credentials to a .env file in the project root:
WP_SITE_URL=https://your-site.com
WP_ADMIN_USER=admin
WP_ADMIN_PASS=your-password
  1. Run the script:
python comment-delete.py

Watch the progress bar. When it’s finished you’ll see a short summary with total successes and failures.

Processing page 1: 100%|██████████| 100/100 [01:10<00:00,  1.41it/s]
Processing page 2: 100%|██████████| 100/100 [01:07<00:00,  1.48it/s]
Processing page 3: 100%|██████████| 100/100 [01:35<00:00,  1.05it/s]
Processing page 4:  43%|████▎     | 43/100 [00:29<00:42,  1.35it/s]

Why these parts matter (short explanations)

requests.Session + HTTPAdapter

Reusing connections reduces latency dramatically. Instead of paying for the handshake each request, the client reuses existing connections and sends the request immediately. This alone gives a huge speedup.

Retries and backoff

Servers sometimes return 429 (Too Many Requests) or intermittent 5xx errors. Automatic retries with a backoff factor smooths over these blips so your job completes without manual intervention.

Adaptive threading

Deleting comments is an I/O-bound task. Adding threads increases throughput until the network or the server becomes the bottleneck. I use a small formula to pick a sane thread count (based on CPU and number of items) and cap it so we don’t overwhelm the server.

Batching and progress bars

Fetching comments in pages and processing them in chunks keeps memory use predictable and gives you useful progress feedback.


The code (high-level excerpts)

Here are the important parts (see the full comment-delete.py in the repository for the complete script):

  • Create a session with pooling and retries:
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retry_strategy = Retry(total=3, backoff_factor=1, status_forcelist=[429,500,502,503,504])
adapter = HTTPAdapter(max_retries=retry_strategy, pool_connections=100, pool_maxsize=100)
session.mount('http://', adapter)
session.mount('https://', adapter)
  • Delete in parallel (thread pool):
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=thread_count) as executor:
    futures = [executor.submit(api.delete_comment, c['id']) for c in comments]
    for f in futures:
        result = f.result()
        # handle success/failure
  • Track progress with tqdm:
from tqdm import tqdm
with tqdm(total=len(comments)) as pbar:
    # call pbar.update(1) after each delete

If you want the full source, open comment-delete.py — it’s compact and commented.


Tuning tips (what I change when I see problems)

  • Seeing lots of 429s? Reduce concurrency and increase backoff_factor.
  • Seeing lots of 5xx? Try a slower thread profile and double-check the server logs.
  • Want to be conservative? Use force=false so comments go to Trash instead of being permanently deleted.
  • Want to be safer before running: add a --dry-run flag (I can add this if you want) that prints IDs but doesn’t delete anything.

Safety notes

  • Never commit your .env with credentials.
  • Prefer application passwords where possible instead of an admin account.
  • Consider testing on a staging site first.

Next steps you might want

If you’d like, I can add any of these in the repo:

  • --dry-run and --failed-output flags to record failed IDs for later retry.
  • A small log file that records each delete and its server response.
  • A safe scheduling wrapper that runs the script at low concurrency during off-peak hours.
  • Convert the script into a small CLI using argparse with useful flags.

Tell me which one you want next and I’ll implement and test it.


That’s it — a short, practical guide with runnable code. If you want, I can also convert this post into a publish-ready HTML page with a tiny stylesheet and front-matter for your static site.

Happy cleaning.