Skip to main content

Resolve Article Scraping Issues via WAF

Introduction

When our system attempts to retrieve articles from your website, Audioboost Publisher Manager may flag scraping failures. These are typically caused by security measures blocking our requests. In order to resolve this, we provide two solutions:

  • Whitelist our User-Agent (speakup-article) on your server / WAF.
  • Whitelist a unique GET parameter appended to our article URLs.

This guide aims to cover the Web Application Firewalls (WAFs) and to outline te steps for whitelisting.

What is a Web Application Firewall (WAF)

A WAF is a security layer between your web server and external traffic. It filters malicious requests (e.g., SQL injection, DDoS attacks) using predefined rules. While essential for security, WAFs can inadvertently block legitimate scrapers like ours.

How WAFs Interfere with Scraping

  • User-Agent Blocking: WAFs may ban unknown / unrecognized user agents.
  • Rate Limiting: Frequent requests from a single IP address or agent trigger blocks.
  • Signature Detection: GET parameters or headers may resemble attack patterns.

Common WAFs in the market

Most enterprises use these WAF solutions. Whitelisting steps vary by provider.

WAF ProviderDeployment Model
CloudflareCloud-based (SaaS)
Akamai Kona Site DefenderCloud-based (SaaS)
AWS WAFCloud (integrated with AWS services)
ImpervaCloud / hybrid
F5 Advanced WAFOn-premises/hardware

Solution 1: Whitelist Speakup User-Agent

Add our user-agent speakup-article to your WAF’s allowlist.

Generic Steps for whitelisting

  1. Access your WAF dashboard (e.g., Cloudflare, AWS WAF).
  2. Navigate to "Security Rules" > "Allowlists" (or equivalent).
  3. Create a new rule:
    • Match type: User-Agent
    • Value: speakup-article
  1. Set the rule action to ALLOW (bypass other checks).
  2. Save and deploy changes.

Cloudflare Guide

  1. Select the website that you want to manage.
  2. In the right menu, select Security > WAF and click “Create rule”.

image1

  1. Create a new rule with this information:

    • Field: "User Agent"
    • Operator: "contains"
    • Value: speakup-article
    • Action: Skip
    • Mark the WAF components to skip as shown below:

    image2

  2. Click on “Deploy” button.

Provider-Specific Guides


Solution 2: Whitelist a specific GET Parameter

Use a unique parameter we'll append to the article URLs.

https://example.com/article?scraper_token=speakup_article

Whitelist the parameter scraper_token parameter to bypass WAF rules.

Generic Steps

  1. In your WAF dashboard, locate "Allowlisted Parameters" (or "Ignore Rules").
  2. Add scraper_token to the allowlist.
  3. Ensure the rule:
    • Applies to the path: /* (all articles).
    • Action: BYPASS or IGNORE.

Example for AWS WAF

{
"Name": "AllowSpeakupScraperToken",
"Priority": 1,
"Action": "ALLOW",
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true
},
"Rule": {
"Rule": {
"Name": "scraper_token-param-rule",
"Action": "ALLOW",
"Match": {
"QueryParameter": {
"Key": "scraper_token",
"Value": "speakup_article"
}
}
}
}
}

Verification

Our team will perform a new scraping attempt to articles within 24 business hours of whitelisting upon confirmation.

For issues, you can contact us at support [at] audioboost [dot] it, with the following details:

  • Your WAF provider name
  • Example article URLs
  • Blocked request logs (if available).

Conclusion

Whitelisting either our User-Agent or the GET parameter ensures uninterrupted article scraping. Most WAFs support these adjustments via their dashboards. For urgent issues, contact our support team.