Skip to main content

Resolve Article Scraping Issues via WAF

Introduction​

When our system attempts to retrieve articles from your website, your Audioboost Publisher Manager may flag scraping failures. These are typically caused by security measures blocking our requests.

Available solutions:

  1. Whitelist our User-Agent (speakup-article) on your server/WAF
  2. Whitelist a unique GET parameter appended to our article URLs

Understanding WAF Interference​

What is a Web Application Firewall (WAF)?​

A WAF is a security layer between your web server and external traffic. It filters malicious requests (e.g., SQL injection, DDoS attacks) using predefined rules. While essential for security, WAFs can inadvertently block legitimate scrapers like ours.

How WAFs Interfere with Scraping​

IssueDescription
User-Agent BlockingWAFs may ban unknown/unrecognized user agents
Rate LimitingFrequent requests from a single IP address or agent trigger blocks
Signature DetectionGET parameters or headers may resemble attack patterns

Common WAF Providers​

WAF ProviderDeployment Model
CloudflareCloud-based (SaaS)
Akamai Kona Site DefenderCloud-based (SaaS)
AWS WAFCloud (integrated with AWS services)
ImpervaCloud / hybrid
F5 Advanced WAFOn-premises/hardware

Solution 1: Whitelist Speakup User-Agent​

Add our user-agent speakup-article to your WAF's allowlist.

Generic Steps​

  1. Access your WAF dashboard (e.g., Cloudflare, AWS WAF)
  2. Navigate to "Security Rules" > "Allowlists" (or equivalent)
  3. Create a new rule:
    • Match type: User-Agent
    • Value: speakup-article
  4. Set the rule action to ALLOW (bypass other checks)
  5. Save and deploy changes

Cloudflare Configuration​

  1. Select the website that you want to manage
  2. In the right menu, select Security > WAF and click "Create rule"

Cloudflare WAF Menu

  1. Create a new rule with these settings:
    • Field: "User Agent"
    • Operator: "contains"
    • Value: speakup-article
    • Action: Skip
    • Mark the WAF components to skip as shown below:

Cloudflare WAF Rule

  1. Click "Deploy"

Provider-Specific Documentation​


Solution 2: Whitelist GET Parameter​

Use a unique parameter we'll append to article URLs:

https://example.com/article?scraper_token=speakup_article

Generic Steps​

  1. In your WAF dashboard, locate "Allowlisted Parameters" (or "Ignore Rules")
  2. Add scraper_token to the allowlist
  3. Configure the rule:
    • Applies to path: /* (all articles)
    • Action: BYPASS or IGNORE

AWS WAF Example​

{
"Name": "AllowSpeakupScraperToken",
"Priority": 1,
"Action": "ALLOW",
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true
},
"Rule": {
"Name": "scraper_token-param-rule",
"Action": "ALLOW",
"Match": {
"QueryParameter": {
"Key": "scraper_token",
"Value": "speakup_article"
}
}
}
}

Verification​

Our team will perform a new scraping attempt within 24 business hours of whitelisting upon confirmation.


Need Help?​

For issues, contact us at support@audioboost.com with:

  • Your WAF provider name
  • Example article URLs
  • Blocked request logs (if available)

Summary​

Whitelisting either our User-Agent or the GET parameter ensures uninterrupted article scraping. Most WAFs support these adjustments via their dashboards.