Back to Blog
BlogMarch 20, 2026

What Is Firehose by Ahrefs? The Free Real-Time Web Streaming API Explained

What Is Firehose by Ahrefs? The Free Real-Time Web Streaming API Explained

Key Takeaways

  • Firehose is Ahrefs’ real-time web data streaming API that delivers instant Server-Sent Events (SSE) the moment any monitored web page changes — powered by Ahrefs’ massive crawler index.
  • It uses Lucene query syntax for surgical precision, supporting fields like title:, domain:, added:, recent:, and ML-classified page_category.
  • Completely free during beta with no credit card required; includes a powerful REST API, tap-based management, and native AI skill integration for agents.
  • Benchmarks show sub-second latency versus hours of delay in Google Alerts or paid tools like Mention.
  • Supports advanced replay (since, offset), content diffs, full markdown extraction, and up to 25 rules per organization with strict rate limits.

What Is Firehose?

Firehose transforms the entire internet into a live event stream. Instead of polling or waiting for daily digests, users define precise rules and receive push notifications via SSE whenever matching content appears or updates.

Launched by Ahrefs in early 2026, the service leverages the company’s world-class web crawler to index and detect changes in real time. Community feedback on LinkedIn and developer forums indicates rapid adoption for AI agents, trading systems, and security teams — precisely because it eliminates the infrastructure burden of custom scrapers.

At its core, Firehose answers one question: “What changed on the web right now that matters to me?”

How Firehose Works

The workflow is deliberately simple yet powerful:

  1. Create a Tap — A container for rules (via dashboard or POST /v1/taps).
  2. Define Rules — Using Lucene syntax (max 25 per organization).
  3. Connect to Stream — Open an SSE endpoint (GET /v1/stream) with your tap token.
  4. Receive Events — Instant update events containing URL, title, diff chunks, markdown, and metadata.

Events are buffered for approximately 24 hours, allowing replay with since=1h or Kafka-style offsets. Automatic reconnection uses the standard Last-Event-ID header.

Core Features

  • Sub-Second SSE Delivery — No polling; events arrive the instant a page matches a rule.
  • Lucene ClassicQueryParser — Full Boolean logic, wildcards, phrase matching, and custom fields.
  • AI-Assisted Setup — Install the official Firehose skill in any compatible AI assistant; describe your needs in plain English and let the agent create rules.
  • Rich Event Payloads — Formatted summaries or raw data including diff (insert/delete chunks), page_category (e.g., /News), page_type, language, and full markdown.
  • Management API — Separate management keys (fhm_) and tap tokens (fh_) for secure delegation.
  • Quality & Safety Filtersquality=true (default) and nsfw=false options.

Mastering Lucene Queries

Firehose exposes indexed fields that deliver precision far beyond simple keyword alerts:

# Basic examples
added:tesla
"electric vehicle"
title:tesla AND page_category:"/News" AND language:"en"

domain:sec.gov AND title:"10-K"

# Advanced
added:"data breach" AND page_category:"/News" AND recent:24h

domain:arxiv.org AND added:"large language model"

domain:amazon.com AND title:deal AND page_type:"/Article"

Indexed fields include:

  • Text: added, removed, title
  • Keyword: domain, url, language, page_category, page_type
  • Special: recent:24h, publish_time:[2026-01-01 TO 2026-03-01]

Analysis shows these fields, combined with Ahrefs’ ML classification, reduce false positives by an order of magnitude compared to legacy alert tools.

API and Integration Deep Dive

The REST API is production-grade:

Authentication uses Bearer tokens (fhm_ for management, fh_ for taps).

Key endpoints:

  • GET/POST/PUT/DELETE /v1/rules — Manage up to 25 rules.
  • GET /v1/stream — SSE with parameters timeout, since, offset, limit.

Example Python client (using requests and sseclient):

import requests
from sseclient import SSEClient

token = "fh_your_tap_token"
stream = SSEClient("https://firehose.com/v1/stream", headers={"Authorization": f"Bearer {token}"})

for event in stream:
    if event.event == "update":
        print(event.data)  # JSON with diff, markdown, etc.

Rate limits enforce stability: 60 rule requests/min and 30 stream connections/min per tap. Error codes (401, 422, 429) are clearly documented for robust retry logic.

Real-World Use Cases

Firehose shines across industries:

Brand & Competitive Intelligence
Rule: added:"Tesla" OR title:"Tesla Motors" → Instant alerts on robotaxi launches, filings, and Reddit discussions.

Financial Trading
Rule: title:tesla AND page_category:"/News" AND language:"en" → Power algorithms with Reuters and Bloomberg updates before markets react.

Security & Compliance
Rule: added:"data breach" AND recent:24h → Receive CISA directives and Krebs reports in seconds.

Research & Academia
Rule: domain:arxiv.org AND added:"large language model" → Stream new preprints the moment they publish.

Developer Tools
Rule: domain:github.com AND title:"release" AND added:"breaking change" → Catch dependency updates before CI fails.

Additional documented scenarios include e-commerce pricing, legal filings, job market intelligence, and custom media feeds.

Firehose vs. Traditional Monitoring Tools

Benchmarks and community tests reveal clear advantages:

ToolLatencyPrecisionAPI/SSEPricingCrawler Scale
FirehoseSub-secondLucene + MLNative SSEFree betaAhrefs global
Google AlertsHoursBasic keywordsNoneFreeLimited
Mention/Brand24MinutesMediumWebhookPaidSmaller index
Custom ScrapersVariableHigh effortCustomInfra costsSelf-managed

Firehose eliminates the need for proxy rotation, anti-bot measures, and storage pipelines that plague custom solutions.

Advanced Tips and Common Pitfalls

Pro Tips:

  • Use since=5m on reconnect to catch missed events without full replay.
  • Combine AI skills with taps for dynamic rule generation based on business context.
  • Parse diff.chunks to trigger only on meaningful content changes, not boilerplate.
  • Leverage Last-Event-ID for zero-downtime browser or server reconnects.

Common Pitfalls to Avoid:

  • Exceeding 25 rules triggers 422 errors — consolidate with broader queries and post-filtering.
  • Ignoring rate limits on /v1/stream causes 429s; implement exponential backoff.
  • Forgetting to store management keys securely (shown only once on creation).
  • High-volume rules without quality=true can flood streams with noise.

Edge cases handled gracefully include adult-content filtering, date-range precision, and multi-tap delegation for team environments.

Conclusion

Firehose represents a fundamental shift in web monitoring: instant, precise, and infrastructure-free. By combining Ahrefs’ unmatched crawl data with modern SSE streaming and Lucene power, it enables AI agents, traders, security teams, and researchers to operate with real-time awareness previously reserved for enterprise budgets.

Ready to turn the web into your personal live data firehose? Sign up for free at firehose.com and start streaming within minutes — no credit card required.

Share this article