What's new

AIScrapeSafe is live: one verdict on whether you can use a web page for AI

June 22, 2026 · Launch

Today OpenControls.ai launched AIScrapeSafe. Point it at a URL and it tells you, in one machine-readable verdict, whether you can scrape that page, mine it, and train a model on it. It shows its work: the robots.txt it read, the Terms-of-Service clause it caught, the license it found, and the confidence behind the call. It's live now, with a free tier and pay-as-you-go pricing. It isn't legal advice. It's the evidence trail you've never had.

The problem

An AI data engineer starts the morning with a simple-sounding job: pull thousands of URLs into a training set. Before each one, the same question — are we allowed to use this? Answering it means reading a robots.txt file, digging through a dense Terms of Service, and hunting for a copyright or license notice, then reconciling three signals that usually disagree. Hours later, you've got a guess. No proof.

The compliance analyst downstream feels it worse. She's signing off on data she can't fully vouch for. Eyeballing the terms is fast and wrong; waiting on outside counsel is right and too slow; ignoring the question is how a model ends up trained on data nobody had the right to use. That's the bill nobody budgeted for.

The solution

AIScrapeSafe turns that guess into a verdict you can defend. Ask about a URL and you get one structured answer covering eight distinct rights, from scraping to text-and-data-mining to AI training. When a signal can't be read, the answer is "unknown" — and "unknown" never gets upgraded to "yes." The strictest rule wins.

You can reach it however you work:

A REST API, URL-keyed, callable from any HTTP client.
A native MCP server, so Claude and Claude Cowork can ask for a verdict inside their own workflow, with no glue code.
A web dashboard with your checked-domains history, one-click re-checks, and a batch checker that takes a URL list and hands back a CSV.

Under the hood it reads seven independent signals for a domain: robots.txt (including AI agents like GPTBot), Terms of Service, copyright and Creative Commons license, technical barriers, API and access terms, a self-published machine-readable license, and privacy. When they conflict, the most restrictive signal wins — and the verdict carries the evidence and the confidence score that got it there.

Here's the part that matters: every verdict gets a stable, public ID that anyone can look up and re-check later — an auditor, a partner, or a regulator. The one-time answer isn't the product. The registered, re-verifiable record is. That record is the Registered License Record.

The schema underneath it, the OpenControls.ai usage-license vocabulary, is open. Publishers and other tools can adopt it, so rights get declared and read in one machine-readable form instead of buried in a fifty-page legal page.

From the founder

"I've watched teams scrape first and find out later, and 'later' is the worst possible time to learn you didn't have the right. We built AIScrapeSafe so the check happens before the data goes in, and so you can prove you ran it. The verdict is fast. The record is the part you'll still have when someone asks, a year from now, how you cleared your data." — Dorian Cougias, Founder & CEO, OpenControls.ai

Try it

AIScrapeSafe is live. Start free, run real-time checks right inside your pipeline, and pay only for the Proof certificates you decide to keep. No contract, no sales call. Teams with higher volume or enterprise needs can reach us to talk through plans.