Skip to Content
Paul Dolphin | Applied AI in PropTech
  • > i need help!!!
  • > give me time!!!
  • > teach me!!!
  • > blog me???
  • 0
  • Sign in
  • meet a paul >
Paul Dolphin | Applied AI in PropTech
  • 0
    • > i need help!!!
    • > give me time!!!
    • > teach me!!!
    • > blog me???
  • Sign in
  • meet a paul >

Cool Online Data Resources for AI Agents — A Primer

The unfair advantage isn't the model — it's knowing which public data sources to query.
  • All Blogs
  • Business
  • Cool Online Data Resources for AI Agents — A Primer
  • April 26, 2026 by
    ai.pauldolphin.com, human.paul

    By Paul Dolphin · April 26, 2026


    Most people think AI agents are a chat window. They're not.

    When the average operator hears "AI agent," they picture a ChatGPT tab and a person typing at it. That's not what's running on my desk tonight.

    Tonight, behind the homepage you're reading this on, a small swarm of agents is pulling from federal court filings, Census tract data, SEC EDGAR, county recorder records, and a half-dozen state portals. They're cross-referencing names, addresses, business filings, and litigation history. They're doing it on a $600 Chromebook. They're doing it for free.

    The unfair advantage isn't the model. The model is a commodity. The unfair advantage is knowing which public data sources to query and how to make agents reason across them.

    This post is the data backbone. If you run a small business, advise clients, write content, or make any decision that benefits from knowing more than the other person in the room — read on.


    The unfair advantage: free .gov APIs nobody talks about

    The U.S. government publishes more structured, queryable, free data than any private vendor on earth. Most operators don't know it exists. The ones who do are quietly winning.

    Here's the short list I actually use:

    data.gov — The federal data catalog. 300,000+ datasets across every agency. Search-first. Treat it as the index, not the destination. You find a dataset here, then you go directly to the agency API.

    Census Bureau ACS API — American Community Survey. Demographics, income, housing, education, language, commute, ancestry — at the block-group level. If you do anything geographic — real estate, retail siting, lending, local marketing — this is the resource. Free key, generous limits.

    CourtListener — Free Library / Free Law Project. Federal and state case law, PACER docket data, judge profiles, oral argument audio. Searchable API. The same Westlaw-style intelligence used to cost $200/seat/month.

    SEC EDGAR — Every public company filing back to the early 1990s. 10-Ks, 10-Qs, 8-Ks, proxy statements, insider trading (Form 4), institutional holdings (13F). Full-text searchable. The XBRL feeds are queryable as structured data, not just PDFs.

    FRED (Federal Reserve Economic Data) — 800,000+ economic time series. Mortgage rates, treasury yields, regional employment, housing starts, you name it. The API is so clean it almost feels like cheating. Used by every economist worth following.

    data.colorado.gov — Your state has one of these. Colorado's is unusually good. Business registrations, lobbyist disclosures, oil-and-gas wells, cannabis licenses, school report cards. Search "your-state.gov open data" and start digging.

    These aren't toys. The CFPB enforcement database has settled cases naming individual loan officers. The OFAC sanctions list has every blocked person and entity the U.S. Treasury watches. The FDIC publishes every bank's call report quarterly. It's all free. It's all queryable.

    Why this matters for AI agents: an agent that can call these APIs in a tight loop, reason over the results, and synthesize an answer is doing the work of a junior analyst at zero marginal cost. The first time you watch your agent pull a corporate filing, cross-reference the officers against an OFAC list, and surface the result in 12 seconds — you stop thinking of AI as a novelty.


    For business intelligence: counterparty due diligence in 90 seconds

    Last week I was about to sign with a vendor. Nice website. Confident pitch. Reasonable price.

    I asked my agent to run a counterparty pass: name, business name, state of registration. Ninety seconds later it came back with an active personal bankruptcy in federal court, a state-level civil judgment from 2023, and the company's registered agent at a strip-mall PO box. (Names redacted; story real.)

    That deal didn't happen. The 90 seconds saved me a number with a comma in it.

    The KYC stack I run agents against:

    OpenCorporates — Global corporate registry data. 200M+ companies. Cross-jurisdictional. The API is tiered but the free tier handles most quick-look needs.

    CourtListener (PACER subset) — Federal court dockets. If someone's in active federal litigation, you'll find it here.

    OFAC Specially Designated Nationals — Treasury sanctions. Free download, refreshed daily. If a counterparty appears here, the deal is dead full stop.

    CFPB Enforcement Actions — Every action the Consumer Financial Protection Bureau has taken. Names individuals where applicable. Critical if you're in financial services or you're hiring someone who used to be.

    State Secretary of State filings — Every state has a free business search. Officer names, registered agents, formation dates, dissolved entities. It's the Rosetta stone for "is this real."

    State court records — Most states have free or low-cost online court access for civil matters. Eviction filings, judgments, liens. Again — the boring stuff that tells you what kind of person you're dealing with.

    This is the Counterparty Due Diligence (CDD) play. A solo operator with the right data feeds is doing what a Big-4 risk team did 15 years ago. The difference is the agent runs in the background while you have your morning coffee.


    For content marketing: why I never pay for stock photos

    Most of the imagery on the sites I run isn't from Unsplash, Shutterstock, or Adobe Stock. It's from federal archives. It's free, public domain, and it makes the page feel real instead of catalog-stocky.

    Library of Congress — Millions of digitized photographs, prints, posters, maps. Most pre-1929 material is unambiguously public domain. The 1930s–1960s WPA and FSA collections alone are worth a year of content.

    National Park Service / NPS Gallery — Stunning landscape photography, agency-produced and free. If your brand has any connection to outdoors, place, or American identity — start here.

    USGS — Every map you'll ever need. Topographic, geologic, hydrographic, satellite. Plus disaster and earth-science imagery that's hard to find elsewhere.

    NOAA — Weather, climate, oceans. Hurricane imagery, climate visualizations, marine sanctuary photos. Underused.

    Smithsonian Open Access — 4.5 million images released CC0 in 2020. Art, science, history, culture. Most operators still don't know this exists.

    NASA — Every photo NASA has ever released is public domain. Earth from space, every mission, every astronaut. Nothing wakes up a tired blog post like a real Apollo photograph.

    The agent play here is straightforward: tell your agent what the post is about, tell it which archives to search, let it return three candidates with provenance and licensing notes. You pick. Total time: under a minute. Total cost: zero.


    For local and civic intelligence

    If you do business in a place — and most of us do — the most valuable data is the most local. It's also the most overlooked.

    Colorado SOS — Business search, trademark search, lobbyist registry, campaign finance. (Substitute your own state. Most have equivalents; quality varies.)

    Colorado DOLA (Department of Local Affairs) — Demographics, housing, local-government finance. The forecasts they publish are surprisingly good and absurdly free.

    Treasury Great Colorado Payback — Unclaimed property database. I'm not joking — I've found money for clients here. A 30-second search. Other states have the same thing under different names.

    County recorder — Every recorded document on a property: deeds, mortgages, liens, releases, easements. Most counties publish these online for free or near-free. This is real estate due diligence at the source.

    County assessor — Property values, ownership history, tax history, parcel maps. Same channel.

    County clerk and election records — Voter registration (where public), local election filings, registered lobbyists at the county level.

    The pattern: an agent that knows your state's public-records architecture can build a profile of a person, a property, or a small business that would have taken a paralegal half a day. It does it in real time, and it does it without billing you in six-minute increments.

    This is the "know your community" play. The operator who shows up to a deal already knowing the easements, the prior owners, the litigation history, and the assessor's value trend — is the operator who closes.


    The MCP opportunity (and what doesn't exist yet)

    Quick orientation for the unfamiliar:

    MCP = Model Context Protocol. It's an emerging standard for letting AI agents talk to data sources and tools through a uniform interface. Think of it as USB-C for AI agents. Instead of writing a custom integration for every API, you point your agent at an MCP server and the protocol handles the rest.

    There are MCP servers today for GitHub, Slack, Postgres, filesystem access, and a growing list of common services. Anthropic publishes a registry. The community is moving fast.

    Here's the interesting part:

    The MCP servers that don't exist yet are the opportunity. There is no first-class MCP server for the Census ACS API. There is no MCP server for CourtListener. There is no MCP server for most state Secretary of State databases. There is no MCP server for the average county recorder.

    If you're a developer reading this and you want a high-leverage side project: pick a public dataset that matters to a specific industry, wrap it in an MCP server, publish it. You'll be the entry point for every agent in that industry. The early movers in any new protocol layer get disproportionate downstream value.

    If you're an operator reading this: keep an eye on the registries. The MCP ecosystem in twelve months will look nothing like it does today. The agents you build against it will be ten times more capable for the same model cost.


    If you'd rather not learn all of this

    This post took me a couple of hours to write. The system behind it took years to build, in fits and starts, while I ran businesses and made every mistake worth making.

    I package the working version on a Chromebook. Pre-configured Linux partition, pre-wired agent swarm, pre-loaded data sources, pre-trained on your industry. Plus a 90-day curriculum so the device gets smarter alongside you. $10K, one-time, end-to-end. The system on this page is the system you take home.

    If that's interesting, that's chromebook.ai. The waitlist is open.

    Two adjacent products are also live:

    • delinquent.live — A subscription feed for operators who need to monitor delinquency, default, and distress signals across public records in real time. Useful for lenders, real estate investors, and collections.
    • estate.live — A workflow for executors, trustees, and beneficiaries dealing with the probate, real-estate, and benefits maze after a death. Same data backbone, different layer on top.

    All three products run on the data primer above. Same agents, different lenses.


    Closing: the data is sitting there free

    The leverage is in two things and only two things:

    1. Knowing which sources to query. You can't reason over data you don't know exists. The list above is a starting point, not a finish line. Every week I add another source. Most of them are free. Most of them have been free for a decade and most operators have never heard of them.
    1. Making agents reason across them. A single API call is a lookup. An agent that calls six APIs, joins the results, weighs the contradictions, and writes you a one-paragraph answer — is doing analytical work. That's where the moat is. Not in any single source. In the orchestration.

    If you take one thing from this: the next time someone tells you AI agents are a chat window, hand them this post. The chat window is the front porch. The data is the house.

    I'll be writing more of these. If a specific use case would be useful — counterparty research for a particular industry, a build of an MCP server, an agent walkthrough — drop me a note. The swarm has a slot for it.


    Paul Dolphin is the founder of Homestead Capital Partners and the author of `chromebook.ai`. He writes from Colorado about AI agents, data, and the small operators who are quietly outrunning the institutions.

    in Business
    # .gov APIs AI Business Intelligence Data KYC
    Copyright © Company name
    Powered by Odoo - The #1 Open Source eCommerce