About — postgr.esq

WHAT IS THIS?

postgr.esq is a free, read-only archive and intelligence service for the PostgreSQL community. All data served is already public. We add full-text search, semantic search, code intelligence with call graphs (get_callees works; get_callers under repair), and structured MCP access for AI agents.

No personal data is collected. No accounts exist. All protocols are read-only. The software powering this service (Agora) is a derivative of public-inbox.org, ported to Go and significantly extended.

WHY THIS EXISTS

PostgreSQL's mailing lists, git repositories, documentation, wikis, and commitfest patches contain decades of distributed knowledge. We currently index 95,000+ messages from 2022-present across 16 core mailing lists (pgsql-hackers, pgsql-general, pgsql-bugs, etc.), with historical backfill in progress. This knowledge is scattered across different systems and formats, making it expensive and time-consuming for AI systems (and humans) to:

Search across all sources simultaneously
Correlate information (which discussion led to this commit?)
Fetch and parse different formats (mbox, git, HTML, markdown)
Handle repeated requests that overload upstream infrastructure

postgr.esq solves this by:

Indexing all PostgreSQL community sources in one place
Providing full-text search, fuzzy matching, and semantic search (semantic search is experimental and currently slow — prefer keyword search for time-sensitive queries)
Offering multiple access protocols (HTTP, NNTP, IMAP, POP3, MCP, Git)
Reducing load on postgresql.org infrastructure

FOR AI/LLM DEVELOPERS

The Problem

Fetching raw PostgreSQL data inefficiently wastes LLM tokens. Example:

Before: Fetch all 100,000 pgsql-hackers messages, parse manually → 50MB+ of tokens
After: Query structured, indexed results → 2MB of tokens (25x cheaper)

postgr.esq provides pre-indexed, pre-embedded data with structured queries, so you get exactly what you need without parsing overhead.

How Your Agent Uses This

You'll never need to do this yourself! This is just a peek under the covers. Your AI agent will use the methods exported by the MCP service to invoke this service and get information about Postgres faster than you'd imagine. Just give it a try!

When you connect your agent to postgr.esq via MCP, it gains access to 70+ structured methods for searching mailing lists, analyzing code, exploring git history, and discovering commitfest patches. The agent calls these directly—no manual HTTP requests needed.

Behind The Scenes (For Curiosity)

If you want to see how it works, here are some example queries against the live HTTP/JSON surface. The agent never makes you type these — it calls equivalent MCP tools directly.

1. Search a mailing list and get JSON results:

curl -s 'https://postgr.esq/m/pgsql-announce/?q=release&format=json' | jq .

Returns matching messages with subject, from, date, message-id, and a thread cursor. Each indexed inbox has its own search endpoint at /m/<inbox>/?q=<term>&format=json (the canonical path; /<inbox>/?... still 301-redirects there). Drop &format=json to get the HTML view a browser would render.

2. Fetch a single message as JSON:

curl -s 'https://postgr.esq/m/pgsql-announce/?q=release&format=json&limit=1' | jq '.results[0]'

Returns one full message-id-keyed record so you can follow up with thread, raw, or atom retrieval.

3. Talk to the MCP server directly (JSON-RPC over HTTP):

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":0,"method":"initialize",
       "params":{"protocolVersion":"2024-11-05",
                 "capabilities":{},
                 "clientInfo":{"name":"curl","version":"1"}}}' \
  https://postgr.esq/mcp | jq .

Initializes a session and returns the server's protocol version, capabilities, and tool surface. Real agent clients (Claude Desktop, Cursor, custom integrations) do this transparently. Subsequent tool calls (tools/list, tools/call, resources/read) require the session ID returned here, which is why most users let an MCP client library handle the protocol.

4. Clone an inbox via Git smart-HTTP:

git clone https://postgr.esq/m/pgsql-announce.git

Each inbox is also a public-inbox v2 git repository — mirror the whole archive locally and run your own indexer on top.

Why This Matters

Context Windows Are Expensive: Structured queries cost 25-100x less in tokens than raw data
Semantic Search Works Better: Ask "How does PostgreSQL handle NUMA?" and find related code, discussions, and documentation automatically (experimental, currently slow — use keyword search for time-sensitive queries)
Real-Time Data: Updates hourly (mailing lists), continuously (git), nightly (docs)
No Overhead: Data is pre-parsed, pre-embedded, ready to feed to your LLM

INFRASTRUCTURE

Open source. Self-hosted on commodity infrastructure across two regions in Europe with active replication. All ingestion, indexing, and vector embedding is performed on-premise — no third-party service sees the content of any query, nor any private data about who asked it. The full source for the application and its deployment is at codeberg.org/postgresq.

DATA SOURCES

Mailing lists: 54 PostgreSQL community archives, updated hourly. Full NNTP/IMAP/POP3/Git/HTTP access.
Git repositories: PostgreSQL upstream, forks (Supabase, Postgres Pro, Greenplum, AgensGraph, Postgres-XL), connection poolers (pgbouncer, pgpool2, pgdog), and client libraries (JDBC, ODBC, psycopg, tokio-postgres, etc.)
Documentation: postgresql.org/docs (all versions), full-text indexed
Wiki: wiki.postgresql.org (1,800+ public pages), searchable
Commitfest: Patch submission lifecycle tracking, currently live with web UI and API
Buildfarm: Build/test results — planned
Discord: PostgreSQL community Discord — coming soon

The critical value is the connective tissue between these sources: linking a mailing list discussion to the commit it produced, to the code symbols it changed, to the documentation it updated, and to the build results that validated it.

LEGACY: THE BERKELEY POSTGRES ARCHIVE

The /legacy section is a static mirror of the original Berkeley POSTGRES archive at dsf.berkeley.edu/postgres.html — Stonebraker's research project from 1986–1995 that became modern PostgreSQL. We mirror it here for permanence; bytes are preserved in our R2 backup as well as served from postgr.esq.

The mailing-list discussions, source releases (POSTGRES v3 through v4.2 plus postgres95-{0.01..1.02}), patches, and papers are also indexed and queryable through MCP and search_docs. See /legacy.

EXTENSIONS & ACKNOWLEDGMENTS

postgr.esq is a thin Go service (agora) over a much larger pile of work done by other people. The PostgreSQL extensions listed below do most of the actual heavy lifting; agora just routes traffic and stitches results.

Extension	Version	Role
PostgreSQL	18.3	Core database — every byte of the archive lives in PostgreSQL.
pgvector	0.8.2	HNSW vector index for the experimental semantic search over messages and code symbols.
pg_search (ParadeDB)	0.23.0	BM25 full-text relevance ranking — replaces the old `pg_trgm` body indexing for message-body search.
pg_deltax	0.1.0	Columnar / delta-compression storage for time-series data; staged for future `ag_messages` partitioning. No production tables yet.
pg_tre	1.1.1	TRE-pattern regex search with edit-distance (drives `?q=…&regex=1&k=N`). Built on Ville Laurikari's TRE library.
pg_mentat	1.3.0	EDN/datalog triple store; lets us cross-join mailing lists, git, docs, and wiki in one query language.
pg_cron	1.6	In-database job scheduler — mailing-list ingest, git pulls, backups all run from here.
pgcrypto	1.4	Hashing for content fingerprints (deduping messages, naming git objects).
pg_stat_statements	1.12	Query telemetry; how we find slow queries before users do.

Also indispensable, even though they're not loaded as PostgreSQL extensions:

public-inbox v2 — the on-disk archive format. agora is a Go reimplementation of public-inbox semantics; we owe Eric Wong and the public-inbox contributors the entire data model.
The ParadeDB team and the broader Postgr.esq community for the search-extension work and ongoing review.
Cloudflare Pages / Workers / Load Balancer — the edge proxy in front of postgr.esq.
ollama — local embedding inference; no third-party model provider sees query text.
Linux / NixOS / colmena — the deployment substrate. Two-region active replication is just colmena pushing the same flake to both hosts.

Bug reports, patches, and version corrections welcome via contact.

OWNERSHIP & SUPPORT

This service is operated by Greg Burd. I pay all hosting fees personally. No ads, no tracking, no data collection. This is a gift to the PostgreSQL community.

A Tip Jar is coming soon to help defray infrastructure costs. If you find this service valuable, please consider supporting it.

Questions? Get in touch.

postgr.esq — About