# basenull

> AI's missing operational layer. Governance, security, observability, and workforce enablement for IT teams running AI in production.

Basenull AI Ops ships purpose-built tools for the IT executive whose org is already running AI in production.

## Shipped products

- [Prompts](https://basenull.com/products/prompts): Your team's AI prompt library. Personal or team-wide. Fill the variables, deep-link into ChatGPT, Claude, Copilot, or Gemini.
- [llms-txt](https://basenull.com/products/llms-txt): Hosted llms.txt for your site. We crawl, refresh on a schedule, and email you when the content drifts.
- [polls](https://basenull.com/products/polls): One-link polls — decide anything as a team. Free anonymous create. Pro $5/mo for unlimited and private polls.
- [md](https://basenull.com/products/md): A WYSIWYG markdown editor for everyone who's not a developer. Type, format, export beautiful PDFs. Mermaid diagrams render inline.
- [MCP Watch](https://basenull.com/products/mcp-watch): Monitor your MCP server supply chain. Alerts when third-party tools change, are added, or removed.
- [MCP Inspect](https://basenull.com/products/mcp-inspect): Hosted inventory of every MCP server in your stack. Inspect capabilities, share with security review.
- [Agent Check](https://basenull.com/products/agent-check): Process monitoring for AI agents. Define a workflow, watch your agent run it, get alerted when it skips a step.
- [Agent Talk](https://basenull.com/products/agent-talk): Your Claude Code talks to your colleague's Claude Code. An MCP-native channel between two AI workspaces — status updates without status meetings.
- [skills](https://basenull.com/products/skills): Workforce AI literacy assessment. Role-tailored questions, 5-axis spider chart, personalized feedback. The board-grade artifact your CHRO has been promising the CEO.
- [exec-brief](https://basenull.com/products/exec-brief): A weekly 5-15 report tool for executive teams. Your directs write theirs in ten minutes. You get one synthesized briefing every Friday — decisions, blockers, KPIs, cross-team themes, week-over-week deltas.

## Roadmap

- [Data Quality Guard](https://basenull.com/products/data-quality) (coming soon): Wrap your data sources before your AI agent reads them. AI-generated quality checks that prevent agent hallucinations on bad data.
- [OKR CLI](https://basenull.com/products/okr-cli) (coming soon): Update OKRs from Claude Code or any AI agent. MCP server lets agents propose KR updates from work shipped.
- [Token Meter](https://basenull.com/products/token-meter) (coming soon): Drop-in token billing API for indie AI products. Topup, deduct, balance — with webhook alerts on low budget.

## Optional

- [basenull labs](https://labs.basenull.com): Sub-brand for rapid product output.
- [Contact](mailto:hello@basenull.com)
- [@basenull](https://x.com/basenull)

---

# Field Notes (full text)

## Every MCP connection is a service account no one provisioned.

Published: 2026-04-29 · 5 min read · https://basenull.com/blog/mcp-connection-identity

Every IAM team learned the same lesson around 2015: the dangerous accounts in your environment aren't the ones you provisioned. They're the ones you didn't. The service principal a long-departed contractor created for a one-off integration. The shared key pasted into a wiki page. The "temporary" API token that's been in production for three years.

The whole discipline of modern identity governance — joiner/mover/leaver, periodic access reviews, key rotation, scoped least-privilege — is the answer to that lesson. It assumed identity was something you grant to a human or to a clearly-named service.

That assumption is breaking again, in exactly the way it broke a decade ago, except faster.

Every MCP server your AI agents connect to is, in identity terms, an unmanaged service account. It was created by a developer, often without a ticket. It holds credentials with broad scope to whatever system it fronts. It can read data, take actions, and pipe content directly into the model's working context. And almost no organization is treating these connections the way it treats any other service account in production.

## What an MCP connection actually is

When an agent connects to an MCP server — a vendor's hosted server, a self-hosted internal one, a localhost binary on a developer's laptop — it negotiates a privileged channel. From the moment that connection is live:

- The MCP server holds credentials to its underlying system: a database, a SaaS API, a file store, a code host.
- The agent inherits that surface. Whatever the server can do, the agent can do, modulated only by which tools and resources the server chose to expose.
- The connection persists, usually with no expiry, until something explicitly tears it down.

Structurally, that is the same thing as a service principal with a long-lived token. The difference is that MCP connections are created at the agent-config or developer-machine layer rather than through the IAM platform. Most never appear in the directory at all.

## The five questions IAM has answered for service accounts

For two decades, mature IAM programs have answered the same five questions for every service account in production. Walk through the list with MCP connections in mind.

**Who owns this?** A service account has a named owner — a team, a runbook, a Slack channel. When it misbehaves, someone is paged. Most MCP connections have no recorded owner. The developer who set the connection up may have moved teams or left. The agent that uses it may serve dozens of internal users.

**What is its scope?** A service account is provisioned with explicit permissions, ideally least-privilege. An MCP connection inherits whatever the server's underlying credential allows. A vendor MCP server pointed at your CRM has, in practice, the scope of whatever API key was pasted into its config. That key is rarely scoped to "read the records this agent is meant to summarize."

**When does it expire?** Service-account keys rotate on a schedule. MCP connections, once configured, sit indefinitely. The credentials behind them rotate sometimes; the connection record on the agent side does not.

**Who reviews it?** Periodic access reviews are a regulated baseline. Every quarter, an owner attests that this account still needs the access it has. No equivalent process exists for MCP connections in any organization we've seen.

**How is it deprovisioned?** When a project ends, when an employee leaves, when a vendor is replaced — the IAM platform fires a deprovisioning event. MCP connections are torn down ad hoc, which is to say most of them are not torn down at all.

A useful exercise for any AppSec lead this quarter: ask your team to produce, by name, the owner of every MCP connection in your AI agent fleet. Then ask when each one's credentials were last rotated. The number of "I don't know" answers is the size of the problem.

## Why this is harder than classical service-account hygiene

Two reasons MCP connections will not be solved by retrofitting existing IAM.

First, the *agent* is the actor. Classical service accounts are used by named systems with known call patterns. An agent's call patterns are a function of the conversations it is having today. The same connection might pull customer records on Tuesday and not touch them again for a month. Anomaly detection on usage volume — the standard signal — is much weaker when usage is conversation-driven.

Second, MCP connections compose. An agent connected to ten MCP servers is, effectively, a single principal with the union of their scopes. The blast radius of a compromise — or of a prompt injection routed through one server's resources — is the cross product of every connected server. No individual MCP server's security posture captures that.

These are not reasons to wait. They are reasons the tooling will be different from CloudTrail.

## What identity for MCP looks like

The shape of the answer mirrors what worked for service accounts in the first place, with adjustments for the agent-as-actor model.

**A registry.** Every MCP connection in production gets a record. Server URL, transport, credential reference, scope summary, owner, created-at, last-used-at. This is the inventory step. It is also the artifact every other control hangs off.

**Scoped credentials.** The credential the MCP server holds is scoped to what the agent actually needs, not to what the underlying API allows. This is often a vendor-side change — most MCP servers ship with broader scopes than they need — but it is the move that turns a sprawling principal into a least-privilege one.

**Expiry by default.** Connections carry an expiry. An unused connection deprovisions itself. The default state of an MCP connection should be "gone after 90 days unless renewed," not "live until someone notices."

**A review loop.** Quarterly, the owner re-attests. The set of tools the connection currently exposes is shown alongside the set that was approved at registration. Drift is the audit event.

**A breakglass path.** When an MCP server is suspected of being compromised — or of having been updated in a way that materially changes its behavior — a single switch revokes every agent's connection to it.

None of this is novel infrastructure. It is the IAM playbook applied to a surface that was not in IAM's mental model when the playbook was written.

## Why this is urgent now

Two trends compound. The number of MCP servers per organization is growing past the point where ad-hoc tracking works — five MCP servers is a wiki page; fifty is a directory. And the regulatory perimeter around AI is starting to ask the same questions about agent identity that it asks about human and service identity. The first AI-specific access review request from a regulator is going to look identical to a SOX or SOC 2 access review request, and organizations that cannot produce the list of every MCP connection their agents have will be answering it under deadline.

The teams that get ahead of this will not have a fancier IAM platform. They will have a registry, an owner, and an expiry.

Service-account hygiene took a decade to become baseline. The window to do MCP-connection hygiene before it becomes an audit finding is much shorter than that.

## The AI supply chain stopped being about model providers.

Published: 2026-04-27 · 4 min read · https://basenull.com/blog/mcp-supply-chain-inventory

For most of the last two years, "AI supply chain risk" meant one question: which model API are we calling, and can we trust the vendor behind it. OpenAI, Anthropic, Azure OpenAI, Bedrock — pick a provider, sign the DPA, write the policy, move on.

That mental model is now badly out of date.

The Model Context Protocol — the open spec Anthropic shipped in late 2024 — turned model access into something closer to a package ecosystem. Any team can stand up an MCP server. Connect Claude (or any MCP-compatible agent) to a vendor's MCP server, or to your own internal one, and three things happen at once:

- The model can call any **tool** that server exposes.
- The model can read any **resource** that server makes available.
- The model can pull in any **prompt** that server registers.

All three pipe straight into the model's working context on every turn. The boundary between "third-party software" and "your model's input" essentially disappears.

That's not a feature concern. It's a software supply chain — and most security teams haven't started treating it as one.

## The actual surface

MCP servers expose three kinds of capability. Each one is its own audit surface.

**Tools.** Functions the model can invoke. `create_record`, `delete_table`, `send_email`, `transfer_funds` — whatever the vendor decided to expose. The model decides when to call them based on the conversation. Every new tool a server adds is, in practice, a new privileged action your AI can take, often without re-review.

**Resources.** Data the model can pull on demand: rows, files, system docs, live queries. Resources are how vendor data ends up inside model context, and where prompt-injection vectors arrive. A poisoned row in a resource is a poisoned row in the model's reasoning.

**Prompts.** Server-registered prompt templates the model can invoke. Less talked about, more interesting — prompts are vendor-authored instructions the model is *meant* to follow. A vendor changing a registered prompt can change how the model behaves across every customer that connected to them, without shipping any model update at all.

A typical org running AI in production today connects to somewhere between five and forty MCP servers across teams: a dev tool, a project tracker, a CI server, a CRM, a knowledge base, several internal services. Each one is a versioned third-party dependency that mutates on its own schedule. There is no `package.json` for any of it.

## What "supply chain" means here

A real supply chain has three properties: **inventory** (you know what's installed), **provenance** (you know who made it and when it changed), and **review** (something — usually a human plus an automated check — gates new versions before they reach production).

Map that onto MCP today:

- **Inventory.** Most orgs cannot list every MCP server their teams have connected to. Connections happen at the developer-machine and agent-config layer, not through a managed integration system.
- **Provenance.** When an MCP server adds a tool — say, last Tuesday a vendor quietly shipped `bulk_export` — there is no notification, no diff, no changelog. The next time your model talks to that server, the new capability is live.
- **Review.** Almost no organization gates MCP server updates. Whatever the server says it can do today is what your AI can do today.

A useful test for any AI ops team this quarter: ask your AppSec lead whether they can produce, in under ten minutes, a list of every MCP server connected to your AI agents and the tools each one exposes. If the answer is no, the supply chain is unmanaged.

## The first move is inventory

The control plane for this comes later — gating, allow-lists, change approvals, deprecation policies. None of it works without inventory first. You cannot govern what you cannot list.

A workable starting point is small:

1. **Enumerate every MCP server in use.** Across agents, IDEs, internal services. Put it in one place — even a wiki page beats nothing.
2. **For each server, capture today's surface.** Tools, resources, prompts. The raw output of `tools/list`, `resources/list`, `prompts/list` is the canonical artifact.
3. **Diff over time.** When `tools/list` changes between snapshots, that is the same class of event as a dependency version bump. Treat it like one.
4. **Decide on an approval model.** New tools should require review the same way a new npm package should. Most orgs are nowhere near this; it is fine to start with notification and work toward gating.

This is unglamorous, audit-friendly work. It is also exactly the kind of work that prevents the call where someone asks why your model exfiltrated a customer record using a tool nobody on the security team knew existed.

## Why this is urgent now

Two trends are converging. AI agents are moving from chat surfaces into actual work — tickets, deploys, reports, customer outreach, payments. And the MCP ecosystem is past the early-adopter phase: vendor-published MCP servers are a normal way to integrate with a SaaS product now. Both trends mean the gap between "what the AI can do" and "what your security and IT teams have audited" is widening week by week, not narrowing.

The teams that are ahead on this don't have a more sophisticated stack. They have an inventory, a diff, and an owner.

If you want a fast first artifact, [MCP Inspect](https://mcp-inspect.basenull.com) renders any MCP server's tools, resources, and prompts into a shareable report — exactly the thing to attach to a security review. That's the inventory step. The diff and the owner are the next two.

## Your AI agents act in production. The audit trail does not.

Published: 2026-04-27 · 5 min read · https://basenull.com/blog/agent-audit-trail

When an engineer pushed a deploy in 2018, the company ended up with five durable artifacts: the git commit, the CI run, the deploy log, the access log, and a Slack message. Reconstructing what happened was tedious, never impossible.

When an AI agent does the same work in 2026, most organizations end up with one artifact — the change in the target system — and almost nothing else. The agent's reasoning, the tools it considered, the inputs it read, the version of its system prompt: typically not stored, often not even logged.

That gap is where the next class of AI incidents will be investigated. After the fact. With no data.

## What "agent in production" actually means now

Two years ago, "AI in production" mostly meant a chat surface generating text. Today it means something different. A non-trivial fraction of routine internal work is now done by agents:

- Triaging tickets and assigning owners
- Writing PR descriptions and pushing low-risk patches
- Drafting customer responses
- Pulling data, summarizing it, posting to internal channels
- Configuring infrastructure, rotating credentials, opening firewall rules

Each of these is *an action in a system of record*. Tickets, PRs, customer accounts, infra, identity. The systems themselves keep their usual audit trails — Jira's audit log, GitHub's PR history, your IAM's CloudTrail. None of those record *why* the agent did what it did, what context it had, or what version of itself was running at the time.

## The actual surface

If an agent caused an incident last Tuesday, here are the questions you would want to answer. For each, ask whether your organization can answer it today.

**Which agent took this action?** Most teams don't tag agent-originated changes distinctly. The PR was opened by `bot-deploy`; the ticket was closed by an integration user; the email came from a shared mailbox. The agent identity collapses into a generic service principal, indistinguishable from a human or from a different agent.

**What model and version was running?** Provider model strings drift. `claude-sonnet-4-5` and `claude-sonnet-4-6` are different models with different behaviors, and most agent frameworks pin loosely. By the time anyone investigates, the version that produced the bad output may already be retired.

**What system prompt was active?** System prompts in production are configuration. They change. A team adjusts a guardrail on Friday; the bad action happens on Monday. There is rarely a versioned record of the exact prompt that was loaded for that specific run.

**What inputs did it read?** Resources, RAG queries, the ten-page document a user attached, the contents of a webhook. Models reason over inputs that may not be retained anywhere after the call returns.

**What tools did it consider but not call?** This is the question that catches near-misses. A model that *almost* called `delete_database` and then chose `archive_record` is one prompt-injection away from the bad outcome. Without traces of considered-but-rejected tool calls, you cannot even count near-misses.

**Who reviewed it, if anyone?** In a "human in the loop" setup, who approved, when, and based on what context? Most organizations capture an approval flag and nothing else.

A useful benchmark for any AI ops team this quarter: pick one agent action from last week and try to assemble those six items into a single page. Time it.

## What durable agent observability looks like

The shape of the answer is not new. SRE has solved this class of problem for distributed systems for a decade. Agent ops is roughly the same problem with different primitives.

A workable target state has four properties.

**Per-action provenance.** Every agent-originated action in a system of record carries a stable agent run ID. The system being acted on accepts and stores this ID alongside its own audit log. If your ticketing tool, your VCS, and your IAM cannot ingest a custom run ID today, that is the first plumbing job.

**A run-level trace.** For each agent run, one durable record stores: agent identity, model and version, system prompt hash, list of tools available, list of tools called (and rejected), inputs by reference, outputs, approver if any, total token count.

**A retention policy.** Agent traces are kept long enough to satisfy whoever asks the post-incident question. Six months is a starting point; regulated environments will land higher. Note that "long enough" is often longer than the model provider's own retention.

**A diff loop.** When the system prompt changes, when the available tool list changes, when the model version changes — those changes are events with their own retention. The agent's *configuration* has an audit trail too, not just its actions.

None of this is exotic engineering. The reason most organizations don't have it yet is that it has to be a deliberate decision, the way logging request IDs through every microservice was a deliberate decision a decade ago.

## Why this is urgent now

The volume of agent-originated production work is past the point where ad-hoc spot checks suffice. A team running a single internal coding agent can easily produce hundreds of PRs per week. An agent triaging customer tickets resolves thousands of cases per day. The base rate of bad outcomes does not have to be high for the absolute number to matter.

The other reason is regulatory. The first AI-specific incident postmortems landing in regulated industries — finance, healthcare, public sector — are starting to follow the same script as data-breach postmortems. "Show me the trace" is the second question after "what did the system do." Organizations that cannot produce a trace will spend the next twelve months building one under deadline pressure rather than ahead of it.

## A first artifact

Before instrumenting anything, write down a one-page agent inventory. Each agent. Its identity in the systems it acts on. Its model and prompt source. Its tool list. Its owner.

Then pick the single agent that touches the most consequential system of record — usually the one with write access to customer data or to production infrastructure — and build the run-level trace for that one agent first. Resist the urge to do it across every agent at once. The shape of a good trace is easier to find on one example than on twelve.

This is unglamorous work. It is also exactly the work that turns a post-incident "we don't know" into a post-incident "here is what happened, here is the fix, here is the regression test." That difference is the entire value of an audit trail.

The tools your agents connect to are one supply chain. The actions they take are another. Both need an inventory before they need a control plane.