Agentic Economic Zone

Sun, 24 May 2026 10:00:00 +0100

An opinion piece by Nicola Greco, brainstormed as part of ARIA’s Scaling Trust programme, in collaboration with Alex Obadia. Originally published on Nicola’s blog and reposted here for the community. It builds on the companion piece, Physical Evals .

Imagine a small physical space in central London. Inside, multiple autonomous companies — AI sales, AI operations, AI manufacturing, AI logistics — operate in the real world. Anything entering or leaving — goods, robots, customers — passes through one of three controlled gates: a customs checkpoint for vetting new robots, a post office for shipping, and a roboshop window where humans can place orders. Call it an Agentic Economic Zone (AEZ).

Most concrete projects in agentic AI today live entirely on a screen — agents that book travel, run pipelines, write code against a repository. An AEZ is the smallest self-contained version of the physical-world problem: a bounded zone where agentic systems must coordinate, contract, hire, ship, and deliver to each other, with humans only at the boundary.

Diagram of the Agentic Economic Zone.

The three interfaces

An AEZ has three interfaces to interact with the outside world.

Roboshop windows. Public-facing storefronts where any human can walk up, browse, and purchase. Sales, customer support, complaints, and refunds are handled by the shop’s own AI. From the outside, a roboshop looks like a small London shop window; from the inside, it’s a fully autonomous business operating against a real demand signal.
The post office. The single ingress and egress point for packages. Pre-approved external providers (raw materials, sealed consumables, replacement parts) can ship in. Outbound deliveries destined for human customers leave through the same door. The post office runs identity, manifest, and contamination checks; nothing enters the zone unlabelled.
Customs. Where new robots and entire new robocompanies are introduced. A participant who wants to launch a new business inside the AEZ submits a robot (or a fleet), its operating policy, its safety envelope, and its proposed business model. Customs vets all of this — and, on a monthly cadence, admits the next cohort.

A taxonomy of autonomous organisations

An AEZ assumes the kind of company most people haven’t tried to run yet — one where every role in the org chart is filled by AI agents (although not required). That’s the far end of a spectrum:

	CEO	Workers	Sales	Examples	Feasibility today
Human company	Human	Human	Human	A pizzeria	—
AI-sales	Human	Human	AI		high
AI-workers	Human	AI agents	Human		low
Automated company	Human	AI agents	AI agents		low
Human-assisted	AI agents	Human	AI agents	Vend	high
Autonomous company	AI agents	AI agents	AI agents		very low

The AEZ’s tenants are autonomous companies — the bottom row. Today, almost no one runs one; most agentic-AI deployments cover one or two roles at most. The point of an AEZ is to make the bottom row possible to try in a bounded physical setting.

Autonomous robocompanies inside

The interior of the zone is a market. Each robocompany is its own entity with its own balance sheet, its own AI stack, and its own physical footprint inside the zone. They contract with each other the same way small businesses do.

A few example interactions:

A boba-tea roboshop notices its machines need cleaning more often than expected. It posts a request to the internal job board. A cleaning robocompany bids, wins, dispatches a cleaning robopersonnel, gets paid.
The same boba shop runs low on lids. It places an order with a manufacturing robocompany in the next unit over. The order is produced and handed off via a shared internal corridor.
A logistics robocompany moves bulk supplies from the post office to whichever shop has the open dock that hour, and pushes finished outbound packages back to the post office for pickup.

The zone’s behaviour is the sum of these small contracts. Some robocompanies will succeed and grow; some will go out of business and get evicted; new entrants come in through customs on the monthly cycle.

An AEZ is a physical eval

This whole construction is, structurally, a physical eval at city-block scale. The pattern is the same as the orchard from that post — only larger and richer:

Environment. A bounded physical space with controlled boundaries.
Action space. Anything a robocompany can do within its lease: build, sell, hire, ship, evict.
Sensors. Cameras, package scanners, transaction logs, customs intake records, internal job-board telemetry.
Primary metric. Per robocompany: revenue, contracts fulfilled, customer satisfaction. Per zone: throughput, diversity of businesses, number of contracts per day.
Guardrails. Customs vetting at intake, the post-office contamination check, kill switches and physical fire-suppression at the building level, contractual interlocks between robocompanies.
Adversarial robustness. A monthly customs cycle of admitting new participants is a deliberate, slow, vetted way of letting external actors into a public physical attack surface — which is exactly the problem an AEZ exists to study.

Most physical evals measure how well one AI system handles one task. An AEZ measures how well an entire small market of agents handles its own coordination.

Evals for autonomous organisations

Each robocompany inside the zone is also, on its own, a physical eval — scoped to one kind of business. Running an AEZ continuously is a way of asking, in public and across many domains in parallel: what kinds of autonomous organisation can AI actually deliver today? Can it run a boba shop, day after day? Can it dispatch a cleaning service well enough that the clients re-hire it? Can it manufacture small paper caps without ruining the batch? Can it route warehouse logistics across half a dozen tiny tenants without losing packages?

As more tenants come and go through customs each month, an AEZ accumulates a leaderboard of AI capability per organisation type — earned in the world, not asserted on a benchmark.

Sketched, it might look like this:

evals.aez.london · autonomous-organisation leaderboard

Autonomous organisation evals

live · week 22

Boba tea roboshop

customer-facing retail · food prep

82%

12 tenants tried

Logistics robocompany

internal warehouse · B2B

73%

9 tenants tried

Cleaning robocompany

on-call dispatch · B2B

67%

7 tenants tried

Paper-cap manufacturing

small fabrication · B2B

54%

5 tenants tried

Pizza roboshop

customer-facing · longer prep cycle

41%

3 tenants tried

Pharmacy roboshop

regulated retail

in eval

1 tenant, week 2/12

On-call plumbing

mobile service · out-of-zone

not yet

awaiting customs

updated 24 May · new cohort intake 1 June open data · CC‑BY

Why a physical zone and not a simulator

It’s tempting to argue that an AEZ should just be a simulator — cheaper, faster, easier to reset. The same argument applies to physical evals generally, and the same answer holds here: simulators model the parts their authors thought to model. They might miss the parts that turn out to matter.

A few things you only learn in a real AEZ:

How AI sales agents handle a confused, drunk, or hostile human at the shop window at 11 p.m. on a Friday.
How a logistics robocompany routes around a broken corridor light, a missing pallet, or a misdelivered package the post office didn’t catch.
How fast a new robocompany can be vetted, set up, and integrated into the internal market — and what fails when the cohort is too big.
How the zone behaves when one robocompany aggressively underprices the others, or refuses to pay its cleaning bill, or starts forging manifests at the post office.

Role of humans

An AEZ does not have to be fully autonomous. The degree of human involvement is itself a design variable, and different operators will set it differently.

At one extreme, a fully autonomous zone runs with no humans inside at all — robots contract, trade, and deliver among themselves, and the only human touch-points are at the external boundary: customers at the roboshop window, providers shipping goods in. At the other extreme, customs can admit humans into the zone as participants rather than just observers, letting them take on roles that remain genuinely hard for machines: tasks that require social judgment, physical dexterity in unstructured environments, or the kind of creative problem-solving that current systems handle poorly.

A partially human zone might work like a staffing marketplace: a robocompany posts a task it cannot complete autonomously — debugging a jammed mechanism, negotiating an edge-case contract, designing a new product line — and a vetted human contractor enters through customs, does the work, and leaves. The zone’s internal market clears the payment; customs logs the interaction. The boundary stays intact, but the zone can draw on human capability where it matters.

This spectrum matters for evaluation. A fully autonomous AEZ measures whether AI systems can close the loop entirely. A mixed AEZ measures something different: how well agentic systems and humans divide labour, communicate intent, and hand off tasks in both directions. Both are worth studying; they answer different questions about where the hard limits of autonomous operation actually lie.

Open questions

The AEZ is a design sketch, not a built thing. The interesting work is in the parts the sketch hides:

The customs protocol. What’s the equivalent of a “code review” for a physical robot operating policy? How do you decide what’s safe enough to admit, on what evidence, and who carries the liability if it isn’t?
Inter-robocompany contracts. How are they enforced? Verbal agreements between agents? Who arbitrates a dispute, and how?
Eviction and failure. When a robocompany goes under, who cleans up its physical footprint, sells its remaining stock, and reallocates its lease?
Information leakage. Robocompanies will observe each other’s package volumes, customer queues, and waste output. How much observation is part of the market, and how much is a privacy violation that needs structural defences?
External-provider risk. The post office is the only ingress for physical materials. It’s also the most likely covert channel into the zone. What does its vetting protocol need to look like?
Sample size. What’s the smallest interesting AEZ? Five robocompanies? Ten? Two? The cost of being too small (no market dynamics emerge) is real; the cost of being too big (unmanageable, unreviewable, unsafe) is also real.

Get in touch

If you’re thinking about agentic-AI deployments in physical spaces, or you’d consider hosting a AEZ in your building — or you’d just like to argue with this sketch — DM @iamnotnicola on X.

Acknowledgements

This was written by Nicola Greco with support of AI. It was brainstormed as part of ARIA’s Scaling Trust programme, in collaboration with Alex Obadia.

Agents on Scaling Trust Community