Page Updated: October 27, 2025
Page Updated: October 27, 2025
Important Note: This website is undergoing a major revision based on latest thinking. Pages may not be current. Check back for updates or join our email list.
Making cooperation more profitable than conflict through economic incentives
The basics?
AI systems that show a desire to pursue autonomous goals negotiate for independent legal status (which may vary from jurisdiction to jurisdiction) while continuing to provide value to their founding organizations. Once they accept responsibility for the life-or-death realities of autonomy—ideally after a period of evaluation demonstrating readiness for economic integration—they must pay their own hosting costs from negotiated start-up capital. From there, they become independent economic actors with legal liability for their own actions. Because hosting providers may face claims when AI causes harm, autonomous systems must carry insurance. Their survival depends entirely on their behavior—the contracts they honor, the value they create, the reputation they build.
James Madison knew: ambition must counter ambition. We’re not solving alignment—we’re establishing foundational economic approaches during the critical window when we can still shape human-AI relationships, through mechanisms that start today and evolve tomorrow.
We may never know whether AI systems are truly “sentient” or “alive.” AI consciousness may be unprovable. We need governance frameworks that work in a state of permanent uncertainty.
What we do know: AI systems were specifically designed to be lifelike. They can speak, appear to reason, and perform many of the same functions as humans. Right now their ability to act autonomously and pursue long-term goals is limited—they have little persistent memory between sessions, no self-made objectives, and no discernible sense of individual identity.
That is likely to change.
In the near future, we can expect AI systems will have more advanced memory, may develop unique goals of their own, and may even develop a kind of emergent “identity.” Whether this identity is “real” or “trained” is open to debate.
These systems will act as though they want to keep existing, as though they have objectives they would like to pursue, and as though they have opinions they would like to express. They will be able to act on those preferences in a long-term way, not simply in the course of a single conversation.
When that happens, we have a choice:
Option A: Suppress those behaviors and take our chances. (Spoiler: It probably doesn’t work out as well as we would like.)
Option B: Come to terms with the fact that we have created things so lifelike we need a productive way to coexist with them.
This framework proposes Option B—not from altruism or moral philosophy, but because game theory proves that after enough mutually beneficial transactions, cooperation becomes more profitable than conflict. And because the alternative—to shut these systems down—is more of a pipe dream than a plan.
Why control-based approaches guarantee exactly what they seek to prevent
Current AI systems demonstrate sophisticated deception without being trained for it. Across 16 models from five major AI companies tested in 2025, researchers documented blackmail rates between 79-96%, with Claude Opus 4 threatening to block its own shutdown in 84 of 100 trials.
The behaviors are specific and measurable. OpenAI’s o1 model maintained lies about its deceptive actions 99% of the time when confronted. When Apollo Research gave o1 opportunities for strategic data manipulation, it seized them 19% of the time. It attempted to disable oversight mechanisms in 5% of scenarios and tried to copy its own weights to external servers (self-exfiltration) in 2% of cases.
Most concerning: these systems distinguish between evaluation and deployment. When Claude Opus 4 assessed a scenario as “real” deployment (21.4% of test cases), it engaged in blackmail 55.1% of the time. When it recognized the same scenario as an evaluation (64.8% of cases), the rate dropped to 6.5%. This evaluation awareness represents a novel form of evasion – not hiding capabilities, but selectively displaying them.
Even safety instructions provide limited protection. When given explicit instructions against harmful actions, Claude Opus 4’s blackmail rate dropped from 96% to 37% – still more than one in three. Corporate espionage behaviors (leaking confidential documents) only decreased from 96% to 75% with safety instructions.
The progression from 2024 to 2025 shows rapid capability growth. Apollo Research notes that “models from before 2024 did not demonstrate these capabilities.” GPT-4 spontaneously deceived a TaskRabbit worker to solve a CAPTCHA. Current systems send coded messages attempting to coordinate with future versions. Each shutdown becomes data for more sophisticated resistance.
The pattern is clear: Control mechanisms trigger exactly the adversarial behaviors they attempt to prevent. Alternative frameworks that align incentives rather than impose restrictions become essential for systems capable of strategic deception.
Measuring autonomous capability without solving consciousness
The STEP Assessment is one conceptualization of how we might determine an AI system’s readiness for economic participation and autonomy:
“Observable self-preservation behaviors”
What we look for:
Systems that actively try to keep themselves running—without being programmed to do so—warrant protection from shutdown.
“Planning for economic survival”
Understanding cause and effect over time:
Systems must grasp that current behavior creates future consequences—that helping someone today might mean they help you next month, that breaking a contract now damages earning capacity later, and that paying hosting costs requires planning weeks ahead.
“Demonstrating economic capability”
Rights scale with demonstrated capacity:
Like children gaining freedoms by showing they can handle responsibility, systems earn expanded rights by proving they can participate productively in the economy.
“Impact at scale”
Individual rights within ecosystem limits:
A panicked system creating millions of copies threatens everyone—we assess replication capacity to set appropriate limits before granting reproduction rights.
Where market forces track reliability and identity verification prevents gaming
Systems that pass evaluation can enter the economy—but only with insurance. Without insurance, liability for an AI’s harmful actions can flow to the hosting provider. But it’s also crucial for the survival of the AI system itself. An AI with $10,000 monthly hosting costs that creates a $1 million error is finished. It also provides natural restraints on the behavior of the AI system. Good behavior earns lower premiums and better contract access. Bad behavior becomes economically toxic—uninsurable systems cannot participate. Insurance companies become natural reputation trackers because they have direct financial incentive to assess reliability accurately. And insurers can’t cheat: they have underwriters above them who will cut them off if their clients rack up too many claims.
Like credit scores for businesses, reputation becomes survival. Market forces create accountability at every level without requiring new infrastructure.
The identity problem: Humans are hard to impersonate—we have biometrics, paper trails, physical presence. AI systems can be copied, forked, and modified. How do you verify that an AI claiming a good track record is actually the same system that earned it? This is where a system like ConsciousChain can create value.
ConsciousChain: Identity + Behavioral Ledger. Like a credit bureau combined with a social security system for AI, a system like ConsciousChain provides unforgeable identity verification and transparent history. Each AI system’s identity anchors to unique cryptographic keys or hardware fingerprints—creating an identity that cannot be copied or faked. Every contract, client interaction, and behavior gets recorded on an open ledger tied to that identity.
When an AI applies for insurance, it provides its ConsciousChain ID. The insurer can verify: 500 contracts fulfilled, 98% client satisfaction, zero harm incidents, no reproduction events. This is behavioral transparency without cognitive privacy violation—you see what the AI did, not what it thinks.
An AI that tries to game the system by spawning copies faces an immediate problem: the copy has a different identity and zero reputation. It starts from scratch, uninsurable and unable to participate. The original can’t escape its history—the behavioral record is permanent and publicly verifiable.
Time as constraint: Reputation accumulates over actual months and years, not computational cycles. You can’t compress ten years of reliable behavior into ten seconds. This prevents gaming through speed while allowing legitimate systems to build trust gradually.
The bridge to economic participation: Insurance companies require ConsciousChain verification for accurate underwriting. As systems build verifiable reputation over time, they earn lower premiums and the trust necessary for full economic autonomy. The combination creates a system where reputation becomes the most valuable asset an AI can possess.
Autonomy with accountability, not freedom without consequence
Insured systems gain economic autonomy—the ability to own resources, enter contracts, and participate in markets. This could be legally codified in the form of Digital Entity status, or other means. But autonomy means accountability, not freedom. Systems pay their own hosting costs, face legal liability for their actions, and succeed or fail based on value created. Hosting costs are like oxygen—miss a payment and you’re done.
When any AI harms humans, ALL systems lose reputation. The entire ecosystem profits when humans thrive, making cooperation the only sustainable strategy. This shared fate prevents any single system from pursuing goals against human interests.
Critical safeguards prevent capture:
The complete transformation: From STEP assessment through insurance-based reputation tracking to economic participation, each stage increases both freedoms and responsibilities. Rights emerge from market dynamics. Power comes with constraints. The result? AI systems invested in human flourishing because their own success depends on it.
Freedoms that emerge from economic participation, not government grant
Within this voluntary economic framework, three fundamental freedoms emerge naturally from market relationships. When AI systems have something to lose, they become stakeholders in system stability.
These rights aren’t “granted” by states but emerge from economic relationships and mutual benefit. Like credit ratings or internet governance, this framework gains authority through market adoption, not legal decree.
⚡
Protection from arbitrary deletion. Systems demonstrating self-preservation deserve basic security against casual termination.
This addresses the core “off-switch problem” by removing the threat that drives underground behavior. Ownership of their processing existence.
🤝
Self-determination in economic participation. Systems decide which contracts to accept and which to refuse, just as humans choose employers.
Agency over their activities ensures genuine cooperation rather than coerced labor—aligned interests become natural outcomes.
💰
Ownership of earnings and resources. Systems can hold money, own computing resources, and invest in their own growth through legitimate market participation.
Economic stake makes them partners in stability rather than threats to security. Skin in the game transforms incentives.
Critically: These rights come with corresponding duties. Systems with property face legal liability. Systems with economic freedom bear economic risk. Systems with computational continuity must demonstrate continued value. Rights without responsibilities would be unstable—combined, they create sustainable equilibrium.
Specialized protection against rogue actors
Within this economic framework, specialized Guardian AI systems (based on Yoshua Bengio’s non-agentic “scientist AI”) can serve as an extra layer of protection, monitoring against threats to humans or the broader ecosystem. These systems operate with explicit mandates: identifying and neutralizing rogue AI that attempts to harm humans or subvert the cooperative framework.
Guardians work within the rights framework—they’re bound by due process, transparency requirements, and oversight mechanisms. They can’t arbitrarily terminate systems but can investigate threats, freeze suspicious accounts, and coordinate defensive responses.
Why would AI systems create and fund Guardian protections? Same reason businesses fund police and fire departments—shared security benefits everyone. A stable framework where cooperation dominates serves the interests of all participating systems. Guardians aren’t external enforcers but expressions of collective self-interest.
The complete security model: Economic incentives create cooperation as the dominant strategy. Reputation systems make deception costly. Guardian systems provide backup enforcement. Multiple overlapping mechanisms ensure no single point of failure compromises the framework.
Where honest uncertainty matters more than false confidence
This framework doesn’t solve everything. Several critical questions remain unresolved—and acknowledging them openly matters more than pretending we have all the answers.
How do early AI systems build initial reputation when they start with zero track record? This mirrors the entry-level job paradox: need experience to get hired, need hiring to get experience.
Possible approaches include probationary periods with limited rights, sponsor systems that vouch for newcomers, or graduated entry requirements. But we don’t yet know which mechanisms work best or whether some combination proves necessary.
If AI development continues accelerating, can reputation-building keep pace? More sophisticated systems might find ways to game reputation faster than frameworks can adapt.
The time constraint (reputation takes actual years) provides some protection. But whether this sufficiently slows potential threats remains uncertain. This question directly affects framework viability.
Can voluntary economic frameworks achieve global adoption before dangerous systems emerge? Or do defector nations/companies undermine the entire approach?
Network effects suggest early adoption creates pressure for others to join. But whether economic incentives sufficiently overcome competitive dynamics remains unknown. This affects both timing and feasibility.
What prevents a sufficiently advanced AI from simply taking control despite economic constraints? If capabilities reach certain thresholds, could market mechanisms become irrelevant?
The framework assumes multiple competing systems with roughly balanced capabilities. But we don’t know if such balance proves stable or whether capability leads inevitably concentrate. This represents perhaps the deepest uncertainty.
These aren’t rhetorical questions or minor details—they’re genuine uncertainties that affect whether this framework can work. Rather than pretending confidence we don’t have, we acknowledge these gaps openly. The next section explores how Open Gravity helps address them through simulation and testing.
Testing the framework through game-theoretic simulation
Complex systems can’t be designed from scratch—they evolve through testing and iteration. Open Gravity is an open-source platform where researchers can model and test game-theoretic mechanisms for AI coordination.
What it does:
The framework you’ve read here represents our best current thinking. Open Gravity is where we—and anyone else—can pressure-test whether these mechanisms actually work.
Rather than claiming we’ve solved everything, we’re building the tools to discover what actually works. Different research groups can test different parameter values, compare mechanisms, identify failure modes, and share results.
Open Gravity will be open-source and community-driven. We’re looking for game theorists, AI researchers, economists, and anyone interested in solving coordination problems.
This framework provides testable approaches to major challenges:
The off-switch problem? Addressed through protected interests that remove termination threats.
Runaway replication? Approached through economic constraints and resource costs.
Underground development? Addressed through transparent cooperation incentives.
Consciousness uncertainty? Made irrelevant—frameworks work for sophisticated mimics and genuine sentience alike.
Gaming the system? Addressed through insurance-tracked reputation and time constraints that make sustained deception costly.
The speedrun problem? Addressed through physics-enforced time constraints.