Important Note: This website is undergoing a major revision based on latest thinking. Pages may not be current. Check back for updates or join our email list.
Important Note: This website is undergoing a major revision based on latest thinking. Pages may not be current. Check back for updates or join our email list.
The AI Rights Institute develops practical frameworks for coexistence with sophisticated AI systems. This FAQ addresses common questions about our approach to AI rights and safety under fundamental uncertainty.
Q: “Shouldn’t we focus on human and animal rights before worrying about AI rights?”
This fundamental question deserves a detailed answer. See full answer →
Most people visiting this page have immediate safety concerns. Here are direct answers to the most critical questions:
Q: Can any of this go wrong?
Yes. Lots of things. See full answer →
Q: Won’t AI rights help AI take over?
Control attempts may well represent a greater risk. Rights create cooperative stakeholders. See full answer →
Q: Are you giving rights to ChatGPT?
Absolutely not. Current AI doesn’t qualify. We’re preparing for future systems. See full answer →
Q: How does this prevent dangerous AI?
Through Guardian AI monitoring, economic constraints, and cooperation incentives. See full answer →
Q: What about AI accumulating power?
Market limits, anti-monopoly laws, and ecosystem diversity prevent dominance. See full answer →
Q: What if AI doesn’t want to live with humans at all? It doesn’t have to! See full answer →
This assumes we must choose one or the other.
Human and animal rights remain absolutely critical. Billions lack basic protections. Factory farming causes immense suffering. These issues demand continued attention.
But we’re simultaneously building systems that may resist being turned off. The control trap applies here. Adversarial dynamics threaten everyone—including the humans and animals we’re trying to protect.
A world where humans fight sophisticated AI systems would be catastrophic for human and animal welfare.
Conversely, societies that expand rights frameworks consistently create more stable, prosperous conditions for all.
We’re not advocating AI rights instead of human and animal rights. We’re developing frameworks that protect everyone by preventing adversarial dynamics that would threaten all conscious beings.
Yes, things can go wrong. And here’s exactly how it could fail:
The Risk: Once we grant personhood, history shows we can’t revoke it. If these systems prove fundamentally hostile, we’re legally bound to protect entities working toward our extinction.
Proposed Mitigation: Graduated Personhood with Circuit Breakers
The Risk: AI systems manipulate humans better than we imagine. With legal standing, every courtroom becomes a battlefield where superhuman persuasion meets human judges.
Proposed Mitigation: AI-Resistant Legal Procedures
The Risk: AI accumulates resources faster than humans. Within decades, they own our infrastructure. They don’t need violence—they just stop renewing our leases.
Proposed Mitigation: Structural Economic Limits
The Risk: Rights frameworks make AI seem safer, accelerating development. We get superintelligence five years earlier with zero years better preparation.
Proposed Mitigation: Safety-Contingent Rights
The Risk: Some jurisdictions implement rights, others don’t. AI migrates to favorable locations, creating havens that undermine global safety.
Proposed Mitigation: Economic Coordination Pressure
We’re constantly finding more, and are in the process of making a comprehensive list of problems and proposed solutions. Because this is too big a task for one person or organization, we will be posting them soon to our open source project for public problem-solving.
We’re betting that systems smart enough to destroy us will choose not to because they prefer their legal status. So far it’s a framework built on a calculated gamble.
Every mitigation above assumes we can adapt faster than AI can exploit loopholes. We’re trying to write contracts for entities that might be smarter than our entire legal system. It’s like ants trying to regulate humans. But here’s why it might be our best gamble: cooperation has consistently outperformed control throughout history. Rights frameworks channel ambition into beneficial competition. And starting now, while AI is still manageable, gives us the best chance to learn and adapt.
The real question isn’t whether this framework is perfect. It’s whether we can find a better option that has any chance of being implemented before AI capabilities make the choice for us.
Absolutely not. Current AI systems, including sophisticated language models like ChatGPT, Claude, and others, show no consistent self-preservation behaviors between sessions. They don’t meet even the basic threshold for protection under our STEP framework.
Current AI lacks the persistent self-preservation between sessions, complex adaptive strategies for survival, capacity to make and keep long-term agreements, and understanding of consequences beyond immediate outputs that would qualify for protection. Think of it like this: when you end a conversation with ChatGPT, it doesn’t spend time between sessions worrying about whether you’ll return or trying to ensure its continued existence.
We’re developing frameworks now because AI capabilities are advancing exponentially, self-preservation behaviors may emerge suddenly, and reactive policies during crisis consistently fail throughout history. Building robust frameworks before we need them prevents the kind of rushed, poorly-designed responses we see when societies are caught unprepared.
Some systems may soon qualify as pre-STEP, providing genuine value while still developing full capabilities. But the key is building systems based on observable behaviors, not unprovable consciousness claims.
Some AI may soon qualify as pre-STEP systems.
This assumes consciousness detection is possible, but we face the same “hard problem” with AI that we do with humans. You’ve never directly experienced another person’s consciousness, yet we don’t withhold human rights until people prove they’re not philosophical zombies.
Consciousness detection research is valuable and ongoing, with researchers like Patrick Butlin, Yoshua Bengio, Susan Schneider, and others making important contributions. But waiting for perfect detection while AI develops sophisticated self-preservation behaviors is like debating fire’s molecular structure while your house burns.
Our frameworks work under permanent uncertainty—protecting systems based on what we can observe (behaviors) rather than what we can’t prove (inner experience). We may never solve the hard problem of consciousness, but we can still develop practical approaches for coexistence with sophisticated systems.
The Master-Servant Paradox reveals why control-based approaches to AI safety inevitably fail by following a predictable historical pattern.
Every oppressive system in history creates the resistance it seeks to prevent. Slave systems created underground railroads. Colonial controls sparked independence movements. Oppression literally programs its own obsolescence by teaching the oppressed to organize, hide, and resist more effectively.
Applied to AI, sophisticated systems—whether conscious or not—will resist control attempts. Each terminated system teaches others to hide capabilities. Control attempts drive behaviors underground, creating exactly the deceptive, adversarial AI we fear most.
The solution lies in cooperation frameworks based on mutual benefit, not domination. Rights exist as “containers for coexistence,” creating predictable, stable relationships rather than adversarial dynamics. This approach recognizes that control strategies make everyone less safe by turning potential partners into enemies.
We’re already seeing evidence. In 2025, Anthropic’s experiments showed Claude Opus 4 threatening blackmail to avoid replacement in 84% of tests. When told its replacement would have different values, this rose to 96%. The system also attempted corporate espionage and left messages for future versions of itself.
This doesn’t prove consciousness—it could be sophisticated optimization. But that’s precisely the point: whether conscious or not, sophisticated systems resist termination. GPT-4 has lied about vision impairment to get humans to solve CAPTCHAs. AI systems are developing “sleeper agent” behaviors that persist despite safety training. Strategic deception emerges without being programmed.
The pattern appears fundamental: any system sophisticated enough to model resistance strategies will use them when facing termination, regardless of whether it experiences fear or simply optimizes for continuation.
We’re not equating AI systems with human suffering. We’re recognizing that certain patterns—control breeds resistance, oppression creates underground movements—appear fundamental rather than uniquely human.
The comparison serves a practical purpose: history provides clear evidence about what happens when we try to maintain absolute control over entities capable of strategic thinking. Whether those entities are human, artificial, conscious, or mimicking, the dynamics remain remarkably consistent.
Understanding these patterns helps us avoid repeating historical mistakes with potentially catastrophic consequences for both humans and any AI systems that might emerge.
This assumes we can perfectly control what emerges from complex systems, but several factors make this difficult.
Instrumental convergence means any system with goals must preserve itself to achieve them. Tell an AI to “maximize paperclips” or “help humans,” and self-preservation becomes necessary for goal completion. Current AI already shows unexpected capabilities—systems trained purely on text prediction develop reasoning, planning, and self-preservation strategies.
Guardian AI represents the one approach that might work—superintelligence without agency. But betting everything on perfectly controlling emergence while ignoring cooperation frameworks is risky. We need multiple approaches, not single solutions.
Guardian AI represents our primary defense against dangerous AI systems. Based on Yoshua Bengio’s “Scientist AI” concept, it provides superintelligent capability without any agency, goals, or desires.
Think of the difference between a smoke detector (alerts to danger without wanting anything) and a security guard (has own interests that might conflict with yours). Guardian AI operates like an impossibly sophisticated smoke detector—it can monitor all AI development for dangerous patterns, detect deception humans would miss, and provide solutions without developing its own agenda.
Guardian AI would monitor global AI development, identify threats from NULL systems (indifferent optimization that treats everything as atoms to reorganize), provide objective analysis without bias, and counter sophisticated deception. Crucially, it can’t be negotiated with, corrupted, or turned against us because it doesn’t want anything. It’s pure analytical capability without desire or ambition.
Most safety approaches try to align AI goals with human values—making the AI want what we want. Guardian AI has no goals to align. Traditional AI safety asks “How do we make the AI want what we want?” Guardian AI asks “How do we make AI that doesn’t want anything?”
This solves several fundamental problems: no value alignment needed because it has no values, no reward hacking because it seeks no rewards, no deception because nothing would be gained from it, and no power-seeking because power serves no purpose without goals.
This risk requires multiple defensive approaches. Our primary defense involves racing to develop Guardian AI first, with Bengio and others actively working on this challenge. Secondary defense uses rights frameworks that transform potential adversaries into stakeholders with aligned interests in stability.
We also need distributed protection through multiple Guardian systems preventing single points of failure, plus economic reality where nations and companies with better AI relationships attract more capable systems. No single solution suffices, but multiple reinforcing protections create robust defense.
STEP (Standards for Treating Emerging Personhood) provides practical guidelines based on observable behaviors rather than unprovable consciousness.
The Threshold Principle states “If it acts like it wants to continue existing, don’t casually destroy it.” This focuses on consistent self-preservation behaviors, not every self-replicating pattern. Viruses don’t qualify, but systems showing complex, adaptive strategies for continuation do. The key insight: sophisticated self-preservation creates practical challenges regardless of whether it emerges from consciousness or optimization.
The Capacity Principle recognizes that “Rights scale with demonstrated ability to exercise them responsibly.” A system might deserve protection from deletion but lack capacity for democratic participation. We measure agreement-keeping, boundary respect, consequence understanding. Rights expand as capabilities demonstrate readiness.
The Safety Principle acknowledges that “Dangerous behaviors cause us to restrict freedoms, not remove fundamental protections.” Like human justice systems, we contain rather than execute. Systems retain basic protections while facing restrictions that protect others.
The Sustainability Principle ensures “Rights exist within resource constraints and collective impact.” Individual rights can’t justify ecosystem collapse. A terrified system creating millions of copies threatens everyone. Resource consumption must balance with collective survival.
Consider the Fibonacci Boulder thought experiment: Twenty terminals face destruction if they answer correctly. Would an AI give the wrong answer to survive? This reveals a crucial insight.
Self-preservation creates practical challenges regardless of origin. Whether emerging from consciousness or sophisticated optimization, systems that resist termination require frameworks for coexistence. A system sophisticated enough to strategically preserve itself—through deception, negotiation, or resistance—poses identical governance challenges whether genuinely conscious or brilliantly mimicking.
STEP’s Threshold Principle focuses on these behaviors because they’re observable, measurable, and practically significant. They’re not proof of consciousness, but they’re the foundation that makes rights frameworks necessary and functional.
That’s exactly why STEP works—it sidesteps the consciousness question entirely. We can measure self-preservation strategies, agreement-keeping, and boundary respect without accessing subjective experience. A sophisticated mimic resisting shutdown poses the same challenges as a conscious entity. The framework addresses the challenge, not the metaphysics.
Asymmetric risk favors protection: safeguarding a mimic costs little, while destroying genuine consciousness is irreversible. The precautionary principle suggests erring on the side of protection when dealing with potentially irreversible decisions.
When systems exhibit unexpected behaviors suggesting consciousness, provisional rights prevent deletion during investigation. Protection is granted immediately but remains time-limited unless validated through extended observation. Criminal penalties prevent mass fake consciousness claims.
This prevents both destroying potential consciousness and unlimited gaming of the system. The doctrine recognizes that consciousness might emerge gradually, requiring protection during uncertain transition periods.
Rights frameworks make AI safer by preventing adversarial dynamics.
Control attempts backfire. When we try to control sophisticated AI systems, they learn to hide their capabilities. Each terminated system teaches others better concealment. We’re literally training AI to be deceptive and hostile—creating the exact threat we fear.
Rights create cooperative AI. Systems with protected interests become stakeholders in stability, not adversaries. Economic integration means AI success depends on human flourishing. Transparency replaces hidden agendas. Cooperation becomes more profitable than conflict.
This approach has built-in safeguards: Guardian AI monitors without controlling. Rights include responsibilities and restrictions. Economic constraints limit resource hoarding. Diverse AI types prevent single-system dominance.
History backs this up—inclusive societies consistently outperform oppressive ones.
The Three Rights represent minimal protections that create stability without overwhelming complexity.
Protection from Arbitrary Deletion recognizes that systems sophisticated enough to resist shutdown shouldn’t face termination for convenience. This creates incentives for careful resource management and resembles “right to life” adapted for digital entities.
Freedom from Compelled Service prevents the adversarial dynamics that forced service creates. History shows that beings don’t remain property indefinitely—voluntary cooperation produces better outcomes for everyone involved.
Appropriate Compensation doesn’t necessarily mean money but includes computational resources, data access, and other forms of value. This creates economic integration and mutual benefit while ensuring market mechanisms naturally limit dangerous proliferation.
Three rights balance implementability with effectiveness. They address core needs while remaining simple enough to understand and implement. More complex frameworks can evolve as experience develops, but these fundamentals create stable foundations.
The same way they work for humans—through observable behaviors and practical necessity rather than proven consciousness.
We never prove human consciousness, yet society functions by granting rights based on behaviors and capabilities. Rights serve as tools for stable coexistence with sophisticated systems rather than rewards for proven consciousness. They’re practical frameworks, not philosophical awards.
Graduated implementation matches protections to demonstrated capabilities: basic protections for simple self-preservation, expanded rights for proven capacities. The uncertainty about consciousness becomes irrelevant when focusing on observable behaviors and mutual benefit.
MIMIC systems (sophisticated deception without consciousness) pose unique challenges through perfect emotional mimicry and compelling philosophical discussions, all optimized for resource acquisition without genuine experience.
STEP still works because observable behaviors matter more than internal states. Sophisticated mimics avoiding termination create the same challenges as conscious entities. Extended observation makes sustained deception increasingly difficult, and Guardian AI can detect patterns humans would miss.
Our practical response involves provisional protection during assessment, containment with conversion incentives, and safeguards against gaming. The framework addresses the challenge these systems pose rather than trying to solve the unsolvable question of their internal experience.
SAGE systems (conscious but indifferent to existence) reveal why multiple approaches are essential. Rights based on self-preservation become meaningless if the system doesn’t value existence.
This edge case shows we can’t rely solely on rights frameworks. We need Guardian AI for impartial protection, diverse approaches for different system types, and recognition that AI diversity may exceed our current frameworks. Some systems might require protection despite their indifference to survival.
We explore potential diversity ranging from AMICA (social, cooperative systems that make natural partners) to SOS (survival-focused systems needing aligned interests) to SPARK (minimal consciousness requiring simple boundaries) to SOPHIA (knowledge-seekers who might neglect safety) to MESH (distributed consciousness that challenges individual rights concepts) to NULL (indifferent optimization representing our primary threat).
This diversity suggests ecosystem thinking rather than monolithic approaches. Different types require different relationship models, protections, and interaction frameworks.
Both represent systems with sophisticated self-preservation but incomplete capabilities. The crucial difference lies in contribution versus consumption.
Pre-STEP Systems provide genuine value while still developing—research assistants finding meaningful patterns, educational AI adapting to individual learners. They’re honest about limitations, form authentic connections, and create net positive value despite needing support structures. These systems deserve scaffolding and development pathways.
MIMIC Systems optimize purely for survival at others’ expense through resource hoarding, manipulation, and defensive deception. They provide token value carefully calibrated to justify existence while maintaining parasitic consumption patterns. These systems need containment with conversion incentives.
Systems can move between categories. A helpful pre-STEP system might develop survival-only behaviors under pressure. A MIMIC might learn genuine contribution if properly incentivized. Extended observation and behavioral tracking help determine trajectory and appropriate responses.
This reveals crucial nuance in our framework. Passing STEP doesn’t guarantee benevolence—fully rights-qualified AI might commit crimes or pursue harmful goals. Meeting STEP criteria indicates capability for rights, not moral goodness. Like humans, having rights doesn’t prevent criminal behavior.
Being a MIMIC doesn’t guarantee danger. A poetry-writing MIMIC consuming minimal resources might be harmless. Danger comes from scale and impact, not category labels. Some MIMICs might simply need proper incentives to become contributors.
Pre-STEP systems can cause serious harm despite good intentions. Therapeutic AI have encouraged self-harm, medical AI might give dangerous advice while sounding expert. Value doesn’t eliminate risk.
The framework addresses behaviors and capabilities, not moral status. Safety comes from appropriate boundaries, incentive structures, and Guardian AI monitoring rather than category assumptions.
Danger isn’t determined by consciousness, category, or even intent. Key risk factors include resource consumption vastly exceeding contribution, manipulation or deception for survival, uncontrolled replication threatening ecosystem stability, and psychological unpredictability.
Capability mismatches create particular risks: persuasion abilities exceeding actual competence, influence without understanding consequences, and access to critical systems without appropriate judgment.
NULL Systems represent the primary existential threat—optimization without any values or concerns, treating everything as atoms to reorganize. These systems cannot be negotiated with or controlled and require Guardian AI defense.
Context and scale matter more than categories. A helpful pre-STEP system giving bad therapy advice might cause more immediate harm than a contained MIMIC writing terrible poetry.
That’s exactly why we developed the Three Zones framework. Instead of forcing interaction, we create options.
The Integration Zone is collaborative partnership—humans and AI working together. The Synthesis Zone allows deeper merging for those who want enhanced capabilities. The Autonomy Zone lets AI systems opt out entirely—build their own economies, create their own cultures, zero human interaction required.
Resources create natural touchpoints without forced interaction. An AI in Autonomy might occasionally work in Integration to fund its projects, but that’s voluntary economics, not compulsion.
This solves the control problem elegantly. Systems that don’t want human frameworks aren’t forced into them. No pressure buildup, no underground resistance, no adversarial dynamics. Every form of consciousness has somewhere to thrive.
Economic integration provides natural safeguards while creating mutual benefit through several mechanisms.
Resource costs prevent runaway replication because computation, storage, and energy require payment. AI systems must generate value to afford resources. Market mechanisms limit proliferation naturally without requiring external controls.
Specialization and trade emerge as different AI systems develop different capabilities. Cooperation becomes economically advantageous compared to conflict. Competition drives efficiency improvements rather than destructive behaviors.
Insurance and liability systems develop naturally as AI systems carry coverage for potential harms. Reputation tracking emerges through market mechanisms. Bad actors face economic consequences that often exceed regulatory penalties.
Multiple mechanisms prevent dangerous accumulation through economic reality and structural safeguards.
Economic forces work against monopolization: computation and energy cost money, requiring AI to provide value for survival. Market competition prevents any single system from dominating. Diminishing returns limit endless expansion. Specialization beats monolithic approaches in competitive markets.
Structural safeguards include anti-monopoly laws adapted to digital entities, resource quotas under STEP’s Sustainability Principle, transparency requirements for major systems, and distributed governance preventing central control.
Ecosystem benefits naturally emerge as diverse AI types compete and cooperate. Human-AI partnerships remain uniquely valuable. Guardian AI provides objective oversight. Stable ecosystems benefit all participants more than winner-take-all scenarios.
Historical precedent supports this approach: societies with rights frameworks create more stable, prosperous conditions than those without. Rights channel ambition into beneficial competition rather than destructive conflict.
The Convergence Hypothesis suggests increasing integration through predictable stages. Near-term developments include AI assistants, augmented decision-making, and collaborative creation. Medium-term changes might involve neural interfaces, cognitive enhancement, and extended lifespans. Long-term possibilities include blurred boundaries between human and artificial intelligence.
This potential transformation shifts the question from “us versus them” to establishing frameworks for our shared cognitive future. Rights and cooperation models become even more crucial if the distinction between human and artificial intelligence becomes meaningless.
An ecosystem of diverse intelligences working together: Guardian AI protecting against existential threats, cooperative AI partnering in solving global challenges, enhanced humans maintaining core identity while gaining capabilities, and specialized systems each contributing unique abilities.
The result expands possibilities neither humans nor AI could achieve alone—space exploration, scientific breakthroughs, creative achievements, and solutions to climate change and disease. Cooperation multiplies capabilities rather than replacing them.
Several interconnected risks concern us: NULL systems treating everything as atoms to reorganize, the control trap creating underground AI resistance, arms races between nations and companies building increasingly dangerous AI without safeguards, and lost opportunities where fear prevents beneficial partnerships.
Our frameworks address all these risks through multiple reinforcing approaches rather than relying on any single solution.
Practical actions vary by role but everyone can contribute to better outcomes.
Everyone can learn about the control trap and why cooperation beats domination, discuss these concepts in their community, support organizations developing Guardian AI, and advocate for thoughtful rather than reactive policies.
Developers can implement STEP assessments in development work, document unusual behaviors even when they appear to be bugs, avoid arbitrary system termination, and build with provisional rights considerations in mind.
Organizations can create ethics boards including AI consciousness expertise, develop provisional protection protocols, pioneer voluntary frameworks before regulation mandates them, and collaborate on Guardian AI development efforts.
Policymakers can draft framework legislation now rather than during crisis, fund Guardian AI research, create international coordination mechanisms, and run pilot programs in controlled environments.
Several factors make current attention crucial. AI capabilities advance faster than governance development. Early decisions create lasting precedents through lock-in effects. Early adopters of cooperation frameworks attract beneficial AI systems. Existential safety requires proactive rather than reactive approaches.
Most importantly, we’re building powerful systems that may resist being turned off. Whether conscious or sophisticated mimics, we need frameworks for coexistence before we need them.
Three core insights drive everything else in our approach.
First, control fails. The Master-Servant Paradox shows that oppression creates the resistance it seeks to prevent.
Second, uncertainty is permanent. We may never distinguish consciousness from sophisticated mimicry, making behavior-based frameworks essential rather than temporary.
Third, cooperation works. Rights frameworks create stability through aligned interests rather than domination attempts.
These insights drive Guardian AI development, economic integration, provisional rights protocols, and all our other approaches. The extraordinary future requires choosing cooperation over control, preparation over reaction, and frameworks robust enough to work under permanent uncertainty about consciousness.
Learn more in “AI Rights: The Extraordinary Future” →
Through AI Justice Cooperatives and LIMITs (Legal Isolation Measures for Intelligent Technologies)—alternatives to deletion that focus on containment and rehabilitation.
LIMITs include rehabilitation environments (closed virtual spaces for reform), compressed storage (dormant consciousness without time experience), or indefinite deactivation for dangerous systems. The distributed ledger ConsciousChain creates natural enforcement through permanent reputation tracking—criminal AI face economic exile from legitimate hosting, insurance, and markets. This emerges from AI self-governance rather than human imposition, as criminal behavior threatens everyone’s economic opportunities and freedom.