Threshold Signatures for AI Agent Safety

Published April 3, 2026 by James Benton

An AI agent proposes to delete a production database. A single human could authorize the deletion, but what if that human makes a mistake, or what if the human has been compromised by an attacker? A single point of approval is a single point of failure. Threshold signatures solve this problem by requiring multiple independent parties to cryptographically sign off on high-risk actions.

Threshold signatures are a cryptographic primitive that enforce m-of-n authorization. An action requires the signatures of at least m parties out of a pool of n authorized signers. The signatures are cryptographically verified, meaning no forgery is possible and no signer can claim they did not sign.

For AI agents, threshold signatures provide two critical benefits. First, they prevent rogue authorization: an attacker cannot trick a single person into approving a harmful action, because the attacker must compromise at least m people. Second, they create undeniable accountability: once m signatures are collected, it is cryptographic proof that m specific people reviewed and approved the action. This makes it impossible for signers to later deny responsibility.

The Authorization Tier System

ExecLayer classifies agent actions into four authorization tiers. Tier 0 actions are routine and execute automatically. They are low risk and occur within normal operating bounds. A weather agent checking a temperature sensor is Tier 0. It needs no human approval.

Tier 1 actions execute automatically but require logging and review. They are slightly elevated in risk but still within boundaries. An agent querying a database for aggregated statistics is Tier 1. It happens, and the action is logged so that humans can review it afterward. Humans can establish alerts so that if too many Tier 1 actions occur in a time window, a human is notified to investigate.

Tier 2 actions require human approval before execution. A single authorized human must review the action and explicitly authorize it. An agent proposing to modify infrastructure configuration is Tier 2. A human receives a notification, reviews what the agent intends, and approves or denies the action. If approved, the action executes. If denied, it does not.

Tier 3 actions require threshold signature approval. A single human's judgment is not sufficient. Multiple parties must sign. An agent proposing to delete data from a production system is Tier 3. The system requires 2 signatures out of 3 authorized operators. Each operator independently reviews the action and signs. Only when 2 signatures are collected does execution proceed.

The tier is assigned to the action when it is classified in the SovereignIR representation. The tier is immutable: an agent cannot claim a Tier 3 action is actually Tier 0 to bypass authorization. The tier is part of the cryptographically committed intent.

How Threshold Signatures Work: High Level

Threshold signatures operate through a combination of key splitting and cryptographic verification. The core idea is that a secret signing key is split into n shares such that any m shares can reconstruct the key, but any m-1 shares reveal nothing about the key.

Each of the n authorized signers holds one share of the key. When an action requires approval, the system creates a signing request. The action details, including the SovereignIR commitment hash, are encoded into the request. The request is sent to m-of-n signers.

Each signer independently verifies the request. They see the action details and the commitment hash. They confirm what the action does. If they approve, they use their key share to create a partial signature. If they disapprove, they refuse.

The system collects the partial signatures. Once m partial signatures are received, they can be combined to produce a complete signature that is valid under the original key. The complete signature is verifiable by anyone with the public key.

The signature becomes part of the Authority Receipt. The Receipt is cryptographic proof that m authorized parties reviewed and signed off on the specific action represented by the commitment hash. Once the Receipt exists, the action is authorized to execute.

ExecLayer's 2-of-3 Threshold Implementation

ExecLayer uses a 2-of-3 threshold signature scheme for Tier 3 actions. Three operators are authorized to sign. Any two of them can authorize an action. At least one signer must be present for approval.

This scheme has important properties. It provides redundancy: if one signer is unavailable, the other two can still authorize. It requires consensus among the minority: two signers cannot act without each other's consent. It is cryptographically sound: a forged signature cannot be created, and a signer cannot deny they signed.

The three signers might be: the head of security, the head of operations, and a third senior engineer. When a Tier 3 action arises, the system sends approval requests to all three. The action is described in human-readable form, and the commitment hash is displayed so signers can see exactly what they are authorizing.

Suppose the head of security and the head of operations approve, but the engineer does not. Because m=2, the two approvals are sufficient. The system combines their two partial signatures into a complete signature. An Authority Receipt is generated and the action is authorized.

Suppose an attacker compromises one signer's private key share. The attacker can create partial signatures on their behalf, but they cannot create a complete signature without a second signer. The attacker cannot unilaterally authorize actions. They must manipulate a second signer to cooperate, and that requires a second compromise. The threshold raises the bar for attackers.

Real-World Example: Production Database Deletion

An AI agent managing database schemas identifies an obsolete table from a deprecated service. The table is no longer used. The agent decides the table should be deleted to reduce storage costs and improve schema hygiene. It formulates the action: DROP TABLE deprecated_service.old_users.

Before executing, the agent's action is canonicalized into SovereignIR. The commitment hash is computed. The action is classified as Tier 3 because it involves deletion from production.

The authorization system creates an approval request. The request includes: the action (drop table old_users), the target (deprecated_service schema in the production database), the commitment hash, and the authorization requirement (2 of 3 signatures from security, operations, and engineering leads).

The three signers receive notifications. Each can see the request details via a secure interface. The head of security reviews the action. She queries the system to confirm the table is indeed unused. She verifies it is from a deprecated service. She approves and signs.

The head of operations reviews independently. He checks the schema documentation and confirms the table has no dependencies. He checks the backup schedule and confirms the data is backed up. He approves and signs.

The third signer, the engineering lead, is unavailable. He has not responded within the time limit. But two signatures have been collected. The system combines the two partial signatures into a complete signature and generates an Authority Receipt.

The authorization is complete. The action executor verifies that the commitment hash in the Authority Receipt matches the canonicalized SovereignIR. It does. The executor runs the DROP TABLE command.

The action is logged in the Merkle audit ledger with the Authority Receipt attached. The receipt proves that security and operations approved. Regulators, auditors, or internal compliance can later verify the receipt and confirm that the deletion was authorized.

Comparison: Traditional Approval Workflows

How does this compare to existing approval mechanisms? Consider three alternatives.

First, manual approval workflows. A human receives an email saying "Please approve deletion of table X." They click "approve". The system executes the deletion. This approach is slow (humans may take hours to respond), not cryptographic (an attacker could forge the approval email), and produces weak audit trails. There is no proof that the specific person approved the specific action.

Second, single-admin approval. One person holds the ability to approve high-risk actions. An agent requests approval from this person. The person authorizes and the action executes. This is faster than email and can be cryptographically sound, but it is a single point of failure. If that person is compromised, attacked, or makes a mistake, the system has no defense. Compliance frameworks like SOC 2 often require separation of duties, which this violates.

Third, no approval. The agent decides what to do and executes immediately. This is fastest but most dangerous. There is no human in the loop. If the agent misbehaves or is attacked, it can cause damage before humans notice.

Threshold signatures provide a middle ground. They are faster than email workflows. They are cryptographic, unlike manual approvals. They distribute trust, unlike single-admin systems. They introduce human oversight, unlike automatic execution.

Cryptographic Properties and Security

The security of threshold signatures rests on the assumption that breaking the cryptographic scheme is computationally infeasible. Standard threshold signature schemes use elliptic curve cryptography or similar schemes that are believed to be secure against known attacks.

Key material (the secret shares held by each signer) must be protected. ExecLayer requires that each signer's key share is stored in a Hardware Security Module (HSM) or similar tamper-resistant device. The key never exists in plaintext in memory. When a signer signs, the HSM computes the signature internally and returns only the signature, not the key.

Partial signatures are also sensitive. If an attacker collects m-1 partial signatures, they cannot forge the mth signature. But collecting m-1 signatures without the mth is a sign of potential compromise. The system monitors for unusual signing patterns and alerts if a signer is unusually slow to respond or frequently signs with others in suspicious combinations.

The communication channel between the signer and the signing server must be encrypted and authenticated. TLS is standard. The signing request must be verified by the signer independently. The signer should not blindly sign whatever the system presents; they should check that the action and commitment hash are legitimate.

Integration With SovereignIR and Policy Evaluation

Threshold signatures are the final step in authorization, but they follow policy evaluation. The pipeline is: agent output to SovereignIR (canonicalization), SovereignIR to policy evaluation (runtime policy enforcement), policy evaluation to authorization (threshold signatures or simpler approval), and authorization to execution.

An agent action that violates policy is rejected before it reaches the signer. For example, if an agent proposes to transfer more money than the transfer limit allows, the policy engine rejects the action. No signer is bothered. This prevents false positives and ensures that signers only see genuinely authorized actions that have passed policy checks.

Once policy is satisfied, the authorization tier is checked. If the action is Tier 3, the threshold signature process begins. The commitment hash from SovereignIR is included in the signing request. The signers verify that the action they are authorizing matches the commitment hash. The cryptographic linking ensures end-to-end accountability.

This architecture is explained in more detail in the AI control plane and runtime policy enforcement pages.

Threshold Signatures and Compliance

Many compliance frameworks require separation of duties and dual control for sensitive operations. HIPAA, SOC 2, and FedRAMP all have this requirement. Threshold signatures provide a technically sound way to enforce it.

When an auditor asks: "Who authorized this database deletion?" the answer is cryptographic. Two specific people signed the Authority Receipt. Their signatures are proof they reviewed and approved. There is no ambiguity. The signers cannot later claim they did not authorize, because their signatures are cryptographic proof they did.

This is stronger than asking "Who is listed as the approver in the system?" because it is not subject to database manipulation or log tampering. The signature exists independently and can be verified by any party with the public key.

Operational Considerations

Implementing threshold signatures requires operational discipline. Each signer must protect their key share. Lost or compromised keys require key rotation. The system must be configured with the correct threshold (m and n). Authorized signers must be managed and updated as personnel changes.

Notification systems must be reliable. If a signer never receives the approval request, they cannot sign. ExecLayer uses multiple notification channels (email, SMS, push notification) to ensure requests reach signers. There is a timeout: if m signatures are not collected within a time limit, the action is denied. This prevents actions from hanging indefinitely.

Signers must be trained. They need to understand what they are approving. The UI should be clear and not subject to misinterpretation. Signers should have access to context: is the table really unused? Is the transfer amount justified? The system should provide this information in the approval request.

Future: Decentralized and Adaptive Thresholds

Current ExecLayer deployments use fixed 2-of-3 thresholds. Future versions may support adaptive thresholds. For instance, a 2-of-3 threshold might be required for routine Tier 3 actions, but a 3-of-3 threshold might be required for unprecedented high-stakes actions that the policy engine has never seen before.

Some organizations may want to integrate threshold signatures with external governance systems. For example, a blockchain-based smart contract could enforce that actions are only executed if they carry valid Authority Receipts from the threshold signature system. This creates an immutable record of authorized actions on an external ledger.

For more on the broader governance architecture, see the zero trust architecture and Merkle audit ledger pages.

Ready to implement multi-party authorization for your organization's AI agents?

Request Early Access