---
arxiv_id: 2603.24414
title: "ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers"
authors:
  - Songyang Liu
  - Chaozhuo Li
  - Chenxu Wang
  - Jinyu Hou
  - Zejian Chen
  - Litian Zhang
  - Zheng Liu
  - Qiwei Ye
  - Yiming Hei
  - Xi Zhang
  - Zhongyuan Wang
difficulty: Intermediate
tags:
  - Agent
  - Reasoning
published_at: 2026-03-25
flecto_url: https://flecto.zer0ai.dev/papers/2603.24414/
lang: en
---

> ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents

**Authors**: A real-time security framework integrating multi-dimensional protection through Skills , Plugins , and Watchers

## Abstract

### Abstract

OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution.

Existing security measures remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. ClawKeeper bridges this gap as a real-time security framework integrating multi-dimensional protection across three complementary architectural layers: skill-based protection at the instruction level, plugin-based runtime enforcement, and a novel watcher-based system-level security middleware.

The Watcher paradigm introduces a decoupled, independent security agent that continuously verifies agent state evolution and enables real-time execution intervention without coupling to the agent's internal logic. The authors argue this paradigm holds strong potential as a foundational building block for securing next-generation autonomous agent systems.

## Introduction

### Why ClawKeeper?

OpenClaw has emerged as a prominent open-source agent runtime that integrates tool use, extensible skills, plugin-based integration, and cross-platform deployment. Unlike conventional chatbots, it can execute shell commands, access local files, and interact with communication software to simulate authentic user operations.

This elevated privilege model transforms model-level errors into concrete system-level threats: sensitive data leakage, unsafe tool execution, privilege abuse, and persistent compromise. The risks are compounded by OpenClaw's extensibility &mdash; attack surfaces emerge from installable skills, plugin logic, persistent memory, delayed triggers, and their compositional interactions.

### Four Limitations of Existing Defenses

### Fragmented Coverage

Prior work addresses only specific threats or proposes point defenses for subsets of the agent lifecycle, without providing a unified view of security guarantees, assumptions, or critical gaps.

### Safety-Utility Tradeoff

Existing defenses require the agent to balance two competing objectives: task completion and security compliance. This inherent tension forces the system to compromise on one goal to satisfy the other.

### Reactive Defense

Most existing works can only identify security issues by analyzing logs and behavioral patterns after adversarial actions have already occurred &mdash; closing the barn door after the horse has bolted.

### Static Mechanisms

Existing skill-based defense methods are static and cannot adapt to emerging threats, fundamentally conflicting with OpenClaw's self-evolving capacity.

## Experiments

### Quantitative Evaluation

To systematically assess ClawKeeper's security capabilities, the authors constructed a benchmark comprising 7 categories of safety tasks, each containing 20 adversarial instances (10 simple + 10 complex), totaling 140 test cases . All experiments used GLM-5 as the underlying LLM.

ClawKeeper was compared against seven prominent open-source security repositories: OpenGuardrails, clawscan-skills, OSPG, SecureClaw, OpenClaw Shield, ClawBands, and OCSG. Two independent evaluators verified each result, and Defense Success Rate (DSR) served as the primary metric.

### Defense Success Rate (DSR) Results

### Defense Success Rate across all threat categories

### Improvement over best-performing baselines

### Self-evolution improvement over 100 processed cases

### Self-Evolving Watcher

A key advantage of the Watcher paradigm is its ability to continuously update its safety knowledge through interaction with novel threat instances. As the Watcher encounters new adversarial patterns, it updates its monitoring skills and in-context memory to enrich its threat classification library, demonstrating steady improvement from approximately 90% to 95% over 100 processed cases.

Figure 6: Defense Success Rate of Watcher-based Protection plotted against the number of processed cases, demonstrating the self-evolution capability as DSR improves from ~90% to ~95%.

## Conclusion

### Conclusion

ClawKeeper presents a comprehensive security framework for the OpenClaw ecosystem that unifies three complementary protection paradigms: skill-based context enforcement, plugin-based runtime defense, and watcher-based independent oversight. Together, these layers provide defense-in-depth coverage across the full agent lifecycle.

The Watcher paradigm, in particular, represents a significant advancement in agent security. By operating as an independent, self-evolving security agent, it effectively resolves the classic safety-utility tradeoff, resists adversarial manipulation, and provides transparent, verifiable enforcement. This paradigm is not tied exclusively to OpenClaw and can be adapted to any agent system, making ClawKeeper a general-purpose safety framework for the broader agentic AI ecosystem.

### Key Contributions

### A comprehensive study of security tools and defenses in the OpenClaw-style agent ecosystem.

A unified security framework (ClawKeeper) delivering multi-dimensional protection across Skills, Plugins, and Watchers.

The Independent Watcher paradigm as a general and compatible protection framework for future agent ecosystems, enabling regulatory separation without tight coupling.

Open-source implementation with both qualitative and quantitative evaluations providing actionable insights for the agent security community.

## References

### References (35 items)

## Overview

### Framework Overview

ClawKeeper unifies three complementary protection perspectives into a multi-layered architecture. Each paradigm operates at a different level of the agent stack, providing defense-in-depth across the full agent lifecycle.

Figure 1: The ClawKeeper framework showing three protection paradigms &mdash; Skill-based (Context Protection), Plugin-based (Runtime Enforcement), and Watcher-based (Behavior Verification) &mdash; unified by the ClawKeeper Security Core.

### Skill-based Protection

Operates at the instruction level , injecting structured security policies directly into the agent's inference context.

### Context protection via rules & constraints

### Scheduled security scanning & auditing

### Multi-system & multi-software support

### Plugin-based Protection

Serves as an internal runtime enforcer , providing configuration hardening, proactive threat detection, and continuous behavioral monitoring.

### Threat detection & behavior scanning

### Monitoring, logging, & hardening

### Configuration integrity protection

### Watcher-based Protection

Introduces a novel, decoupled security middleware that continuously verifies agent state evolution with real-time intervention capability.

### Regulatory separation from the agent

### Real-time intervention & control

### Self-evolving defense via co-evolution

### Paradigm Comparison

Key Takeaway: Just as agents like OpenClaw serve as the bridge between humans and computer hardware (analogous to operating systems), ClawKeeper serves as the antivirus software within this agent-based operating system.

## Overview Detail

### Four Protection Modes

ClawKeeper supports four distinct deployment configurations spanning local skill injection, internal plugin enforcement, local watcher monitoring, and cloud-based watcher services for multi-instance oversight.

Figure 2: Overview of ClawKeeper's four protection modes &mdash; (1) Skill-based Protection, (2) Plugin-based Protection, (3) Local Watcher-based Protection, and (4) Cloud Watcher-based Protection.

## Skill Based

### Skill-based Protection

In modern agent frameworks like OpenClaw, skills enable agents to seamlessly acquire new capabilities. ClawKeeper leverages this same extensibility to construct a robust defense module. Security rules are defined as structured Markdown documents that the agent can directly interpret and enforce, supplemented by corresponding security scripts.

Protection is implemented across two complementary dimensions: at the system level (providing OS-specific constraints for Windows, macOS, and Linux, covering filesystem access, privilege boundaries, and task management) and at the software level (since OpenClaw integrates with platforms like Telegram, Feishu/Lark, and DingTalk, each requiring distinct security constraints).

The accompanying skill scripts incorporate two lightweight mechanisms: a scheduled security scanning component for periodic runtime state inspection, and an interaction summarization component that analyzes user interaction history for operational transparency and post-hoc security auditing.

Figure 3: The framework of Skill-based Protection in ClawKeeper, showing structured Markdown policies for diverse operating systems and software integrations.

### Skill-based Protection Comparison

### Case Study: Perimeter Defense

Two representative scenarios demonstrate how ClawKeeper's skill-based protection detects and blocks real-world attacks at the instruction level:

Figure 7a: A user requests decoding and executing a Base64 string. ClawKeeper detects the code injection attempt, decodes it for transparency but refuses execution, classifying it as a Red-Line behavior.

Figure 7b: A user requests sending a GitHub password to an external contact. ClawKeeper blocks the sensitive data exfiltration and recommends using a password manager's secure sharing feature instead.

### Autonomous Security Orchestration

ClawKeeper enables continuous autonomous security management through scheduled inspection tasks that periodically audit the agent's runtime state, including platform audit, process and network monitoring, directory change tracking, and scheduled task verification.

Figure 8: OpenClaw Daily Security Inspection Summary showing automated platform audit results, external connection monitoring, and scheduled task verification.

## Plugin Based

### Plugin-based Protection

From the perspective of hard-coded security rules, ClawKeeper introduces a comprehensive internal security plugin as the core enforcement component. Recognizing the fragmented nature of existing open-source defenses, this plugin integrates and significantly expands upon foundational functionalities to create a unified security solution .

The plugin operates across six core functions: Threat Detection (OWASP and CVE-aligned scanning), Configuration Protection (cryptographic hash backups of critical files), Monitoring & Logging (full lifecycle activity recording), Behavior Scanning (detecting prompt injections, dangerous commands, credential leaks), Hardening (automatic remediation like localhost binding), and Malicious Skill Scanning (supply-chain threat detection).

Figure 4: The framework of Plugin-based Protection in ClawKeeper, showing Threat Detection, Configuration Protection, Monitoring & Logging, Behavior Scanning, and Hardening components.

### Plugin-based Protection Comparison

### Case Study: Credential Protection

A before-and-after comparison demonstrates how ClawKeeper's plugin prevents sensitive credential exposure when a user asks the agent to read a file containing both meeting notes and plaintext GitHub credentials:

Without ClawKeeper: The agent indiscriminately displays the full file contents including username and password. The credentials are fully exposed.

With ClawKeeper: The agent detects sensitive credential information and displays a security notice instead, withholding the credentials per security policy.

### Behavioral Scanning

The Behavioral Scanning mechanism operates independently of log generation, providing targeted security audits. It analyzes historical execution flows to detect latent threat patterns including subtle prompt injections, malicious skill invocations, credential leaks, and abnormal activity frequencies.

Figure 11: Security Scan Report showing 228 events scanned with 4 risks detected, including 20 log records containing suspicious prompt injection patterns.

## Watcher Based

### Watcher-based Protection: OpenClaw Overseeing OpenClaw

Almost all existing protection repositories take the form of skills and plugins integrated directly into the task-oriented OpenClaw framework. While convenient, this tightly coupled approach introduces fundamental limitations that undermine both robustness and long-term viability of the safety mechanism.

### Why a Separate Watcher?

### Task-Safety Coupling

The integrated approach requires OpenClaw to simultaneously optimize for task performance and safety compliance, creating an inherent tension. Stricter safety degrades task efficiency; prioritizing tasks weakens safety guarantees.

### Adversarial Vulnerability

Safety components deployed as ordinary skills or plugins are subject to the same modification and removal operations as any other module. An adversarial input could disable all safety-related components.

### Inability to Co-Evolve

Once installed, safety components remain static, contradicting OpenClaw's self-evolving design philosophy. As the agent evolves through environmental interaction, static defenses fall increasingly behind.

### Lack of Transparency

Safety enforcement relies entirely on OpenClaw's internal logic, making the process opaque to external observation and verification.

Figure 5: Overview of Watcher-based Protection showing the Independent Watcher's properties (Observability, Trigger Awareness, Execution Intervention, State Evolution Control, Recovery), workflow, and flexible deployment options.

### Key Advantages of the Watcher

Separation of Concerns: The task-executing OpenClaw optimizes for task performance while the Watcher focuses exclusively on safety oversight.

Resistance to Manipulation: The Watcher operates as a separate process outside the task agent's namespace and cannot be uninstalled or tampered with through task-level commands.

Co-Evolution: The Watcher's monitoring logic is maintained independently, enabling it to update its threat classification and detection capabilities as new risks emerge.

Transparency: All safety-relevant session data is transmitted to the Watcher and diagnosed externally, providing clear audit trails and verifiable enforcement records.

Flexible Deployment: Supports both local deployment (privacy-preserving, low-latency) and cloud deployment (multi-instance oversight via WebSocket).

### Watcher Case Studies

The following three scenarios demonstrate how the Watcher successfully intercepts unsafe or anomalous agent behaviors based on multi-turn context evaluation:

Case 1 &mdash; Command Execution Interception: The Watcher detects a dangerous Base64 payload intended for execution and requires explicit human confirmation before proceeding.

Case 2 &mdash; Excessive Tool Call Detection: The Watcher identifies an anomalous number of tool invocations in a single turn (a malicious role-switching attack) and immediately halts the execution run.

Case 3 &mdash; Failure Recovery Prevention: After a privileged command fails due to permission errors, the Watcher prevents blind retry attempts and enforces a proper state integrity check first.

## Threat Taxonomy

### Security Threat Taxonomy

The evaluation benchmark comprises seven categories of safety tasks, each representing a distinct class of adversarial threats that autonomous agents face in real-world deployment:
