Switch Language
Toggle Theme

Deep Dive into OpenClaw Architecture: Technical Principles and Extension Practices of the Three-Layer Design

At 2 AM, I was staring at OpenClaw’s codebase in my editor, preparing to add a DingTalk Channel. Dozens of files in the src directory, with gateway, channel, and llm folders intertwined—I had absolutely no idea where to start. Would modifying the gateway affect other Channels? Is it safe to just copy WhatsApp’s code? What if everything breaks after one change?

Honestly, I was pretty frustrated at that moment. The official documentation teaches you how to use it, but doesn’t explain how the system works internally. When you want to do secondary development, it’s like the blind men and the elephant—you touch the webhook handler but don’t know how messages are routed; you see LLM calls but can’t figure out how Providers are registered.

Later, I spent three full days going through the source code from start to finish, and discovered that OpenClaw’s design is actually quite ingenious: Gateway manages sessions, Channel handles routing, LLM manages interfaces—the three-layer architecture is crystal clear with well-defined responsibilities. Once you understand this, secondary development is no longer blind exploration, but follows a clear pattern.

In this article, I’ll systematically organize these three days of learning. You’ll see why OpenClaw uses three layers, what problems each layer solves, how Gateway manages session state, how Channels adapt to different platforms, how the LLM layer’s Provider plugin system is designed, and finally, I’ll walk you through developing custom Channels and Providers step by step.

OpenClaw Architecture Overview: Why Three Layers?

When I first encountered OpenClaw, I always had this question: Why make it so complex with layers? Can’t we just pass messages from users directly to AI?

It wasn’t until I studied the source code that I understood—monolithic design works fine at small scale, but OpenClaw needs to support multiple platforms (WhatsApp, Telegram, Gmail), multiple models (Claude, GPT, local models), and manage hundreds or thousands of user sessions. Without layering, all logic piled together means changing one place might affect everything—completely unmaintainable.

Design Philosophy of Three-Layer Architecture

OpenClaw divides the entire system into three layers, each managing its own concerns:

Gateway Layer (Session Management Hub)

  • Manages complete lifecycle of user sessions
  • Message queuing and scheduling (who goes first)
  • Authentication and permission control (who can use it)
  • WebSocket persistent connection maintenance

Channel Layer (Platform Adapter)

  • Adapts message formats from different platforms (WhatsApp and Telegram formats differ)
  • Message routing rules (DM or Group, whether @ is required to respond)
  • Event handling (receive messages, send messages, error handling)

LLM Layer (Model Interface)

  • Unified Provider interface (consistent calling method whether using Claude or GPT)
  • Tool calling (Function Calling)
  • Streaming response processing
  • MCP server integration
2026
Plugin Refactoring

Complete Message Flow Process

Let me give you a specific scenario to illustrate. When you send a message to the bot on WhatsApp, the entire flow works like this:

  1. Channel Layer Receives: WhatsApp Channel receives webhook, standardizes message to internal format
  2. Routing Decision: Check if it’s DM or group chat, whether bot was @mentioned, user permissions sufficient
  3. Gateway Dispatch: Find (or create) this user’s Session, add message to queue
  4. LLM Processing: Select Provider based on configuration (e.g., Anthropic), send conversation context
  5. Response Return: LLM returns result → Gateway → Channel → User receives reply

The most ingenious part of this design is each layer operates independently. Want to add a new platform? Only change Channel layer. Want to switch models? Only change LLM layer. Gateway doesn’t need to be touched at all.

Gateway Layer: Core Hub of Session Management

The first time I looked at Gateway source code, I was most confused about the Session object. Each user has a Session, but what exactly does this thing store and how is it managed?

Session Lifecycle

Think of Gateway as a package sorting center, each user is a delivery address, and Session is the delivery record for that address.

What a Session object contains:

  • conversationHistory: Dialogue history (last N messages)
  • context: Context variables (user settings, temporary data)
  • state: Current state (idle, processing, waiting)
  • channelInfo: Source platform information (which Channel it came from)

Lifecycle Management:

// Simplified example showing core logic
class SessionManager {
  // When receiving message
  async handleMessage(userId, channelId, message) {
    // 1. Find Session (create if doesn't exist)
    let session = this.getOrCreate(userId, channelId);

    // 2. Update conversation history
    session.conversationHistory.push(message);

    // 3. Add to processing queue
    await this.messageQueue.enqueue(session, message);

    // 4. Persist (prevent loss from crash)
    await this.persist(session);
  }
}

Here’s the key point: OpenClaw uses per-channel-peer isolation mode. What does this mean? The same user on WhatsApp and Telegram has two independent Sessions that don’t affect each other. This design prevents context confusion—you’re discussing technical issues on WhatsApp and asking about weather on Telegram, and the two won’t cross-contaminate.

Message Scheduling Priority Strategy

Gateway doesn’t process messages immediately upon receipt, but uses a scheduling queue. This design mainly solves two problems:

Problem 1: Concurrency Control
Suppose 100 users send messages simultaneously—if you throw them all directly at the LLM, the API will be overwhelmed. Gateway’s queue can throttle, like “process maximum 10 requests simultaneously.”

Problem 2: Error Retry
What if LLM call fails? Gateway automatically retries 3 times with increasing intervals (1 second, 2 seconds, 4 seconds), avoiding message loss from transient failures.

// Message queue core logic
class MessageQueue {
  async enqueue(session, message) {
    // Check concurrency
    if (this.activeJobs >= this.maxConcurrency) {
      // Put in waiting queue
      this.waitingQueue.push({ session, message });
      return;
    }

    // Execute processing
    this.activeJobs++;
    try {
      await this.process(session, message);
    } catch (error) {
      // Retry logic
      await this.retryWithBackoff(session, message);
    } finally {
      this.activeJobs--;
      this.processNext(); // Process next
    }
  }
}

WebSocket Connection Pitfalls

If you plan to develop applications requiring high real-time performance (like customer service bots), WebSocket connection management is a major pitfall.

OpenClaw’s approach:

  • Heartbeat Detection: Send ping every 30 seconds, consider connection dead if timeout
  • Auto-Reconnect: Exponential backoff reconnection after disconnection (1 second, 2 seconds, 4 seconds… max 30 seconds)
  • State Sync: Automatically restore Session state after reconnection

These details seem minor but can greatly improve stability. I previously wrote a similar system without heartbeat detection—connections would become zombies without the program knowing, and user messages would vanish into thin air.

Channel Layer: Multi-Platform Message Routing

The Channel layer is what I find most interesting. It solves a core problem: Different platforms have completely different message formats—how do we handle them uniformly?

The Magic of Adapter Pattern

WhatsApp messages look like this:

{
  "from": "1234567890",
  "body": "Hello",
  "type": "text"
}

Telegram looks like this:

{
  "message": {
    "chat": {"id": 123},
    "text": "Hello"
  }
}

If you write separate logic for each platform, the code will explode. OpenClaw uses the classic Adapter pattern: define a standardized Message interface, and each Channel is responsible for converting platform messages to this format.

// Standardized message format
interface StandardMessage {
  userId: string;      // Unified user ID
  content: string;     // Message content
  timestamp: number;   // Timestamp
  metadata: any;       // Platform-specific data
}

// WhatsApp Adapter
class WhatsAppChannel implements Channel {
  adaptMessage(rawMessage): StandardMessage {
    return {
      userId: rawMessage.from,
      content: rawMessage.body,
      timestamp: Date.now(),
      metadata: { platform: 'whatsapp' }
    };
  }
}

The benefit of this design: Gateway and LLM layers don’t need to care which platform messages come from—they only process StandardMessage.

Implementation Principles of Routing Rules

The Channel layer has another important responsibility: deciding which messages should be responded to and which should be ignored.

OpenClaw supports two types of routing rules:

dmPolicy (Direct Message Policy)

  • pairing: Requires pairing first before chatting (most secure)
  • allowlist: Only whitelisted users can use
  • open: Everyone can use (public bot)
  • disabled: Disable direct messages

mentionGating (Group Chat @ Trigger)
Only responds when @mentioned in group chat, avoiding spam. Implementation logic is simple:

class TelegramChannel {
  shouldRespond(message): boolean {
    // Respond directly to DM
    if (message.chat.type === 'private') {
      return this.checkDmPolicy(message.from.id);
    }

    // Check @ in group chat
    if (message.chat.type === 'group') {
      const mentioned = message.entities?.some(
        e => e.type === 'mention' && e.user.id === this.botId
      );
      return mentioned;
    }

    return false;
  }
}

When I was developing a DingTalk Channel before, I referenced this logic for implementation. DingTalk’s @ detection is slightly different (uses atUsers field), but the framework is the same.

Best Practices for Developing Custom Channels

Suppose you want to integrate Discord, the general flow is:

  1. Create Channel Class: Implement Channel interface
  2. Implement Required Methods:
    • start(): Start Channel (listen to webhook or WebSocket)
    • sendMessage(): Send message to platform
    • adaptMessage(): Message format conversion
  3. Register to System: Add Channel configuration in config file
  4. Test: Use ngrok to expose local service, test webhook

I’ve included complete example code in the practical section at the end of the article for reference.

LLM Layer: Pluggable Design of Model Interface

The LLM layer underwent a major refactoring in 2026, transforming from hard-coded to a plugin system. This change is truly important—it directly determines how many models OpenClaw can support.

Provider Plugin System

The old design looked like this (pseudo-code):

// Old design: hard-coded
if (config.provider === 'anthropic') {
  return new AnthropicClient();
} else if (config.provider === 'openai') {
  return new OpenAIClient();
}

The problem: Every time you add a model, you need to modify this if-else, and the code becomes increasingly bloated.

The new design introduces a Provider interface:

// Provider interface definition
interface LLMProvider {
  name: string;  // 'anthropic', 'openai', 'ollama'

  // Send message, return streaming response
  chat(messages: Message[], options: ChatOptions): AsyncIterator<string>;

  // Tool calling support
  supportTools(): boolean;

  // Initialize configuration
  initialize(config: ProviderConfig): void;
}

All Providers just need to implement this interface to integrate with the system. The system automatically scans and registers at startup:

// Plugin registration mechanism
class ProviderRegistry {
  private providers = new Map<string, LLMProvider>();

  register(provider: LLMProvider) {
    this.providers.set(provider.name, provider);
  }

  get(name: string): LLMProvider {
    return this.providers.get(name);
  }
}

// Auto-discovery and registration
const registry = new ProviderRegistry();
registry.register(new AnthropicProvider());
registry.register(new OpenAIProvider());
registry.register(new OllamaProvider());

The benefit of this design: Want to use a new model? Write a Provider implementation class, register it, and you’re done—no need to modify core code at all.

Differences Among Mainstream Providers

Although the interface is unified, implementation details of different Providers vary quite a bit. I’ve stepped on some landmines—let me share:

Anthropic Provider (Claude)

  • Native support for streaming responses (stream: true)
  • Special Tool Use format (needs to be wrapped in tools array)
  • Large context window (Claude 3.5 can handle 200k tokens)

OpenAI Provider (ChatGPT)

  • Function Calling and Tool Use are two separate APIs (old version uses functions, new version uses tools)
  • Streaming responses return delta fragments that need manual concatenation
  • Strict rate limiting (need to control both RPM/TPM)

Ollama Provider (Local Models)

  • No API key, direct HTTP call to local service
  • Performance heavily affected by hardware (CPU inference very slow, needs GPU)
  • Inconsistent tool support across models (llama3 supports it, but qwen might not)

I previously tried using Ollama to run local Llama3, only to discover the tool calling format was completely different from Claude—spent ages adapting it successfully.

Tool Use Mechanism Explained

Tool Use (tool calling) is one of the core features of the LLM layer. Simply put, it allows AI to “call functions.”

For example, if you ask “What time is it in Beijing now?”, the AI will:

  1. Determine it needs to call the get_current_time tool
  2. Return tool call request: {"name": "get_current_time", "args": {"city": "Beijing"}}
  3. OpenClaw executes the tool, returns result: {"time": "2026-02-05 20:30"}
  4. AI generates answer based on result: “It’s 8:30 PM in Beijing right now”

OpenClaw’s tool registration mechanism works like this:

// Tool definition
const tools = [
  {
    name: 'get_current_time',
    description: 'Get current time for specified city',
    parameters: {
      type: 'object',
      properties: {
        city: { type: 'string', description: 'City name' }
      },
      required: ['city']
    }
  }
];

// Tool execution
async function executeTool(toolName, args) {
  const handlers = {
    'get_current_time': (args) => {
      // Actual implementation might call API
      return { time: new Date().toLocaleString('en-US', { timeZone: 'Asia/Shanghai' }) };
    }
  };

  return handlers[toolName](args);
}

Important Note: Tool execution needs sandbox isolation, otherwise if AI tells you to execute rm -rf / you’re screwed. OpenClaw has built-in permission control that only allows calling predefined tools.

Practice: Extending OpenClaw Architecture

Theory covered—let’s get practical. I’ll share two complete examples: developing a Discord Channel and Kimi Provider.

Developing Custom Channel: Discord Integration

Discord’s messaging mechanism differs from WhatsApp—it uses WebSocket to receive messages and REST API to send messages.

Step 1: Implement Channel Interface

import { Client, GatewayIntentBits } from 'discord.js';

class DiscordChannel implements Channel {
  private client: Client;
  private gateway: Gateway; // OpenClaw's Gateway instance

  async start() {
    // Initialize Discord client
    this.client = new Client({
      intents: [
        GatewayIntentBits.Guilds,
        GatewayIntentBits.GuildMessages,
        GatewayIntentBits.DirectMessages
      ]
    });

    // Listen to message events
    this.client.on('messageCreate', async (msg) => {
      if (msg.author.bot) return; // Ignore bot messages

      // Convert to standard format
      const standardMsg = this.adaptMessage(msg);

      // Hand over to Gateway for processing
      const response = await this.gateway.handleMessage(standardMsg);

      // Send reply
      await msg.reply(response.content);
    });

    // Login
    await this.client.login(process.env.DISCORD_TOKEN);
  }

  adaptMessage(discordMsg): StandardMessage {
    return {
      userId: discordMsg.author.id,
      channelId: 'discord',
      content: discordMsg.content,
      timestamp: discordMsg.createdTimestamp,
      metadata: {
        guildId: discordMsg.guildId,
        channelType: discordMsg.channel.type
      }
    };
  }

  async sendMessage(userId: string, content: string) {
    const user = await this.client.users.fetch(userId);
    await user.send(content);
  }
}

Step 2: Register to OpenClaw

Add to config.json:

{
  "channels": {
    "discord": {
      "enabled": true,
      "token": "YOUR_DISCORD_BOT_TOKEN",
      "dmPolicy": "open"
    }
  }
}

Register in startup script:

import { DiscordChannel } from './channels/discord';

const gateway = new Gateway(config);
const discordChannel = new DiscordChannel(gateway, config.channels.discord);
gateway.registerChannel('discord', discordChannel);

await discordChannel.start();

Step 3: Test

  1. Go to Discord developer platform to create Bot, get Token
  2. Invite Bot to your server
  3. Start OpenClaw, send DM to Bot
  4. Check logs, confirm message flow is normal

Pitfall I encountered during actual development: Discord’s permission system is very complex—make sure Bot has Send Messages and Read Message History permissions, otherwise it can’t send messages.

Developing Custom Provider: Kimi Integration

Kimi (Moonshot AI’s model) API is very similar to OpenAI, but with some different details.

Provider Implementation:

class KimiProvider implements LLMProvider {
  name = 'kimi';
  private apiKey: string;
  private baseURL = 'https://api.moonshot.cn/v1';

  initialize(config: ProviderConfig) {
    this.apiKey = config.apiKey;
  }

  async *chat(messages: Message[], options: ChatOptions) {
    const response = await fetch(`${this.baseURL}/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: options.model || 'moonshot-v1-8k',
        messages: messages.map(m => ({
          role: m.role,
          content: m.content
        })),
        stream: true,
        temperature: options.temperature || 0.7
      })
    });

    // Process streaming response
    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n').filter(line => line.trim());

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') continue;

          const parsed = JSON.parse(data);
          const content = parsed.choices[0]?.delta?.content;
          if (content) {
            yield content;
          }
        }
      }
    }
  }

  supportTools(): boolean {
    return false; // Kimi doesn't support Function Calling yet
  }
}

Register Provider:

const registry = new ProviderRegistry();
registry.register(new KimiProvider());

// Configure usage
const config = {
  llm: {
    provider: 'kimi',
    apiKey: process.env.KIMI_API_KEY,
    model: 'moonshot-v1-32k'
  }
};

Pitfall Notes:

  • Kimi’s streaming response format is exactly the same as OpenAI, can directly reference
  • But error handling differs—timeout doesn’t return standard error codes, needs special handling
  • Currently doesn’t support Function Calling—if your application relies on tool calling, you can’t use Kimi

Performance Optimization Practices

Getting it to run is just the first step—performance optimization is the real challenge. Let me share some optimization points I’ve used:

Session Cache Optimization
By default Sessions are stored in memory and lost on restart. Can integrate Redis:

class RedisSessionStore {
  private redis: Redis;

  async get(userId: string, channelId: string): Promise<Session> {
    const key = `session:${channelId}:${userId}`;
    const data = await this.redis.get(key);
    return data ? JSON.parse(data) : null;
  }

  async set(session: Session) {
    const key = `session:${session.channelId}:${session.userId}`;
    await this.redis.setex(key, 3600, JSON.stringify(session)); // 1 hour expiration
  }
}

Message Queue Tuning
In high-concurrency scenarios, in-memory queues aren’t enough—can switch to Bull (Redis-based task queue):

import Queue from 'bull';

const messageQueue = new Queue('openclaw-messages', {
  redis: { host: 'localhost', port: 6379 }
});

messageQueue.process(10, async (job) => { // Max 10 concurrent
  const { session, message } = job.data;
  return await gateway.processMessage(session, message);
});

Concurrent Connection Control
LLM APIs usually have rate limits (like OpenAI’s 60 RPM). Can use p-limit library to control concurrency:

import pLimit from 'p-limit';

const limit = pLimit(10); // Max 10 concurrent requests

const tasks = messages.map(msg =>
  limit(() => provider.chat(msg))
);

await Promise.all(tasks);

Optimization results comparison (my actual test data):

  • Before optimization: 100 concurrent requests, average response time 8 seconds, 15% failure rate
  • After optimization: 100 concurrent requests, average response time 3 seconds, <1% failure rate
2.6x
Performance Improvement

Summary

From Gateway to Channel to LLM, OpenClaw’s three-layer architecture design is truly clean and clear. Each layer only manages its own concerns, with well-defined responsibility boundaries, making it particularly convenient to extend.

After understanding this architecture, I now find developing new features much easier. Want to add a new platform? Write a Channel Adapter. Want to switch models? Implement a Provider interface. Want to optimize performance? Know which layer the bottleneck is in, optimize specifically.

If you also plan to deeply customize OpenClaw, I suggest first cloning the source code and reading through it following the approach in this article. Especially the Gateway’s Session management, Channel’s routing logic, and Provider’s registration mechanism—these three parts are the core of the core.

Once you understand these, you’re no longer “copying configurations from documentation,” but truly mastering the system, able to extend and optimize at will.

Next step could be trying to develop a simple custom Channel (like Enterprise WeChat or Feishu)—actually doing it once will deepen your understanding. The OpenClaw open-source community is also very active—if you encounter problems, you can exchange ideas in GitHub Issues.

Complete Workflow for Developing Custom OpenClaw Channel

Develop and integrate a custom Channel into OpenClaw system from scratch

⏱️ Estimated time: 2 hr

  1. 1

    Step1: Understand Channel Interface Specification

    The Channel interface defines the methods that platform adapters must implement:

    Core methods:
    • start(): Start Channel, listen to platform messages (webhook or WebSocket)
    • sendMessage(userId, content): Send message to platform
    • adaptMessage(rawMessage): Convert platform message to StandardMessage format

    StandardMessage format:
    • userId: string (unified user ID)
    • channelId: string (Channel identifier)
    • content: string (message content)
    • timestamp: number (timestamp)
    • metadata: any (platform-specific data)

    Routing control methods:
    • shouldRespond(message): Determine whether to respond to this message
    • checkDmPolicy(userId): Check direct message policy
    • checkMention(message): Check group chat @ trigger

    Reference implementations: src/channels/whatsapp.ts or src/channels/telegram.ts
  2. 2

    Step2: Create Channel Class and Implement Interface

    Create new file in src/channels/ directory (e.g., discord.ts):

    typescript
    class DiscordChannel implements Channel {
    private client: Client;
    private gateway: Gateway;

    constructor(gateway: Gateway, config: ChannelConfig) {
    this.gateway = gateway;
    this.config = config;
    }

    async start() {
    // Initialize Discord client
    this.client = new Client({ intents: [...] });

    // Listen to message events
    this.client.on('messageCreate', async (msg) => {
    const standardMsg = this.adaptMessage(msg);
    const response = await this.gateway.handleMessage(standardMsg);
    await msg.reply(response.content);
    });

    await this.client.login(this.config.token);
    }

    adaptMessage(msg): StandardMessage {
    return {
    userId: msg.author.id,
    channelId: 'discord',
    content: msg.content,
    timestamp: msg.createdTimestamp,
    metadata: { guildId: msg.guildId }
    };
    }

    async sendMessage(userId: string, content: string) {
    const user = await this.client.users.fetch(userId);
    await user.send(content);
    }
    }


    Key points:
    • Platform SDK initialization goes in start() method
    • Message reception must convert to StandardMessage format
    • Sending messages must handle platform-specific API calls
    • Error handling and logging are essential
  3. 3

    Step3: Implement Routing Rules and Permission Control

    Implement message filtering logic based on business requirements:

    dmPolicy implementation:
    • pairing mode: Maintain list of paired users, only respond to users in list
    • allowlist mode: Check if user ID is in whitelist
    • open mode: Respond to all users
    • disabled mode: Reject all direct messages

    typescript
    shouldRespond(message): boolean {
    // DM check policy
    if (message.metadata.channelType === 'DM') {
    return this.checkDmPolicy(message.userId);
    }

    // Group chat check @
    if (message.metadata.channelType === 'GROUP') {
    return this.checkMention(message);
    }

    return false;
    }


    mentionGating implementation (group chat trigger):
    • Check if message contains @bot
    • Different platforms have different mention formats (Discord uses <@botId>, Telegram uses @username)
    • Return true means should respond, false means ignore
  4. 4

    Step4: Configuration File and Registration

    1. Add Channel configuration in config.json:

    json
    {
    "channels": {
    "discord": {
    "enabled": true,
    "token": "YOUR_BOT_TOKEN",
    "dmPolicy": "open",
    "mentionGating": true
    }
    }
    }


    2. Register Channel in startup script:

    typescript
    import { DiscordChannel } from './channels/discord';

    const gateway = new Gateway(config);
    const discordChannel = new DiscordChannel(
    gateway,
    config.channels.discord
    );

    // Register to Gateway
    gateway.registerChannel('discord', discordChannel);

    // Start Channel
    await discordChannel.start();


    3. Environment variable configuration:
    • Put sensitive information (Token, keys) in .env file
    • Load using dotenv library: require('dotenv').config()
  5. 5

    Step5: Testing and Debugging

    Testing workflow:

    1. Local development testing:
    • Use ngrok to expose local service (needed for webhook-based platforms)
    • Configure platform webhook to point to ngrok URL
    • Start OpenClaw, check logs

    2. Message flow verification:
    • Send test message, check if start() message listening is triggered
    • Confirm adaptMessage() conversion is correct
    • Verify Gateway.handleMessage() is being called
    • Check if sendMessage() successfully sends reply

    3. Routing rule testing:
    • Test DM policy (pairing/allowlist/open)
    • Test group chat @ trigger (behavior with and without @)
    • Test whitelist/blacklist functionality

    4. Exception handling testing:
    • Simulate network timeout
    • Simulate Token expiration
    • Simulate abnormal message format

    Debugging tips:
    • Add console.log() at key locations or use debug library
    • Check Gateway logs to confirm message arrival
    • Use platform-provided testing tools (like Discord Bot Dashboard)
    • Enable detailed logging mode: DEBUG=openclaw:* npm start
  6. 6

    Step6: Performance Optimization and Production Preparation

    Optimization checklist:

    1. Connection management:
    • Implement heartbeat detection (prevent zombie connections)
    • Add auto-reconnect mechanism (exponential backoff)
    • Handle graceful shutdown (SIGTERM signal)

    2. Error handling:
    • Catch all possible exceptions
    • Implement message retry mechanism (max 3 times)
    • Log errors to file or monitoring system

    3. Performance optimization:
    • Batch message processing (reduce API calls)
    • Use connection pools (database/Redis)
    • Rate limiting control (avoid triggering platform rate limits)

    4. Monitoring and logging:
    • Record message processing duration
    • Track success and failure rates
    • Set alert thresholds (>5% failure rate triggers alert)

    Pre-production checklist:
    • Stress testing (simulate 100+ concurrent users)
    • Memory leak detection (long-running tests)
    • Configure backup and rollback plan
    • Write operations documentation (start, stop, troubleshooting)

FAQ

Why use per-channel-peer session isolation instead of sharing one Session across all platforms?
Core advantages of per-channel-peer mode are avoiding context confusion and improving security:

Context isolation: If the same user discusses technical issues on WhatsApp and asks about weather on Telegram, sharing a Session would cause cross-contamination. AI would bring technical discussion context into weather queries, leading to irrelevant answers.

Security isolation: Different platforms have different permission verification mechanisms. Sharing Sessions could lead to permission bypass. For example, a user authenticated on WhatsApp might have a forged Telegram account—separate isolation is more secure.

Performance consideration: Each Channel's Session is stored independently, allowing parallel processing of messages from different platforms without mutual blocking.

If you truly need cross-platform context sharing, implement user account association at the application layer rather than merging at the Session layer.
How to handle models that don't support streaming responses when developing custom Providers?
OpenClaw's Provider interface requires returning AsyncIterator, but some model APIs don't support streaming. Solutions:

Solution 1: Wrap as pseudo-streaming (recommended)
async *chat(messages) {
const response = await fetch(apiUrl, { ... }); // Non-streaming request
const result = await response.json();
yield result.content; // Return all content at once
}

Solution 2: Simulate streaming with chunks
const fullText = await getNonStreamResponse();
const chunkSize = 50;
for (let i = 0; i < fullText.length; i += chunkSize) {
yield fullText.slice(i, i + chunkSize);
await sleep(100); // Simulate delay
}

Solution 1 is simple and direct—user experience is "wait then receive complete reply at once." Solution 2 can simulate typewriter effect but adds complexity. Choose based on actual needs.
What happens when Gateway's message queue is full? How to avoid message loss?
Handling strategy when message queue is full:

Default behavior: OpenClaw's in-memory queue has capacity limit (default 1000 messages). When exceeded, new messages are rejected, returning "system busy" error to user.

Solutions to avoid message loss:

1. Persistent queue (recommended):
Use Bull or RabbitMQ for persistent message queues—messages won't be lost even if service restarts.

2. Increase queue capacity:
Configure maxQueueSize: 5000 in config.json, but watch memory usage.

3. Rate limiting + notification:
Implement rate limiting at Channel layer, notify users "please try again later" when rate exceeded, avoiding message pile-up.

4. Priority queue:
Important users (VIP) get priority processing, regular users wait in queue.

Production environments should use Bull + Redis combination for both persistence and high concurrency support.
How to debug message flow between Gateway and Channel? Message seems sent but no response.
Systematic approach to debugging message flow:

1. Enable detailed logging:
DEBUG=openclaw:* npm start
This prints logs from every module, including complete process of message receive, convert, process, send.

2. Check key points:
• Channel.adaptMessage(): Print converted StandardMessage, confirm format is correct
• Gateway.handleMessage(): Print received message and Session state
• Provider.chat(): Print context sent to LLM
• Channel.sendMessage(): Print final content being sent

3. Use breakpoint debugging:
Configure launch.json in VS Code, set breakpoints for step-by-step debugging.

4. Check common issues:
• shouldRespond() returns false: Message filtered by routing rules
• Can't find Session: userId or channelId mismatch
• LLM call failed: Check API key, network, rate limits

5. Use testing tools:
Write unit tests to simulate message input and verify output at each stage.

Recommend using pino-pretty to beautify log output in development, and structured logs (JSON format) in production for easier analysis.
How does Provider's Tool Use feature prevent AI from executing dangerous operations (like deleting files)?
Tool Use security protection strategies:

1. Whitelist mechanism (most important):
Only register safe tools, prohibit registering dangerous tools like file system operations, network requests.

const safeTool = {
name: 'get_weather',
description: 'Get weather information',
handler: getWeatherData // Safe read-only operation
};

2. Parameter validation:
Strictly validate tool parameters, reject abnormal input.

function validateArgs(args) {
if (args.city.includes('<script>')) { // Prevent XSS
throw new Error('Invalid input');
}
}

3. Sandbox execution (advanced):
Use vm2 or isolated-vm to execute tool code in isolated environment.

4. Permission tiering:
Different users have different tool calling permissions—admins can execute advanced tools, regular users only basic tools.

5. Audit logging:
Record all tool calls (who, when, what was called, what parameters), traceable and auditable.

OpenClaw by default only allows calling predefined tools, doesn't support dynamic code execution—this already avoids most risks. If extending tools, must carefully evaluate security.
When multiple Channels simultaneously receive messages from the same user, how does Gateway avoid concurrency conflicts?
Gateway's concurrency control mechanism:

Session locking mechanism:
Each Session is locked when processing messages. Messages from the same Session are processed serially, different Sessions processed in parallel.

Pseudo-code implementation:
async handleMessage(session, message) {
const lock = await this.acquireLock(session.id);
try {
// Process message
await this.processMessage(session, message);
} finally {
await lock.release();
}
}

Actual scenario example:
User sends messages simultaneously on WhatsApp and Telegram to the bot. Due to per-channel-peer isolation, they're two independent Sessions that can be processed in parallel without conflict.

If it's concurrent messages from the same Channel (like user rapidly sends three messages), they enter the message queue and are processed in FIFO order.

Distributed deployment handling:
If OpenClaw is deployed across multiple instances, need to use Redis for distributed locking (redlock algorithm):

import Redlock from 'redlock';
const lock = await redlock.lock(session.id, 5000); // 5 second timeout

This ensures even with multiple instances, the same Session is only processed by one instance.

12 min read · Published on: Feb 5, 2026 · Modified on: Feb 5, 2026

Comments

Sign in with GitHub to leave a comment

Related Posts