21 - 拦截器链与洋葱模型

基于 Spring AI Alibaba 的 AOP 拦截器链，为 LLM 调用和工具执行提供可插拔切面

一、什么是洋葱模型？

洋葱模型是一种经典的中间件/拦截器执行模式，请求像穿过洋葱一样逐层通过各拦截器：进入时正序，返回时逆序。

核心特征：before 方法按 order 正序执行（10 → 20 → 30），after 方法按 order 逆序执行（30 → 20 → 10）。任一 before 返回 null 可立即短路。

  洋葱模型执行顺序（3 个拦截器 A=10, B=20, C=30）:

  请求 ──→ ┌─────────────────────────────────────────┐
           │  A.beforeModel() ← order=10 (最先)      │
           │  ┌─────────────────────────────────────┐ │
           │  │  B.beforeModel() ← order=20         │ │
           │  │  ┌─────────────────────────────────┐ │ │
           │  │  │  C.beforeModel() ← order=30     │ │ │
           │  │  │                                  │ │ │
           │  │  │      [LLM 调用 / 工具执行]       │ │ │
           │  │  │                                  │ │ │
           │  │  │  C.afterModel() ← order=30      │ │ │
           │  │  └─────────────────────────────────┘ │ │
           │  │  B.afterModel() ← order=20          │ │
           │  └─────────────────────────────────────┘ │
           │  A.afterModel() ← order=10 (最后)       │
           └─────────────────────────────────────────┘ ──→ 响应

  短路示例（B.before 返回 null）:
  A.before() ✅ → B.before() ❌ 返回null → C.before() 跳过 → 目标执行 跳过

二、核心架构

InterceptorChain — 链式执行引擎

  InterceptorChain
  ├── modelInterceptors: List<ModelInterceptor>   ← LLM 调用拦截
  │   ├── beforeModel(request, context)  → 正序
  │   └── afterModel(response, context)  → 逆序
  │
  └── toolInterceptors: List<ToolInterceptor>    ← 工具调用拦截
      ├── beforeTool(request, context)   → 正序
      └── afterTool(response, context)   → 逆序

  执行引擎内部方法:
  ├── executeChainForward()   — 泛型正序链，null 中断
  └── executeChainReverse()   — 泛型逆序链，异常不中断

拦截器接口

接口	方法	默认 order	返回 null 效果
`ModelInterceptor`	beforeModel / afterModel	100	中断链路，跳过 LLM 调用
`ToolInterceptor`	beforeTool / afterTool	100	中断链路，跳过工具执行

设计要点：所有方法都有默认实现（直接传递），实现者只需覆盖关心的方法。getOrder() 控制执行顺序，值越小越先执行。

三、内置拦截器

CachingToolInterceptor（order=20）

基于工具名 + 参数哈希的进程内缓存，避免重复调用相同参数的工具。

  beforeTool() 流程:
  ┌─────────────────────────────────────────────────────┐
  │ 1. 生成缓存 key = toolName + hash(arguments)        │
  │ 2. 查找 ConcurrentHashMap                           │
  │    ├── 命中且未过期:                                 │
  │    │   ├── cacheHits++                               │
  │    │   ├── 上下文标记 caching.tool.cacheHit = true   │
  │    │   └── 返回 null（中断链路）                     │
  │    └── 未命中或已过期:                               │
  │        ├── cacheMisses++                             │
  │        ├── 惰性淘汰过期条目                          │
  │        └── 返回原 request（继续链路）                │
  └─────────────────────────────────────────────────────┘

  afterTool() 流程:
  ┌─────────────────────────────────────────────────────┐
  │ 成功响应 → 写入缓存（TTL = 5 分钟）                 │
  └─────────────────────────────────────────────────────┘

  定时清理: @Scheduled(fixedRate=60000) evictExpired()

配置	值	说明
TTL	5 分钟	缓存过期时间
存储	ConcurrentHashMap	进程内缓存，线程安全
淘汰	60 秒定时 + 惰性淘汰	双重保护避免内存泄漏

RetryToolInterceptor（order=30）

协作式指数退避重试 — 拦截器仅标记"需要重试"，实际重试由调用方执行。

  协作式重试设计:

  beforeTool():
    └── 初始化 retryCount=0, maxRetries=2 到上下文

  afterTool():
    ├── 成功 → 标记 shouldRetry=false
    └── 失败:
        ├── retryCount < maxRetries?
        │   ├── Yes → shouldRetry=true
        │   │         suggestedDelayMs = 500 × 2^attempt
        │   │         (500ms → 1000ms → 2000ms ...)
        │   └── No  → shouldRetry=false (重试耗尽)
        └── 调用方检查 RetryToolInterceptor.shouldRetry(context)
            决定是否重新执行工具调用

为什么是协作式？拦截器本身不执行重试循环，因为工具调用可能涉及异步/非阻塞操作。通过上下文标记通知调用方，由调用方在合适的执行上下文中完成重试。

四、InterceptorContext — 跨阶段数据传递

InterceptorContext 是一个 ConcurrentHashMap，在 before → 目标执行 → after 整个过程中共享。

// CachingToolInterceptor 在 beforeTool 中写入
context.put("caching.tool.cacheHit", true);

// 调用方在 beforeTool 返回 null 后读取缓存响应
ToolResponse cached = cachingInterceptor.getCachedResponse(cacheKey);

// RetryToolInterceptor 在 afterTool 中写入
context.put("retry.shouldRetry", true);
context.put("retry.suggestedDelayMs", 1000L);

// 调用方在 afterTool 后检查
if (RetryToolInterceptor.shouldRetry(context)) {
    Thread.sleep(context.get("retry.suggestedDelayMs"));
    // 重新执行工具调用...
}

五、监控与自省

InterceptorChain.getChainInfo() 提供运行时自省能力：

GET /api/debug/interceptor-stats

{
  "modelInterceptors": [...],
  "toolInterceptors": [
    {"name": "CachingToolInterceptor", "order": 20},
    {"name": "RetryToolInterceptor", "order": 30}
  ],
  "cacheStats": {
    "cacheSize": 42,
    "cacheHits": 156,
    "cacheMisses": 89,
    "hitRate": 0.637,
    "ttlMs": 300000
  },
  "retryStats": {
    "maxRetries": 2,
    "baseDelayMs": 500,
    "totalRetries": 23,
    "retriesExhausted": 3
  }
}

六、面试高频问题

Q: 洋葱模型和责任链模式有什么区别？
A: 责任链模式是单向的（请求只往前传递），处理者之间是"或"关系（一个处理者处理了就停止）。洋葱模型是双向的（请求正向穿过所有层，响应逆向穿回），每层都参与处理，形成"且"关系。本项目的 InterceptorChain 是洋葱模型——所有拦截器都会执行 before 和 after（除非短路）。

Q: 为什么缓存拦截器的 order 比重试拦截器小？
A: order 越小越先执行。缓存应该在重试之前检查——如果缓存命中，直接返回结果，不需要执行工具调用，自然也不需要重试。如果反过来，重试逻辑会在没有必要的时候运行。

Q: 拦截器链和 Hook 系统有什么区别？
A: 拦截器链工作在单次调用级别（一次 LLM 调用或工具调用），关注请求/响应的修改和控制。Hook 系统工作在Agent 生命周期级别（整个 Agent 执行过程），关注执行前/后/错误/超时等生命周期事件。拦截器能修改请求和响应数据，Hook 主要做审计/监控/安全。

七、扩展拦截器详解

a) ModelFallbackInterceptor（order=10, ModelInterceptor）

ModelInterceptor order=10

核心职责：当主模型调用失败时，自动遍历备用模型列表进行降级重试，保障 Agent 在单一模型不可用时仍能正常响应。

算法说明：afterModel() 检测响应是否为失败（异常或空响应）。若失败且未处于降级状态（通过 CTX_IS_FALLBACK 标记判断），则依次尝试备用模型列表中的模型。每次尝试前设置 CTX_IS_FALLBACK=true，防止递归降级。成功则返回降级响应，全部失败则返回原始错误。

  ModelFallbackInterceptor 执行流程:

  afterModel(response, context):
  ┌──────────────────────────────────────────────────────────┐
  │ 主模型调用完成                                           │
  │   │                                                      │
  │   ▼                                                      │
  │ 响应失败?──── No ────→ 直接返回成功响应                  │
  │   │                                                      │
  │  Yes                                                     │
  │   │                                                      │
  │   ▼                                                      │
  │ CTX_IS_FALLBACK == true? ── Yes ──→ 直接返回（防递归）   │
  │   │                                                      │
  │  No                                                      │
  │   │                                                      │
  │   ▼                                                      │
  │ 设置 CTX_IS_FALLBACK = true                              │
  │   │                                                      │
  │   ▼                                                      │
  │ ┌────────────────────────────────────┐                   │
  │ │ for model in fallbackModels:       │                   │
  │ │   ├── 尝试调用 model              │                   │
  │ │   ├── 成功? → 返回降级响应         │                   │
  │ │   └── 失败? → 重试下一个           │                   │
  │ │       (最多 maxRetries 次/模型)    │                   │
  │ └────────────────────────────────────┘                   │
  │   │                                                      │
  │   ▼                                                      │
  │ 全部失败 → 返回原始错误响应                              │
  │   │                                                      │
  │   ▼                                                      │
  │ remove CTX_IS_FALLBACK                                   │
  └──────────────────────────────────────────────────────────┘

# application.yml 配置示例
agent:
  interceptor:
    model-fallback:
      enabled: true
      fallback-models:
        - qwen-turbo        # 第一备选
        - qwen-plus         # 第二备选
        - gpt-3.5-turbo     # 第三备选
      max-retries: 2        # 每个模型最大重试次数

与密钥池降级的区别：密钥池降级（KeyPoolManager）是同一模型切换不同 API Key，解决的是额度/限流问题。ModelFallbackInterceptor 是切换不同模型，解决的是模型本身不可用的问题。两者互补：密钥池先尝试同模型不同 Key，仍失败后 Fallback 拦截器切换模型。

b) LargeResultEvictionInterceptor（order=20, ToolInterceptor）

ToolInterceptor order=20

核心职责：当工具返回结果过大（超出 Token 阈值）时，将完整结果写入临时文件，并用摘要引用替换原始结果，防止 Token 爆炸。

算法说明：afterTool() 估算响应内容的 Token 数（字符数 / 4 近似）。超过阈值时，将完整内容持久化到磁盘文件，然后用前 N 个字符的摘要 + 文件路径引用替换原响应。

  LargeResultEvictionInterceptor 执行流程:

  afterTool(response, context):
  ┌────────────────────────────────────────────────────────────┐
  │ 工具返回 response                                          │
  │   │                                                        │
  │   ▼                                                        │
  │ 估算 Token ≈ response.length() / 4                        │
  │   │                                                        │
  │   ▼                                                        │
  │ Token > threshold(20000)?                                  │
  │   │              │                                         │
  │  No             Yes                                        │
  │   │              │                                         │
  │   ▼              ▼                                         │
  │ 原样返回     写入文件:                                     │
  │              evictionDir/agentId/toolName_timestamp.txt    │
  │                  │                                         │
  │                  ▼                                         │
  │              构造摘要响应:                                  │
  │              ┌──────────────────────────────────────┐      │
  │              │ [前 preserveSampleChars 字符内容]     │      │
  │              │ ...                                  │      │
  │              │ [完整结果已保存至: /path/to/file]     │      │
  │              │ [原始大小: 85000 chars, Token≈21250] │      │
  │              └──────────────────────────────────────┘      │
  │                  │                                         │
  │                  ▼                                         │
  │              返回摘要响应（替换原 response）                │
  └────────────────────────────────────────────────────────────┘

  文件写入失败时的降级:
  IOException → 截断为 preserveSampleChars×4 字符 + 截断标记
              → 保证 Agent 可继续执行

# application.yml 配置示例
agent:
  interceptor:
    large-result-eviction:
      enabled: true
      token-threshold: 20000          # Token 阈值
      eviction-dir: /tmp/agent-evict  # 驱逐文件目录
      preserve-sample-chars: 500      # 摘要保留字符数

c) ContextEditingInterceptor（order=30, ModelInterceptor）

ModelInterceptor order=30

核心职责：将 ContextEngineeringService 包装进拦截器管线，在 LLM 调用前自动压缩/编辑上下文，防止对话过长导致 Token 超限。

算法说明：beforeModel() 获取当前请求的消息列表，调用 ContextEngineeringService 按配置策略进行上下文编辑。编辑后的消息列表替换原请求中的消息。

4 种策略对比

策略	原理	适用场景	代价
`TRIM`	保留首尾 N 条消息，中间裁剪	简单对话、成本敏感	丢失中间上下文
`SUMMARIZE`	对超长部分调用 LLM 生成摘要	需要保留语义连贯性	额外 LLM 调用开销
`SLIDING_WINDOW`	保留最近 N 轮对话的滑动窗口	实时对话、最近上下文最重要	丢失早期信息
`HYBRID`	摘要 + 滑动窗口组合	长对话、兼顾全局和局部	实现复杂、摘要开销

# application.yml 配置示例
agent:
  interceptor:
    context-editing:
      enabled: true
      strategy: HYBRID              # TRIM | SUMMARIZE | SLIDING_WINDOW | HYBRID
      max-tokens: 8000              # 压缩后目标 Token 上限
      trim-keep-first: 2            # TRIM策略: 保留前N条
      trim-keep-last: 10            # TRIM策略: 保留后N条
      summarize-threshold: 4000     # SUMMARIZE策略: 超过此Token数触发摘要

d) SubAgentInterceptor（order=60, ModelInterceptor）

ModelInterceptor order=60

核心职责：根据用户输入的关键词匹配和评分，判断是否应将请求委派给专业子 Agent，并在 system prompt 中注入委派提示。

算法说明：遍历配置的 SubAgentSpec 列表，对用户消息进行关键词匹配打分。当最高分超过 minMatchScore 阈值时，在 system prompt 中注入委派建议（如"建议将此请求转交给 xxx 子Agent"），由 LLM 决定是否执行委派。

  SubAgentInterceptor 执行流程:

  beforeModel(request, context):
  ┌────────────────────────────────────────────────────────────┐
  │ 提取用户最新消息 userMessage                               │
  │   │                                                        │
  │   ▼                                                        │
  │ for each SubAgentSpec:                                     │
  │   ├── keywords.forEach:                                    │
  │   │     message.contains(keyword) → score += weight        │
  │   └── 记录 {agentName, score}                              │
  │   │                                                        │
  │   ▼                                                        │
  │ 最高分 bestMatch                                           │
  │   │                                                        │
  │   ▼                                                        │
  │ bestMatch.score >= minMatchScore?                          │
  │   │              │                                         │
  │  No             Yes                                        │
  │   │              │                                         │
  │   ▼              ▼                                         │
  │ 原样返回     注入 system prompt:                           │
  │              "[DELEGATION_HINT] 建议委派给:                │
  │               {agentName}, 匹配度: {score}"               │
  │                  │                                         │
  │                  ▼                                         │
  │              返回修改后的 request                           │
  └────────────────────────────────────────────────────────────┘

# application.yml 配置示例
agent:
  interceptor:
    sub-agent:
      enabled: true
      min-match-score: 5            # 最低匹配分数阈值
      agents:
        - name: code-review-agent
          description: "代码审查专家"
          keywords:
            - keyword: "代码审查"
              weight: 5
            - keyword: "review"
              weight: 4
            - keyword: "代码质量"
              weight: 3
        - name: data-analysis-agent
          description: "数据分析专家"
          keywords:
            - keyword: "数据分析"
              weight: 5
            - keyword: "统计"
              weight: 3

e) ToolSelectionInterceptor（order=70, ModelInterceptor）

ModelInterceptor order=70

核心职责：根据用户消息内容对可用工具进行相关性评分，筛选出最相关的 Top-N 工具，减少 LLM 的工具选择负担和 Token 消耗。

评分算法：

匹配方式	得分	说明
工具名完全包含在消息中	+10	如消息含"searchWeb"匹配 searchWeb 工具
工具名按驼峰拆分后部分匹配	+3（每段）	如"search"匹配 searchWeb 中的 search 段
工具描述关键词匹配	+2（每词）	描述中的关键词与消息匹配

按总分降序排列，取前 maxTools 个。alwaysInclude 列表中的工具无论评分如何都会保留。

  ToolSelectionInterceptor 评分流程:

  beforeModel(request, context):
  ┌────────────────────────────────────────────────────────────┐
  │ 获取所有可用工具 allTools                                  │
  │   │                                                        │
  │   ▼                                                        │
  │ for each tool in allTools:                                 │
  │   score = 0                                                │
  │   ├── message.contains(tool.name)       → score += 10     │
  │   ├── camelSplit(tool.name).forEach:                       │
  │   │     message.contains(part)          → score += 3      │
  │   └── tool.description.keywords.forEach:                   │
  │         message.contains(keyword)       → score += 2      │
  │   │                                                        │
  │   ▼                                                        │
  │ 排序 + 取 top maxTools(10)                                │
  │   │                                                        │
  │   ▼                                                        │
  │ 合并 alwaysInclude 工具（去重）                            │
  │   │                                                        │
  │   ▼                                                        │
  │ 替换 request 中的工具列表                                  │
  └────────────────────────────────────────────────────────────┘

# application.yml 配置示例
agent:
  interceptor:
    tool-selection:
      enabled: true
      max-tools: 10                  # 最多保留工具数
      always-include:                # 始终保留的工具
        - getCurrentTime
        - searchKnowledgeBase

f) TodoListInterceptor（order=80, ModelInterceptor）

ModelInterceptor order=80

核心职责：在 system prompt 中注入当前任务列表，并从 LLM 响应中解析任务管理指令，实现 Agent 的自主任务追踪能力。

指令格式：LLM 在响应中嵌入特殊指令，拦截器通过正则解析执行：

// LLM 输出中的指令格式
[TODO_ADD: title=实现用户登录接口]
[TODO_UPDATE: id=3, status=IN_PROGRESS]
[TODO_UPDATE: id=1, status=COMPLETED]
[TODO_UPDATE: id=5, status=CANCELLED]

// TodoStatus 枚举
enum TodoStatus {
    PENDING,        // 待处理
    IN_PROGRESS,    // 进行中
    COMPLETED,      // 已完成
    CANCELLED       // 已取消
}

  TodoListInterceptor 双阶段流程:

  beforeModel(request, context):
  ┌────────────────────────────────────────────────────────────┐
  │ 获取当前 TodoList                                          │
  │   │                                                        │
  │   ▼                                                        │
  │ 格式化为文本:                                              │
  │ ┌──────────────────────────────────────────┐               │
  │ │ [当前任务列表]                            │               │
  │ │ #1 [COMPLETED] 分析需求文档               │               │
  │ │ #2 [IN_PROGRESS] 设计数据库表结构         │               │
  │ │ #3 [PENDING] 实现用户登录接口             │               │
  │ │                                          │               │
  │ │ 你可以使用以下指令管理任务:               │               │
  │ │ [TODO_ADD: title=xxx]                    │               │
  │ │ [TODO_UPDATE: id=N, status=XXX]          │               │
  │ └──────────────────────────────────────────┘               │
  │   │                                                        │
  │   ▼                                                        │
  │ 注入到 system prompt 末尾                                  │
  └────────────────────────────────────────────────────────────┘

  afterModel(response, context):
  ┌────────────────────────────────────────────────────────────┐
  │ 正则匹配响应中的指令:                                      │
  │ Pattern: \[TODO_(ADD|UPDATE):(.+?)\]                       │
  │   │                                                        │
  │   ▼                                                        │
  │ TODO_ADD → 解析 title, 创建新 Todo(PENDING)                │
  │ TODO_UPDATE → 解析 id + status                             │
  │   ├── status 合法(enum匹配) → 更新状态                    │
  │   ├── status 非法 → catch IllegalArgumentException, warn   │
  │   └── id 无效 → 记录 warn 日志, 跳过                      │
  │   │                                                        │
  │   ▼                                                        │
  │ 从响应文本中移除已解析的指令（清洁输出）                   │
  └────────────────────────────────────────────────────────────┘

# application.yml 配置示例
agent:
  interceptor:
    todo-list:
      enabled: true
      # 无需额外配置，自动管理 TodoList
      # TodoList 存储在 InterceptorContext 中，随会话生命周期

八、InterceptorChainManager — 统一管理

核心职责：统一注册、启停、诊断所有拦截器，提供运行时动态管理能力，无需重启应用。

ManagedInterceptor 包装

每个拦截器被 ManagedInterceptor 包装，内含：

字段	类型	说明
interceptor	Object	原始拦截器实例
enabled	AtomicBoolean	启停开关，CAS 操作保证线程安全
invocationCount	AtomicLong	调用次数统计
totalDurationNanos	AtomicLong	累计耗时统计
lastError	volatile String	最后一次错误信息

线程安全：CopyOnWriteArrayList

  InterceptorChainManager 内部结构:

  ┌─────────────────────────────────────────────────────────┐
  │  CopyOnWriteArrayList<ManagedInterceptor>               │
  │  ┌─────────┬─────────┬─────────┬─────────┬──────────┐  │
  │  │ Fallback│ Caching │ Retry   │ Context │ TodoList │  │
  │  │ enabled │ enabled │ enabled │ enabled │ disabled │  │
  │  └─────────┴─────────┴─────────┴─────────┴──────────┘  │
  │                                                         │
  │  读操作（高频）: buildChain() 遍历列表                   │
  │    → 无锁，直接读取当前数组快照                          │
  │                                                         │
  │  写操作（低频）: register() / enable() / disable()       │
  │    → 写时复制整个数组，更新引用                          │
  └─────────────────────────────────────────────────────────┘

buildChain() vs buildChainWithOnly()

方法	行为	用途
`buildChain()`	过滤 `enabled=true` 的拦截器，按 order 排序构建链	正常请求执行
`buildChainWithOnly(names)`	仅包含指定名称的拦截器构建链	调试/测试场景，隔离特定拦截器

监控 API

GET /api/debug/interceptor-stats

{
  "interceptors": [
    {
      "name": "ModelFallbackInterceptor",
      "order": 10,
      "type": "ModelInterceptor",
      "enabled": true,
      "invocationCount": 1523,
      "avgDurationMs": 2.3,
      "lastError": null
    },
    ...
  ],
  "orderConflicts": [],       // detectOrderConflicts() 结果
  "chainBuildCount": 8921,    // 链构建次数
  "lastBuildTimeMs": 0.8      // 最近一次构建耗时
}

POST /api/debug/interceptor/{name}/enable    // 动态启用
POST /api/debug/interceptor/{name}/disable   // 动态禁用
GET  /api/debug/interceptor/diagnostics      // 完整诊断信息

九、完整执行顺序

所有拦截器按 order 排序后的完整执行链（before 正序，after 逆序）：

Order	拦截器名称	类型	核心功能
10	ModelFallbackInterceptor	Model	模型降级 — 主模型失败时切换备用模型
20	CachingToolInterceptor	Tool	工具缓存 — 相同参数调用结果复用
20	LargeResultEvictionInterceptor	Tool	大结果驱逐 — 超大工具结果写文件替换为摘要
30	RetryToolInterceptor	Tool	工具重试 — 指数退避协作式重试
30	ContextEditingInterceptor	Model	上下文编辑 — 对话压缩/摘要/滑动窗口
60	SubAgentInterceptor	Model	子Agent委派 — 关键词匹配后注入委派提示
70	ToolSelectionInterceptor	Model	工具筛选 — 评分排序保留 Top-N 工具
80	TodoListInterceptor	Model	任务追踪 — 注入任务列表 + 解析管理指令
100	LoggingModelInterceptor	Model	日志记录 — 请求/响应审计日志

注意：Model 拦截器和 Tool 拦截器在不同链路上独立执行。同一 order 值的不同类型拦截器（如 order=20 的 Caching 和 LargeResultEviction，order=30 的 Retry 和 ContextEditing）不会冲突，因为它们分属 ModelInterceptor 链和 ToolInterceptor 链。同一类型内应避免 order 相同。

十、面试高频问题（续）

Q1: 洋葱模型的优缺点是什么？为什么选择洋葱模型而不是简单的链式调用？
A: 优点：before 正序 + after 逆序形成对称结构，每层拦截器能看到前后完整上下文（类似 try-finally），适合需要在响应阶段做后处理的场景（如 ModelFallback 在 after 阶段捕获失败并重试）。缺点：调试复杂度增加（需要理解双向流动）、栈深度增加（N 个拦截器 = 2N 层调用）。选择原因：很多拦截器需要"看到响应后再决定"，如降级、重试、缓存写入，简单的链式调用无法实现这种对称的 before/after 语义。

Q2: 如何实现拦截器的动态启停？不重启应用如何生效？
A: InterceptorChainManager 使用 ManagedInterceptor 包装每个拦截器，内含 AtomicBoolean enabled 状态。buildChain() 构建执行链时过滤 enabled=false 的拦截器。通过 REST API POST /api/debug/interceptor/{name}/enable|disable 触发启停，CopyOnWriteArrayList 保证并发安全，下次 buildChain 即刻生效，无需重启。

Q3: CopyOnWriteArrayList 在拦截器场景的取舍是什么？
A: 拦截器列表读多写少（高频 buildChain 遍历 vs 低频 register/enable 操作），COW 非常适合。代价：写时复制整个数组（O(n)内存+拷贝），注册时有锁竞争。替代方案：ReadWriteLock + ArrayList，但 COW 更简单且读操作完全无锁，不存在读锁开销。考虑到拦截器数量通常 <20，写时拷贝代价可忽略。

Q4: ModelFallbackInterceptor 如何防止递归降级？
A: 通过 CTX_IS_FALLBACK 上下文标记。进入降级前设为 true，退出后 remove。afterModel() 检查该标记，若为 true 则直接返回不再尝试降级。这避免了"备用模型也失败 → 再次触发降级 → 无限递归"的问题。

Q5: ToolSelectionInterceptor 的关键词评分为什么不用 LLM？
A: 性能考虑：每次 beforeModel 都会触发，LLM 调用 100ms+ 级别延迟会在每次请求中累积。关键词匹配 <1ms 完成。这是精确度 vs 延迟的权衡——工具筛选只需"大致相关"即可，LLM 最终会从筛选后的工具列表中做最终选择。可扩展：未来可加 embedding 相似度作为补充评分策略，但仍应避免实时 LLM 调用。

Q6: LargeResultEviction 写文件失败时的降级策略是什么？
A: IOException 时不阻断主流程。降级为截断：保留 preserveSampleChars × 4 字符 + 截断标记。设计原则：优先保证 Agent 能继续执行而非数据完整性。完整数据丢失可接受（工具可重新调用），Agent 中断不可接受。

Q7: ContextEditingInterceptor order=30 在较前位置，这样设计的原因？
A: 上下文压缩必须在 SubAgent(60)、ToolSelection(70)、TodoList(80) 之前执行。原因：后续拦截器会向 system prompt 注入内容（委派提示、任务列表等）。如果上下文已经接近 Token 上限，注入后会超限。先压缩保证后续拦截器有足够的 Token 空间注入内容。

Q8: 如果两个拦截器 order 相同会怎样？
A: InterceptorChainManager.detectOrderConflicts() 会检测同类型拦截器的 order 冲突并通过诊断 API 报告。执行时按注册顺序（不确定性），可能导致行为不一致。最佳实践：避免同类型拦截器使用相同 order，通过 getDiagnostics() API 定期监控冲突。

Q9: TodoListInterceptor 的正则解析如何处理格式错误的指令？
A: 宽容处理原则。Pattern 不匹配则忽略（不报错），status 解析用 try-catch 包裹 IllegalArgumentException（枚举不匹配），无效 id 记录 warn 日志后跳过。设计原则：LLM 输出不可靠，容错优先，不能因为 LLM 格式错误导致 Agent 崩溃。

Q10: 拦截器的监控统计（AtomicLong）在高并发下会成为瓶颈吗？
A: AtomicLong 使用 CAS 无锁操作，在一般并发下不会成为瓶颈（单次 CAS 约 10ns）。极端竞争下替代方案：LongAdder（分段累加，写多读少更优）。当前选择 AtomicLong 因为读操作（getStats）也频繁，LongAdder.sum() 需要遍历所有段，读开销更大。