07. Web Search + Local Knowledge Base (RAG)

Learning Objectives

After completing this tutorial, you will be able to:

Understand how the web.search tool passes through the HITL approval gate
Build a local knowledge base using euleragent rag init/add/status/query
Ingest only specific file types using glob patterns
Apply web search security settings (domain whitelist)
Use web search and local KB together in the same run
Practical scenario: combine internal documents and web information for writing a technical blog post
Configure MCP search providers and manage them with euleragent mcp sync/show
Understand the SearchRouter's automatic routing concept and the search_routing.jsonl artifact
Ensure reproducibility of search source configurations using MCP catalog snapshots
Specify per-run search source sets with the --source-set option

Prerequisites

Workspace initialization completed (euleragent init)
Agent created:

euleragent new blog-writer --template marketing-expert

Ollama running (or default_llm_profile: fake):

ollama serve
ollama pull nomic-embed-text   # RAG 임베딩 모델

What is RAG (Retrieval-Augmented Generation)?

RAG is a technique where the LLM retrieves relevant documents and injects them as context before generating a response. euleragent supports two retrieval sources:

Source	Command	Characteristics
Local Knowledge Base	`rag init/add/query`	Internal documents, offline, fast
Web Search	`web.search` (HITL gate)	Latest information, approval required

HITL Security Gate for `web.search`

web.search sends data to external networks, so it always requires human approval. Let's first understand this process.

Why `web.search` is risky

Data leakage risk: Internal information included in queries is transmitted to search engines
Cost: Search APIs incur charges based on usage
Unexpected results: Search results can influence the agent's output

Why execution is impossible without approval

web.search is included in the require_approval list in tools.yaml:

# .euleragent/agents/blog-writer/tools.yaml
require_approval:
  - web.search
  - web.fetch
  - file.write
  - shell.exec

Because of this setting, even if the agent proposes web.search, it will not execute without human acceptance.

MCP-Based Web Search Setup (Recommended)

Important: The rag.web_search configuration is deprecated. For new projects, use the MCP server approach.

euleragent recommends MCP (Model Context Protocol)-based search providers. When an MCP server is configured, MCP-first routing is applied. Only when MCP is not configured does it fall back to the legacy rag.web_search (with a one-time DeprecationWarning).

Quick Start: Demo MCP Web Search Server

You can get started immediately using the demo server included in the euleragent repository:

# 1. 데모 MCP web search 서버 실행 (별도 터미널)
python examples/mcp/web_search_server.py
# MCP web search demo server listening on 0.0.0.0:9021
#   Provider: fake
#   Endpoint: POST http://0.0.0.0:9021/mcp

# 2. workspace.yaml에서 MCP 서버 주석 해제
# .euleragent/config/workspace.yaml 편집:
#   mcp:
#     enabled: true
#     servers:
#       - id: local_web_search
#         url: http://localhost:9021/mcp
#         allow_tools: [web.search, web.fetch]
#         ...

# 3. 카탈로그 동기화
euleragent mcp sync
euleragent mcp show

`workspace.yaml` MCP Configuration Example

# .euleragent/config/workspace.yaml

mcp:
  enabled: true
  servers:
    - id: local_web_search
      url: http://localhost:9021/mcp
      allow_tools: [web.search, web.fetch]
      cost_tier: free
      risk_default: high
      require_approval_enable: true
      timeout_seconds: 30
      config:
        provider: fake           # fake | brave | tavily | duckduckgo
        api_key: ""              # set for brave/tavily
        max_results: 10

MCP-First Routing: If web.search is included in a MCP server's allow_tools, it takes priority over the existing rag.web_search configuration.

Legacy Configuration (deprecated)

The configuration below is maintained for backward compatibility and is automatically used as a fallback when MCP is not configured:

# .euleragent/config/workspace.yaml

mcp:
  enabled: true
  catalog_path: .euleragent/state/mcp_catalog.json
  sources:
    - name: tavily
      type: web_search
      provider: tavily
      api_key: tvly-YOUR_API_KEY
      priority: 1
    - name: brave
      type: web_search
      provider: brave
      api_key: BSA-YOUR_API_KEY
      priority: 2
    - name: local_kb
      type: rag
      project: my-project
      priority: 0              # 가장 높은 우선순위 (로컬 우선)

  source_sets:
    default: [local_kb, tavily]
    academic: [local_kb, tavily]
    news: [brave, tavily]
    offline: [local_kb]

MCP Catalog Sync and Verification

After configuration, sync the catalog and verify the status:

# 카탈로그 동기화 — 각 소스의 연결 상태를 확인하고 스냅샷 생성
euleragent mcp sync

Expected output:

MCP catalog sync completed.
  Sources: 3
    [OK]   local_kb  (rag, 7 chunks)
    [OK]   tavily    (web_search, connected)
    [OK]   brave     (web_search, connected)
  Source sets: 4 (default, academic, news, offline)
  Snapshot: .euleragent/state/mcp_catalog_snapshot.json

# 현재 카탈로그 상태 확인
euleragent mcp show

Expected output:

MCP Catalog
  Sources:
    NAME       TYPE         PROVIDER   STATUS    PRIORITY
    local_kb   rag          —          active    0
    tavily     web_search   tavily     active    1
    brave      web_search   brave      active    2

  Source Sets:
    default:   local_kb, tavily
    academic:  local_kb, tavily
    news:      brave, tavily
    offline:   local_kb

Step-by-Step Hands-On

Step 1: Initialize the RAG Knowledge Base

Initialize a per-project knowledge base:

euleragent rag init --project my-project

Expected output:

RAG knowledge base initialized.
  Project:  my-project
  DB:       .euleragent/projects/my-project/rag.db
  Chunks:   0
  Docs:     0

If the project does not exist, it is automatically created. The knowledge base is stored as a SQLite DB inside the project directory.

Step 2: Ingest Documents -- Entire Directory

Add local documents to the knowledge base. Various formats are supported, including Markdown, text, and PDF:

# 예시 문서 디렉토리 생성
mkdir -p ./docs/product
cat > ./docs/product/overview.md << 'EOF'
# FastBridge 제품 개요

FastBridge는 FastAPI 기반 고성능 API 게이트웨이입니다.

## 핵심 기능
- 요청 라우팅 및 로드 밸런싱
- JWT 기반 인증/인가
- 레이트 리미팅
- 실시간 모니터링 대시보드
EOF

cat > ./docs/product/tech_stack.md << 'EOF'
# FastBridge 기술 스택

## 백엔드
- Python 3.11
- FastAPI 0.110
- SQLAlchemy 2.0
- Redis (캐싱)

## 인프라
- Docker + Kubernetes
- GitHub Actions (CI/CD)
EOF

Ingest the documents:

euleragent rag add --project my-project --path ./docs/

Expected output:

Ingesting documents from ./docs/ ...

  [OK] ./docs/product/overview.md    → 4 chunks
  [OK] ./docs/product/tech_stack.md  → 3 chunks

Ingested 2 documents, 7 chunks total.
  Project: my-project
  Total chunks in KB: 7

Step 3: Ingest Specific Formats Only -- Glob Patterns

Use the --glob option to selectively ingest only specific file types:

# 마크다운 파일만
euleragent rag add \
  --project my-project \
  --path ./docs/ \
  --glob "*.md"

# 텍스트 파일만
euleragent rag add \
  --project my-project \
  --path ./docs/ \
  --glob "*.txt"

# 특정 하위 디렉토리만
euleragent rag add \
  --project my-project \
  --path ./docs/ \
  --glob "product/*.md"

Expected output:

Ingesting documents matching "*.md" from ./docs/ ...

  [OK] overview.md    → 4 chunks
  [OK] tech_stack.md  → 3 chunks
  [skip] data.csv     ← glob 패턴 불일치

Ingested 2 documents, 7 chunks total.

Step 4: Check Knowledge Base Status

euleragent rag status --project my-project

Expected output:

RAG Knowledge Base Status
  Project:    my-project
  DB:         .euleragent/projects/my-project/rag.db
  Documents:  2
  Chunks:     7
  Embedding:  ollama/nomic-embed-text
  Last added: 2026-02-23 10:30:00

Document list:
  [1] docs/product/overview.md    4 chunks  2026-02-23 10:30:00
  [2] docs/product/tech_stack.md  3 chunks  2026-02-23 10:30:00

Step 5: Query the Knowledge Base Directly

Test RAG retrieval directly without going through the agent:

euleragent rag query \
  --project my-project \
  --q "FastBridge의 인증 방식은?" \
  --top-k 3

Expected output:

Query: "FastBridge의 인증 방식은?"
Top-3 results:

[1] score=0.8923
    Source: docs/product/overview.md
    Chunk:  "JWT 기반 인증/인가"

[2] score=0.7234
    Source: docs/product/overview.md
    Chunk:  "FastBridge는 FastAPI 기반 고성능 API 게이트웨이입니다. 핵심 기능: 요청 라우팅..."

[3] score=0.6891
    Source: docs/product/tech_stack.md
    Chunk:  "Python 3.11, FastAPI 0.110..."

Try another query:

euleragent rag query \
  --project my-project \
  --q "Python version" \
  --top-k 2

Step 6: Web Search Security Settings -- Domain Whitelist

You can restrict allowed domains in workspace.yaml:

cat .euleragent/config/workspace.yaml

tools_policy:
  network:
    default: deny                    # 기본 외부 통신 차단
    require_approval: true           # 모든 외부 통신 승인 필요
    allow_domains:                   # 허용 도메인 화이트리스트
      - arxiv.org
      - github.com
      - docs.python.org
      - fastapi.tiangolo.com

rag:
  web_search:
    provider: fake                   # 테스트용 (실제: tavily | brave | duckduckgo)
    require_approval: true
    api_key: ""                      # 실제 프로바이더 사용 시 API 키

How the domain whitelist works:

Currently, allow_domains is recorded as metadata that humans reference during approval. When approving, you manually verify whether the query or URL belongs to a whitelisted domain and then accept or reject accordingly.

Approval flow for MCP search providers:

When using MCP, only searches that the SearchRouter routes to external sources are subject to approval. The approval information displays the routed target source:

euleragent approve show apv_xxx

{
  "tool_name": "web.search",
  "params": {"query": "API gateway comparison 2025"},
  "side_effects": ["external_network"],
  "mcp_source": "tavily",
  "source_set": "default",
  "allowed_domains": ["fastapi.tiangolo.com", "docs.python.org"]
}

Searches routed to the local KB do not use external networks, so they do not appear in the approval queue.

# 승인 시 도메인 확인
euleragent approve show apv_xxx

{
  "tool_name": "web.search",
  "params": {"query": "FastAPI authentication JWT 2025"},
  "side_effects": ["external_network"],
  "allowed_domains": ["fastapi.tiangolo.com", "docs.python.org"]
}

Setting up API key-based web search for production use:

# workspace.yaml — 실제 Tavily 웹 검색 사용 시
rag:
  web_search:
    provider: tavily
    api_key: tvly-YOUR_TAVILY_API_KEY
    require_approval: true
    max_results: 5

Step 7: Run Agent Using Local KB Only

First, write a blog post using only the local KB:

euleragent run blog-writer \
  --task "FastBridge API 게이트웨이 소개 블로그 글을 작성해줘. 기술 스택과 핵심 기능을 포함해야 해. result.md에 저장해줘." \
  --project my-project \
  --mode plan

Expected output:

Run b2c3d4e5f6a1 started
  Project: my-project
  RAG: my-project KB active (7 chunks)

[loop 1/5] Retrieving from local KB...
  RAG hit: "FastBridge는 FastAPI 기반..." (score: 0.89)
  RAG hit: "JWT 기반 인증/인가..." (score: 0.83)
  RAG hit: "Python 3.11, FastAPI 0.110..." (score: 0.78)

[loop 2/5] Generating blog post...
  → Proposed: file.write (path: result.md)

Run b2c3d4e5f6a1 completed (state: PENDING_APPROVAL)
1 approval(s) pending.

After approval, execute:

euleragent approve accept-all --run-id b2c3d4e5f6a1 --actor "user:you" --execute

cat result.md

Step 8: Combining Web Search + Local KB

This is the most powerful pattern. It uses internal documents and the latest web information together:

euleragent run blog-writer \
  --task "FastBridge API 게이트웨이 vs 최신 오픈소스 API 게이트웨이 비교 블로그를 작성해줘. FastBridge 내부 문서를 기반으로 하고, 경쟁사 정보는 웹에서 검색해서 포함해줘. result.md에 저장해줘." \
  --project my-project \
  --mode plan \
  --max-loops 4

Expected output:

Run c3d4e5f6a1b2 started
  Project: my-project
  RAG: my-project KB active (7 chunks)

[loop 1/4] Retrieving from local KB...
  RAG hits: 3 chunks retrieved from my-project KB

[loop 2/4] Planning web searches for competitor info...
  → Proposed: web.search (query: "open source API gateway comparison 2025")
  → Proposed: web.search (query: "Kong APISIX Traefik features comparison")

[loop 3/4] Planning output...
  → Proposed: file.write (path: result.md)

Run c3d4e5f6a1b2 completed (state: PENDING_APPROVAL)
3 approval(s) pending.
  [2x] web.search
  [1x] file.write

Approve web searches first and execute:

euleragent approve accept-all \
  --run-id c3d4e5f6a1b2 \
  --tool web.search \
  --actor "user:you" \
  --execute

Expected output:

Accepted and executed 2 web.search approval(s).
  [OK] "open source API gateway comparison 2025" → 5 results
  [OK] "Kong APISIX Traefik features comparison" → 4 results

Approve file write:

euleragent approve accept-all \
  --run-id c3d4e5f6a1b2 \
  --tool file.write \
  --actor "user:you" \
  --execute

Verify the result:

cat result.md
# FastBridge vs 오픈소스 API 게이트웨이 비교
# (내부 문서 + 웹 검색 결과 결합)

Step 9: Verify Citations

When RAG is active during a run, citations.json is generated:

cat .euleragent/runs/c3d4e5f6a1b2/artifacts/citations.json

Expected output:

[
  {
    "citation_id": "cit_001",
    "doc_id": "doc_overview",
    "chunk_id": "chunk_003",
    "content": "JWT 기반 인증/인가",
    "source": "docs/product/overview.md",
    "score": 0.8923
  },
  {
    "citation_id": "cit_002",
    "doc_id": "doc_tech",
    "chunk_id": "chunk_001",
    "content": "Python 3.11, FastAPI 0.110",
    "source": "docs/product/tech_stack.md",
    "score": 0.7891
  }
]

View the full RAG context:

cat .euleragent/runs/c3d4e5f6a1b2/artifacts/rag_context.json

External transmission audit log:

cat .euleragent/runs/c3d4e5f6a1b2/external_transmission.jsonl

{"tool": "web.search", "query": "open source API gateway comparison 2025", "timestamp": "2026-02-23T10:30:15Z", "approval_id": "apv_w1"}
{"tool": "web.search", "query": "Kong APISIX Traefik features comparison", "timestamp": "2026-02-23T10:30:17Z", "approval_id": "apv_w2"}

Step 10: Understanding Search Routing

When MCP search providers are configured, the SearchRouter automatically routes the agent's search requests to the appropriate source. The agent only calls a single web.search tool, but internally the SearchRouter selects the optimal source -- such as local KB, Tavily, Brave, etc. -- based on query characteristics.

How SearchRouter works:

에이전트 → web.search("FastBridge 인증 방식")
             │
             ▼
        SearchRouter
             ├─ 쿼리 분석: 내부 제품명 포함 → 로컬 KB 우선
             ├─ source_set: default → [local_kb, tavily]
             └─ 라우팅 결정: local_kb (priority 0)
             │
             ▼
        로컬 KB에서 검색 → 결과 반환

에이전트 → web.search("API gateway market trends 2025")
             │
             ▼
        SearchRouter
             ├─ 쿼리 분석: 외부 시장 정보 → 웹 검색 필요
             ├─ source_set: default → [local_kb, tavily]
             └─ 라우팅 결정: tavily (로컬 KB 매치 없음)
             │
             ▼
        Tavily 웹 검색 → 결과 반환 (승인 필요)

Routing decisions for each run are recorded in search_routing.jsonl:

cat .euleragent/runs/c3d4e5f6a1b2/search_routing.jsonl

{"query": "FastBridge 인증 방식", "routed_to": "local_kb", "source_set": "default", "reason": "internal_doc_match", "approval_required": false, "timestamp": "2026-02-23T10:30:10Z"}
{"query": "open source API gateway comparison 2025", "routed_to": "tavily", "source_set": "default", "reason": "external_info_needed", "approval_required": true, "timestamp": "2026-02-23T10:30:15Z"}
{"query": "Kong APISIX Traefik features comparison", "routed_to": "brave", "source_set": "default", "reason": "tavily_fallback", "approval_required": true, "timestamp": "2026-02-23T10:30:17Z"}

Local KB routing does not require approval: Searches routed to the local KB by the SearchRouter do not use external networks, so HITL approval is not required. The approval gate is activated only when routing to external search sources.

Step 11: Specifying Search Sources with `--source-set`

You can explicitly specify the set of search sources to use for a run with the --source-set option:

# 기본 소스 셋 사용 (local_kb + tavily)
euleragent run blog-writer \
  --task "FastBridge 소개 블로그 작성" \
  --project my-project \
  --mode plan

# 오프라인 모드: 로컬 KB만 사용
euleragent run blog-writer \
  --task "FastBridge 내부 기술 문서 정리" \
  --project my-project \
  --source-set offline \
  --mode plan

# 뉴스 소스 사용: 최신 시장 동향 조사
euleragent run blog-writer \
  --task "API 게이트웨이 시장 최신 동향 조사" \
  --project my-project \
  --source-set news \
  --mode plan

If --source-set is not specified, the mcp.source_sets.default from workspace.yaml is used.

Step 12: MCP Catalog Snapshots

MCP catalog snapshots record the search source configuration at the time of execution to ensure reproducibility. A snapshot is automatically included in each run's artifacts.

cat .euleragent/runs/c3d4e5f6a1b2/artifacts/mcp_catalog_snapshot.json

{
  "snapshot_hash": "sha256:a3f8e2d1c4b5...",
  "created_at": "2026-02-23T10:30:00Z",
  "sources": [
    {
      "name": "local_kb",
      "type": "rag",
      "status": "active",
      "chunks": 7,
      "content_hash": "sha256:b7c9d1e2f3..."
    },
    {
      "name": "tavily",
      "type": "web_search",
      "provider": "tavily",
      "status": "active"
    },
    {
      "name": "brave",
      "type": "web_search",
      "provider": "brave",
      "status": "active"
    }
  ],
  "source_set_used": "default",
  "resolved_sources": ["local_kb", "tavily"]
}

Role of the snapshot hash: - snapshot_hash is a hash of the entire catalog configuration - Same hash = guaranteed to have run with the same search source configuration - During audits, you can precisely track which sources were active at the time of execution - The local KB's content_hash also tracks changes to the knowledge base contents

# 두 실행의 카탈로그 스냅샷 비교
diff <(jq '.snapshot_hash' .euleragent/runs/RUN1/artifacts/mcp_catalog_snapshot.json) \
     <(jq '.snapshot_hash' .euleragent/runs/RUN2/artifacts/mcp_catalog_snapshot.json)

Step 13: Full Flow Diagram

euleragent run blog-writer --project my-project --source-set default --mode plan
            │
            ▼
    [컨텍스트 구성]
    system prompt + memory snapshot + MCP 카탈로그 스냅샷
            │
            ▼
    [로컬 KB RAG 검색]
    euleragent rag query --project my-project
    → 관련 내부 문서 청크 추출
            │
            ▼
    [LLM 루프]
    ├─ 로컬 KB 결과를 컨텍스트에 주입
    ├─ web.search 제안 (외부 정보 필요 시)
    └─ file.write 제안 (최종 결과물)
            │
            ▼
    [SearchRouter — MCP 라우팅]
    ├─ 쿼리 분석 → 소스 선택 (source_set 기준)
    ├─ 로컬 KB 매치 → 즉시 반환 (승인 불필요)
    └─ 외부 소스 → 승인 큐로 전달
            │
            ▼
    [승인 큐]
    web.search (medium) → 사람이 쿼리 + MCP 소스 검토
    file.write (medium) → 사람이 내용 검토
            │
            ▼
    [수락 후 실행]
    → 웹 검색 결과 컨텍스트 추가
    → 최종 블로그 포스트 생성
            │
            ▼
    artifacts/
    ├── result.md                   ← 최종 블로그 글
    ├── citations.json              ← 로컬 KB 인용
    ├── rag_context.json            ← RAG 컨텍스트 전체
    ├── search_routing.jsonl        ← 검색 라우팅 결정 기록
    ├── mcp_catalog_snapshot.json   ← MCP 카탈로그 스냅샷
    └── external_transmission.jsonl ← 외부 전송 감사 로그

`workspace.yaml` RAG and MCP Configuration Reference

# .euleragent/config/workspace.yaml

# LLM 프로바이더
llm_profiles:
  local:
    provider: ollama
    base_url: http://localhost:11434
    model: qwen3:32b
    timeout_seconds: 120
    keep_alive: 5m
    is_external: false
  openai:
    provider: openai
    api_key: ''
    model: gpt-4o-mini
    base_url: https://api.openai.com/v1
    is_external: true
default_llm_profile: local

# 메모리 (장기기억)
memory:
  enabled: true
  store: sqlite
  sqlite_path: .euleragent/state/memory.db
  embedding:
    provider: ollama            # RAG 임베딩용
    model: nomic-embed-text
  retrieval:
    top_k: 8
    min_score: 0.2

# 도구 보안 정책
tools_policy:
  network:
    default: deny
    require_approval: true
    allow_domains:
      - arxiv.org
      - github.com
      - docs.python.org
      - huggingface.co

# RAG 설정
rag:
  enabled: true
  chunk_size: 512             # 청크 크기 (토큰)
  chunk_overlap: 64           # 청크 간 겹침
  embedding:
    provider: ollama
    model: nomic-embed-text
  web_search:
    provider: fake            # fake | tavily | brave | duckduckgo
    api_key: ""               # 실제 프로바이더 API 키
    require_approval: true    # 항상 true 권장
    max_results: 5

# MCP 검색 프로바이더 설정 (선택 사항)
mcp:
  enabled: true
  catalog_path: .euleragent/state/mcp_catalog.json
  sources:
    - name: local_kb
      type: rag
      project: my-project
      priority: 0
    - name: tavily
      type: web_search
      provider: tavily
      api_key: tvly-YOUR_API_KEY
      priority: 1
    - name: brave
      type: web_search
      provider: brave
      api_key: BSA-YOUR_API_KEY
      priority: 2
  source_sets:
    default: [local_kb, tavily]
    academic: [local_kb, tavily]
    news: [brave, tavily]
    offline: [local_kb]

Expected Output Summary

# RAG 초기화
$ euleragent rag init --project my-project
RAG knowledge base initialized. Chunks: 0, Docs: 0

# 문서 추가
$ euleragent rag add --project my-project --path ./docs/
Ingested 2 documents, 7 chunks total.

# 상태 확인
$ euleragent rag status --project my-project
Documents: 2, Chunks: 7

# 직접 쿼리
$ euleragent rag query --project my-project --q "인증 방식"
[1] score=0.8923  Source: overview.md  "JWT 기반 인증/인가"

# 에이전트 실행 (로컬 KB + 웹 검색)
Run c3d4e5f6... started
  RAG hits: 3 chunks from my-project KB
  Proposed: web.search × 2, file.write × 1
3 approval(s) pending.

Frequently Asked Questions / Common Errors

Q: No results are returned from rag query after rag add.

Verify that the embedding model is running. If configured with provider: ollama, the nomic-embed-text model is required:

ollama pull nomic-embed-text
ollama list  # 모델 목록 확인

Setting provider: fake allows offline operation, but semantic-based search will be limited.

Q: If I add a document that was already added, will duplicates be created?

If the same file path and content already exist, they are automatically skipped. If the file has been modified, existing chunks are deleted and re-ingested:

# 수정된 파일 재인제스트
euleragent rag add --project my-project --path ./docs/product/overview.md
# [update] overview.md → 4 chunks (replaced existing)

Q: Does the allow_domains setting actually block web requests?

In the current version, allow_domains is recorded as metadata, and actual domain filtering is performed by humans during the approval stage. Programmatic blocking is planned for a future version. For now, review URLs/queries manually during approval to verify whitelist compliance.

Q: Can I ingest local PDF files?

Currently supported formats are Markdown (.md), text (.txt), and plain text. Convert PDFs to text before ingesting:

# pdftotext 사용 (poppler-utils 설치 필요)
pdftotext document.pdf document.txt
euleragent rag add --project my-project --path ./document.txt

Q: How can I share RAG across multiple projects?

Currently, RAG is independent per project. To add common documents to multiple projects, you must ingest them individually into each project:

euleragent rag add --project project-a --path ./shared-docs/
euleragent rag add --project project-b --path ./shared-docs/

Q: I want to run web.search automatically without approval.

This is not recommended for security reasons. However, if automation is needed in a trusted environment, remove web.search from the require_approval list in tools.yaml:

# tools.yaml (보안 위험 — 신중하게 사용)
require_approval:
  - file.write
  - file.delete
  - shell.exec
  # web.search 제거 → 자동 실행

In this case, all external transmissions will occur automatically, so manage allow_domains and the external_transmission.jsonl audit log thoroughly.

Common Mistakes (Order Errors)

Symptom	Cause	Recovery
`Error: No knowledge base found for project 'X'.`	`rag init` not executed	`euleragent rag init --project X`
`No results found.`	No documents added to KB	`euleragent rag add --project X --path docs/`

Next step: 08_mcp_provider_and_tools.md -- Learn in depth how to configure MCP providers and manage various external search sources in an integrated manner.

07. Web Search + Local Knowledge Base (RAG)

Learning Objectives

Prerequisites

What is RAG (Retrieval-Augmented Generation)?

HITL Security Gate for web.search

Why web.search is risky

Why execution is impossible without approval

MCP-Based Web Search Setup (Recommended)

Quick Start: Demo MCP Web Search Server

workspace.yaml MCP Configuration Example

Legacy Configuration (deprecated)

MCP Catalog Sync and Verification

Step-by-Step Hands-On

Step 1: Initialize the RAG Knowledge Base

Step 2: Ingest Documents -- Entire Directory

Step 3: Ingest Specific Formats Only -- Glob Patterns

Step 4: Check Knowledge Base Status

Step 5: Query the Knowledge Base Directly

Step 6: Web Search Security Settings -- Domain Whitelist

Step 7: Run Agent Using Local KB Only

Step 8: Combining Web Search + Local KB

Step 9: Verify Citations

Step 10: Understanding Search Routing

Step 11: Specifying Search Sources with --source-set

Step 12: MCP Catalog Snapshots

Step 13: Full Flow Diagram

workspace.yaml RAG and MCP Configuration Reference

Expected Output Summary

Frequently Asked Questions / Common Errors

Common Mistakes (Order Errors)

HITL Security Gate for `web.search`

Why `web.search` is risky

`workspace.yaml` MCP Configuration Example

Step 11: Specifying Search Sources with `--source-set`

`workspace.yaml` RAG and MCP Configuration Reference