07. Web Search + Local Knowledge Base (RAG)
Learning Objectives
After completing this tutorial, you will be able to:
- Understand how the
web.searchtool passes through the HITL approval gate - Build a local knowledge base using
euleragent rag init/add/status/query - Ingest only specific file types using glob patterns
- Apply web search security settings (domain whitelist)
- Use web search and local KB together in the same run
- Practical scenario: combine internal documents and web information for writing a technical blog post
- Configure MCP search providers and manage them with
euleragent mcp sync/show - Understand the SearchRouter's automatic routing concept and the
search_routing.jsonlartifact - Ensure reproducibility of search source configurations using MCP catalog snapshots
- Specify per-run search source sets with the
--source-setoption
Prerequisites
- Workspace initialization completed (
euleragent init) - Agent created:
euleragent new blog-writer --template marketing-expert
- Ollama running (or
default_llm_profile: fake):
ollama serve
ollama pull nomic-embed-text # RAG 임베딩 모델
What is RAG (Retrieval-Augmented Generation)?
RAG is a technique where the LLM retrieves relevant documents and injects them as context before generating a response. euleragent supports two retrieval sources:
| Source | Command | Characteristics |
|---|---|---|
| Local Knowledge Base | rag init/add/query |
Internal documents, offline, fast |
| Web Search | web.search (HITL gate) |
Latest information, approval required |
HITL Security Gate for web.search
web.search sends data to external networks, so it always requires human approval. Let's first understand this process.
Why web.search is risky
- Data leakage risk: Internal information included in queries is transmitted to search engines
- Cost: Search APIs incur charges based on usage
- Unexpected results: Search results can influence the agent's output
Why execution is impossible without approval
web.search is included in the require_approval list in tools.yaml:
# .euleragent/agents/blog-writer/tools.yaml
require_approval:
- web.search
- web.fetch
- file.write
- shell.exec
Because of this setting, even if the agent proposes web.search, it will not execute without human acceptance.
MCP-Based Web Search Setup (Recommended)
Important: The
rag.web_searchconfiguration is deprecated. For new projects, use the MCP server approach.
euleragent recommends MCP (Model Context Protocol)-based search providers. When an MCP server is configured, MCP-first routing is applied. Only when MCP is not configured does it fall back to the legacy rag.web_search (with a one-time DeprecationWarning).
Quick Start: Demo MCP Web Search Server
You can get started immediately using the demo server included in the euleragent repository:
# 1. 데모 MCP web search 서버 실행 (별도 터미널)
python examples/mcp/web_search_server.py
# MCP web search demo server listening on 0.0.0.0:9021
# Provider: fake
# Endpoint: POST http://0.0.0.0:9021/mcp
# 2. workspace.yaml에서 MCP 서버 주석 해제
# .euleragent/config/workspace.yaml 편집:
# mcp:
# enabled: true
# servers:
# - id: local_web_search
# url: http://localhost:9021/mcp
# allow_tools: [web.search, web.fetch]
# ...
# 3. 카탈로그 동기화
euleragent mcp sync
euleragent mcp show
workspace.yaml MCP Configuration Example
# .euleragent/config/workspace.yaml
mcp:
enabled: true
servers:
- id: local_web_search
url: http://localhost:9021/mcp
allow_tools: [web.search, web.fetch]
cost_tier: free
risk_default: high
require_approval_enable: true
timeout_seconds: 30
config:
provider: fake # fake | brave | tavily | duckduckgo
api_key: "" # set for brave/tavily
max_results: 10
MCP-First Routing: If
web.searchis included in a MCP server'sallow_tools, it takes priority over the existingrag.web_searchconfiguration.
Legacy Configuration (deprecated)
The configuration below is maintained for backward compatibility and is automatically used as a fallback when MCP is not configured:
# .euleragent/config/workspace.yaml
mcp:
enabled: true
catalog_path: .euleragent/state/mcp_catalog.json
sources:
- name: tavily
type: web_search
provider: tavily
api_key: tvly-YOUR_API_KEY
priority: 1
- name: brave
type: web_search
provider: brave
api_key: BSA-YOUR_API_KEY
priority: 2
- name: local_kb
type: rag
project: my-project
priority: 0 # 가장 높은 우선순위 (로컬 우선)
source_sets:
default: [local_kb, tavily]
academic: [local_kb, tavily]
news: [brave, tavily]
offline: [local_kb]
MCP Catalog Sync and Verification
After configuration, sync the catalog and verify the status:
# 카탈로그 동기화 — 각 소스의 연결 상태를 확인하고 스냅샷 생성
euleragent mcp sync
Expected output:
MCP catalog sync completed.
Sources: 3
[OK] local_kb (rag, 7 chunks)
[OK] tavily (web_search, connected)
[OK] brave (web_search, connected)
Source sets: 4 (default, academic, news, offline)
Snapshot: .euleragent/state/mcp_catalog_snapshot.json
# 현재 카탈로그 상태 확인
euleragent mcp show
Expected output:
MCP Catalog
Sources:
NAME TYPE PROVIDER STATUS PRIORITY
local_kb rag — active 0
tavily web_search tavily active 1
brave web_search brave active 2
Source Sets:
default: local_kb, tavily
academic: local_kb, tavily
news: brave, tavily
offline: local_kb
Step-by-Step Hands-On
Step 1: Initialize the RAG Knowledge Base
Initialize a per-project knowledge base:
euleragent rag init --project my-project
Expected output:
RAG knowledge base initialized.
Project: my-project
DB: .euleragent/projects/my-project/rag.db
Chunks: 0
Docs: 0
If the project does not exist, it is automatically created. The knowledge base is stored as a SQLite DB inside the project directory.
Step 2: Ingest Documents -- Entire Directory
Add local documents to the knowledge base. Various formats are supported, including Markdown, text, and PDF:
# 예시 문서 디렉토리 생성
mkdir -p ./docs/product
cat > ./docs/product/overview.md << 'EOF'
# FastBridge 제품 개요
FastBridge는 FastAPI 기반 고성능 API 게이트웨이입니다.
## 핵심 기능
- 요청 라우팅 및 로드 밸런싱
- JWT 기반 인증/인가
- 레이트 리미팅
- 실시간 모니터링 대시보드
EOF
cat > ./docs/product/tech_stack.md << 'EOF'
# FastBridge 기술 스택
## 백엔드
- Python 3.11
- FastAPI 0.110
- SQLAlchemy 2.0
- Redis (캐싱)
## 인프라
- Docker + Kubernetes
- GitHub Actions (CI/CD)
EOF
Ingest the documents:
euleragent rag add --project my-project --path ./docs/
Expected output:
Ingesting documents from ./docs/ ...
[OK] ./docs/product/overview.md → 4 chunks
[OK] ./docs/product/tech_stack.md → 3 chunks
Ingested 2 documents, 7 chunks total.
Project: my-project
Total chunks in KB: 7
Step 3: Ingest Specific Formats Only -- Glob Patterns
Use the --glob option to selectively ingest only specific file types:
# 마크다운 파일만
euleragent rag add \
--project my-project \
--path ./docs/ \
--glob "*.md"
# 텍스트 파일만
euleragent rag add \
--project my-project \
--path ./docs/ \
--glob "*.txt"
# 특정 하위 디렉토리만
euleragent rag add \
--project my-project \
--path ./docs/ \
--glob "product/*.md"
Expected output:
Ingesting documents matching "*.md" from ./docs/ ...
[OK] overview.md → 4 chunks
[OK] tech_stack.md → 3 chunks
[skip] data.csv ← glob 패턴 불일치
Ingested 2 documents, 7 chunks total.
Step 4: Check Knowledge Base Status
euleragent rag status --project my-project
Expected output:
RAG Knowledge Base Status
Project: my-project
DB: .euleragent/projects/my-project/rag.db
Documents: 2
Chunks: 7
Embedding: ollama/nomic-embed-text
Last added: 2026-02-23 10:30:00
Document list:
[1] docs/product/overview.md 4 chunks 2026-02-23 10:30:00
[2] docs/product/tech_stack.md 3 chunks 2026-02-23 10:30:00
Step 5: Query the Knowledge Base Directly
Test RAG retrieval directly without going through the agent:
euleragent rag query \
--project my-project \
--q "FastBridge의 인증 방식은?" \
--top-k 3
Expected output:
Query: "FastBridge의 인증 방식은?"
Top-3 results:
[1] score=0.8923
Source: docs/product/overview.md
Chunk: "JWT 기반 인증/인가"
[2] score=0.7234
Source: docs/product/overview.md
Chunk: "FastBridge는 FastAPI 기반 고성능 API 게이트웨이입니다. 핵심 기능: 요청 라우팅..."
[3] score=0.6891
Source: docs/product/tech_stack.md
Chunk: "Python 3.11, FastAPI 0.110..."
Try another query:
euleragent rag query \
--project my-project \
--q "Python version" \
--top-k 2
Step 6: Web Search Security Settings -- Domain Whitelist
You can restrict allowed domains in workspace.yaml:
cat .euleragent/config/workspace.yaml
tools_policy:
network:
default: deny # 기본 외부 통신 차단
require_approval: true # 모든 외부 통신 승인 필요
allow_domains: # 허용 도메인 화이트리스트
- arxiv.org
- github.com
- docs.python.org
- fastapi.tiangolo.com
rag:
web_search:
provider: fake # 테스트용 (실제: tavily | brave | duckduckgo)
require_approval: true
api_key: "" # 실제 프로바이더 사용 시 API 키
How the domain whitelist works:
Currently, allow_domains is recorded as metadata that humans reference during approval. When approving, you manually verify whether the query or URL belongs to a whitelisted domain and then accept or reject accordingly.
Approval flow for MCP search providers:
When using MCP, only searches that the SearchRouter routes to external sources are subject to approval. The approval information displays the routed target source:
euleragent approve show apv_xxx
{
"tool_name": "web.search",
"params": {"query": "API gateway comparison 2025"},
"side_effects": ["external_network"],
"mcp_source": "tavily",
"source_set": "default",
"allowed_domains": ["fastapi.tiangolo.com", "docs.python.org"]
}
Searches routed to the local KB do not use external networks, so they do not appear in the approval queue.
# 승인 시 도메인 확인
euleragent approve show apv_xxx
{
"tool_name": "web.search",
"params": {"query": "FastAPI authentication JWT 2025"},
"side_effects": ["external_network"],
"allowed_domains": ["fastapi.tiangolo.com", "docs.python.org"]
}
Setting up API key-based web search for production use:
# workspace.yaml — 실제 Tavily 웹 검색 사용 시
rag:
web_search:
provider: tavily
api_key: tvly-YOUR_TAVILY_API_KEY
require_approval: true
max_results: 5
Step 7: Run Agent Using Local KB Only
First, write a blog post using only the local KB:
euleragent run blog-writer \
--task "FastBridge API 게이트웨이 소개 블로그 글을 작성해줘. 기술 스택과 핵심 기능을 포함해야 해. result.md에 저장해줘." \
--project my-project \
--mode plan
Expected output:
Run b2c3d4e5f6a1 started
Project: my-project
RAG: my-project KB active (7 chunks)
[loop 1/5] Retrieving from local KB...
RAG hit: "FastBridge는 FastAPI 기반..." (score: 0.89)
RAG hit: "JWT 기반 인증/인가..." (score: 0.83)
RAG hit: "Python 3.11, FastAPI 0.110..." (score: 0.78)
[loop 2/5] Generating blog post...
→ Proposed: file.write (path: result.md)
Run b2c3d4e5f6a1 completed (state: PENDING_APPROVAL)
1 approval(s) pending.
After approval, execute:
euleragent approve accept-all --run-id b2c3d4e5f6a1 --actor "user:you" --execute
cat result.md
Step 8: Combining Web Search + Local KB
This is the most powerful pattern. It uses internal documents and the latest web information together:
euleragent run blog-writer \
--task "FastBridge API 게이트웨이 vs 최신 오픈소스 API 게이트웨이 비교 블로그를 작성해줘. FastBridge 내부 문서를 기반으로 하고, 경쟁사 정보는 웹에서 검색해서 포함해줘. result.md에 저장해줘." \
--project my-project \
--mode plan \
--max-loops 4
Expected output:
Run c3d4e5f6a1b2 started
Project: my-project
RAG: my-project KB active (7 chunks)
[loop 1/4] Retrieving from local KB...
RAG hits: 3 chunks retrieved from my-project KB
[loop 2/4] Planning web searches for competitor info...
→ Proposed: web.search (query: "open source API gateway comparison 2025")
→ Proposed: web.search (query: "Kong APISIX Traefik features comparison")
[loop 3/4] Planning output...
→ Proposed: file.write (path: result.md)
Run c3d4e5f6a1b2 completed (state: PENDING_APPROVAL)
3 approval(s) pending.
[2x] web.search
[1x] file.write
Approve web searches first and execute:
euleragent approve accept-all \
--run-id c3d4e5f6a1b2 \
--tool web.search \
--actor "user:you" \
--execute
Expected output:
Accepted and executed 2 web.search approval(s).
[OK] "open source API gateway comparison 2025" → 5 results
[OK] "Kong APISIX Traefik features comparison" → 4 results
Approve file write:
euleragent approve accept-all \
--run-id c3d4e5f6a1b2 \
--tool file.write \
--actor "user:you" \
--execute
Verify the result:
cat result.md
# FastBridge vs 오픈소스 API 게이트웨이 비교
# (내부 문서 + 웹 검색 결과 결합)
Step 9: Verify Citations
When RAG is active during a run, citations.json is generated:
cat .euleragent/runs/c3d4e5f6a1b2/artifacts/citations.json
Expected output:
[
{
"citation_id": "cit_001",
"doc_id": "doc_overview",
"chunk_id": "chunk_003",
"content": "JWT 기반 인증/인가",
"source": "docs/product/overview.md",
"score": 0.8923
},
{
"citation_id": "cit_002",
"doc_id": "doc_tech",
"chunk_id": "chunk_001",
"content": "Python 3.11, FastAPI 0.110",
"source": "docs/product/tech_stack.md",
"score": 0.7891
}
]
View the full RAG context:
cat .euleragent/runs/c3d4e5f6a1b2/artifacts/rag_context.json
External transmission audit log:
cat .euleragent/runs/c3d4e5f6a1b2/external_transmission.jsonl
{"tool": "web.search", "query": "open source API gateway comparison 2025", "timestamp": "2026-02-23T10:30:15Z", "approval_id": "apv_w1"}
{"tool": "web.search", "query": "Kong APISIX Traefik features comparison", "timestamp": "2026-02-23T10:30:17Z", "approval_id": "apv_w2"}
Step 10: Understanding Search Routing
When MCP search providers are configured, the SearchRouter automatically routes the agent's search requests to the appropriate source. The agent only calls a single web.search tool, but internally the SearchRouter selects the optimal source -- such as local KB, Tavily, Brave, etc. -- based on query characteristics.
How SearchRouter works:
에이전트 → web.search("FastBridge 인증 방식")
│
▼
SearchRouter
├─ 쿼리 분석: 내부 제품명 포함 → 로컬 KB 우선
├─ source_set: default → [local_kb, tavily]
└─ 라우팅 결정: local_kb (priority 0)
│
▼
로컬 KB에서 검색 → 결과 반환
에이전트 → web.search("API gateway market trends 2025")
│
▼
SearchRouter
├─ 쿼리 분석: 외부 시장 정보 → 웹 검색 필요
├─ source_set: default → [local_kb, tavily]
└─ 라우팅 결정: tavily (로컬 KB 매치 없음)
│
▼
Tavily 웹 검색 → 결과 반환 (승인 필요)
Routing decisions for each run are recorded in search_routing.jsonl:
cat .euleragent/runs/c3d4e5f6a1b2/search_routing.jsonl
{"query": "FastBridge 인증 방식", "routed_to": "local_kb", "source_set": "default", "reason": "internal_doc_match", "approval_required": false, "timestamp": "2026-02-23T10:30:10Z"}
{"query": "open source API gateway comparison 2025", "routed_to": "tavily", "source_set": "default", "reason": "external_info_needed", "approval_required": true, "timestamp": "2026-02-23T10:30:15Z"}
{"query": "Kong APISIX Traefik features comparison", "routed_to": "brave", "source_set": "default", "reason": "tavily_fallback", "approval_required": true, "timestamp": "2026-02-23T10:30:17Z"}
Local KB routing does not require approval: Searches routed to the local KB by the SearchRouter do not use external networks, so HITL approval is not required. The approval gate is activated only when routing to external search sources.
Step 11: Specifying Search Sources with --source-set
You can explicitly specify the set of search sources to use for a run with the --source-set option:
# 기본 소스 셋 사용 (local_kb + tavily)
euleragent run blog-writer \
--task "FastBridge 소개 블로그 작성" \
--project my-project \
--mode plan
# 오프라인 모드: 로컬 KB만 사용
euleragent run blog-writer \
--task "FastBridge 내부 기술 문서 정리" \
--project my-project \
--source-set offline \
--mode plan
# 뉴스 소스 사용: 최신 시장 동향 조사
euleragent run blog-writer \
--task "API 게이트웨이 시장 최신 동향 조사" \
--project my-project \
--source-set news \
--mode plan
If --source-set is not specified, the mcp.source_sets.default from workspace.yaml is used.
Step 12: MCP Catalog Snapshots
MCP catalog snapshots record the search source configuration at the time of execution to ensure reproducibility. A snapshot is automatically included in each run's artifacts.
cat .euleragent/runs/c3d4e5f6a1b2/artifacts/mcp_catalog_snapshot.json
{
"snapshot_hash": "sha256:a3f8e2d1c4b5...",
"created_at": "2026-02-23T10:30:00Z",
"sources": [
{
"name": "local_kb",
"type": "rag",
"status": "active",
"chunks": 7,
"content_hash": "sha256:b7c9d1e2f3..."
},
{
"name": "tavily",
"type": "web_search",
"provider": "tavily",
"status": "active"
},
{
"name": "brave",
"type": "web_search",
"provider": "brave",
"status": "active"
}
],
"source_set_used": "default",
"resolved_sources": ["local_kb", "tavily"]
}
Role of the snapshot hash:
- snapshot_hash is a hash of the entire catalog configuration
- Same hash = guaranteed to have run with the same search source configuration
- During audits, you can precisely track which sources were active at the time of execution
- The local KB's content_hash also tracks changes to the knowledge base contents
# 두 실행의 카탈로그 스냅샷 비교
diff <(jq '.snapshot_hash' .euleragent/runs/RUN1/artifacts/mcp_catalog_snapshot.json) \
<(jq '.snapshot_hash' .euleragent/runs/RUN2/artifacts/mcp_catalog_snapshot.json)
Step 13: Full Flow Diagram
euleragent run blog-writer --project my-project --source-set default --mode plan
│
▼
[컨텍스트 구성]
system prompt + memory snapshot + MCP 카탈로그 스냅샷
│
▼
[로컬 KB RAG 검색]
euleragent rag query --project my-project
→ 관련 내부 문서 청크 추출
│
▼
[LLM 루프]
├─ 로컬 KB 결과를 컨텍스트에 주입
├─ web.search 제안 (외부 정보 필요 시)
└─ file.write 제안 (최종 결과물)
│
▼
[SearchRouter — MCP 라우팅]
├─ 쿼리 분석 → 소스 선택 (source_set 기준)
├─ 로컬 KB 매치 → 즉시 반환 (승인 불필요)
└─ 외부 소스 → 승인 큐로 전달
│
▼
[승인 큐]
web.search (medium) → 사람이 쿼리 + MCP 소스 검토
file.write (medium) → 사람이 내용 검토
│
▼
[수락 후 실행]
→ 웹 검색 결과 컨텍스트 추가
→ 최종 블로그 포스트 생성
│
▼
artifacts/
├── result.md ← 최종 블로그 글
├── citations.json ← 로컬 KB 인용
├── rag_context.json ← RAG 컨텍스트 전체
├── search_routing.jsonl ← 검색 라우팅 결정 기록
├── mcp_catalog_snapshot.json ← MCP 카탈로그 스냅샷
└── external_transmission.jsonl ← 외부 전송 감사 로그
workspace.yaml RAG and MCP Configuration Reference
# .euleragent/config/workspace.yaml
# LLM 프로바이더
llm_profiles:
local:
provider: ollama
base_url: http://localhost:11434
model: qwen3:32b
timeout_seconds: 120
keep_alive: 5m
is_external: false
openai:
provider: openai
api_key: ''
model: gpt-4o-mini
base_url: https://api.openai.com/v1
is_external: true
default_llm_profile: local
# 메모리 (장기기억)
memory:
enabled: true
store: sqlite
sqlite_path: .euleragent/state/memory.db
embedding:
provider: ollama # RAG 임베딩용
model: nomic-embed-text
retrieval:
top_k: 8
min_score: 0.2
# 도구 보안 정책
tools_policy:
network:
default: deny
require_approval: true
allow_domains:
- arxiv.org
- github.com
- docs.python.org
- huggingface.co
# RAG 설정
rag:
enabled: true
chunk_size: 512 # 청크 크기 (토큰)
chunk_overlap: 64 # 청크 간 겹침
embedding:
provider: ollama
model: nomic-embed-text
web_search:
provider: fake # fake | tavily | brave | duckduckgo
api_key: "" # 실제 프로바이더 API 키
require_approval: true # 항상 true 권장
max_results: 5
# MCP 검색 프로바이더 설정 (선택 사항)
mcp:
enabled: true
catalog_path: .euleragent/state/mcp_catalog.json
sources:
- name: local_kb
type: rag
project: my-project
priority: 0
- name: tavily
type: web_search
provider: tavily
api_key: tvly-YOUR_API_KEY
priority: 1
- name: brave
type: web_search
provider: brave
api_key: BSA-YOUR_API_KEY
priority: 2
source_sets:
default: [local_kb, tavily]
academic: [local_kb, tavily]
news: [brave, tavily]
offline: [local_kb]
Expected Output Summary
# RAG 초기화
$ euleragent rag init --project my-project
RAG knowledge base initialized. Chunks: 0, Docs: 0
# 문서 추가
$ euleragent rag add --project my-project --path ./docs/
Ingested 2 documents, 7 chunks total.
# 상태 확인
$ euleragent rag status --project my-project
Documents: 2, Chunks: 7
# 직접 쿼리
$ euleragent rag query --project my-project --q "인증 방식"
[1] score=0.8923 Source: overview.md "JWT 기반 인증/인가"
# 에이전트 실행 (로컬 KB + 웹 검색)
Run c3d4e5f6... started
RAG hits: 3 chunks from my-project KB
Proposed: web.search × 2, file.write × 1
3 approval(s) pending.
Frequently Asked Questions / Common Errors
Q: No results are returned from rag query after rag add.
Verify that the embedding model is running. If configured with provider: ollama, the nomic-embed-text model is required:
ollama pull nomic-embed-text
ollama list # 모델 목록 확인
Setting provider: fake allows offline operation, but semantic-based search will be limited.
Q: If I add a document that was already added, will duplicates be created?
If the same file path and content already exist, they are automatically skipped. If the file has been modified, existing chunks are deleted and re-ingested:
# 수정된 파일 재인제스트
euleragent rag add --project my-project --path ./docs/product/overview.md
# [update] overview.md → 4 chunks (replaced existing)
Q: Does the allow_domains setting actually block web requests?
In the current version, allow_domains is recorded as metadata, and actual domain filtering is performed by humans during the approval stage. Programmatic blocking is planned for a future version. For now, review URLs/queries manually during approval to verify whitelist compliance.
Q: Can I ingest local PDF files?
Currently supported formats are Markdown (.md), text (.txt), and plain text. Convert PDFs to text before ingesting:
# pdftotext 사용 (poppler-utils 설치 필요)
pdftotext document.pdf document.txt
euleragent rag add --project my-project --path ./document.txt
Q: How can I share RAG across multiple projects?
Currently, RAG is independent per project. To add common documents to multiple projects, you must ingest them individually into each project:
euleragent rag add --project project-a --path ./shared-docs/
euleragent rag add --project project-b --path ./shared-docs/
Q: I want to run web.search automatically without approval.
This is not recommended for security reasons. However, if automation is needed in a trusted environment, remove web.search from the require_approval list in tools.yaml:
# tools.yaml (보안 위험 — 신중하게 사용)
require_approval:
- file.write
- file.delete
- shell.exec
# web.search 제거 → 자동 실행
In this case, all external transmissions will occur automatically, so manage allow_domains and the external_transmission.jsonl audit log thoroughly.
Common Mistakes (Order Errors)
| Symptom | Cause | Recovery |
|---|---|---|
Error: No knowledge base found for project 'X'. |
rag init not executed |
euleragent rag init --project X |
No results found. |
No documents added to KB | euleragent rag add --project X --path docs/ |
Next step: 08_mcp_provider_and_tools.md -- Learn in depth how to configure MCP providers and manage various external search sources in an integrated manner.