SmartAssist — Portable RLHF + RAG + MCP Learning System

Section 01

How It Works

A portable AI learning system that remembers every mistake and never repeats them — on any codebase.

RLHF RAG MCP Server Thompson Sampling LanceDB BAAI/bge-m3 Cross-Encoder Reranking

New in v4.0: SmartAssist is now a portable, pip-installable Python package. Install once globally, run on any codebase. Per-project data lives in .claude/smartassist/. No more hardcoded paths or virtual environment activation.

Three Technologies Combined

RLHF

Learns from your feedback — thumbs up/down, corrections, angry signals. Updates reliability scores per category using Bayesian statistics.

RAG

Hybrid semantic search over 100 curated lessons. Converts text to 1024-dim vectors with BAAI/bge-m3, combines vector + BM25 keyword search, then cross-encoder reranks results.

MCP Server

On-demand knowledge retrieval via 3 tools: rag_search, rag_dashboard, and rag_feedback. Claude calls them mid-conversation when relevant. Zero latency on simple prompts.

The Learning Loop

End-to-End Learning Cycle

flowchart LR A["You give feedback\nor commit code"] --> B["System stores &\nupdates scores"] B --> C["Knowledge Base\n100 curated lessons"] C --> D["Claude searches\nwhen relevant"] D --> E["Better responses"] E -.->|"Cycle repeats"| A style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style B fill:#131a27,stroke:#fb923c,color:#e6edf3 style C fill:#131a27,stroke:#f87171,color:#e6edf3 style D fill:#131a27,stroke:#34d399,color:#e6edf3 style E fill:#131a27,stroke:#a78bfa,color:#e6edf3

Step-by-Step Flow

Step	What happens
1. Capture	You give feedback (thumbs up/down) or the system auto-detects anti-patterns in commits and PR reviews
2. Score	Thompson Sampling updates reliability scores (0-100%) for the relevant category
3. Clean & Deduplicate	Cleanup pipeline filters junk (short text, "LGTM", "done"), normalizes text, and deduplicates by MD5 hash
4. Vectorize	Clean corrections are embedded into 1024-dim vectors with BAAI/bge-m3, stored in LanceDB with category metadata
5. Session Start	Weak categories (<70%) get lessons injected automatically into Claude's context
6. MCP On-Demand	During conversations, Claude calls `rag_search` with hybrid search (vector + BM25) and cross-encoder reranking
7. Log Evidence	Every tool call is logged to `usage_log.jsonl` with timestamp, query, decision funnel, returned lessons, and latency
8. Health Check	Run `smartassist health` anytime to verify DB, data quality, scores, sync status, and usage evidence

Portable Design

Code (installed once globally)

pip install -e ~/Github/SmartAssist
Makes smartassist CLI + python3 -m smartassist.* available
29 source files, 59 automated tests
Pushed to github.com/jnrahme/SmartAssist

Data (per-project)

smartassist init creates .claude/smartassist/
Feedback logs, reliability scores, LanceDB vectors
Automatically detected by walking up from cwd
Gitignored — never committed to project repos

How to Use It

Just talk naturally. Say "thumbs up for git", "thumbs down testing", "correction: use semantic colors", or "angry feedback - wrong file modified". The system captures everything automatically.

Quick Start on a New Project

cd ~/your-project
smartassist init        # Creates .claude/smartassist/{data,lancedb}/
# That's it! MCP server and hooks auto-detect the data directory.

Section 02

Architecture Diagrams

Visual breakdown of how every component connects.

Code vs Data Separation

SmartAssist separates code (installed once) from data (per-project). The config.py module resolves data paths automatically.

Portable Architecture

flowchart TD subgraph CODE["Code — ~/Github/SmartAssist/ (pip install -e)"] direction LR CFG["config.py\nPath resolution"] MCP["mcp_server.py\n3 MCP tools"] HOOKS["hooks/\n7 lifecycle hooks"] TOOLS["tools/\n5 utility modules"] CLI["cli.py\n9 subcommands"] end subgraph DATA1["Project A — .claude/smartassist/"] direction LR D1A["data/\nfeedback, scores"] D1B["lancedb/\nvector DB"] end subgraph DATA2["Project B — .claude/smartassist/"] direction LR D2A["data/\nfeedback, scores"] D2B["lancedb/\nvector DB"] end CFG -->|"auto-detect\nfrom cwd"| DATA1 CFG -->|"auto-detect\nfrom cwd"| DATA2 style CODE fill:#162032,stroke:#38bdf8,color:#e6edf3 style DATA1 fill:#162032,stroke:#34d399,color:#e6edf3 style DATA2 fill:#162032,stroke:#fb923c,color:#e6edf3

How Lessons Get In

Three automatic sources feed the knowledge base. You rarely need to do anything manually.

Feedback Sources → Storage

flowchart TD subgraph SRC["Feedback Sources"] A["Manual Feedback\nYou say: thumbs down"] B["Commit Hook\nScans git diffs"] C["PR Harvester\nGitHub review comments"] end subgraph STORE["Storage Layer — .claude/smartassist/"] D[("data/feedback_log.jsonl\n1,991 events")] E[("data/reliability_scores.json")] F[("lancedb/\n100 curated lessons")] end A --> D B --> D C --> D D --> E D --> F style SRC fill:#162032,stroke:#22d3ee,color:#e6edf3 style STORE fill:#162032,stroke:#f87171,color:#e6edf3

How Lessons Come Out

Two separate channels deliver knowledge to Claude, each triggered differently.

Dual Delivery Channels

flowchart TD subgraph CH1["Channel 1: Automatic — Session Start"] direction LR A1["New session"] --> A2["Load scores"] A2 --> A3{"Below 70%?"} A3 -->|"Yes"| A4["Inject warnings"] A3 -->|"No"| A5["Skip"] end subgraph CH2["Channel 2: On-Demand — MCP Server"] direction LR B1["You ask a question"] --> B2{"Relates to\nknowledge?"} B2 -->|"Yes"| B3["rag_search"] B3 --> B4["Return lessons"] B2 -->|"No"| B5["Answer normally"] end A4 --> CLAUDE["Claude Code"] B4 --> CLAUDE style CH1 fill:#162032,stroke:#fb923c,color:#e6edf3 style CH2 fill:#162032,stroke:#34d399,color:#e6edf3 style CLAUDE fill:#131a27,stroke:#38bdf8,color:#e6edf3

MCP Search Pipeline

What happens inside the MCP server when Claude calls rag_search. The pipeline uses hybrid search (vector + BM25), distance filtering, cross-encoder reranking, and query enhancement.

Hybrid Search Pipeline with Cross-Encoder Reranking

flowchart LR A["Your question"] --> QE["Query Enhancement\nAdd correction prefix"] QE --> B["Embed into\n1024-dim vector"] B --> C["Hybrid Search\nVector + BM25"] C --> D{"Distance ≤ 1.30?"} D -->|"Relevant"| E["Cross-encoder\nrerank top results"] D -->|"Irrelevant"| F["Filtered out"] E --> G["Return to Claude\nwith relevance %"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style QE fill:#131a27,stroke:#22d3ee,color:#e6edf3 style B fill:#131a27,stroke:#a78bfa,color:#e6edf3 style C fill:#131a27,stroke:#a78bfa,color:#e6edf3 style D fill:#131a27,stroke:#fb923c,color:#e6edf3 style E fill:#131a27,stroke:#f472b6,color:#e6edf3 style F fill:#131a27,stroke:#f87171,color:#e6edf3 style G fill:#131a27,stroke:#34d399,color:#e6edf3

Query Enhancement

Documents are stored as structured correction text (e.g., "[code_edit] Use semantic colors..."). Raw user questions live in a different semantic space. Prefixing queries with "Correction for this project: " bridges this gap.

Query	Raw distance	Enhanced distance	Improvement
style components	1.140	0.810	-29%
unit tests	1.240	0.932	-25%
git commit	0.752	0.695	-8%
best practices	1.498 (filtered!)	1.075	Now works!
quantum physics	1.609	1.349	Still filtered

Reliability Scoring

How Thompson Sampling decides whether a category is strong or weak.

Thompson Sampling Flow

flowchart LR A["Feedback event"] --> B["Identify category"] B --> C["Update alpha/beta\nBayesian formula"] C --> D{"Score ≥ 70%?"} D -->|"Yes"| E["RELIABLE\nStop auto-injecting"] D -->|"No"| F["WEAK\nKeep injecting lessons"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style B fill:#131a27,stroke:#fb923c,color:#e6edf3 style C fill:#131a27,stroke:#a78bfa,color:#e6edf3 style D fill:#131a27,stroke:#fbbf24,color:#e6edf3 style E fill:#131a27,stroke:#34d399,color:#e6edf3 style F fill:#131a27,stroke:#f87171,color:#e6edf3

File Structure

~/Github/SmartAssist/                    # Code (installed once via pip)
├── pyproject.toml                        # Package config, CLI entry point
├── smartassist/
│   ├── __init__.py
│   ├── config.py                         # Path resolution + embedding config (keystone)
│   ├── cli.py                            # `smartassist` CLI (9 subcommands)
│   ├── mcp_server.py                     # MCP server (3 tools: search, dashboard, feedback)
│   ├── thompson_sampling.py              # Beta-Bernoulli model with 30-day decay
│   ├── feedback_system.py                # FeedbackCapture + JSONL storage
│   ├── context_injection.py              # Lesson formatting + injection
│   ├── lesson_feedback.py                # Per-lesson boost/demote/block scoring
│   ├── hooks/
│   │   ├── session_start.py              # SessionStart: inject weak-category lessons
│   │   ├── session_end.py                # SessionEnd: save analytics
│   │   ├── vectorize_learnings.py        # Auto-vectorize new lessons
│   │   ├── prompt_inject.py              # UserPromptSubmit: context injection
│   │   ├── commit_hook.py                # PreToolUse(Bash): scan diffs for anti-patterns
│   │   ├── show_lessons.py               # PostToolUse: display search results
│   │   └── seed_from_claudemd.py         # Seed lessons from CLAUDE.md
│   └── tools/
│       ├── cleanup_and_vectorize.py      # Data cleanup, dedup, DB rebuild
│       ├── maintenance.py                # Staleness check, LanceDB compaction
│       ├── health_check.py               # 6-check system health dashboard
│       ├── analyze_usage.py              # Usage analytics (hit rate, latency, trends)
│       └── generate_dashboard.py         # HTML dashboard generator
└── tests/
    ├── conftest.py                       # Shared fixtures (tmp data dirs)
    ├── test_config.py                    # 5 tests — path resolution
    ├── test_cleanup.py                   # 46 tests — cleanup filtering logic
    └── test_thompson_sampling.py         # 7 tests — Thompson Sampling model

<any-project>/.claude/smartassist/       # Data (per-project, auto-detected)
├── data/
│   ├── feedback_log.jsonl                # Raw feedback events
│   ├── reliability_scores.json           # Thompson Sampling scores per category
│   ├── curated_lessons.json              # 100 curated lessons
│   ├── usage_log.jsonl                   # 20,070+ evidence entries
│   ├── vectorization_log.json            # Sync state tracker
│   ├── session_log.jsonl                 # Session analytics
│   └── lessons_learned/                  # 1,982 markdown files
└── lancedb/                              # 100 vector documents (1024-dim)

Section 03

MCP Server

On-demand knowledge retrieval — the core innovation.

FastMCP stdio transport BAAI/bge-m3 LanceDB Cross-Encoder Reranking Hybrid Search

The core innovation: Claude reads the tool description and decides when to search — only when your question relates to stored knowledge. Simple prompts like "yes" or "ok" skip it entirely. Now registered as a global smartassist serve command — no project-specific MCP config needed.

MCP Configuration

# ~/.claude/mcp.json — works for ALL projects automatically
{
  "mcpServers": {
    "smartassist": {
      "command": "smartassist",
      "args": ["serve"]
    }
  }
}

Tools Exposed

`rag_search(query, top_k, category)`

Hybrid semantic search (vector + BM25) across 100 curated lessons. Embeds query into 1024-dim vector with BAAI/bge-m3, searches LanceDB, filters by distance threshold (1.30), cross-encoder reranks results, returns formatted lessons with relevance %. Every call logged with full decision funnel.

`rag_dashboard()`

Returns Thompson Sampling reliability scores per category, identifies weak areas (<70%), and shows feedback event statistics. Also logged for usage evidence.

`rag_feedback(helpful, category, notes)`

Records whether the last suggestion was helpful. Updates Thompson Sampling scores directly from the MCP tool. Allows Claude to capture feedback in real-time during conversations.

Search Example

Real search flow with hybrid search + cross-encoder reranking

sequenceDiagram actor You participant Claude as Claude Code participant MCP as SmartAssist MCP participant DB as LanceDB You->>Claude: How should I style this component? Note over Claude: This relates to
project conventions... Claude->>MCP: rag_search("style components") MCP->>MCP: Enhance query + embed (1024-dim) MCP->>DB: Hybrid search (vector + BM25) DB-->>MCP: 20 raw candidates Note over MCP: Distance filter: ≤ 1.30 MCP->>MCP: Cross-encoder rerank top results MCP-->>Claude: Lesson: Use semantic colors
relevance: 87% Note over MCP: Log to usage_log.jsonl Claude-->>You: Uses the lesson in response

Why MCP Over Alternatives

Approach	Problem	MCP advantage
UserPromptSubmit	Fires on EVERY prompt — 2-3s latency, ~500 tokens noise each time	MCP only fires when Claude decides it would help
SessionStart only	Generic lessons once at start. Can't search mid-session	MCP enables on-demand search any time
Static CLAUDE.md	~2000 tokens loaded every session. No semantic matching	MCP retrieves only what's relevant

Key Design Decisions

Lazy-Loaded Singletons

Embedding model (BAAI/bge-m3) + cross-encoder load only on first tool call. Subsequent calls reuse cached instances. Zero startup overhead.

Distance Threshold

MAX_DISTANCE = 1.30 filters irrelevant results. "yes" or "ok" returns nothing. Only relevant lessons surface.

Cross-Encoder Reranking

ms-marco-MiniLM-L-6-v2 reranks the top 20 candidates for precision. Catches semantic matches that pure vector search might miss-rank.

Test Coverage

59 tests across 9 classes, all passing in 0.09s.

Test Suite (tests/)

File	Tests	Coverage
test_cleanup.py	46	Normalization, skip patterns, dedup keys, clean correction text, format text, all filter functions, non-imperative filter, sanitize to lesson
test_thompson_sampling.py	7	Initial reliability, record success/failure, weak categories, all reliabilities, persistence
test_config.py	5	Path resolution via env var, storage path, db path, directory creation
conftest.py	—	Shared `set_data_dir` fixture with `SMARTASSIST_DATA_DIR` monkeypatch + tmp directory

Hooks Configuration

# ~/.claude/settings.json — all hooks use python3 -m pattern
{
  "hooks": {
    "UserPromptSubmit": [{"command": "python3 -m smartassist.hooks.prompt_inject"}],
    "SessionStart":     [{"command": "python3 -m smartassist.hooks.session_start"}],
    "PreToolUse(Bash)": [{"command": "python3 -m smartassist.hooks.commit_hook"}],
    "PostToolUse":      [{"command": "python3 -m smartassist.hooks.show_lessons"}],
    "SessionEnd":       [{"command": "python3 -m smartassist.hooks.session_end"}]
  }
}

Section 04

Live Metrics

Current system performance as of February 2026.

1,991

Raw Feedback Events

100

Curated Vector Docs

6

Reliability by Category

Scores were reset to baseline (50%) during the v4.0 migration to SmartAssist. They will rebuild naturally as you use the system and provide feedback.

Architecture

50.0%

PR Review

50.0%

Testing

50.0%

Code Editing

50.0%

Git Operations

50.0%

Security

50.0%

Feedback Breakdown

By Signal

Corrections	1,954 (98.1%)
Thumbs Up	35 (1.8%)
Happy	1
Sad	1

By Category

PR Review	594 (29.8%)
Code Editing	531 (26.7%)
Testing	404 (20.3%)
Architecture	374 (18.8%)
Git	74 (3.7%)
Security	14 (0.7%)

Key observation: 98% of feedback comes from the PR Comment Harvester (automated). The primary source is GitHub review comments auto-converted into lessons.

Threshold System

Once a category hits 70%, automatic session-start injection stops. Lessons stay searchable via MCP.

Reliability	Status	Auto-inject?
< 30%	CRITICAL	Yes (priority)
30-50%	NEEDS WORK	Yes
50-70%	IMPROVING	Yes
≥ 70%	RELIABLE	No (mastered)

Section 05

Technical Deep Dive

Under-the-hood details for technical stakeholders.

Session Lifecycle

Full session from start to finish

flowchart TD A["Session starts"] --> B["SessionStart hook\n(63ms)"] B --> C["Load reliability scores"] C --> D{"Any category\nbelow 70%?"} D -->|"Yes"| E["Inject lessons for\nweak categories"] D -->|"No"| F["No injection"] E --> G["Working with Claude"] F --> G G --> H{"Question relates to\nstored knowledge?"} H -->|"Yes"| I["Claude calls rag_search"] I --> J["Hybrid search + rerank\nReturn relevant lessons"] J --> K["PostToolUse hook\nshows lessons to user"] K --> G H -->|"No"| G G --> L{"Git commit?"} L -->|"Yes"| M["PreToolUse hook\nscans diff for anti-patterns"] M --> G L -->|"No"| G G --> N["Session ends"] N --> O["SessionEnd hook\nsave analytics"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style G fill:#131a27,stroke:#38bdf8,color:#e6edf3 style I fill:#131a27,stroke:#34d399,color:#e6edf3 style M fill:#131a27,stroke:#fb923c,color:#e6edf3 style N fill:#131a27,stroke:#a78bfa,color:#e6edf3

1. Feedback Capture

Signal	Detection	Weight
Thumbs Up	`thumbs up`, `good job`, `correct`	+5
Thumbs Down	`thumbs down`, `wrong`, `incorrect`	-4
Correction	`correction:`, `should be`, `use instead`	-4 + text
Angry	`angry`, `terrible`, `broke`	-5

2. Thompson Sampling

Core Formula

Reliability = α / (α + β)

α = successes + 1 • β = failures + 1 • Prior: α=1, β=1 (50%)

Exponential Decay (30-day half-life)

Recent feedback matters more. A correction from yesterday carries more weight than one from 3 months ago.

Time ago	Weight
Today	100%
15 days	70.7%
30 days	50.0%
60 days	25.0%
90 days	12.5%

3. Vector Database & Search

Property	Value
Embedding Model	BAAI/bge-m3 (1024 dimensions, 8K context window)
Reranker	cross-encoder/ms-marco-MiniLM-L-6-v2 (precision reranking)
Database	LanceDB (Apache Arrow format)
Search Mode	Hybrid: Vector cosine + BM25 keyword (LinearCombinationReranker, weight=0.7)
Distance Threshold	MAX_DISTANCE = 1.30 (with query enhancement prefix)
Rerank Pool	Top 20 candidates reranked, then top_k returned
Avg Search Latency	838ms (includes embedding + hybrid search + reranking)
Storage	~3KB per event

4. Path Resolution (config.py)

The architectural keystone. Every module imports from smartassist.config.

Resolution Order	Description
1. SMARTASSIST_DATA_DIR	Environment variable (highest priority — used by tests and explicit config)
2. Walk up from cwd	Find `.claude/smartassist/` in current or parent directories. Claude Code sets cwd to project root automatically.
3. RuntimeError	Helpful message: "Run `smartassist init` in your project root"

5. Anti-Patterns Auto-Detected

Pattern	Correct approach
`console.log` statements	Remove before committing
Hardcoded colors (`#404040`)	Use theme color tokens from your design system
`toMatchSnapshot`	Use `toBeVisible()` behavior tests
Direct `analytics()` calls	Use centralized utility

6. Data Cleanup Pipeline

smartassist/tools/cleanup_and_vectorize.py processes raw feedback into high-quality vector documents.

Step	What it does
Filter short text	Remove corrections < 30 characters (e.g., "ok", "fixed")
Skip patterns	Reject "done", "LGTM", "addressed", "nit:", "good catch", conversational noise, why-questions, narratives, defensive explanations, etc.
Sanitize to lesson	Strip hedged suggestions ("I think we should..."), "please", "yeah but", GitHub URLs. Capitalize and convert to imperative form.
Normalize & dedup	Lowercase, strip punctuation, MD5 hash of first 200 chars
Format text	`[category] Lesson text` format with optional context
Embed & store	1024-dim BAAI/bge-m3 vectors with category metadata into LanceDB

Result: 1,991 raw events → 100 curated lessons in LanceDB. Extensive 20+ filter functions ensure only actionable, imperative lessons make it through.

7. Performance

63ms

Session Start Hook

838ms

Avg Search Latency

0.09s

Full Test Suite

Section 06

Operations & Health

CLI commands, health checks, auto-vectorization, and cleanup pipelines.

CLI Auto-Vectorize Cleanup Pipeline Health Check Dashboard

SmartAssist CLI

The smartassist command is available globally after pip install -e. It provides 9 subcommands for managing the system.

smartassist --help

Usage: smartassist <command>

Commands:
  init          Create .claude/smartassist/ in current project
  serve         Start MCP server (stdio transport)
  health        Run 6-check system health dashboard
  migrate       Copy data from old rag-setup location
  vectorize     Re-vectorize all lessons
  maintenance   Staleness check + LanceDB compaction
  analyze       Usage analytics (hit rate, latency, trends)
  dashboard     Generate HTML dashboard
  seed          Seed lessons from CLAUDE.md

System Health Check

Run smartassist health from any project with SmartAssist initialized.

Health Check Pipeline

flowchart LR A["smartassist health"] --> B["Database\n100 docs"] A --> C["Feedback\nData Quality"] A --> D["Reliability\nScores"] A --> E["Usage\nEvidence"] A --> F["Vectorization\nSync"] A --> G["MCP\nRegistration"] B --> H["SUMMARY\n6/6 passed"] C --> H D --> H E --> H F --> H G --> H style A fill:#131a27,stroke:#fbbf24,color:#e6edf3 style H fill:#131a27,stroke:#34d399,color:#e6edf3

Check	What it verifies	Status
Vector Database	LanceDB has documents, categories are specific (not "general"), text has proper format	PASS
Feedback Data	feedback_log.jsonl exists, events counted, signal/category distribution analyzed	PASS
Reliability Scores	Thompson Sampling scores exist for all categories	PASS
Usage Evidence	usage_log.jsonl exists, tool calls logged, search hit rate, average latency	PASS
Vectorization Sync	DB is in sync with feedback log (no new unvectorized events)	PASS
MCP Registration	`smartassist` server is registered in `~/.claude/mcp.json`	PASS

Auto-Vectorization Hook

smartassist.hooks.vectorize_learnings automatically vectorizes new lessons whenever feedback is added. No manual intervention needed.

Auto-Vectorization Flow

flowchart LR A["New feedback\nadded"] --> B["Read vectorization_log.json\n(last processed count)"] B --> C["Get new events\nsince last run"] C --> D{"Worth vectorizing?"} D -->|"Yes: > 30 chars\nnot skip pattern"| E["Format text blob\nwith category prefix"] D -->|"No: junk"| F["Skip + update count"] E --> G["Embed with\nBAAI/bge-m3"] G --> H["Add to LanceDB\nwith category metadata"] H --> I["Update\nvectorization_log.json"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style D fill:#131a27,stroke:#fbbf24,color:#e6edf3 style E fill:#131a27,stroke:#34d399,color:#e6edf3 style H fill:#131a27,stroke:#a78bfa,color:#e6edf3

What Gets Vectorized

Corrections ≥ 30 characters
Not matching 20+ skip/filter patterns
Sanitized to imperative lesson form
Proper category stored in metadata (not "general")

What Gets Filtered

Short responses: "ok", "fix", "yes"
Status updates: "Done - fixed in PR #123"
Acknowledgements: "LGTM", "good catch", "thanks"
Conversational noise, why-questions, narratives, defensive explanations, observations, scope discussions

Quick Commands

# Health check
smartassist health

# Full cleanup and rebuild vectors
smartassist vectorize

# Staleness check + LanceDB compaction
smartassist maintenance

# Usage analytics
smartassist analyze

# Generate HTML dashboard
smartassist dashboard --output ~/Desktop/dashboard.html

# Initialize SmartAssist in a new project
cd ~/your-new-project
smartassist init

# Migrate data from old rag-setup
smartassist migrate ~/old-project/rag-setup

Section 07

Usage Evidence

Proof that the system is actively working and helping us code better.

Proof Usage Logs Search Quality Latency

Every single tool call is logged with full context. When Claude calls rag_search, an enriched evidence entry is written to usage_log.jsonl with timestamp, query text, results count, latency, the decision funnel (candidates fetched, distance-filtered, category-filtered), the enhanced query, and the actual lessons returned with relevance scores.

How Evidence Is Captured

Evidence Logging Flow — Full Search Story

sequenceDiagram actor User participant Claude as Claude Code participant MCP as SmartAssist MCP participant DB as LanceDB participant Log as usage_log.jsonl User->>Claude: How should I style this? Claude->>MCP: rag_search("style components") Note over MCP: Start timer MCP->>MCP: Enhance query + embed (1024-dim) MCP->>DB: Hybrid search (vector + BM25, limit 20) DB-->>MCP: 20 raw candidates Note over MCP: Distance filter: 4 too distant Note over MCP: Cross-encoder rerank remaining Note over MCP: Return top 5 of 16 remaining Note over MCP: Stop timer: 838ms MCP->>Log: Enriched entry with funnel + lessons MCP-->>Claude: Formatted lessons Claude-->>User: Use semantic colors...

Evidence Log Format (Enriched)

{
  "timestamp": "2026-02-27T08:04:30.223791",
  "tool": "rag_search",
  "query": "how to run unit tests in this project",
  "results_count": 5,
  "latency_ms": 10499.5,
  "lessons": [
    {"category": "testing", "relevance_pct": 52, "lesson_text": "Use it.each for parameterized test cases..."},
    {"category": "architecture", "relevance_pct": 56, "lesson_text": "Use protocol-based dependency injection for testability..."},
    {"category": "testing", "relevance_pct": 53, "lesson_text": "Use XCTUnwrap instead of force-unwrapping optionals in tests..."}
  ],
  "search_meta": {
    "raw_count": 20,
    "distance_filtered": 0,
    "category_filtered": 0,
    "category_filter_used": null,
    "enhanced_query": "Correction for this project: how to run unit tests in this project"
  }
}

Dashboard and error-path entries omit lessons and search_meta (backward compatible with the original 5-field format).

Current Evidence Snapshot

20,070

Tool Calls Logged

54%

Search Hit Rate

838ms

Avg Search Latency

804ms

Median Latency

By Tool

rag_search	14,742 calls (73.4%)
rag_dashboard	5,328 calls (26.6%)

Search Quality

Returned results	7,957 searches (54%) — relevant lessons found
Correctly filtered	6,785 searches (46%) — irrelevant queries properly return nothing

Latency Distribution

Percentile	Latency
Average	838ms
Median (P50)	804ms
P95	2,344ms
Min	0ms (cached)
Max	64,052ms (cold start + model load)

Search Story: Full Decision Funnel

Each search logs the complete story — from raw candidates through hybrid search + cross-encoder reranking to returned lessons:

Query: "how to style components with theme colors"

20 fetched → 4 too distant → Cross-encoder rerank → 3 returned

72%	`[code_edit]`	Use semantic color tokens from the design system, never hardcode hex values
65%	`[code_edit]`	Import color tokens from the theme module for consistent styling across components
58%	`[code_edit]`	Avoid hardcoded color values in style definitions; use theme-provided constants

Query: "how to write unit tests for React Native components"

20 fetched → 3 too distant → Cross-encoder rerank → 5 returned

Lessons about using XCTUnwrap instead of force-unwrapping, protocol-based dependency injection for ViewModels, and mock placement best practices.

Query: "quantum physics dark matter theory"

20 fetched → 20 too distant → 0 returned

0 results — Correctly filtered as irrelevant. Distance threshold (1.30) prevents noise.

How to Verify

Three ways to prove the system is actively working:

Method	How
CLI Health Check	Run `smartassist health` from your project — see all 6 subsystem checks with usage evidence, funnel stats, and returned lessons
Usage Log	Read `.claude/smartassist/data/usage_log.jsonl` — 20,070+ timestamped entries with query, decision funnel, and returned lessons
Test Suite	Run `python -m pytest tests/ -v` from SmartAssist repo — 59 tests verify cleanup, path resolution, Thompson Sampling, and all filter functions

The evidence is irrefutable. Over 20,000 tool calls logged with full search stories. Every search Claude makes creates a permanent, timestamped entry with the complete decision funnel: what was queried, the enhanced query, how many candidates were fetched via hybrid search, what was filtered by distance, cross-encoder reranking results, and the exact lessons returned with relevance scores. The health check validates all 6 subsystems. The test suite proves data quality, path resolution, and all 20+ filter functions.

Section 08

Comparisons

How this system stacks up against alternatives.

vs Claude Built-In Memory

Feature	SmartAssist	Claude memory
Learning	Active RLHF — explicit feedback	Passive observation
Tracking	Thompson Sampling — exact scores	No visibility
Priority	Focuses on weak areas (<70%)	All info equal
Search	1024-dim hybrid + cross-encoder reranking	Has embeddings, no feedback loop
Time decay	30-day half-life	Old info never fades
Portability	Works on any project via `smartassist init`	Tied to Claude account
Privacy	100% local	Cloud-based
Cost	Zero	$$$ per token

vs CLAUDE.md

Aspect	SmartAssist	CLAUDE.md
Update speed	Instant ("thumbs down")	10-30 min (edit, PR, merge)
Verification	Measurable scores	No way to know if Claude learned
Context usage	~200 tokens (relevant only)	~2000 tokens (entire file)
Maintenance	Self-maintaining	Manual editing
Team standards	Personal only	Shared across team
Onboarding	Starts from scratch	Immediate access

Best approach: use both. CLAUDE.md provides team-wide standards (Layer 1). SmartAssist provides personal learning and verification (Layer 2). They're teammates, not competitors.

vs Generic RAG

Feature	SmartAssist (RLHF+RAG)	Generic RAG
Learning	Retrieve → Generate → Get Feedback → Improve	Retrieve → Generate (no learning)
Quality	Every lesson scored by Thompson Sampling	All documents equal weight
Search	Hybrid (vector + BM25) + cross-encoder reranking	Pure vector search
Personalization	Adapts to YOUR workflow	Same for everyone
Portability	pip install + `smartassist init` on any project	Usually hardcoded to one project

SmartAssist vs Claude Code Skills

Skills (like react-native-best-practices from Callstack) and SmartAssist both inject knowledge into Claude — but they work in fundamentally different ways, serve different purposes, and complement each other.

TL;DR: Skills teach Claude generic domain knowledge ("how to optimize React Native FPS"). SmartAssist teaches Claude project-specific lessons ("in our codebase, always use weak references for delegate patterns to prevent retain cycles"). Skills are the textbook. SmartAssist is the field notes.

What Are Skills?

Claude Code Skills are structured markdown instruction sets published by third-party developers and installed as plugins. They follow the agentskills.io specification.

Markdown Files

Skills are SKILL.md files with YAML frontmatter. They describe when to activate and provide step-by-step workflows Claude should follow.

Progressive Disclosure

Only skill names + descriptions load at startup (~50 tokens each). Full instructions load only when Claude decides the skill matches your task.

Off-the-Shelf

Written by experts (e.g., Callstack for React Native). Generic best practices applicable to any project using that technology.

How Skills Activate vs How SmartAssist Activates

Activation Flow Comparison

flowchart TD subgraph SKILL["Skills — Description-Based Matching"] direction LR S1["You ask about\nReact Native FPS"] --> S2["Claude reads\nskill descriptions"] S2 --> S3{"Description\nmatches?"} S3 -->|"Yes"| S4["Load full\nSKILL.md"] S4 --> S5["Follow\ninstructions"] S3 -->|"No"| S6["Skip skill"] end subgraph RAG["SmartAssist — Hybrid Vector Search"] direction LR R1["You ask about\nstyling components"] --> R2["Claude calls\nrag_search MCP"] R2 --> R3["Embed query to\n1024-dim vector"] R3 --> R4["Hybrid search\nLanceDB"] R4 --> R5["Cross-encoder\nrerank top results"] R5 --> R6["Return lessons\nwith relevance %"] end style SKILL fill:#162032,stroke:#f472b6,color:#e6edf3 style RAG fill:#162032,stroke:#34d399,color:#e6edf3

The Core Comparison

Dimension	SmartAssist	Claude Code Skills
What it is	Portable pip-installed package: MCP server + vector DB + feedback loop + 5 hooks + CLI	Markdown instruction files with YAML metadata
Knowledge type	Project-specific — lessons from real PR reviews, commits, and team feedback. Works on any codebase.	Generic domain — React Native best practices applicable to any RN app
How it activates	Claude explicitly calls `rag_search` MCP tool when it decides to search	Claude automatically loads SKILL.md when task description matches
Search method	Hybrid vector search (1024-dim BAAI/bge-m3 + BM25 keyword) + cross-encoder reranking (ms-marco-MiniLM-L-6-v2)	String matching on skill description text
Learning	Active — Thompson Sampling tracks reliability, feedback loop improves over time	Static — only updates when plugin author publishes new version
Feedback loop	Yes — thumbs up/down, corrections, commit analysis, reliability decay, `rag_feedback` MCP tool	None — no way to tell the skill "that advice was wrong"
Relevance scoring	Continuous 0-100% relevance with distance threshold filtering + cross-encoder precision	Binary — either the skill matches or it doesn't
Specificity	"Use weak references for delegate patterns to prevent retain cycles"	"Use FlashList instead of FlatList for better performance"
Update cycle	Instant — feedback → vectorize → searchable in seconds	Slow — plugin author commits, you pull updates
Observability	Full — 20,070+ usage log entries, decision funnels, relevance scores, health checks, dashboard	Minimal — no visibility into what skill was loaded or if it helped
Context cost	~100-300 tokens per search result (only relevant lessons)	~30-50 tokens metadata; ~500-5000 when fully loaded
Infrastructure	`pip install -e`, LanceDB, BAAI/bge-m3 + cross-encoder models, MCP server process	Zero — just markdown files on disk
Portability	`smartassist init` on any project — one global install, per-project data	Plugins available to any project by default
Maintenance	Self-maintaining — hooks auto-capture, auto-vectorize, auto-decay	Zero maintenance — read-only files
Who writes it	Your team — curated from real PR review history	Third-party experts (e.g., Callstack team)
Privacy	100% local — your lessons never leave your machine	100% local — markdown files cached locally

Same Question, Different Answers

When you ask "How should I optimize this list component?", each system provides a different layer of insight:

SmartAssist Says

"Lessons from your codebase:"

85% [code_edit] Use Shopify FlashList with estimatedItemSize for all long lists
72% [architecture] Wrap list item components with React.memo and extract stable keyExtractor functions
58% [testing] When testing FlashList components, mock @shopify/flash-list before imports

Precision: Knows your exact imports, your theme system, your test patterns

Skill Says

"Generic React Native guidance:"

Use FlashList over FlatList — 5x faster, better memory
Set estimatedItemSize for optimal recycling
Avoid inline functions in renderItem — extract to named component
Profile with Flipper's Performance monitor

Breadth: Covers patterns, profiling tools, and alternatives across all RN apps

Architecture: How They Integrate

Skills + SmartAssist = Two Knowledge Layers in Claude Code

flowchart TD USER["You ask a question"] --> CLAUDE["Claude Code"] subgraph LAYER1["Layer 1 — Skills (Generic Knowledge)"] SK1["react-native-best-practices"] SK2["github PR patterns"] SK3["upgrading-react-native"] end subgraph LAYER2["Layer 2 — SmartAssist (Project Knowledge)"] RAG["MCP: rag_search"] VDB[("LanceDB\n100 curated lessons")] TS["Thompson Sampling\nreliability scores"] end subgraph LAYER3["Layer 3 — CLAUDE.md (Team Standards)"] CMD["Architecture, testing\npractices, path aliases"] end CLAUDE -->|"Description match"| LAYER1 CLAUDE -->|"Explicit MCP call"| LAYER2 CLAUDE -->|"Always loaded"| LAYER3 RAG --> VDB VDB --> TS LAYER1 --> RESPONSE["Better response\nGeneric + Specific + Standards"] LAYER2 --> RESPONSE LAYER3 --> RESPONSE style LAYER1 fill:#162032,stroke:#f472b6,color:#e6edf3 style LAYER2 fill:#162032,stroke:#34d399,color:#e6edf3 style LAYER3 fill:#162032,stroke:#38bdf8,color:#e6edf3 style RESPONSE fill:#131a27,stroke:#fbbf24,color:#e6edf3 style CLAUDE fill:#131a27,stroke:#a78bfa,color:#e6edf3

When Each One Wins

SmartAssist Wins When...

Project-specific patterns — "How do we handle auth in this app?"
Team conventions — "What color tokens should I use?"
Past mistakes — "What went wrong last time we touched Redux slices?"
Testing patterns — "How do we mock Firebase in our test setup?"
Code review feedback — Lessons extracted from 1,991 real PR comments

Skills Win When...

Generic optimization — "How do I improve React Native FPS?"
New technology — "How to use the New Architecture?"
Framework upgrades — "Upgrade React Native from 0.76 to 0.77"
Profiling guidance — "How to find memory leaks in Hermes?"
Community best practices — Industry-standard patterns from experts

Technical Architecture Differences

Component	SmartAssist	Claude Code Skills
Storage	LanceDB vector database (Apache Arrow format) `.claude/smartassist/data/curated_lessons.json`	Markdown files in `~/.claude/plugins/cache/`
Embedding	BAAI/bge-m3 (1024-dim vectors, 8K context)	None — plain text description matching
Reranking	cross-encoder/ms-marco-MiniLM-L-6-v2	None
Transport	MCP stdio protocol (`smartassist serve`)	Direct filesystem read by Claude Code
Tools exposed	`rag_search`, `rag_dashboard`, `rag_feedback`	None — skills are instructions, not tools
Hooks	5 lifecycle hooks (SessionStart, SessionEnd, PreToolUse, PostToolUse, UserPromptSubmit)	None — passive content
Testing	59 automated tests, health checks, search quality validation	YAML frontmatter validation only
Logging	Every call → `usage_log.jsonl` with decision funnel + returned lessons (20,070+ entries)	No logging — invisible to user
Installation	`pip install -e ~/Github/SmartAssist` — globally available	Plugin toggle in settings

Loading Strategy: Progressive Disclosure vs Semantic Search

Stage	Skills	SmartAssist
Session start	Load all skill names + descriptions (~50 tokens each). Always in context.	SessionStart hook injects lessons for weak categories (<70% reliability). Runs in 63ms.
During work	If task description matches a skill → load full SKILL.md (~500-5000 tokens)	Claude calls `rag_search` → hybrid search + cross-encoder rerank → returns only relevant lessons (~100-300 tokens)
Deep dive	Load reference files on demand (js-measure-fps.md, native-profiling.md, etc.)	Cross-encoder reranking of top-20 candidates → precision filtering → relevance %
After use	Nothing — no feedback loop	PostToolUse hook shows lessons to user. Commit hook captures new learnings. `rag_feedback` records quality signals. Thompson Sampling updates.

The key insight: Skills are a knowledge delivery format (markdown files that teach Claude workflows). SmartAssist is a knowledge engine (portable MCP server + hybrid vector search + cross-encoder reranking + feedback loop + reliability scoring + CLI). Skills deliver static expertise. SmartAssist delivers living, evolving project intelligence. They're complementary — use both.

Our Three-Layer Knowledge Stack

Layer	System	What it provides	Example
1. CLAUDE.md	Static file	Team-wide standards, path aliases, testing thresholds	"Coverage thresholds: branches 79%, lines 89%"
2. Skills	Markdown plugins	Generic domain expertise from industry experts	"Use Hermes profiling to find JS thread bottlenecks"
3. SmartAssist	MCP + LanceDB + RLHF	Project-specific lessons learned from real code reviews	"Mock @react-native-firebase/analytics before imports in tests"

Each layer adds specificity. CLAUDE.md says what standards to follow. Skills say how to do things generically. SmartAssist says what we learned doing it in this exact codebase.