SmartAssist
Portable RLHF + RAG + MCP — An AI That Gets Smarter Every Project
Joey Rahme — Version 4.0 • February 2026
Overview
Architecture
MCP Server
Live Metrics
Deep Dive
Operations
Evidence
Comparisons
New in v4.0: SmartAssist is now a portable, pip-installable Python package. Install once globally, run on any codebase. Per-project data lives in .claude/smartassist/. No more hardcoded paths or virtual environment activation.
Three Technologies Combined
RLHF
Learns from your feedback — thumbs up/down, corrections, angry signals. Updates reliability scores per category using Bayesian statistics.
RAG
Hybrid semantic search over 100 curated lessons. Converts text to 1024-dim vectors with BAAI/bge-m3, combines vector + BM25 keyword search, then cross-encoder reranks results.
MCP Server
On-demand knowledge retrieval via 3 tools: rag_search, rag_dashboard, and rag_feedback. Claude calls them mid-conversation when relevant. Zero latency on simple prompts.
The Learning Loop
End-to-End Learning Cycle
flowchart LR
A["You give feedback\nor commit code"] --> B["System stores &\nupdates scores"]
B --> C["Knowledge Base\n100 curated lessons"]
C --> D["Claude searches\nwhen relevant"]
D --> E["Better responses"]
E -.->|"Cycle repeats"| A
style A fill:#131a27,stroke:#38bdf8,color:#e6edf3
style B fill:#131a27,stroke:#fb923c,color:#e6edf3
style C fill:#131a27,stroke:#f87171,color:#e6edf3
style D fill:#131a27,stroke:#34d399,color:#e6edf3
style E fill:#131a27,stroke:#a78bfa,color:#e6edf3
Step-by-Step Flow
| Step | What happens |
| 1. Capture | You give feedback (thumbs up/down) or the system auto-detects anti-patterns in commits and PR reviews |
| 2. Score | Thompson Sampling updates reliability scores (0-100%) for the relevant category |
| 3. Clean & Deduplicate | Cleanup pipeline filters junk (short text, "LGTM", "done"), normalizes text, and deduplicates by MD5 hash |
| 4. Vectorize | Clean corrections are embedded into 1024-dim vectors with BAAI/bge-m3, stored in LanceDB with category metadata |
| 5. Session Start | Weak categories (<70%) get lessons injected automatically into Claude's context |
| 6. MCP On-Demand | During conversations, Claude calls rag_search with hybrid search (vector + BM25) and cross-encoder reranking |
| 7. Log Evidence | Every tool call is logged to usage_log.jsonl with timestamp, query, decision funnel, returned lessons, and latency |
| 8. Health Check | Run smartassist health anytime to verify DB, data quality, scores, sync status, and usage evidence |
Portable Design
Code (installed once globally)
pip install -e ~/Github/SmartAssist
- Makes
smartassist CLI + python3 -m smartassist.* available
- 29 source files, 59 automated tests
- Pushed to
github.com/jnrahme/SmartAssist
Data (per-project)
smartassist init creates .claude/smartassist/
- Feedback logs, reliability scores, LanceDB vectors
- Automatically detected by walking up from
cwd
- Gitignored — never committed to project repos
How to Use It
Just talk naturally. Say "thumbs up for git", "thumbs down testing", "correction: use semantic colors", or "angry feedback - wrong file modified". The system captures everything automatically.
Quick Start on a New Project
cd ~/your-project
smartassist init # Creates .claude/smartassist/{data,lancedb}/
# That's it! MCP server and hooks auto-detect the data directory.
Code vs Data Separation
SmartAssist separates code (installed once) from data (per-project). The config.py module resolves data paths automatically.
Portable Architecture
flowchart TD
subgraph CODE["Code — ~/Github/SmartAssist/ (pip install -e)"]
direction LR
CFG["config.py\nPath resolution"]
MCP["mcp_server.py\n3 MCP tools"]
HOOKS["hooks/\n7 lifecycle hooks"]
TOOLS["tools/\n5 utility modules"]
CLI["cli.py\n9 subcommands"]
end
subgraph DATA1["Project A — .claude/smartassist/"]
direction LR
D1A["data/\nfeedback, scores"]
D1B["lancedb/\nvector DB"]
end
subgraph DATA2["Project B — .claude/smartassist/"]
direction LR
D2A["data/\nfeedback, scores"]
D2B["lancedb/\nvector DB"]
end
CFG -->|"auto-detect\nfrom cwd"| DATA1
CFG -->|"auto-detect\nfrom cwd"| DATA2
style CODE fill:#162032,stroke:#38bdf8,color:#e6edf3
style DATA1 fill:#162032,stroke:#34d399,color:#e6edf3
style DATA2 fill:#162032,stroke:#fb923c,color:#e6edf3
How Lessons Get In
Three automatic sources feed the knowledge base. You rarely need to do anything manually.
Feedback Sources → Storage
flowchart TD
subgraph SRC["Feedback Sources"]
A["Manual Feedback\nYou say: thumbs down"]
B["Commit Hook\nScans git diffs"]
C["PR Harvester\nGitHub review comments"]
end
subgraph STORE["Storage Layer — .claude/smartassist/"]
D[("data/feedback_log.jsonl\n1,991 events")]
E[("data/reliability_scores.json")]
F[("lancedb/\n100 curated lessons")]
end
A --> D
B --> D
C --> D
D --> E
D --> F
style SRC fill:#162032,stroke:#22d3ee,color:#e6edf3
style STORE fill:#162032,stroke:#f87171,color:#e6edf3
How Lessons Come Out
Two separate channels deliver knowledge to Claude, each triggered differently.
Dual Delivery Channels
flowchart TD
subgraph CH1["Channel 1: Automatic — Session Start"]
direction LR
A1["New session"] --> A2["Load scores"]
A2 --> A3{"Below 70%?"}
A3 -->|"Yes"| A4["Inject warnings"]
A3 -->|"No"| A5["Skip"]
end
subgraph CH2["Channel 2: On-Demand — MCP Server"]
direction LR
B1["You ask a question"] --> B2{"Relates to\nknowledge?"}
B2 -->|"Yes"| B3["rag_search"]
B3 --> B4["Return lessons"]
B2 -->|"No"| B5["Answer normally"]
end
A4 --> CLAUDE["Claude Code"]
B4 --> CLAUDE
style CH1 fill:#162032,stroke:#fb923c,color:#e6edf3
style CH2 fill:#162032,stroke:#34d399,color:#e6edf3
style CLAUDE fill:#131a27,stroke:#38bdf8,color:#e6edf3
MCP Search Pipeline
What happens inside the MCP server when Claude calls rag_search. The pipeline uses hybrid search (vector + BM25), distance filtering, cross-encoder reranking, and query enhancement.
Hybrid Search Pipeline with Cross-Encoder Reranking
flowchart LR
A["Your question"] --> QE["Query Enhancement\nAdd correction prefix"]
QE --> B["Embed into\n1024-dim vector"]
B --> C["Hybrid Search\nVector + BM25"]
C --> D{"Distance ≤ 1.30?"}
D -->|"Relevant"| E["Cross-encoder\nrerank top results"]
D -->|"Irrelevant"| F["Filtered out"]
E --> G["Return to Claude\nwith relevance %"]
style A fill:#131a27,stroke:#38bdf8,color:#e6edf3
style QE fill:#131a27,stroke:#22d3ee,color:#e6edf3
style B fill:#131a27,stroke:#a78bfa,color:#e6edf3
style C fill:#131a27,stroke:#a78bfa,color:#e6edf3
style D fill:#131a27,stroke:#fb923c,color:#e6edf3
style E fill:#131a27,stroke:#f472b6,color:#e6edf3
style F fill:#131a27,stroke:#f87171,color:#e6edf3
style G fill:#131a27,stroke:#34d399,color:#e6edf3
Query Enhancement
Documents are stored as structured correction text (e.g., "[code_edit] Use semantic colors..."). Raw user questions live in a different semantic space. Prefixing queries with "Correction for this project: " bridges this gap.
| Query | Raw distance | Enhanced distance | Improvement |
| style components | 1.140 | 0.810 | -29% |
| unit tests | 1.240 | 0.932 | -25% |
| git commit | 0.752 | 0.695 | -8% |
| best practices | 1.498 (filtered!) | 1.075 | Now works! |
| quantum physics | 1.609 | 1.349 | Still filtered |
Reliability Scoring
How Thompson Sampling decides whether a category is strong or weak.
Thompson Sampling Flow
flowchart LR
A["Feedback event"] --> B["Identify category"]
B --> C["Update alpha/beta\nBayesian formula"]
C --> D{"Score ≥ 70%?"}
D -->|"Yes"| E["RELIABLE\nStop auto-injecting"]
D -->|"No"| F["WEAK\nKeep injecting lessons"]
style A fill:#131a27,stroke:#38bdf8,color:#e6edf3
style B fill:#131a27,stroke:#fb923c,color:#e6edf3
style C fill:#131a27,stroke:#a78bfa,color:#e6edf3
style D fill:#131a27,stroke:#fbbf24,color:#e6edf3
style E fill:#131a27,stroke:#34d399,color:#e6edf3
style F fill:#131a27,stroke:#f87171,color:#e6edf3
File Structure
~/Github/SmartAssist/ # Code (installed once via pip)
├── pyproject.toml # Package config, CLI entry point
├── smartassist/
│ ├── __init__.py
│ ├── config.py # Path resolution + embedding config (keystone)
│ ├── cli.py # `smartassist` CLI (9 subcommands)
│ ├── mcp_server.py # MCP server (3 tools: search, dashboard, feedback)
│ ├── thompson_sampling.py # Beta-Bernoulli model with 30-day decay
│ ├── feedback_system.py # FeedbackCapture + JSONL storage
│ ├── context_injection.py # Lesson formatting + injection
│ ├── lesson_feedback.py # Per-lesson boost/demote/block scoring
│ ├── hooks/
│ │ ├── session_start.py # SessionStart: inject weak-category lessons
│ │ ├── session_end.py # SessionEnd: save analytics
│ │ ├── vectorize_learnings.py # Auto-vectorize new lessons
│ │ ├── prompt_inject.py # UserPromptSubmit: context injection
│ │ ├── commit_hook.py # PreToolUse(Bash): scan diffs for anti-patterns
│ │ ├── show_lessons.py # PostToolUse: display search results
│ │ └── seed_from_claudemd.py # Seed lessons from CLAUDE.md
│ └── tools/
│ ├── cleanup_and_vectorize.py # Data cleanup, dedup, DB rebuild
│ ├── maintenance.py # Staleness check, LanceDB compaction
│ ├── health_check.py # 6-check system health dashboard
│ ├── analyze_usage.py # Usage analytics (hit rate, latency, trends)
│ └── generate_dashboard.py # HTML dashboard generator
└── tests/
├── conftest.py # Shared fixtures (tmp data dirs)
├── test_config.py # 5 tests — path resolution
├── test_cleanup.py # 46 tests — cleanup filtering logic
└── test_thompson_sampling.py # 7 tests — Thompson Sampling model
<any-project>/.claude/smartassist/ # Data (per-project, auto-detected)
├── data/
│ ├── feedback_log.jsonl # Raw feedback events
│ ├── reliability_scores.json # Thompson Sampling scores per category
│ ├── curated_lessons.json # 100 curated lessons
│ ├── usage_log.jsonl # 20,070+ evidence entries
│ ├── vectorization_log.json # Sync state tracker
│ ├── session_log.jsonl # Session analytics
│ └── lessons_learned/ # 1,982 markdown files
└── lancedb/ # 100 vector documents (1024-dim)
The core innovation: Claude reads the tool description and decides when to search — only when your question relates to stored knowledge. Simple prompts like "yes" or "ok" skip it entirely. Now registered as a global smartassist serve command — no project-specific MCP config needed.
MCP Configuration
# ~/.claude/mcp.json — works for ALL projects automatically
{
"mcpServers": {
"smartassist": {
"command": "smartassist",
"args": ["serve"]
}
}
}
Tools Exposed
rag_search(query, top_k, category)
Hybrid semantic search (vector + BM25) across 100 curated lessons. Embeds query into 1024-dim vector with BAAI/bge-m3, searches LanceDB, filters by distance threshold (1.30), cross-encoder reranks results, returns formatted lessons with relevance %. Every call logged with full decision funnel.
rag_dashboard()
Returns Thompson Sampling reliability scores per category, identifies weak areas (<70%), and shows feedback event statistics. Also logged for usage evidence.
rag_feedback(helpful, category, notes)
Records whether the last suggestion was helpful. Updates Thompson Sampling scores directly from the MCP tool. Allows Claude to capture feedback in real-time during conversations.
Search Example
Real search flow with hybrid search + cross-encoder reranking
sequenceDiagram
actor You
participant Claude as Claude Code
participant MCP as SmartAssist MCP
participant DB as LanceDB
You->>Claude: How should I style this component?
Note over Claude: This relates to
project conventions...
Claude->>MCP: rag_search("style components")
MCP->>MCP: Enhance query + embed (1024-dim)
MCP->>DB: Hybrid search (vector + BM25)
DB-->>MCP: 20 raw candidates
Note over MCP: Distance filter: ≤ 1.30
MCP->>MCP: Cross-encoder rerank top results
MCP-->>Claude: Lesson: Use semantic colors
relevance: 87%
Note over MCP: Log to usage_log.jsonl
Claude-->>You: Uses the lesson in response
Why MCP Over Alternatives
| Approach | Problem | MCP advantage |
| UserPromptSubmit | Fires on EVERY prompt — 2-3s latency, ~500 tokens noise each time | MCP only fires when Claude decides it would help |
| SessionStart only | Generic lessons once at start. Can't search mid-session | MCP enables on-demand search any time |
| Static CLAUDE.md | ~2000 tokens loaded every session. No semantic matching | MCP retrieves only what's relevant |
Key Design Decisions
Lazy-Loaded Singletons
Embedding model (BAAI/bge-m3) + cross-encoder load only on first tool call. Subsequent calls reuse cached instances. Zero startup overhead.
Distance Threshold
MAX_DISTANCE = 1.30 filters irrelevant results. "yes" or "ok" returns nothing. Only relevant lessons surface.
Cross-Encoder Reranking
ms-marco-MiniLM-L-6-v2 reranks the top 20 candidates for precision. Catches semantic matches that pure vector search might miss-rank.
Test Coverage
59 tests across 9 classes, all passing in 0.09s.
Test Suite (tests/)
| File | Tests | Coverage |
| test_cleanup.py | 46 | Normalization, skip patterns, dedup keys, clean correction text, format text, all filter functions, non-imperative filter, sanitize to lesson |
| test_thompson_sampling.py | 7 | Initial reliability, record success/failure, weak categories, all reliabilities, persistence |
| test_config.py | 5 | Path resolution via env var, storage path, db path, directory creation |
| conftest.py | — | Shared set_data_dir fixture with SMARTASSIST_DATA_DIR monkeypatch + tmp directory |
Hooks Configuration
# ~/.claude/settings.json — all hooks use python3 -m pattern
{
"hooks": {
"UserPromptSubmit": [{"command": "python3 -m smartassist.hooks.prompt_inject"}],
"SessionStart": [{"command": "python3 -m smartassist.hooks.session_start"}],
"PreToolUse(Bash)": [{"command": "python3 -m smartassist.hooks.commit_hook"}],
"PostToolUse": [{"command": "python3 -m smartassist.hooks.show_lessons"}],
"SessionEnd": [{"command": "python3 -m smartassist.hooks.session_end"}]
}
}
Reliability by Category
Scores were reset to baseline (50%) during the v4.0 migration to SmartAssist. They will rebuild naturally as you use the system and provide feedback.
Feedback Breakdown
By Signal
| Corrections | 1,954 (98.1%) |
| Thumbs Up | 35 (1.8%) |
| Happy | 1 |
| Sad | 1 |
By Category
| PR Review | 594 (29.8%) |
| Code Editing | 531 (26.7%) |
| Testing | 404 (20.3%) |
| Architecture | 374 (18.8%) |
| Git | 74 (3.7%) |
| Security | 14 (0.7%) |
Key observation: 98% of feedback comes from the PR Comment Harvester (automated). The primary source is GitHub review comments auto-converted into lessons.
Threshold System
Once a category hits 70%, automatic session-start injection stops. Lessons stay searchable via MCP.
| Reliability | Status | Auto-inject? |
| < 30% | CRITICAL | Yes (priority) |
| 30-50% | NEEDS WORK | Yes |
| 50-70% | IMPROVING | Yes |
| ≥ 70% | RELIABLE | No (mastered) |
Session Lifecycle
Full session from start to finish
flowchart TD
A["Session starts"] --> B["SessionStart hook\n(63ms)"]
B --> C["Load reliability scores"]
C --> D{"Any category\nbelow 70%?"}
D -->|"Yes"| E["Inject lessons for\nweak categories"]
D -->|"No"| F["No injection"]
E --> G["Working with Claude"]
F --> G
G --> H{"Question relates to\nstored knowledge?"}
H -->|"Yes"| I["Claude calls rag_search"]
I --> J["Hybrid search + rerank\nReturn relevant lessons"]
J --> K["PostToolUse hook\nshows lessons to user"]
K --> G
H -->|"No"| G
G --> L{"Git commit?"}
L -->|"Yes"| M["PreToolUse hook\nscans diff for anti-patterns"]
M --> G
L -->|"No"| G
G --> N["Session ends"]
N --> O["SessionEnd hook\nsave analytics"]
style A fill:#131a27,stroke:#38bdf8,color:#e6edf3
style G fill:#131a27,stroke:#38bdf8,color:#e6edf3
style I fill:#131a27,stroke:#34d399,color:#e6edf3
style M fill:#131a27,stroke:#fb923c,color:#e6edf3
style N fill:#131a27,stroke:#a78bfa,color:#e6edf3
1. Feedback Capture
| Signal | Detection | Weight |
| Thumbs Up | thumbs up, good job, correct | +5 |
| Thumbs Down | thumbs down, wrong, incorrect | -4 |
| Correction | correction:, should be, use instead | -4 + text |
| Angry | angry, terrible, broke | -5 |
2. Thompson Sampling
Core Formula
Reliability = α / (α + β)
α = successes + 1 • β = failures + 1 • Prior: α=1, β=1 (50%)
Exponential Decay (30-day half-life)
Recent feedback matters more. A correction from yesterday carries more weight than one from 3 months ago.
| Time ago | Weight |
| Today | 100% |
| 15 days | 70.7% |
| 30 days | 50.0% |
| 60 days | 25.0% |
| 90 days | 12.5% |
3. Vector Database & Search
| Property | Value |
| Embedding Model | BAAI/bge-m3 (1024 dimensions, 8K context window) |
| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 (precision reranking) |
| Database | LanceDB (Apache Arrow format) |
| Search Mode | Hybrid: Vector cosine + BM25 keyword (LinearCombinationReranker, weight=0.7) |
| Distance Threshold | MAX_DISTANCE = 1.30 (with query enhancement prefix) |
| Rerank Pool | Top 20 candidates reranked, then top_k returned |
| Avg Search Latency | 838ms (includes embedding + hybrid search + reranking) |
| Storage | ~3KB per event |
4. Path Resolution (config.py)
The architectural keystone. Every module imports from smartassist.config.
| Resolution Order | Description |
| 1. SMARTASSIST_DATA_DIR | Environment variable (highest priority — used by tests and explicit config) |
| 2. Walk up from cwd | Find .claude/smartassist/ in current or parent directories. Claude Code sets cwd to project root automatically. |
| 3. RuntimeError | Helpful message: "Run smartassist init in your project root" |
5. Anti-Patterns Auto-Detected
| Pattern | Correct approach |
console.log statements | Remove before committing |
Hardcoded colors (#404040) | Use theme color tokens from your design system |
toMatchSnapshot | Use toBeVisible() behavior tests |
Direct analytics() calls | Use centralized utility |
6. Data Cleanup Pipeline
smartassist/tools/cleanup_and_vectorize.py processes raw feedback into high-quality vector documents.
| Step | What it does |
| Filter short text | Remove corrections < 30 characters (e.g., "ok", "fixed") |
| Skip patterns | Reject "done", "LGTM", "addressed", "nit:", "good catch", conversational noise, why-questions, narratives, defensive explanations, etc. |
| Sanitize to lesson | Strip hedged suggestions ("I think we should..."), "please", "yeah but", GitHub URLs. Capitalize and convert to imperative form. |
| Normalize & dedup | Lowercase, strip punctuation, MD5 hash of first 200 chars |
| Format text | [category] Lesson text format with optional context |
| Embed & store | 1024-dim BAAI/bge-m3 vectors with category metadata into LanceDB |
Result: 1,991 raw events → 100 curated lessons in LanceDB. Extensive 20+ filter functions ensure only actionable, imperative lessons make it through.
7. Performance
SmartAssist CLI
The smartassist command is available globally after pip install -e. It provides 9 subcommands for managing the system.
smartassist --help
Usage: smartassist <command>
Commands:
init Create .claude/smartassist/ in current project
serve Start MCP server (stdio transport)
health Run 6-check system health dashboard
migrate Copy data from old rag-setup location
vectorize Re-vectorize all lessons
maintenance Staleness check + LanceDB compaction
analyze Usage analytics (hit rate, latency, trends)
dashboard Generate HTML dashboard
seed Seed lessons from CLAUDE.md
System Health Check
Run smartassist health from any project with SmartAssist initialized.
Health Check Pipeline
flowchart LR
A["smartassist health"] --> B["Database\n100 docs"]
A --> C["Feedback\nData Quality"]
A --> D["Reliability\nScores"]
A --> E["Usage\nEvidence"]
A --> F["Vectorization\nSync"]
A --> G["MCP\nRegistration"]
B --> H["SUMMARY\n6/6 passed"]
C --> H
D --> H
E --> H
F --> H
G --> H
style A fill:#131a27,stroke:#fbbf24,color:#e6edf3
style H fill:#131a27,stroke:#34d399,color:#e6edf3
| Check | What it verifies | Status |
| Vector Database | LanceDB has documents, categories are specific (not "general"), text has proper format | PASS |
| Feedback Data | feedback_log.jsonl exists, events counted, signal/category distribution analyzed | PASS |
| Reliability Scores | Thompson Sampling scores exist for all categories | PASS |
| Usage Evidence | usage_log.jsonl exists, tool calls logged, search hit rate, average latency | PASS |
| Vectorization Sync | DB is in sync with feedback log (no new unvectorized events) | PASS |
| MCP Registration | smartassist server is registered in ~/.claude/mcp.json | PASS |
Auto-Vectorization Hook
smartassist.hooks.vectorize_learnings automatically vectorizes new lessons whenever feedback is added. No manual intervention needed.
Auto-Vectorization Flow
flowchart LR
A["New feedback\nadded"] --> B["Read vectorization_log.json\n(last processed count)"]
B --> C["Get new events\nsince last run"]
C --> D{"Worth vectorizing?"}
D -->|"Yes: > 30 chars\nnot skip pattern"| E["Format text blob\nwith category prefix"]
D -->|"No: junk"| F["Skip + update count"]
E --> G["Embed with\nBAAI/bge-m3"]
G --> H["Add to LanceDB\nwith category metadata"]
H --> I["Update\nvectorization_log.json"]
style A fill:#131a27,stroke:#38bdf8,color:#e6edf3
style D fill:#131a27,stroke:#fbbf24,color:#e6edf3
style E fill:#131a27,stroke:#34d399,color:#e6edf3
style H fill:#131a27,stroke:#a78bfa,color:#e6edf3
What Gets Vectorized
- Corrections ≥ 30 characters
- Not matching 20+ skip/filter patterns
- Sanitized to imperative lesson form
- Proper category stored in metadata (not "general")
What Gets Filtered
- Short responses: "ok", "fix", "yes"
- Status updates: "Done - fixed in PR #123"
- Acknowledgements: "LGTM", "good catch", "thanks"
- Conversational noise, why-questions, narratives, defensive explanations, observations, scope discussions
Quick Commands
# Health check
smartassist health
# Full cleanup and rebuild vectors
smartassist vectorize
# Staleness check + LanceDB compaction
smartassist maintenance
# Usage analytics
smartassist analyze
# Generate HTML dashboard
smartassist dashboard --output ~/Desktop/dashboard.html
# Initialize SmartAssist in a new project
cd ~/your-new-project
smartassist init
# Migrate data from old rag-setup
smartassist migrate ~/old-project/rag-setup
Every single tool call is logged with full context. When Claude calls rag_search, an enriched evidence entry is written to usage_log.jsonl with timestamp, query text, results count, latency, the decision funnel (candidates fetched, distance-filtered, category-filtered), the enhanced query, and the actual lessons returned with relevance scores.
How Evidence Is Captured
Evidence Logging Flow — Full Search Story
sequenceDiagram
actor User
participant Claude as Claude Code
participant MCP as SmartAssist MCP
participant DB as LanceDB
participant Log as usage_log.jsonl
User->>Claude: How should I style this?
Claude->>MCP: rag_search("style components")
Note over MCP: Start timer
MCP->>MCP: Enhance query + embed (1024-dim)
MCP->>DB: Hybrid search (vector + BM25, limit 20)
DB-->>MCP: 20 raw candidates
Note over MCP: Distance filter: 4 too distant
Note over MCP: Cross-encoder rerank remaining
Note over MCP: Return top 5 of 16 remaining
Note over MCP: Stop timer: 838ms
MCP->>Log: Enriched entry with funnel + lessons
MCP-->>Claude: Formatted lessons
Claude-->>User: Use semantic colors...
Evidence Log Format (Enriched)
{
"timestamp": "2026-02-27T08:04:30.223791",
"tool": "rag_search",
"query": "how to run unit tests in this project",
"results_count": 5,
"latency_ms": 10499.5,
"lessons": [
{"category": "testing", "relevance_pct": 52, "lesson_text": "Use it.each for parameterized test cases..."},
{"category": "architecture", "relevance_pct": 56, "lesson_text": "Use protocol-based dependency injection for testability..."},
{"category": "testing", "relevance_pct": 53, "lesson_text": "Use XCTUnwrap instead of force-unwrapping optionals in tests..."}
],
"search_meta": {
"raw_count": 20,
"distance_filtered": 0,
"category_filtered": 0,
"category_filter_used": null,
"enhanced_query": "Correction for this project: how to run unit tests in this project"
}
}
Dashboard and error-path entries omit lessons and search_meta (backward compatible with the original 5-field format).
Current Evidence Snapshot
By Tool
| rag_search | 14,742 calls (73.4%) |
| rag_dashboard | 5,328 calls (26.6%) |
Search Quality
| Returned results | 7,957 searches (54%) — relevant lessons found |
| Correctly filtered | 6,785 searches (46%) — irrelevant queries properly return nothing |
Latency Distribution
| Percentile | Latency |
| Average | 838ms |
| Median (P50) | 804ms |
| P95 | 2,344ms |
| Min | 0ms (cached) |
| Max | 64,052ms (cold start + model load) |
Search Story: Full Decision Funnel
Each search logs the complete story — from raw candidates through hybrid search + cross-encoder reranking to returned lessons:
Query: "how to style components with theme colors"
20 fetched
→
4 too distant
→
Cross-encoder rerank
→
3 returned
| 72% | [code_edit] | Use semantic color tokens from the design system, never hardcode hex values |
| 65% | [code_edit] | Import color tokens from the theme module for consistent styling across components |
| 58% | [code_edit] | Avoid hardcoded color values in style definitions; use theme-provided constants |
Query: "how to write unit tests for React Native components"
20 fetched
→
3 too distant
→
Cross-encoder rerank
→
5 returned
Lessons about using XCTUnwrap instead of force-unwrapping, protocol-based dependency injection for ViewModels, and mock placement best practices.
Query: "quantum physics dark matter theory"
20 fetched
→
20 too distant
→
0 returned
0 results — Correctly filtered as irrelevant. Distance threshold (1.30) prevents noise.
How to Verify
Three ways to prove the system is actively working:
| Method | How |
| CLI Health Check | Run smartassist health from your project — see all 6 subsystem checks with usage evidence, funnel stats, and returned lessons |
| Usage Log | Read .claude/smartassist/data/usage_log.jsonl — 20,070+ timestamped entries with query, decision funnel, and returned lessons |
| Test Suite | Run python -m pytest tests/ -v from SmartAssist repo — 59 tests verify cleanup, path resolution, Thompson Sampling, and all filter functions |
The evidence is irrefutable. Over 20,000 tool calls logged with full search stories. Every search Claude makes creates a permanent, timestamped entry with the complete decision funnel: what was queried, the enhanced query, how many candidates were fetched via hybrid search, what was filtered by distance, cross-encoder reranking results, and the exact lessons returned with relevance scores. The health check validates all 6 subsystems. The test suite proves data quality, path resolution, and all 20+ filter functions.
vs Claude Built-In Memory
| Feature | SmartAssist | Claude memory |
| Learning | Active RLHF — explicit feedback | Passive observation |
| Tracking | Thompson Sampling — exact scores | No visibility |
| Priority | Focuses on weak areas (<70%) | All info equal |
| Search | 1024-dim hybrid + cross-encoder reranking | Has embeddings, no feedback loop |
| Time decay | 30-day half-life | Old info never fades |
| Portability | Works on any project via smartassist init | Tied to Claude account |
| Privacy | 100% local | Cloud-based |
| Cost | Zero | $$$ per token |
vs CLAUDE.md
| Aspect | SmartAssist | CLAUDE.md |
| Update speed | Instant ("thumbs down") | 10-30 min (edit, PR, merge) |
| Verification | Measurable scores | No way to know if Claude learned |
| Context usage | ~200 tokens (relevant only) | ~2000 tokens (entire file) |
| Maintenance | Self-maintaining | Manual editing |
| Team standards | Personal only | Shared across team |
| Onboarding | Starts from scratch | Immediate access |
Best approach: use both. CLAUDE.md provides team-wide standards (Layer 1). SmartAssist provides personal learning and verification (Layer 2). They're teammates, not competitors.
vs Generic RAG
| Feature | SmartAssist (RLHF+RAG) | Generic RAG |
| Learning | Retrieve → Generate → Get Feedback → Improve | Retrieve → Generate (no learning) |
| Quality | Every lesson scored by Thompson Sampling | All documents equal weight |
| Search | Hybrid (vector + BM25) + cross-encoder reranking | Pure vector search |
| Personalization | Adapts to YOUR workflow | Same for everyone |
| Portability | pip install + smartassist init on any project | Usually hardcoded to one project |
SmartAssist vs Claude Code Skills
Skills (like react-native-best-practices from Callstack) and SmartAssist both inject knowledge into Claude — but they work in fundamentally different ways, serve different purposes, and complement each other.
TL;DR: Skills teach Claude generic domain knowledge ("how to optimize React Native FPS"). SmartAssist teaches Claude project-specific lessons ("in our codebase, always use weak references for delegate patterns to prevent retain cycles"). Skills are the textbook. SmartAssist is the field notes.
What Are Skills?
Claude Code Skills are structured markdown instruction sets published by third-party developers and installed as plugins. They follow the agentskills.io specification.
Markdown Files
Skills are SKILL.md files with YAML frontmatter. They describe when to activate and provide step-by-step workflows Claude should follow.
Progressive Disclosure
Only skill names + descriptions load at startup (~50 tokens each). Full instructions load only when Claude decides the skill matches your task.
Off-the-Shelf
Written by experts (e.g., Callstack for React Native). Generic best practices applicable to any project using that technology.
How Skills Activate vs How SmartAssist Activates
Activation Flow Comparison
flowchart TD
subgraph SKILL["Skills — Description-Based Matching"]
direction LR
S1["You ask about\nReact Native FPS"] --> S2["Claude reads\nskill descriptions"]
S2 --> S3{"Description\nmatches?"}
S3 -->|"Yes"| S4["Load full\nSKILL.md"]
S4 --> S5["Follow\ninstructions"]
S3 -->|"No"| S6["Skip skill"]
end
subgraph RAG["SmartAssist — Hybrid Vector Search"]
direction LR
R1["You ask about\nstyling components"] --> R2["Claude calls\nrag_search MCP"]
R2 --> R3["Embed query to\n1024-dim vector"]
R3 --> R4["Hybrid search\nLanceDB"]
R4 --> R5["Cross-encoder\nrerank top results"]
R5 --> R6["Return lessons\nwith relevance %"]
end
style SKILL fill:#162032,stroke:#f472b6,color:#e6edf3
style RAG fill:#162032,stroke:#34d399,color:#e6edf3
The Core Comparison
| Dimension | SmartAssist | Claude Code Skills |
| What it is |
Portable pip-installed package: MCP server + vector DB + feedback loop + 5 hooks + CLI |
Markdown instruction files with YAML metadata |
| Knowledge type |
Project-specific — lessons from real PR reviews, commits, and team feedback. Works on any codebase. |
Generic domain — React Native best practices applicable to any RN app |
| How it activates |
Claude explicitly calls rag_search MCP tool when it decides to search |
Claude automatically loads SKILL.md when task description matches |
| Search method |
Hybrid vector search (1024-dim BAAI/bge-m3 + BM25 keyword) + cross-encoder reranking (ms-marco-MiniLM-L-6-v2) |
String matching on skill description text |
| Learning |
Active — Thompson Sampling tracks reliability, feedback loop improves over time |
Static — only updates when plugin author publishes new version |
| Feedback loop |
Yes — thumbs up/down, corrections, commit analysis, reliability decay, rag_feedback MCP tool |
None — no way to tell the skill "that advice was wrong" |
| Relevance scoring |
Continuous 0-100% relevance with distance threshold filtering + cross-encoder precision |
Binary — either the skill matches or it doesn't |
| Specificity |
"Use weak references for delegate patterns to prevent retain cycles" |
"Use FlashList instead of FlatList for better performance" |
| Update cycle |
Instant — feedback → vectorize → searchable in seconds |
Slow — plugin author commits, you pull updates |
| Observability |
Full — 20,070+ usage log entries, decision funnels, relevance scores, health checks, dashboard |
Minimal — no visibility into what skill was loaded or if it helped |
| Context cost |
~100-300 tokens per search result (only relevant lessons) |
~30-50 tokens metadata; ~500-5000 when fully loaded |
| Infrastructure |
pip install -e, LanceDB, BAAI/bge-m3 + cross-encoder models, MCP server process |
Zero — just markdown files on disk |
| Portability |
smartassist init on any project — one global install, per-project data |
Plugins available to any project by default |
| Maintenance |
Self-maintaining — hooks auto-capture, auto-vectorize, auto-decay |
Zero maintenance — read-only files |
| Who writes it |
Your team — curated from real PR review history |
Third-party experts (e.g., Callstack team) |
| Privacy |
100% local — your lessons never leave your machine |
100% local — markdown files cached locally |
Same Question, Different Answers
When you ask "How should I optimize this list component?", each system provides a different layer of insight:
SmartAssist Says
"Lessons from your codebase:"
- 85%
[code_edit] Use Shopify FlashList with estimatedItemSize for all long lists
- 72%
[architecture] Wrap list item components with React.memo and extract stable keyExtractor functions
- 58%
[testing] When testing FlashList components, mock @shopify/flash-list before imports
Precision: Knows your exact imports, your theme system, your test patterns
Skill Says
"Generic React Native guidance:"
- Use FlashList over FlatList — 5x faster, better memory
- Set
estimatedItemSize for optimal recycling
- Avoid inline functions in renderItem — extract to named component
- Profile with Flipper's Performance monitor
Breadth: Covers patterns, profiling tools, and alternatives across all RN apps
Architecture: How They Integrate
Skills + SmartAssist = Two Knowledge Layers in Claude Code
flowchart TD
USER["You ask a question"] --> CLAUDE["Claude Code"]
subgraph LAYER1["Layer 1 — Skills (Generic Knowledge)"]
SK1["react-native-best-practices"]
SK2["github PR patterns"]
SK3["upgrading-react-native"]
end
subgraph LAYER2["Layer 2 — SmartAssist (Project Knowledge)"]
RAG["MCP: rag_search"]
VDB[("LanceDB\n100 curated lessons")]
TS["Thompson Sampling\nreliability scores"]
end
subgraph LAYER3["Layer 3 — CLAUDE.md (Team Standards)"]
CMD["Architecture, testing\npractices, path aliases"]
end
CLAUDE -->|"Description match"| LAYER1
CLAUDE -->|"Explicit MCP call"| LAYER2
CLAUDE -->|"Always loaded"| LAYER3
RAG --> VDB
VDB --> TS
LAYER1 --> RESPONSE["Better response\nGeneric + Specific + Standards"]
LAYER2 --> RESPONSE
LAYER3 --> RESPONSE
style LAYER1 fill:#162032,stroke:#f472b6,color:#e6edf3
style LAYER2 fill:#162032,stroke:#34d399,color:#e6edf3
style LAYER3 fill:#162032,stroke:#38bdf8,color:#e6edf3
style RESPONSE fill:#131a27,stroke:#fbbf24,color:#e6edf3
style CLAUDE fill:#131a27,stroke:#a78bfa,color:#e6edf3
When Each One Wins
SmartAssist Wins When...
- Project-specific patterns — "How do we handle auth in this app?"
- Team conventions — "What color tokens should I use?"
- Past mistakes — "What went wrong last time we touched Redux slices?"
- Testing patterns — "How do we mock Firebase in our test setup?"
- Code review feedback — Lessons extracted from 1,991 real PR comments
Skills Win When...
- Generic optimization — "How do I improve React Native FPS?"
- New technology — "How to use the New Architecture?"
- Framework upgrades — "Upgrade React Native from 0.76 to 0.77"
- Profiling guidance — "How to find memory leaks in Hermes?"
- Community best practices — Industry-standard patterns from experts
Technical Architecture Differences
| Component | SmartAssist | Claude Code Skills |
| Storage |
LanceDB vector database (Apache Arrow format)
.claude/smartassist/data/curated_lessons.json |
Markdown files in
~/.claude/plugins/cache/ |
| Embedding |
BAAI/bge-m3 (1024-dim vectors, 8K context) |
None — plain text description matching |
| Reranking |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
None |
| Transport |
MCP stdio protocol (smartassist serve) |
Direct filesystem read by Claude Code |
| Tools exposed |
rag_search, rag_dashboard, rag_feedback |
None — skills are instructions, not tools |
| Hooks |
5 lifecycle hooks (SessionStart, SessionEnd, PreToolUse, PostToolUse, UserPromptSubmit) |
None — passive content |
| Testing |
59 automated tests, health checks, search quality validation |
YAML frontmatter validation only |
| Logging |
Every call → usage_log.jsonl with decision funnel + returned lessons (20,070+ entries) |
No logging — invisible to user |
| Installation |
pip install -e ~/Github/SmartAssist — globally available |
Plugin toggle in settings |
Loading Strategy: Progressive Disclosure vs Semantic Search
| Stage | Skills | SmartAssist |
| Session start |
Load all skill names + descriptions (~50 tokens each). Always in context. |
SessionStart hook injects lessons for weak categories (<70% reliability). Runs in 63ms. |
| During work |
If task description matches a skill → load full SKILL.md (~500-5000 tokens) |
Claude calls rag_search → hybrid search + cross-encoder rerank → returns only relevant lessons (~100-300 tokens) |
| Deep dive |
Load reference files on demand (js-measure-fps.md, native-profiling.md, etc.) |
Cross-encoder reranking of top-20 candidates → precision filtering → relevance % |
| After use |
Nothing — no feedback loop |
PostToolUse hook shows lessons to user. Commit hook captures new learnings. rag_feedback records quality signals. Thompson Sampling updates. |
The key insight: Skills are a knowledge delivery format (markdown files that teach Claude workflows). SmartAssist is a knowledge engine (portable MCP server + hybrid vector search + cross-encoder reranking + feedback loop + reliability scoring + CLI). Skills deliver static expertise. SmartAssist delivers living, evolving project intelligence. They're complementary — use both.
Our Three-Layer Knowledge Stack
| Layer | System | What it provides | Example |
| 1. CLAUDE.md |
Static file |
Team-wide standards, path aliases, testing thresholds |
"Coverage thresholds: branches 79%, lines 89%" |
| 2. Skills |
Markdown plugins |
Generic domain expertise from industry experts |
"Use Hermes profiling to find JS thread bottlenecks" |
| 3. SmartAssist |
MCP + LanceDB + RLHF |
Project-specific lessons learned from real code reviews |
"Mock @react-native-firebase/analytics before imports in tests" |
Each layer adds specificity. CLAUDE.md says what standards to follow. Skills say how to do things generically. SmartAssist says what we learned doing it in this exact codebase.