SmartAssist

Portable RLHF + RAG + MCP — An AI That Gets Smarter Every Project

Joey Rahme — Version 4.0 • February 2026

Section 01
How It Works
RLHF RAG MCP Server Thompson Sampling LanceDB BAAI/bge-m3 Cross-Encoder Reranking
New in v4.0: SmartAssist is now a portable, pip-installable Python package. Install once globally, run on any codebase. Per-project data lives in .claude/smartassist/. No more hardcoded paths or virtual environment activation.

Three Technologies Combined

RLHF
Learns from your feedback — thumbs up/down, corrections, angry signals. Updates reliability scores per category using Bayesian statistics.
RAG
Hybrid semantic search over 100 curated lessons. Converts text to 1024-dim vectors with BAAI/bge-m3, combines vector + BM25 keyword search, then cross-encoder reranks results.
MCP Server
On-demand knowledge retrieval via 3 tools: rag_search, rag_dashboard, and rag_feedback. Claude calls them mid-conversation when relevant. Zero latency on simple prompts.

The Learning Loop

End-to-End Learning Cycle
flowchart LR A["You give feedback\nor commit code"] --> B["System stores &\nupdates scores"] B --> C["Knowledge Base\n100 curated lessons"] C --> D["Claude searches\nwhen relevant"] D --> E["Better responses"] E -.->|"Cycle repeats"| A style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style B fill:#131a27,stroke:#fb923c,color:#e6edf3 style C fill:#131a27,stroke:#f87171,color:#e6edf3 style D fill:#131a27,stroke:#34d399,color:#e6edf3 style E fill:#131a27,stroke:#a78bfa,color:#e6edf3

Step-by-Step Flow

StepWhat happens
1. CaptureYou give feedback (thumbs up/down) or the system auto-detects anti-patterns in commits and PR reviews
2. ScoreThompson Sampling updates reliability scores (0-100%) for the relevant category
3. Clean & DeduplicateCleanup pipeline filters junk (short text, "LGTM", "done"), normalizes text, and deduplicates by MD5 hash
4. VectorizeClean corrections are embedded into 1024-dim vectors with BAAI/bge-m3, stored in LanceDB with category metadata
5. Session StartWeak categories (<70%) get lessons injected automatically into Claude's context
6. MCP On-DemandDuring conversations, Claude calls rag_search with hybrid search (vector + BM25) and cross-encoder reranking
7. Log EvidenceEvery tool call is logged to usage_log.jsonl with timestamp, query, decision funnel, returned lessons, and latency
8. Health CheckRun smartassist health anytime to verify DB, data quality, scores, sync status, and usage evidence

Portable Design

Code (installed once globally)

  • pip install -e ~/Github/SmartAssist
  • Makes smartassist CLI + python3 -m smartassist.* available
  • 29 source files, 59 automated tests
  • Pushed to github.com/jnrahme/SmartAssist

Data (per-project)

  • smartassist init creates .claude/smartassist/
  • Feedback logs, reliability scores, LanceDB vectors
  • Automatically detected by walking up from cwd
  • Gitignored — never committed to project repos

How to Use It

Just talk naturally. Say "thumbs up for git", "thumbs down testing", "correction: use semantic colors", or "angry feedback - wrong file modified". The system captures everything automatically.

Quick Start on a New Project

cd ~/your-project smartassist init # Creates .claude/smartassist/{data,lancedb}/ # That's it! MCP server and hooks auto-detect the data directory.
Section 02
Architecture Diagrams

Code vs Data Separation

SmartAssist separates code (installed once) from data (per-project). The config.py module resolves data paths automatically.

Portable Architecture
flowchart TD subgraph CODE["Code — ~/Github/SmartAssist/ (pip install -e)"] direction LR CFG["config.py\nPath resolution"] MCP["mcp_server.py\n3 MCP tools"] HOOKS["hooks/\n7 lifecycle hooks"] TOOLS["tools/\n5 utility modules"] CLI["cli.py\n9 subcommands"] end subgraph DATA1["Project A — .claude/smartassist/"] direction LR D1A["data/\nfeedback, scores"] D1B["lancedb/\nvector DB"] end subgraph DATA2["Project B — .claude/smartassist/"] direction LR D2A["data/\nfeedback, scores"] D2B["lancedb/\nvector DB"] end CFG -->|"auto-detect\nfrom cwd"| DATA1 CFG -->|"auto-detect\nfrom cwd"| DATA2 style CODE fill:#162032,stroke:#38bdf8,color:#e6edf3 style DATA1 fill:#162032,stroke:#34d399,color:#e6edf3 style DATA2 fill:#162032,stroke:#fb923c,color:#e6edf3

How Lessons Get In

Three automatic sources feed the knowledge base. You rarely need to do anything manually.

Feedback Sources → Storage
flowchart TD subgraph SRC["Feedback Sources"] A["Manual Feedback\nYou say: thumbs down"] B["Commit Hook\nScans git diffs"] C["PR Harvester\nGitHub review comments"] end subgraph STORE["Storage Layer — .claude/smartassist/"] D[("data/feedback_log.jsonl\n1,991 events")] E[("data/reliability_scores.json")] F[("lancedb/\n100 curated lessons")] end A --> D B --> D C --> D D --> E D --> F style SRC fill:#162032,stroke:#22d3ee,color:#e6edf3 style STORE fill:#162032,stroke:#f87171,color:#e6edf3

How Lessons Come Out

Two separate channels deliver knowledge to Claude, each triggered differently.

Dual Delivery Channels
flowchart TD subgraph CH1["Channel 1: Automatic — Session Start"] direction LR A1["New session"] --> A2["Load scores"] A2 --> A3{"Below 70%?"} A3 -->|"Yes"| A4["Inject warnings"] A3 -->|"No"| A5["Skip"] end subgraph CH2["Channel 2: On-Demand — MCP Server"] direction LR B1["You ask a question"] --> B2{"Relates to\nknowledge?"} B2 -->|"Yes"| B3["rag_search"] B3 --> B4["Return lessons"] B2 -->|"No"| B5["Answer normally"] end A4 --> CLAUDE["Claude Code"] B4 --> CLAUDE style CH1 fill:#162032,stroke:#fb923c,color:#e6edf3 style CH2 fill:#162032,stroke:#34d399,color:#e6edf3 style CLAUDE fill:#131a27,stroke:#38bdf8,color:#e6edf3

MCP Search Pipeline

What happens inside the MCP server when Claude calls rag_search. The pipeline uses hybrid search (vector + BM25), distance filtering, cross-encoder reranking, and query enhancement.

Hybrid Search Pipeline with Cross-Encoder Reranking
flowchart LR A["Your question"] --> QE["Query Enhancement\nAdd correction prefix"] QE --> B["Embed into\n1024-dim vector"] B --> C["Hybrid Search\nVector + BM25"] C --> D{"Distance ≤ 1.30?"} D -->|"Relevant"| E["Cross-encoder\nrerank top results"] D -->|"Irrelevant"| F["Filtered out"] E --> G["Return to Claude\nwith relevance %"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style QE fill:#131a27,stroke:#22d3ee,color:#e6edf3 style B fill:#131a27,stroke:#a78bfa,color:#e6edf3 style C fill:#131a27,stroke:#a78bfa,color:#e6edf3 style D fill:#131a27,stroke:#fb923c,color:#e6edf3 style E fill:#131a27,stroke:#f472b6,color:#e6edf3 style F fill:#131a27,stroke:#f87171,color:#e6edf3 style G fill:#131a27,stroke:#34d399,color:#e6edf3

Query Enhancement

Documents are stored as structured correction text (e.g., "[code_edit] Use semantic colors..."). Raw user questions live in a different semantic space. Prefixing queries with "Correction for this project: " bridges this gap.

QueryRaw distanceEnhanced distanceImprovement
style components1.1400.810-29%
unit tests1.2400.932-25%
git commit0.7520.695-8%
best practices1.498 (filtered!)1.075Now works!
quantum physics1.6091.349Still filtered

Reliability Scoring

How Thompson Sampling decides whether a category is strong or weak.

Thompson Sampling Flow
flowchart LR A["Feedback event"] --> B["Identify category"] B --> C["Update alpha/beta\nBayesian formula"] C --> D{"Score ≥ 70%?"} D -->|"Yes"| E["RELIABLE\nStop auto-injecting"] D -->|"No"| F["WEAK\nKeep injecting lessons"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style B fill:#131a27,stroke:#fb923c,color:#e6edf3 style C fill:#131a27,stroke:#a78bfa,color:#e6edf3 style D fill:#131a27,stroke:#fbbf24,color:#e6edf3 style E fill:#131a27,stroke:#34d399,color:#e6edf3 style F fill:#131a27,stroke:#f87171,color:#e6edf3

File Structure

~/Github/SmartAssist/ # Code (installed once via pip) ├── pyproject.toml # Package config, CLI entry point ├── smartassist/ │ ├── __init__.py │ ├── config.py # Path resolution + embedding config (keystone) │ ├── cli.py # `smartassist` CLI (9 subcommands) │ ├── mcp_server.py # MCP server (3 tools: search, dashboard, feedback) │ ├── thompson_sampling.py # Beta-Bernoulli model with 30-day decay │ ├── feedback_system.py # FeedbackCapture + JSONL storage │ ├── context_injection.py # Lesson formatting + injection │ ├── lesson_feedback.py # Per-lesson boost/demote/block scoring │ ├── hooks/ │ │ ├── session_start.py # SessionStart: inject weak-category lessons │ │ ├── session_end.py # SessionEnd: save analytics │ │ ├── vectorize_learnings.py # Auto-vectorize new lessons │ │ ├── prompt_inject.py # UserPromptSubmit: context injection │ │ ├── commit_hook.py # PreToolUse(Bash): scan diffs for anti-patterns │ │ ├── show_lessons.py # PostToolUse: display search results │ │ └── seed_from_claudemd.py # Seed lessons from CLAUDE.md │ └── tools/ │ ├── cleanup_and_vectorize.py # Data cleanup, dedup, DB rebuild │ ├── maintenance.py # Staleness check, LanceDB compaction │ ├── health_check.py # 6-check system health dashboard │ ├── analyze_usage.py # Usage analytics (hit rate, latency, trends) │ └── generate_dashboard.py # HTML dashboard generator └── tests/ ├── conftest.py # Shared fixtures (tmp data dirs) ├── test_config.py # 5 tests — path resolution ├── test_cleanup.py # 46 tests — cleanup filtering logic └── test_thompson_sampling.py # 7 tests — Thompson Sampling model <any-project>/.claude/smartassist/ # Data (per-project, auto-detected) ├── data/ │ ├── feedback_log.jsonl # Raw feedback events │ ├── reliability_scores.json # Thompson Sampling scores per category │ ├── curated_lessons.json # 100 curated lessons │ ├── usage_log.jsonl # 20,070+ evidence entries │ ├── vectorization_log.json # Sync state tracker │ ├── session_log.jsonl # Session analytics │ └── lessons_learned/ # 1,982 markdown files └── lancedb/ # 100 vector documents (1024-dim)
Section 03
MCP Server
FastMCP stdio transport BAAI/bge-m3 LanceDB Cross-Encoder Reranking Hybrid Search
The core innovation: Claude reads the tool description and decides when to search — only when your question relates to stored knowledge. Simple prompts like "yes" or "ok" skip it entirely. Now registered as a global smartassist serve command — no project-specific MCP config needed.

MCP Configuration

# ~/.claude/mcp.json — works for ALL projects automatically { "mcpServers": { "smartassist": { "command": "smartassist", "args": ["serve"] } } }

Tools Exposed

rag_search(query, top_k, category)

Hybrid semantic search (vector + BM25) across 100 curated lessons. Embeds query into 1024-dim vector with BAAI/bge-m3, searches LanceDB, filters by distance threshold (1.30), cross-encoder reranks results, returns formatted lessons with relevance %. Every call logged with full decision funnel.

rag_dashboard()

Returns Thompson Sampling reliability scores per category, identifies weak areas (<70%), and shows feedback event statistics. Also logged for usage evidence.

rag_feedback(helpful, category, notes)

Records whether the last suggestion was helpful. Updates Thompson Sampling scores directly from the MCP tool. Allows Claude to capture feedback in real-time during conversations.

Search Example

Real search flow with hybrid search + cross-encoder reranking
sequenceDiagram actor You participant Claude as Claude Code participant MCP as SmartAssist MCP participant DB as LanceDB You->>Claude: How should I style this component? Note over Claude: This relates to
project conventions... Claude->>MCP: rag_search("style components") MCP->>MCP: Enhance query + embed (1024-dim) MCP->>DB: Hybrid search (vector + BM25) DB-->>MCP: 20 raw candidates Note over MCP: Distance filter: ≤ 1.30 MCP->>MCP: Cross-encoder rerank top results MCP-->>Claude: Lesson: Use semantic colors
relevance: 87% Note over MCP: Log to usage_log.jsonl Claude-->>You: Uses the lesson in response

Why MCP Over Alternatives

ApproachProblemMCP advantage
UserPromptSubmitFires on EVERY prompt — 2-3s latency, ~500 tokens noise each timeMCP only fires when Claude decides it would help
SessionStart onlyGeneric lessons once at start. Can't search mid-sessionMCP enables on-demand search any time
Static CLAUDE.md~2000 tokens loaded every session. No semantic matchingMCP retrieves only what's relevant

Key Design Decisions

Lazy-Loaded Singletons
Embedding model (BAAI/bge-m3) + cross-encoder load only on first tool call. Subsequent calls reuse cached instances. Zero startup overhead.
Distance Threshold
MAX_DISTANCE = 1.30 filters irrelevant results. "yes" or "ok" returns nothing. Only relevant lessons surface.
Cross-Encoder Reranking
ms-marco-MiniLM-L-6-v2 reranks the top 20 candidates for precision. Catches semantic matches that pure vector search might miss-rank.

Test Coverage

59 tests across 9 classes, all passing in 0.09s.

Test Suite (tests/)

FileTestsCoverage
test_cleanup.py46Normalization, skip patterns, dedup keys, clean correction text, format text, all filter functions, non-imperative filter, sanitize to lesson
test_thompson_sampling.py7Initial reliability, record success/failure, weak categories, all reliabilities, persistence
test_config.py5Path resolution via env var, storage path, db path, directory creation
conftest.pyShared set_data_dir fixture with SMARTASSIST_DATA_DIR monkeypatch + tmp directory

Hooks Configuration

# ~/.claude/settings.json — all hooks use python3 -m pattern { "hooks": { "UserPromptSubmit": [{"command": "python3 -m smartassist.hooks.prompt_inject"}], "SessionStart": [{"command": "python3 -m smartassist.hooks.session_start"}], "PreToolUse(Bash)": [{"command": "python3 -m smartassist.hooks.commit_hook"}], "PostToolUse": [{"command": "python3 -m smartassist.hooks.show_lessons"}], "SessionEnd": [{"command": "python3 -m smartassist.hooks.session_end"}] } }
Section 04
Live Metrics
1,991
Raw Feedback Events
100
Curated Vector Docs
6
Categories
59
Tests Passing
20,070
Tool Calls Logged
29
Source Files

Reliability by Category

Scores were reset to baseline (50%) during the v4.0 migration to SmartAssist. They will rebuild naturally as you use the system and provide feedback.

Architecture
50.0%
PR Review
50.0%
Testing
50.0%
Code Editing
50.0%
Git Operations
50.0%
Security
50.0%

Feedback Breakdown

By Signal

Corrections1,954 (98.1%)
Thumbs Up35 (1.8%)
Happy1
Sad1

By Category

PR Review594 (29.8%)
Code Editing531 (26.7%)
Testing404 (20.3%)
Architecture374 (18.8%)
Git74 (3.7%)
Security14 (0.7%)
Key observation: 98% of feedback comes from the PR Comment Harvester (automated). The primary source is GitHub review comments auto-converted into lessons.

Threshold System

Once a category hits 70%, automatic session-start injection stops. Lessons stay searchable via MCP.

ReliabilityStatusAuto-inject?
< 30%CRITICALYes (priority)
30-50%NEEDS WORKYes
50-70%IMPROVINGYes
≥ 70%RELIABLENo (mastered)
Section 05
Technical Deep Dive

Session Lifecycle

Full session from start to finish
flowchart TD A["Session starts"] --> B["SessionStart hook\n(63ms)"] B --> C["Load reliability scores"] C --> D{"Any category\nbelow 70%?"} D -->|"Yes"| E["Inject lessons for\nweak categories"] D -->|"No"| F["No injection"] E --> G["Working with Claude"] F --> G G --> H{"Question relates to\nstored knowledge?"} H -->|"Yes"| I["Claude calls rag_search"] I --> J["Hybrid search + rerank\nReturn relevant lessons"] J --> K["PostToolUse hook\nshows lessons to user"] K --> G H -->|"No"| G G --> L{"Git commit?"} L -->|"Yes"| M["PreToolUse hook\nscans diff for anti-patterns"] M --> G L -->|"No"| G G --> N["Session ends"] N --> O["SessionEnd hook\nsave analytics"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style G fill:#131a27,stroke:#38bdf8,color:#e6edf3 style I fill:#131a27,stroke:#34d399,color:#e6edf3 style M fill:#131a27,stroke:#fb923c,color:#e6edf3 style N fill:#131a27,stroke:#a78bfa,color:#e6edf3

1. Feedback Capture

SignalDetectionWeight
Thumbs Upthumbs up, good job, correct+5
Thumbs Downthumbs down, wrong, incorrect-4
Correctioncorrection:, should be, use instead-4 + text
Angryangry, terrible, broke-5

2. Thompson Sampling

Core Formula

Reliability = α / (α + β)

α = successes + 1  •  β = failures + 1  •  Prior: α=1, β=1 (50%)

Exponential Decay (30-day half-life)

Recent feedback matters more. A correction from yesterday carries more weight than one from 3 months ago.

Time agoWeight
Today100%
15 days70.7%
30 days50.0%
60 days25.0%
90 days12.5%

3. Vector Database & Search

PropertyValue
Embedding ModelBAAI/bge-m3 (1024 dimensions, 8K context window)
Rerankercross-encoder/ms-marco-MiniLM-L-6-v2 (precision reranking)
DatabaseLanceDB (Apache Arrow format)
Search ModeHybrid: Vector cosine + BM25 keyword (LinearCombinationReranker, weight=0.7)
Distance ThresholdMAX_DISTANCE = 1.30 (with query enhancement prefix)
Rerank PoolTop 20 candidates reranked, then top_k returned
Avg Search Latency838ms (includes embedding + hybrid search + reranking)
Storage~3KB per event

4. Path Resolution (config.py)

The architectural keystone. Every module imports from smartassist.config.

Resolution OrderDescription
1. SMARTASSIST_DATA_DIREnvironment variable (highest priority — used by tests and explicit config)
2. Walk up from cwdFind .claude/smartassist/ in current or parent directories. Claude Code sets cwd to project root automatically.
3. RuntimeErrorHelpful message: "Run smartassist init in your project root"

5. Anti-Patterns Auto-Detected

PatternCorrect approach
console.log statementsRemove before committing
Hardcoded colors (#404040)Use theme color tokens from your design system
toMatchSnapshotUse toBeVisible() behavior tests
Direct analytics() callsUse centralized utility

6. Data Cleanup Pipeline

smartassist/tools/cleanup_and_vectorize.py processes raw feedback into high-quality vector documents.

StepWhat it does
Filter short textRemove corrections < 30 characters (e.g., "ok", "fixed")
Skip patternsReject "done", "LGTM", "addressed", "nit:", "good catch", conversational noise, why-questions, narratives, defensive explanations, etc.
Sanitize to lessonStrip hedged suggestions ("I think we should..."), "please", "yeah but", GitHub URLs. Capitalize and convert to imperative form.
Normalize & dedupLowercase, strip punctuation, MD5 hash of first 200 chars
Format text[category] Lesson text format with optional context
Embed & store1024-dim BAAI/bge-m3 vectors with category metadata into LanceDB
Result: 1,991 raw events → 100 curated lessons in LanceDB. Extensive 20+ filter functions ensure only actionable, imperative lessons make it through.

7. Performance

63ms
Session Start Hook
838ms
Avg Search Latency
0.09s
Full Test Suite
Section 06
Operations & Health
CLI Auto-Vectorize Cleanup Pipeline Health Check Dashboard

SmartAssist CLI

The smartassist command is available globally after pip install -e. It provides 9 subcommands for managing the system.

smartassist --help Usage: smartassist <command> Commands: init Create .claude/smartassist/ in current project serve Start MCP server (stdio transport) health Run 6-check system health dashboard migrate Copy data from old rag-setup location vectorize Re-vectorize all lessons maintenance Staleness check + LanceDB compaction analyze Usage analytics (hit rate, latency, trends) dashboard Generate HTML dashboard seed Seed lessons from CLAUDE.md

System Health Check

Run smartassist health from any project with SmartAssist initialized.

Health Check Pipeline
flowchart LR A["smartassist health"] --> B["Database\n100 docs"] A --> C["Feedback\nData Quality"] A --> D["Reliability\nScores"] A --> E["Usage\nEvidence"] A --> F["Vectorization\nSync"] A --> G["MCP\nRegistration"] B --> H["SUMMARY\n6/6 passed"] C --> H D --> H E --> H F --> H G --> H style A fill:#131a27,stroke:#fbbf24,color:#e6edf3 style H fill:#131a27,stroke:#34d399,color:#e6edf3
CheckWhat it verifiesStatus
Vector DatabaseLanceDB has documents, categories are specific (not "general"), text has proper formatPASS
Feedback Datafeedback_log.jsonl exists, events counted, signal/category distribution analyzedPASS
Reliability ScoresThompson Sampling scores exist for all categoriesPASS
Usage Evidenceusage_log.jsonl exists, tool calls logged, search hit rate, average latencyPASS
Vectorization SyncDB is in sync with feedback log (no new unvectorized events)PASS
MCP Registrationsmartassist server is registered in ~/.claude/mcp.jsonPASS

Auto-Vectorization Hook

smartassist.hooks.vectorize_learnings automatically vectorizes new lessons whenever feedback is added. No manual intervention needed.

Auto-Vectorization Flow
flowchart LR A["New feedback\nadded"] --> B["Read vectorization_log.json\n(last processed count)"] B --> C["Get new events\nsince last run"] C --> D{"Worth vectorizing?"} D -->|"Yes: > 30 chars\nnot skip pattern"| E["Format text blob\nwith category prefix"] D -->|"No: junk"| F["Skip + update count"] E --> G["Embed with\nBAAI/bge-m3"] G --> H["Add to LanceDB\nwith category metadata"] H --> I["Update\nvectorization_log.json"] style A fill:#131a27,stroke:#38bdf8,color:#e6edf3 style D fill:#131a27,stroke:#fbbf24,color:#e6edf3 style E fill:#131a27,stroke:#34d399,color:#e6edf3 style H fill:#131a27,stroke:#a78bfa,color:#e6edf3

What Gets Vectorized

  • Corrections ≥ 30 characters
  • Not matching 20+ skip/filter patterns
  • Sanitized to imperative lesson form
  • Proper category stored in metadata (not "general")

What Gets Filtered

  • Short responses: "ok", "fix", "yes"
  • Status updates: "Done - fixed in PR #123"
  • Acknowledgements: "LGTM", "good catch", "thanks"
  • Conversational noise, why-questions, narratives, defensive explanations, observations, scope discussions

Quick Commands

# Health check smartassist health # Full cleanup and rebuild vectors smartassist vectorize # Staleness check + LanceDB compaction smartassist maintenance # Usage analytics smartassist analyze # Generate HTML dashboard smartassist dashboard --output ~/Desktop/dashboard.html # Initialize SmartAssist in a new project cd ~/your-new-project smartassist init # Migrate data from old rag-setup smartassist migrate ~/old-project/rag-setup
Section 07
Usage Evidence
Proof Usage Logs Search Quality Latency
Every single tool call is logged with full context. When Claude calls rag_search, an enriched evidence entry is written to usage_log.jsonl with timestamp, query text, results count, latency, the decision funnel (candidates fetched, distance-filtered, category-filtered), the enhanced query, and the actual lessons returned with relevance scores.

How Evidence Is Captured

Evidence Logging Flow — Full Search Story
sequenceDiagram actor User participant Claude as Claude Code participant MCP as SmartAssist MCP participant DB as LanceDB participant Log as usage_log.jsonl User->>Claude: How should I style this? Claude->>MCP: rag_search("style components") Note over MCP: Start timer MCP->>MCP: Enhance query + embed (1024-dim) MCP->>DB: Hybrid search (vector + BM25, limit 20) DB-->>MCP: 20 raw candidates Note over MCP: Distance filter: 4 too distant Note over MCP: Cross-encoder rerank remaining Note over MCP: Return top 5 of 16 remaining Note over MCP: Stop timer: 838ms MCP->>Log: Enriched entry with funnel + lessons MCP-->>Claude: Formatted lessons Claude-->>User: Use semantic colors...

Evidence Log Format (Enriched)

{ "timestamp": "2026-02-27T08:04:30.223791", "tool": "rag_search", "query": "how to run unit tests in this project", "results_count": 5, "latency_ms": 10499.5, "lessons": [ {"category": "testing", "relevance_pct": 52, "lesson_text": "Use it.each for parameterized test cases..."}, {"category": "architecture", "relevance_pct": 56, "lesson_text": "Use protocol-based dependency injection for testability..."}, {"category": "testing", "relevance_pct": 53, "lesson_text": "Use XCTUnwrap instead of force-unwrapping optionals in tests..."} ], "search_meta": { "raw_count": 20, "distance_filtered": 0, "category_filtered": 0, "category_filter_used": null, "enhanced_query": "Correction for this project: how to run unit tests in this project" } }

Dashboard and error-path entries omit lessons and search_meta (backward compatible with the original 5-field format).

Current Evidence Snapshot

20,070
Tool Calls Logged
54%
Search Hit Rate
838ms
Avg Search Latency
804ms
Median Latency

By Tool

rag_search14,742 calls (73.4%)
rag_dashboard5,328 calls (26.6%)

Search Quality

Returned results7,957 searches (54%) — relevant lessons found
Correctly filtered6,785 searches (46%) — irrelevant queries properly return nothing

Latency Distribution

PercentileLatency
Average838ms
Median (P50)804ms
P952,344ms
Min0ms (cached)
Max64,052ms (cold start + model load)

Search Story: Full Decision Funnel

Each search logs the complete story — from raw candidates through hybrid search + cross-encoder reranking to returned lessons:

Query: "how to style components with theme colors"

20 fetched 4 too distant Cross-encoder rerank 3 returned
72%[code_edit]Use semantic color tokens from the design system, never hardcode hex values
65%[code_edit]Import color tokens from the theme module for consistent styling across components
58%[code_edit]Avoid hardcoded color values in style definitions; use theme-provided constants

Query: "how to write unit tests for React Native components"

20 fetched 3 too distant Cross-encoder rerank 5 returned

Lessons about using XCTUnwrap instead of force-unwrapping, protocol-based dependency injection for ViewModels, and mock placement best practices.

Query: "quantum physics dark matter theory"

20 fetched 20 too distant 0 returned

0 results — Correctly filtered as irrelevant. Distance threshold (1.30) prevents noise.

How to Verify

Three ways to prove the system is actively working:

MethodHow
CLI Health CheckRun smartassist health from your project — see all 6 subsystem checks with usage evidence, funnel stats, and returned lessons
Usage LogRead .claude/smartassist/data/usage_log.jsonl — 20,070+ timestamped entries with query, decision funnel, and returned lessons
Test SuiteRun python -m pytest tests/ -v from SmartAssist repo — 59 tests verify cleanup, path resolution, Thompson Sampling, and all filter functions
The evidence is irrefutable. Over 20,000 tool calls logged with full search stories. Every search Claude makes creates a permanent, timestamped entry with the complete decision funnel: what was queried, the enhanced query, how many candidates were fetched via hybrid search, what was filtered by distance, cross-encoder reranking results, and the exact lessons returned with relevance scores. The health check validates all 6 subsystems. The test suite proves data quality, path resolution, and all 20+ filter functions.
Section 08
Comparisons

vs Claude Built-In Memory

FeatureSmartAssistClaude memory
LearningActive RLHF — explicit feedbackPassive observation
TrackingThompson Sampling — exact scoresNo visibility
PriorityFocuses on weak areas (<70%)All info equal
Search1024-dim hybrid + cross-encoder rerankingHas embeddings, no feedback loop
Time decay30-day half-lifeOld info never fades
PortabilityWorks on any project via smartassist initTied to Claude account
Privacy100% localCloud-based
CostZero$$$ per token

vs CLAUDE.md

AspectSmartAssistCLAUDE.md
Update speedInstant ("thumbs down")10-30 min (edit, PR, merge)
VerificationMeasurable scoresNo way to know if Claude learned
Context usage~200 tokens (relevant only)~2000 tokens (entire file)
MaintenanceSelf-maintainingManual editing
Team standardsPersonal onlyShared across team
OnboardingStarts from scratchImmediate access
Best approach: use both. CLAUDE.md provides team-wide standards (Layer 1). SmartAssist provides personal learning and verification (Layer 2). They're teammates, not competitors.

vs Generic RAG

FeatureSmartAssist (RLHF+RAG)Generic RAG
LearningRetrieve → Generate → Get Feedback → ImproveRetrieve → Generate (no learning)
QualityEvery lesson scored by Thompson SamplingAll documents equal weight
SearchHybrid (vector + BM25) + cross-encoder rerankingPure vector search
PersonalizationAdapts to YOUR workflowSame for everyone
Portabilitypip install + smartassist init on any projectUsually hardcoded to one project

SmartAssist vs Claude Code Skills

Skills (like react-native-best-practices from Callstack) and SmartAssist both inject knowledge into Claude — but they work in fundamentally different ways, serve different purposes, and complement each other.

TL;DR: Skills teach Claude generic domain knowledge ("how to optimize React Native FPS"). SmartAssist teaches Claude project-specific lessons ("in our codebase, always use weak references for delegate patterns to prevent retain cycles"). Skills are the textbook. SmartAssist is the field notes.

What Are Skills?

Claude Code Skills are structured markdown instruction sets published by third-party developers and installed as plugins. They follow the agentskills.io specification.

Markdown Files
Skills are SKILL.md files with YAML frontmatter. They describe when to activate and provide step-by-step workflows Claude should follow.
Progressive Disclosure
Only skill names + descriptions load at startup (~50 tokens each). Full instructions load only when Claude decides the skill matches your task.
Off-the-Shelf
Written by experts (e.g., Callstack for React Native). Generic best practices applicable to any project using that technology.

How Skills Activate vs How SmartAssist Activates

Activation Flow Comparison
flowchart TD subgraph SKILL["Skills — Description-Based Matching"] direction LR S1["You ask about\nReact Native FPS"] --> S2["Claude reads\nskill descriptions"] S2 --> S3{"Description\nmatches?"} S3 -->|"Yes"| S4["Load full\nSKILL.md"] S4 --> S5["Follow\ninstructions"] S3 -->|"No"| S6["Skip skill"] end subgraph RAG["SmartAssist — Hybrid Vector Search"] direction LR R1["You ask about\nstyling components"] --> R2["Claude calls\nrag_search MCP"] R2 --> R3["Embed query to\n1024-dim vector"] R3 --> R4["Hybrid search\nLanceDB"] R4 --> R5["Cross-encoder\nrerank top results"] R5 --> R6["Return lessons\nwith relevance %"] end style SKILL fill:#162032,stroke:#f472b6,color:#e6edf3 style RAG fill:#162032,stroke:#34d399,color:#e6edf3

The Core Comparison

DimensionSmartAssistClaude Code Skills
What it is Portable pip-installed package: MCP server + vector DB + feedback loop + 5 hooks + CLI Markdown instruction files with YAML metadata
Knowledge type Project-specific — lessons from real PR reviews, commits, and team feedback. Works on any codebase. Generic domain — React Native best practices applicable to any RN app
How it activates Claude explicitly calls rag_search MCP tool when it decides to search Claude automatically loads SKILL.md when task description matches
Search method Hybrid vector search (1024-dim BAAI/bge-m3 + BM25 keyword) + cross-encoder reranking (ms-marco-MiniLM-L-6-v2) String matching on skill description text
Learning Active — Thompson Sampling tracks reliability, feedback loop improves over time Static — only updates when plugin author publishes new version
Feedback loop Yes — thumbs up/down, corrections, commit analysis, reliability decay, rag_feedback MCP tool None — no way to tell the skill "that advice was wrong"
Relevance scoring Continuous 0-100% relevance with distance threshold filtering + cross-encoder precision Binary — either the skill matches or it doesn't
Specificity "Use weak references for delegate patterns to prevent retain cycles" "Use FlashList instead of FlatList for better performance"
Update cycle Instant — feedback → vectorize → searchable in seconds Slow — plugin author commits, you pull updates
Observability Full — 20,070+ usage log entries, decision funnels, relevance scores, health checks, dashboard Minimal — no visibility into what skill was loaded or if it helped
Context cost ~100-300 tokens per search result (only relevant lessons) ~30-50 tokens metadata; ~500-5000 when fully loaded
Infrastructure pip install -e, LanceDB, BAAI/bge-m3 + cross-encoder models, MCP server process Zero — just markdown files on disk
Portability smartassist init on any project — one global install, per-project data Plugins available to any project by default
Maintenance Self-maintaining — hooks auto-capture, auto-vectorize, auto-decay Zero maintenance — read-only files
Who writes it Your team — curated from real PR review history Third-party experts (e.g., Callstack team)
Privacy 100% local — your lessons never leave your machine 100% local — markdown files cached locally

Same Question, Different Answers

When you ask "How should I optimize this list component?", each system provides a different layer of insight:

SmartAssist Says

"Lessons from your codebase:"

  • 85% [code_edit] Use Shopify FlashList with estimatedItemSize for all long lists
  • 72% [architecture] Wrap list item components with React.memo and extract stable keyExtractor functions
  • 58% [testing] When testing FlashList components, mock @shopify/flash-list before imports
Precision: Knows your exact imports, your theme system, your test patterns

Skill Says

"Generic React Native guidance:"

  • Use FlashList over FlatList — 5x faster, better memory
  • Set estimatedItemSize for optimal recycling
  • Avoid inline functions in renderItem — extract to named component
  • Profile with Flipper's Performance monitor
Breadth: Covers patterns, profiling tools, and alternatives across all RN apps

Architecture: How They Integrate

Skills + SmartAssist = Two Knowledge Layers in Claude Code
flowchart TD USER["You ask a question"] --> CLAUDE["Claude Code"] subgraph LAYER1["Layer 1 — Skills (Generic Knowledge)"] SK1["react-native-best-practices"] SK2["github PR patterns"] SK3["upgrading-react-native"] end subgraph LAYER2["Layer 2 — SmartAssist (Project Knowledge)"] RAG["MCP: rag_search"] VDB[("LanceDB\n100 curated lessons")] TS["Thompson Sampling\nreliability scores"] end subgraph LAYER3["Layer 3 — CLAUDE.md (Team Standards)"] CMD["Architecture, testing\npractices, path aliases"] end CLAUDE -->|"Description match"| LAYER1 CLAUDE -->|"Explicit MCP call"| LAYER2 CLAUDE -->|"Always loaded"| LAYER3 RAG --> VDB VDB --> TS LAYER1 --> RESPONSE["Better response\nGeneric + Specific + Standards"] LAYER2 --> RESPONSE LAYER3 --> RESPONSE style LAYER1 fill:#162032,stroke:#f472b6,color:#e6edf3 style LAYER2 fill:#162032,stroke:#34d399,color:#e6edf3 style LAYER3 fill:#162032,stroke:#38bdf8,color:#e6edf3 style RESPONSE fill:#131a27,stroke:#fbbf24,color:#e6edf3 style CLAUDE fill:#131a27,stroke:#a78bfa,color:#e6edf3

When Each One Wins

SmartAssist Wins When...

  • Project-specific patterns — "How do we handle auth in this app?"
  • Team conventions — "What color tokens should I use?"
  • Past mistakes — "What went wrong last time we touched Redux slices?"
  • Testing patterns — "How do we mock Firebase in our test setup?"
  • Code review feedback — Lessons extracted from 1,991 real PR comments

Skills Win When...

  • Generic optimization — "How do I improve React Native FPS?"
  • New technology — "How to use the New Architecture?"
  • Framework upgrades — "Upgrade React Native from 0.76 to 0.77"
  • Profiling guidance — "How to find memory leaks in Hermes?"
  • Community best practices — Industry-standard patterns from experts

Technical Architecture Differences

ComponentSmartAssistClaude Code Skills
Storage LanceDB vector database (Apache Arrow format)
.claude/smartassist/data/curated_lessons.json
Markdown files in
~/.claude/plugins/cache/
Embedding BAAI/bge-m3 (1024-dim vectors, 8K context) None — plain text description matching
Reranking cross-encoder/ms-marco-MiniLM-L-6-v2 None
Transport MCP stdio protocol (smartassist serve) Direct filesystem read by Claude Code
Tools exposed rag_search, rag_dashboard, rag_feedback None — skills are instructions, not tools
Hooks 5 lifecycle hooks (SessionStart, SessionEnd, PreToolUse, PostToolUse, UserPromptSubmit) None — passive content
Testing 59 automated tests, health checks, search quality validation YAML frontmatter validation only
Logging Every call → usage_log.jsonl with decision funnel + returned lessons (20,070+ entries) No logging — invisible to user
Installation pip install -e ~/Github/SmartAssist — globally available Plugin toggle in settings

Loading Strategy: Progressive Disclosure vs Semantic Search

StageSkillsSmartAssist
Session start Load all skill names + descriptions (~50 tokens each). Always in context. SessionStart hook injects lessons for weak categories (<70% reliability). Runs in 63ms.
During work If task description matches a skill → load full SKILL.md (~500-5000 tokens) Claude calls rag_search → hybrid search + cross-encoder rerank → returns only relevant lessons (~100-300 tokens)
Deep dive Load reference files on demand (js-measure-fps.md, native-profiling.md, etc.) Cross-encoder reranking of top-20 candidates → precision filtering → relevance %
After use Nothing — no feedback loop PostToolUse hook shows lessons to user. Commit hook captures new learnings. rag_feedback records quality signals. Thompson Sampling updates.
The key insight: Skills are a knowledge delivery format (markdown files that teach Claude workflows). SmartAssist is a knowledge engine (portable MCP server + hybrid vector search + cross-encoder reranking + feedback loop + reliability scoring + CLI). Skills deliver static expertise. SmartAssist delivers living, evolving project intelligence. They're complementary — use both.

Our Three-Layer Knowledge Stack

LayerSystemWhat it providesExample
1. CLAUDE.md Static file Team-wide standards, path aliases, testing thresholds "Coverage thresholds: branches 79%, lines 89%"
2. Skills Markdown plugins Generic domain expertise from industry experts "Use Hermes profiling to find JS thread bottlenecks"
3. SmartAssist MCP + LanceDB + RLHF Project-specific lessons learned from real code reviews "Mock @react-native-firebase/analytics before imports in tests"

Each layer adds specificity. CLAUDE.md says what standards to follow. Skills say how to do things generically. SmartAssist says what we learned doing it in this exact codebase.