docs(search): add FTS5 benchmark results to performance comparison

Adds real SQLite benchmarks showing FTS5 is 15-33x faster for the
raw content query, though end-to-end improvement is masked by JS
pipeline overhead (scoring, snippets, path walking).
This commit is contained in:
perfectra1n
2026-03-20 12:00:16 -07:00
parent 8fd2cb39c1
commit 87fc4e1281

465
docs/search-performance-benchmarks.md vendored Normal file
View File

@@ -0,0 +1,465 @@
# Search Performance Benchmarks: `main` vs `feat/search-perf-take1`
> **Date:** 2026-03-20
> **Environment:** In-memory benchmarks (monkeypatched `getContent()`, no real SQLite I/O). Both branches tested on the same machine in the same session for fair comparison. All times are avg of 5 iterations with warm caches unless noted.
> **Benchmark source:** `apps/server/src/services/search/services/search_benchmark.spec.ts`
---
## Table of Contents
- [Single-Token Autocomplete](#single-token-autocomplete)
- [Multi-Token Autocomplete](#multi-token-autocomplete)
- [No-Match Queries (worst case)](#no-match-queries-worst-case)
- [Diacritics / Unicode](#diacritics--unicode)
- [Typing Progression (keystroke simulation)](#typing-progression-keystroke-simulation)
- [Long Queries (4 tokens)](#long-queries-4-tokens)
- [Attribute Matching](#attribute-matching)
- [Fuzzy Matching Effectiveness (typos & misspellings)](#fuzzy-matching-effectiveness-typos--misspellings)
- [Cache Warmth Impact (feature branch only)](#cache-warmth-impact-feature-branch-only)
- [Realistic User Session](#realistic-user-session)
- [Scale Comparison Summary](#scale-comparison-summary)
---
## Single-Token Autocomplete
The most common case — user typing in the search bar. Query: `"meeting"`.
### Autocomplete (fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 3.6ms | 2.8ms | **-22%** |
| 5,000 | 11.9ms | 10.6ms | **-11%** |
| 10,000 | 27.5ms | 22.8ms | **-17%** |
| 20,000 | 53.7ms | 46.2ms | **-14%** |
### Autocomplete (fuzzy ON)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 2.4ms | 2.3ms | -4% |
| 5,000 | 11.7ms | 10.7ms | **-9%** |
| 10,000 | 28.9ms | 21.6ms | **-25%** |
| 20,000 | 58.6ms | 44.5ms | **-24%** |
### Full Search (fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 2.7ms | 4.3ms | +59% |
| 5,000 | 14.3ms | 10.8ms | **-24%** |
| 10,000 | 30.8ms | 26.9ms | **-13%** |
| 20,000 | 63.1ms | 56.7ms | **-10%** |
### Full Search (fuzzy ON)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 2.5ms | 2.4ms | -4% |
| 5,000 | 13.0ms | 11.4ms | **-12%** |
| 10,000 | 29.8ms | 25.6ms | **-14%** |
| 20,000 | 63.4ms | 54.5ms | **-14%** |
---
## Multi-Token Autocomplete
### 2-Token: `"meeting notes"` (autocomplete, fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 3.7ms | 3.5ms | -5% |
| 5,000 | 19.0ms | 19.3ms | +2% |
| 10,000 | 40.2ms | 40.4ms | 0% |
| 20,000 | 86.1ms | 80.7ms | **-6%** |
### 3-Token: `"meeting notes january"` (autocomplete, fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 4.1ms | 4.3ms | +5% |
| 5,000 | 25.7ms | 24.9ms | -3% |
| 10,000 | 50.9ms | 50.5ms | -1% |
| 20,000 | 104.5ms | 107.2ms | +3% |
### 2-Token: `"meeting notes"` (full search, fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 3.4ms | 3.3ms | -3% |
| 5,000 | 22.3ms | 21.9ms | -2% |
| 10,000 | 42.9ms | 40.2ms | **-6%** |
| 20,000 | 95.8ms | 88.3ms | **-8%** |
### 3-Token: `"meeting notes january"` (full search, fuzzy ON)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 4.4ms | 4.3ms | -2% |
| 5,000 | 26.3ms | 25.5ms | -3% |
| 10,000 | 51.7ms | 52.6ms | +2% |
| 20,000 | 113.9ms | 114.0ms | 0% |
---
## No-Match Queries (worst case)
These are the worst case — every note must be scanned with no early exit.
### Single token: `"xyznonexistent"` (autocomplete)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 0.7ms | 0.5ms | **-29%** |
| 5,000 | 4.0ms | 3.4ms | **-15%** |
| 10,000 | 11.3ms | 7.0ms | **-38%** |
| 20,000 | 28.9ms | 19.0ms | **-34%** |
### Single token: `"xyznonexistent"` (autocomplete, fuzzy ON)
This is the biggest behavioral change. On `main`, autocomplete with fuzzy ON triggers the expensive two-phase search. On the feature branch, autocomplete **always skips** the fuzzy fallback phase.
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 1.7ms | 0.5ms | **-71%** |
| 5,000 | 12.8ms | 2.3ms | **-82%** |
| 10,000 | 26.4ms | 6.0ms | **-77%** |
| 20,000 | 60.4ms | 20.0ms | **-67%** |
### Multi token: `"xyzfoo xyzbar"` (autocomplete, fuzzy ON)
Same effect — autocomplete no longer triggers the fuzzy fallback:
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 6.5ms | 0.4ms | **-94%** |
| 5,000 | 33.9ms | 2.5ms | **-93%** |
| 10,000 | 134.5ms | 6.0ms | **-96%** |
| 20,000 | 151.8ms | 19.8ms | **-87%** |
### Multi token: `"xyzfoo xyzbar"` (full search, fuzzy ON)
Full search still does two-phase fuzzy on both branches, so improvement here is from the flat text index and pre-normalized attributes:
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 5.9ms | 5.8ms | -2% |
| 5,000 | 35.0ms | 33.7ms | -4% |
| 10,000 | 144.0ms | 68.8ms | **-52%** |
| 20,000 | 165.5ms | 140.6ms | **-15%** |
---
## Diacritics / Unicode
Searching `"résumé"` (with diacritics) vs `"resume"` (ASCII equivalent). Both forms find the same results thanks to diacritic normalization.
### Autocomplete (fuzzy OFF)
| Notes | Query | main | feature | Change |
|------:|:------|-----:|--------:|-------:|
| 1,000 | `"résumé"` | 4.1ms | 2.4ms | **-41%** |
| 1,000 | `"resume"` | 2.9ms | 2.4ms | **-17%** |
| 5,000 | `"résumé"` | 20.4ms | 15.0ms | **-26%** |
| 5,000 | `"resume"` | 18.1ms | 16.3ms | **-10%** |
| 10,000 | `"résumé"` | 40.6ms | 29.0ms | **-29%** |
| 10,000 | `"resume"` | 40.6ms | 29.5ms | **-27%** |
---
## Typing Progression (keystroke simulation)
Simulates a user typing `"documentation"` character by character. Autocomplete, fuzzy OFF.
### 5,000 notes
| Prefix | main | feature | Change |
|:-------|-----:|--------:|-------:|
| `"d"` | 44.7ms | 35.9ms | **-20%** |
| `"do"` | 12.9ms | 11.6ms | **-10%** |
| `"doc"` | 12.0ms | 10.2ms | **-15%** |
| `"docu"` | 10.9ms | 9.4ms | **-14%** |
| `"document"` | 9.1ms | 7.3ms | **-20%** |
| `"documentation"` | 10.3ms | 8.1ms | **-21%** |
### 10,000 notes
| Prefix | main | feature | Change |
|:-------|-----:|--------:|-------:|
| `"d"` | 85.4ms | 70.1ms | **-18%** |
| `"do"` | 30.0ms | 24.1ms | **-20%** |
| `"doc"` | 28.3ms | 20.8ms | **-27%** |
| `"docu"` | 24.3ms | 20.1ms | **-17%** |
| `"document"` | 19.2ms | 15.9ms | **-17%** |
| `"documentation"` | 23.0ms | 16.8ms | **-27%** |
### 20,000 notes
| Prefix | main | feature | Change |
|:-------|-----:|--------:|-------:|
| `"d"` | 178.3ms | 142.8ms | **-20%** |
| `"do"` | 63.7ms | 50.6ms | **-21%** |
| `"doc"` | 59.1ms | 44.0ms | **-26%** |
| `"docu"` | 59.3ms | 40.6ms | **-32%** |
| `"document"` | 45.7ms | 34.1ms | **-25%** |
| `"documentation"` | 47.4ms | 33.7ms | **-29%** |
---
## Long Queries (4 tokens)
Query: `"quarterly budget review report"` — autocomplete, fuzzy OFF.
| Notes | Tokens | main | feature | Change |
|------:|-------:|-----:|--------:|-------:|
| 5,000 | 1 | 8.8ms | 6.5ms | **-26%** |
| 5,000 | 2 | 13.7ms | 11.0ms | **-20%** |
| 5,000 | 3 | 16.7ms | 15.1ms | **-10%** |
| 5,000 | 4 | 18.9ms | 22.3ms | +18% |
| 10,000 | 1 | 18.5ms | 15.6ms | **-16%** |
| 10,000 | 2 | 25.4ms | 24.9ms | -2% |
| 10,000 | 3 | 31.7ms | 33.3ms | +5% |
| 10,000 | 4 | 39.0ms | 40.7ms | +4% |
---
## Attribute Matching
Searching by label name (`"category"`) and label value (`"important"`). Notes have 5 labels each.
### `"category"` (autocomplete)
| Notes | main (fuzzy OFF) | feature (fuzzy OFF) | Change | main (fuzzy ON) | feature (fuzzy ON) | Change |
|------:|------------------:|--------------------:|-------:|----------------:|-------------------:|-------:|
| 5,000 | 12.0ms | 9.5ms | **-21%** | 34.4ms | 9.7ms | **-72%** |
| 10,000 | 26.7ms | 22.7ms | **-15%** | 77.5ms | 21.0ms | **-73%** |
### `"important"` (autocomplete)
| Notes | main (fuzzy OFF) | feature (fuzzy OFF) | Change | main (fuzzy ON) | feature (fuzzy ON) | Change |
|------:|------------------:|--------------------:|-------:|----------------:|-------------------:|-------:|
| 5,000 | 11.1ms | 9.2ms | **-17%** | 11.6ms | 8.8ms | **-24%** |
| 10,000 | 25.4ms | 18.7ms | **-26%** | 24.2ms | 19.4ms | **-20%** |
---
## Fuzzy Matching Effectiveness (typos & misspellings)
10K notes, keyword: `"performance"`. Shows both time and result quality.
| Query | Fuzzy | main (time) | feature (time) | Change | main (results) | feature (results) |
|:------|:------|------------:|---------------:|-------:|---------------:|------------------:|
| `"performance"` (exact) | OFF | 26.8ms | 22.3ms | **-17%** | 1,000 | 1,000 |
| `"performance"` (exact) | ON | 18.7ms | 16.3ms | **-13%** | 1,000 | 1,000 |
| `"performanc"` (truncated) | OFF | 18.6ms | 16.4ms | **-12%** | 1,000 | 1,000 |
| `"performanc"` (truncated) | ON | 18.5ms | 15.6ms | **-16%** | 1,000 | 1,000 |
| `"preformance"` (typo) | OFF | 10.6ms | 7.9ms | **-25%** | 0 | 0 |
| `"preformance"` (typo) | ON | 55.1ms | 43.4ms | **-21%** | 1,000 | 1,000 |
| `"performence"` (misspelling) | OFF | 11.5ms | 8.8ms | **-23%** | 0 | 0 |
| `"performence"` (misspelling) | ON | 56.2ms | 48.3ms | **-14%** | 1,000 | 1,000 |
| `"optimization"` | OFF | 12.6ms | 9.9ms | **-21%** | 0 | 0 |
| `"optimization"` | ON | 37.2ms | 31.6ms | **-15%** | 0 | 0 |
| `"optimzation"` (typo) | OFF | 11.6ms | 8.1ms | **-30%** | 0 | 0 |
| `"optimzation"` (typo) | ON | 44.5ms | 31.3ms | **-30%** | 0 | 0 |
| `"perf optim"` (abbreviated) | OFF | 16.5ms | 11.8ms | **-28%** | 0 | 0 |
| `"perf optim"` (abbreviated) | ON | 74.9ms | 67.2ms | **-10%** | 0 | 0 |
**Key insight:** Fuzzy matching is equally effective on both branches (same result counts). The feature branch is simply faster at executing it.
---
## Cache Warmth Impact (feature branch only)
This section only applies to the feature branch, which introduces a new flat text index cache in Becca. `main` does not have this cache.
| Scenario | Time |
|:---------|------:|
| Cold (first search, builds index + search) | 61.7ms |
| Warm (reuse existing index, avg of 5 runs) | 25.6ms (avg), 19.8ms (min) |
| Incremental (50 notes dirtied, then search) | 21.1ms |
| Full rebuild (index invalidated, then search) | 20.7ms |
The first search after startup pays a one-time index build cost (~2.4x). All subsequent searches reuse the cached index. When individual notes change, only their entries are recomputed.
---
## Realistic User Session
Simulates a typical user session at 10K notes with mixed query types and typos.
| Query | Mode | main | feature | Change |
|:------|:-----|-----:|--------:|-------:|
| `"pro"` | autocomplete | 26.9ms | 24.6ms | **-9%** |
| `"project"` | autocomplete | 28.3ms | 24.1ms | **-15%** |
| `"project plan"` | autocomplete | 35.6ms | 35.0ms | -2% |
| `"project"` | fullSearch | 32.8ms | 30.0ms | **-9%** |
| `"project planning"` | fullSearch | 37.2ms | 36.4ms | -2% |
| `"project planning"` | fullSearch+fuzzy | 36.5ms | 35.9ms | -2% |
| `"projct"` (typo) | autocomplete | 11.4ms | 6.0ms | **-47%** |
| `"projct"` (typo) | autocomplete+fuzzy | **81.2ms** | **6.7ms** | **-92%** |
| `"projct planing"` (typo) | fullSearch | 12.5ms | 8.8ms | **-30%** |
| `"projct planing"` (typo) | fullSearch+fuzzy | 116.6ms | 113.2ms | -3% |
| `"xyznonexistent"` | autocomplete | 11.4ms | 6.7ms | **-41%** |
| `"xyznonexistent foo"` | fullSearch+fuzzy | 37.4ms | 23.2ms | **-38%** |
| `"note"` (very common) | autocomplete | **106.0ms** | **92.3ms** | **-13%** |
| `"document"` | autocomplete | 24.7ms | 20.7ms | **-16%** |
**Biggest win:** `"projct"` autocomplete+fuzzy goes from 81.2ms to 6.7ms (**-92%**) because the feature branch skips the fuzzy fallback phase for autocomplete entirely.
---
## Scale Comparison Summary
Side-by-side comparison across all note counts for the most common query patterns.
### `"meeting"` autocomplete (fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 3.6ms | 2.3ms | **-36%** |
| 5,000 | 11.4ms | 12.2ms | +7% |
| 10,000 | 25.1ms | 22.9ms | **-9%** |
| 20,000 | 59.4ms | 52.3ms | **-12%** |
### `"meeting notes"` autocomplete (fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 4.0ms | 2.7ms | **-33%** |
| 5,000 | 15.9ms | 17.2ms | +8% |
| 10,000 | 36.1ms | 34.2ms | **-5%** |
| 20,000 | 71.0ms | 72.9ms | +3% |
### `"meeting"` fullSearch (fuzzy ON)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 2.5ms | 2.4ms | -4% |
| 5,000 | 12.1ms | 13.1ms | +8% |
| 10,000 | 27.8ms | 27.1ms | -3% |
| 20,000 | 67.2ms | 57.8ms | **-14%** |
### `"xyznonexistent"` autocomplete (fuzzy OFF)
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 1.3ms | 0.5ms | **-62%** |
| 5,000 | 3.1ms | 2.5ms | **-19%** |
| 10,000 | 7.7ms | 9.4ms | +22% |
| 20,000 | 22.4ms | 16.6ms | **-26%** |
### `"xyznonexistent"` fullSearch (fuzzy ON) — worst case path
| Notes | main | feature | Change |
|------:|-----:|--------:|-------:|
| 1,000 | 2.7ms | 2.5ms | -7% |
| 5,000 | 11.2ms | 9.7ms | **-13%** |
| 10,000 | 25.4ms | 30.3ms | +19% |
| 20,000 | 68.7ms | 55.2ms | **-20%** |
---
## Summary of Improvements
### Where the feature branch clearly wins (consistent 10-30% improvement):
- **Single-token autocomplete** at all scales (10-25% faster)
- **Diacritics queries** (26-41% faster at 10K notes)
- **Typing progression** (17-32% faster per keystroke at 20K notes)
- **Fuzzy typo searches** (14-30% faster while finding same results)
- **Broad term autocomplete** (e.g., `"note"` matching 8,500 results: 13% faster)
### Where the feature branch dramatically wins (50%+ improvement):
- **Autocomplete with fuzzy ON, no-match queries** (67-96% faster — fuzzy fallback skipped entirely)
- **Autocomplete typo queries** (e.g., `"projct"` + fuzzy: 81ms -> 7ms, **-92%**)
### Where performance is roughly equal (within noise):
- Multi-token queries at smaller scales (1-5K notes)
- Full search with fuzzy ON when there are sufficient exact matches (fuzzy phase skipped on both branches)
### Trade-offs:
- Some individual data points show slight regressions at 5K scale (+2-8%), likely noise from shared-machine benchmarking
- Long queries (4 tokens) at 5K notes show a small regression (+18%), but this evens out at 10K
- The new flat text index has a one-time build cost on first search (~62ms at 10K notes), amortized across all subsequent searches
---
## FTS5 Content Index Benchmarks
> These benchmarks use the **real SQLite database** with actual blob content (not monkeypatched). They test the `fastSearch=false` path that users hit when pressing Enter in search or using saved searches. This is the path that was taking **seconds** in production.
### The Architecture
When `fastSearch=false`, the expression tree is `OrExp([NoteFlatTextExp, NoteContentFulltextExp])`. Both expressions run:
- **NoteFlatTextExp**: In-memory scan of titles/attributes (fast — 5-25ms)
- **NoteContentFulltextExp**: Scans ALL note content from SQLite blobs (slow — the bottleneck)
FTS5 replaces the sequential blob scan in `NoteContentFulltextExp` with an indexed FTS5 MATCH query.
### FTS5 Query-Only Performance (isolating the content scan)
This measures just the content search portion, stripped of the expression tree, scoring, and snippet extraction overhead.
| Notes | FTS5 MATCH query | Sequential SQL scan | FTS5 Speedup |
|------:|-----------------:|--------------------:|-------------:|
| 1,000 | **0.2ms** | 3.6ms | **15x** |
| 5,000 | **0.5ms** | 16.0ms | **33x** |
| 10,000 | **1.1ms** | 36.4ms | **32x** |
FTS5 is **15-33x faster** than the sequential scan for the raw content query.
### Why Full Search Doesn't Show the Same Speedup
When measured end-to-end through `findResultsWithQuery()` with `fastSearch=false`:
| Notes | Query | FTS5 | Sequential | Speedup |
|------:|:------|-----:|-----------:|--------:|
| 1,000 | `"performance"` | 52.1ms | 48.3ms | 0.9x |
| 5,000 | `"performance"` | 233.4ms | 227.6ms | 1.0x |
| 10,000 | `"performance"` | 517.3ms | 515.9ms | 1.0x |
| 1,000 | `"xyznonexistent"` | 46.2ms | 57.6ms | 1.2x |
| 5,000 | `"xyznonexistent"` | 272.9ms | 229.3ms | 0.8x |
| 10,000 | `"xyznonexistent"` | 460.3ms | 468.3ms | 1.0x |
The FTS5 query itself is 32x faster, but it's **drowned out by the rest of the pipeline**:
| Component | Time at 10K notes | % of total |
|:----------|------------------:|-----------:|
| `NoteFlatTextExp` (in-memory scan) | ~25ms | ~5% |
| `NoteContentFulltextExp` content scan | 1-36ms | ~1-7% |
| Scoring (`computeScore` per result) | ~100-200ms | ~20-40% |
| Snippet extraction | ~50-100ms | ~10-20% |
| Highlighting | ~50ms | ~10% |
| `searchPathTowardsRoot` recursion | ~100-200ms | ~20-40% |
The content scan (which FTS5 replaces) is only **1-7% of total search time** in this benchmark. The real bottleneck at this scale is scoring, snippet extraction, and the recursive parent-path walk — all JavaScript operations that FTS5 doesn't affect.
### Where FTS5 Will Matter Most
FTS5 will show significant real-world improvement when:
1. **Database is large (50K-200K+ notes)** — The sequential scan reads every blob from disk. At 200K notes with varying content sizes, the I/O cost dominates. FTS5 eliminates this entirely.
2. **Notes have large content** — The benchmark uses 300-word notes (~2KB each). Real notes can be 10KB-100KB+. The sequential scan reads and preprocesses ALL of that content; FTS5 returns noteIds without touching content blobs.
3. **Disk is slow** — These benchmarks run on fast local SSD. On slower storage (network drives, spinning disks, Docker volumes), the I/O savings from FTS5 will be dramatic.
### FTS5 Index Build Cost
| Notes | Build time | Notes indexed |
|------:|-----------:|--------------:|
| 1,000 | 213ms | 1,015 |
| 5,000 | 943ms | 5,015 |
| 10,000 | 2,720ms | 10,015 |
The index builds lazily on first search and is maintained incrementally via `NOTE_CONTENT_CHANGE` events. Users using `unicode61` tokenizer (not trigram) keeps the index compact.
### Reference: Autocomplete (fastSearch=true) — Not Affected by FTS5
For comparison, the in-memory autocomplete path remains fast:
| Notes | `"performance"` | `"performance optimization"` |
|------:|-----------------:|-----------------------------:|
| 1,000 | 5.2ms | 1.4ms |
| 5,000 | 10.1ms | 3.7ms |
| 10,000 | 24.4ms | 10.4ms |
These don't use FTS5 at all — they use the `NoteFlatTextExp` in-memory path optimized by the earlier commits in this PR.