Sveriges mest populära poddar

How AI Is Built

Local-First Search: How to Push Search To End-Devices | S2 E22

53 min • 23 januari 2025

Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector search thing that I don't have to install 10,000 dependencies to use.”

Core Mantra: "Simple, Local, Scalable"

Why SQLite Vec?

"I didn't go along thinking, 'Oh, I want to build vector search, let me find a database for it.' It was much more like: I use SQLite for a lot of projects, I want something lightweight that works in my current workflow."

SQLiteVec uses row-oriented storage with some key design choices:

  • Vectors are stored in large chunks (megabytes) as blobs
  • Data is split across 4KB SQLite pages, which affects analytical performance
  • Currently uses brute force linear search without ANN indexing
  • Supports binary quantization for 32x size reduction
  • Handles tens to hundreds of thousands of vectors efficiently

Practical limits:

  • 500ms search time for 500K vectors (768 dimensions)
  • Best performance under 100ms for user experience
  • Binary quantization enables scaling to ~1M vectors
  • Metadata filtering and partitioning coming soon

Key advantages:

  • Fast writes for transactional workloads
  • Simple single-file database
  • Easy integration with existing SQLite applications
  • Leverages SQLite's mature storage engine

Garcia's preferred tools for local AI:

  • Sentence Transformers models converted to GGUF format
  • Llama.cpp for inference
  • Small models (30MB) for basic embeddings
  • Larger models like Arctic Embed (hundreds of MB) for recent topics
  • SQLite L-Embed extension for text embeddings
  • Transformers.js for browser-based implementations

1. Choose Your Storage

"There's two ways of storing vectors within SQLiteVec. One way is a manual way where you just store a JSON array... [second is] using a virtual table."

  • Traditional row storage: Simple, flexible, good for small vectors
  • Virtual table storage: Optimized chunks, better for large datasets
  • Performance sweet spot: Up to 500K vectors with 500ms search time

2. Optimize Performance

"With binary quantization it's 1/32 of the space... and holds up at 95 percent quality"

  • Binary quantization reduces storage 32x with 95% quality
  • Default page size is 4KB - plan your vector storage accordingly
  • Metadata filtering dramatically improves search speed

3. Integration Patterns

"It's a single file, right? So you can like copy and paste it if you want to make a backup."

  • Two storage approaches: manual columns or virtual tables
  • Easy backups: single file database
  • Cross-platform: desktop, mobile, IoT, browser (via WASM)

4. Real-World Tips

"I typically choose the really small model... it's 30 megabytes. It quantizes very easily... I like it because it's very small, quick and easy."

  • Start with smaller, efficient models (30MB range)
  • Use binary quantization before trying complex solutions
  • Plan for partitioning when scaling beyond 100K vectors

Alex Garcia

Nicolay Gerold:

Kategorier
Förekommer på
00:00 -00:00