With a few optimization tricks, TL;DR: - ONNX inference in Rust - Embeddings cache & lookup - Parallel & Batch requests - hybrid search with full-text filtering + vector re-scoring
With a few optimization tricks, TL;DR: - ONNX inference in Rust - Embeddings cache & lookup - Parallel & Batch requests - hybrid search with full-text filtering + vector re-scoring