database

sqlite-vec is a vector search extension for SQLite with install options for Node.js, Python, Rust, rqlite and more. I previously used SQLite to store embeddings when investigating the viability of using a text embedding model to recommend related posts on the blog. However, back then I'd calculated the cosine similarity between each embedding which is compute intensive. Using this extension would allow me to dynamically retrieve the distance between an embedding and the embeddings stored in the database.

The extension can be compiled on various platforms but currently the PyPI package does not have an ARM release.

I also encountered an error when trying to retrieve distances for embeddings: "OperationalError: A LIMIT or 'k = ?' constraint is required on vec0 knn queries". I wasn't alone in this the workaround is to use k instead of LIMIT in the query. In the coming few weeks I'll be playing around with this extension some more, looks very promising so far.

Read from link

Philipp Keller documents three examples of defining a database schema for your tagging strategy with performance tests and sample queries. The simple "MySQLicious" solution with one table for items and tags. The "Scuttle" solution with two tables one for tags and the other for items. Finally, the classic associative tables approach, or as called by the author the "Toxi" solution, with a table for items, another for tags, and an item-mapping table.

The last approach also has a Wikipedia entry, that I sometimes refer to when building similar tables as a subtle reminder.

Read from link

What if I told you that by tuning a few knobs, you can configure SQLite to reach ~8,300 writes / s and ~168,000 read / s concurrently, with 0 errors

Some interesting configurations that are possible with SQLite today making it much more versatile even though it isn't designed to be a client/server SQL database. Discovered via Simon Willison's weblog.

Read from link