python

I've previously experimented with storing and retrieving text embeddings using SQLite and opted to calculate the cosine similarity of each entry with each other and then store their scores in a table. Similar entries could then be queried by filtering records based on ID and sorting by their similarity score. This was a pattern I copied from Simon Willison.

Max Woolf explores using Apache Parquet files for this purpose as they generated text embeddings and then calculated the similarity between 32,254 Magic the Gathering cards. The write-up also includes instructions to read and write Parquet files using Polaris.

Polars is a relatively new Python library which is primarily written in Rust and supports Arrow, which gives it a massive performance increase over pandas and many other DataFrame libraries.

Max concludes by comparing this method to more traditional vector databases where SQLite (with sqlite-vec) was mentioned.

Notably, SQLite databases are just a single portable file, however interacting with them has more technical overhead and considerations than the read_parquet() and write_parquet() of polars. One notable implementation of vector databases in SQLite is the sqlite-vec extension, which also allows for simultaneous filtering and similarity calculations.

Discovered via Simon Willison.

Read from link

Simon Willison shares his experience serving on the board of the Python Software Foundation over the last two years and some of the responsibilities that entails.

The Python Software Foundation supports the development of Python and the community by allocating their donations towards running infrastructure other activities. However, they are not directly related to developing Python which is handled by the core team ran by the Python Steering Council. Infrastructure includes running PyPi and Python.org and activities most notably include organising PyCon. Simon also mentions an activity I hadn't considered before and that's acting as a fiscal sponsor to other python-related communities.

Simon's write-up is dense with information and definitely worth the read if this is interesting to you. This has also prompted a write-up by Makoto Nozaki on serving on the board of The Perl Foundation.

Read from link