Sunday, August 24, 2014

PostgreSQL Full-text search video sum-up

I've just finished a video regarding full-text search with Django/PostgreSQL
video link, presentation link

1. General notes.
The core purpose of Postgres FTS is to quickly roll out FTS feature. It's not as fast or as storage-effective as "real" search engines.
The other option here is haystack+whoosh. Actually, if you don't use Postgres, it's better.

there's GIT(hashes) and GIST(B-tree) options to create FT indeces in Postgres.
GIT is 3 times: faster to update, less storge place, less search speed.

2. Django packages.
There are lots of pip packages that extend django ORM to support various Postgres features:
https://pypi.python.org/pypi?%3Aaction=search&term=djorm&submit=search
FTS support is here https://pypi.python.org/pypi/djorm-ext-pgfulltext/0.9.2

3. Fuzzy search
Probably, you should take a look at fuzzystrmatch.sql (http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html). It gives you soundex, levenstein and Metaphone/Double Metaphone.
But, as trigram is faster and there's a package https://pypi.python.org/pypi/djorm-ext-pgtrgm/0.2, we should probably use that one.

6. Additional data
FTS engines comparison table (in Russian). Unfortunately, there's no such parameter as "easiness of development" there.
djorm-ext-pgfulltext tutorial + explanation (in Russian)