Some notes on Sphinx & Thinking Sphinx

If you ever have developed search solutions for a web application you probably are familiar with the hazzle attached to it. There are lots of stuff to think about and performance is often an issue when dealing with large sets of data. One of the key benefits with a dedicated search engine such as Sphinx is that it is rapidly fast and can easily be customized to fit the needs for the application.

We were using a gem called ActsAsIndexed in one of our projects, however, when the datasets started to get bigger we noticed some really poor performance (we had request times in the 10s range) and had to look for other solutions to solve the problem. After some research I stumbled upon Sphinx and the gem Thinking Sphinx. I played around with it for a while and absolutely fell in love with it! So, to make it more easy next time i’ll post some notes here to freshen up my memory when needed :) The notes are for Unix but should work perfectly for OSx too.

Installing and setting up Sphinx with Thinking Sphinx with Rails (version 3 and up)

When installing Sphinx, it is highly recommended NOT to use symlinks

  1. Download the latest Sphinx version tarball from here
    wget http://sphinxsearch.com/files/sphinx-x.x.x-release.tar.gz
  2. Navigate to the folder with the tarball and run
    tar xzvf sphinx-x.x.x-release.tar.gz
  3. cd sphinx-x.x.x
  4. ./configure
  5. make
  6. sudo make install
  7. Next up is to install Thinking Sphinx, it’s easy! Just put gem “thinking-sphinx” in your Gemfile and run bundle install
  8. After this you need to setup your indexes, I won’t go into this here since it is pretty straight forward but have a look here for more information
  9. rake ts:index
  10. rake ts:start
  11. DONE!
Some notes on functionality that is useful/cool
Filtering
Including
Model.search("query", :with => { :published => true })
Excluding
Model.search("query", :without => { :draft => true})
Weighing
In a search
:field_weights => { "name" => 100, "nickname" => 50 }
Or, if you prefer in the define_index block
set_property :field_weights => {...}
Relevance
This syntax does some magical stuff
:sort_mode => :expr, :sort_by => "@weight * ranking"
Indexing and searching a specific index
Create custom indexes. Look at that nice usage of a relational table. You can easily set restrictions here.
define_index "my_own_nifty_index" do
  indexes translations.title
  where "post_translations.locale = 'en'"
end
Model.search("query", :index => :my_own_nifty_index)
Search Usage
Model.search "query"
Model.search_count "query"
Model.search_for_ids "query"
Pagination
Model.search "query", :page => 1, :per_page => 5
One slight problem, and two solutions for it
The only thing that might be a problem is that Sphinx is not using your database when you query it. It uses the indexed entries and serves them. So, if you make changes in your model it will not be re-indexed automatically by Sphinx. One solution is to run a cron-job every X minute/hour which rebuilds the index (this is the most common one for sites that don’t have a requirement to show the exact correct data in search). If you need instant updates you need to look into another solution called Delta Indexes. You can read up on that here