[es|en]

Recreate ElasticSearch index for integration testing.

I fought against this for most of last week so now that I solved it I figured I could share it with the rest of the world (not that I had much fun running tons of Jenkins' builds to see if it was fixed...).

So, we have a Rails app that uses ElasticSearch for a few features. There's a single index that we query, and for integration test purposes we create a fake test index so we can go through the whole stack. We are using Tire with its Persistence module, so in our spec_helper.rb (asuming we have a model called Book) we had something along the lines of:

before(:each) do
  Book.index.delete
  Book.create_elasticsearch_index
end

There was some more unrelated stuff in there (like deleting the index after the whole suite was completed, or using Webmock to ensure that we are not making any unwanted HTTP requests), the only detail that I want to mention is that you might want to wait for a yellow status before each test to avoid "No active shards" errors.

Back to the problem at hand, nothing seems wrong with this, but then we started having random 404 errors because the index was missing during the examples. But it should be there, right? It should be created right after it was deleted.

I enabled debugging on Tire's config, and I found something like the following:

# 2013-10-04 09:25:05:839 [DELETE] ("test_index")
#
curl -X DELETE http://some-server:9200/test_index

# 2013-10-04 09:25:05:840 [200]
#
# {
#   "ok": true,
#   "acknowledged": true
# }

# 2013-10-04 09:25:05:852 [HEAD] ("test_index")
#
curl -I "http://some-server:9200/test_index"

# 2013-10-04 09:25:05:852 [200]

So, right after the DELETE request, there's a HEAD request against the same index, which returns 200.

What.

First of all, the HEAD request comes from Tire doing an existence check before creating the index. Makes sense. But why would it return 200 if the DELETE request that came just before that one returned a 200 ok everything is perfect response?

Well, help comes from the great people at StackOverflow. First comment: turns out the entire ES HTTP API is asynchronous. So yeah, I get the 200 for the DELETE request but the index wasn't necessarily deleted yet. So, what do we do? Follow the suggestion at the accepted answer for that question: poll ES until we are sure that the index was deleted.

So, in our Tire initializer, I added:

Tire::Index.class_eval do
  def ensure_deleted
    5.times do
      return true unless exists?
    end

    raise "The ElasticSearch index wasn't successfully deleted."
  end
end

And then modified the hooks to look like:

before(:each) do
  Book.index.ensure_deleted
  Book.create_elasticsearch_index
end

after(:each) do
  Book.index.delete
end

So basically, we check five times to see if the index was deleted. I didn't show the whole log, but in all the failures only one request returned the fake 200 after the DELETE, the next one always returned 404 correctly, so limiting it to 5 tries made sense.

That's it! I hope this can save someone some time and the anger against the world that I went through.

powered byDisqus