Introducing Rubberband Flamethrower
Rubberband Flamethrower is a rubygem for benchmarking data insertion on Elastic Search servers that I'm proud to release into the open source wilds. If you want to mess around with big data, Elastic Search is definitely a cool tool to play with.
First let's talk a little bit about what Elastic Search is and how to get it set up.
Elastic Search is a open source search and analytics engine built on top of Apache Lucene. It stores data as JSON documents and is managed through a RESTful API. It is designed to be used at high scale and to respond quickly to data queries that would take forever on a traditional database such as MySQL. Elastic Search is designed to grow with your data and clusters automatically reorganize to take advantage of new hardware. It very easy to get a basic Elastic Search node up and running.
Download Elastic Search:
curl -k -L -o elasticsearch-0.20.6.tar.gz http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.6.tar.gz
Un-archive the source:
tar -zxvf elasticsearch-0.20.6.tar.gz
Start an Elastic Search node:
./elasticsearch-0.20.6/bin/elasticsearch -f
Now that you've got an Elastic Search node up and running, Rubberband Flamethrower is an easy way to get some big data into it.
Install the gem:
gem install rubberband_flamethrower
Now let's insert some data. The basic command to insert data is:
flamethrower fire
This will insert 500 data objects starting with ID 1 into an index "twitter" of type "tweet" on a local Elastic Search node located at "http://localhost:9200". It will randomly generate the data objects with three fields: message, username, and post_date. The random data generator uses several word lists from the SCOWL project. It generates a message between 6 and 16 random words that maxes at 140 characters and ends with a period. The username will be a random word and the post_date will be a current timestamp. It will print out a dot for each insert and then benchmark information when it completes.
You can configure the default values of the insertion by passing additional parameters. The "fire" command accepts the following parameters in order: how_many, starting_id, server_url, index, and type. If for example you wanted to insert 10,000 objects instead you would use this command:
flamethrower fire 10000
If you wanted to insert 10,000 objects with a starting ID of 20000 into an index named "facebook" of type "message" on an Elastic Search server located at "http://es.test.com:9200" you would use this command:
flamethrower fire 10000 20000 "http://es.test.com:9200" "facebook" "message"
Here is an example including the terminal output:
flamethrower fire 20
....................
Finished Inserting 20 documents into Elastic Search.
user system total real
0.060000 0.020000 0.080000 ( 0.684668)
When benchmarking, you want to do several passes to get good averages. I've added a second command "flamethrower auto" to allow you to run multiple fire commands in a sequence, timing each one. By default "flamethrower auto" will run 3 times.
flamethrower auto
500 documents inserted into Elastic Search per set
user system total real
set 1 of 3: 0.930000 0.270000 1.200000 ( 6.795932)
set 2 of 3: 1.110000 0.410000 1.520000 ( 8.643721)
set 3 of 3: 1.000000 0.300000 1.300000 ( 10.741560)
Have fun playing with some big data sets!