Mongo Performance (And Why RAM Is Awesome!)

Posted on by Tim Rosenblatt

Chris was doing some rough benchmarking on Mongo for a project and we noticed some interesting trends in the data that are worth sharing. This is all super rough data, but the goal here is to show performance numbers from a quick benchmark, the importance of indices in Mongo, and the value of RAM. While a statistics geek would throw a fit -- and would provide their own numbers and analysis rather than just complaining in the comments ;) -- you look like a cool person. By the way, that's a really great shirt you've got on!

Chris's setup was running Mongo in a virtual machine on his MacBook Pro. On a production server, you'll see better performance. He put 10 million rows into a database (a very small Mongo setup, relatively speaking) and ran three types of queries -- looking for a record at the beginning, middle, and end of the table). A real database would be a bit noisier, since it wouldn't just be millions of rows all inserted sequentially. He used "id" as the field name here, in contrast to "_id" which is always indexed and would not demonstrate the unindexed performance. Indexes were added with "db.ensureIndex({id : 1})"

beginning > db.benchmark_data.findOne({id : 0}) > db.system.profile.find(); no index : 86ms, 34ms, 87ms, 44ms, 58ms, 73ms, 88ms, 80ms index : 133ms, 0ms

For the find at the beginning of the database, the numbers are a bit noisy. Standard deviation is 19.6, which is almost 30%. This isn't surprising or worrying -- at these speeds, other tasks on the system are likely to be paging memory in and out, doing disk accesses, or any other number of things that'll cause this statistical noise. It doesn't matter -- the numbers are all very small.

There's an interesting pattern. It takes a minimum of ~30 ms to open the database and do a read without an index. With an index, it takes longer for this single record. Mongo uses B-trees for the index, but there seems to be some overhead for accessing older data.

middle > db.benchmark_data.findOne({id : 4999999}) > db.system.profile.find(); no index - : 21177ms, 21968ms, 22288ms, 23598ms, 21767ms, 23153ms, 25730ms, 22296ms index : 80ms, 0ms

For the middle group, the standard deviation for unindexed queries is 1425, or 6%. This is nice and clean. If I were a stats geek, I would call it "statistically significant".

end > db.benchmark_data.findOne({id : 9999999}) > db.system.profile.find(); no index : 43431ms, 51195ms, 44873ms, 44757ms, 45815ms, 48667ms, 49078ms, 45896ms index : 21ms, 0ms

At the end of the database, we've also got a standard deviation of just over 5% -- again, clean data.

Looking across all the numbers, the middle find is almost half the time of the end: ~30ms, ~22,000ms, ~46,000ms. This makes sense, because Mongo scans starting at the the beginning and moves to the end -- without indices. With them, it's much faster, and the newest data is more accessible in the index, which can be seen in the steadily decreasing numbers: 133ms, 80ms, 21ms. Assuming it is deliberate, this seems like a good design tradeoff for a database that's used in modern web systems where recent data is used more often than old data -- how often do you look at tweets from a year ago?

Finally, you may have noticed that there's only two data points on the indexed data. This demonstrates a good lesson -- the first one takes time, and the next one on the same data essentially doesn't. This is because the data is now cached in memory and there's effectively just two lookups, one in the index, one in the database. Each is going to be around 100ns -- that's nanoseconds. Said a different way, it's one ten thousandth of a millisecond -- four orders of magnitude. Which brings me to the golden rule of database performance: keep it in memory. If you can't keep your data in memory, at least keep the index there. Hitting the disk sucks -- a disk seek alone costs 10 million nanoseconds (10 milliseconds). Or you could upgrade to SSD everywhere :)

If you've read the whole post down to this point, you're obviously interested enough to watch Richard Kreuter talk about the deep magic behind Mongo indexes and the query optimizer.

Enhanced by Zemanta
 
comments powered by Disqus