Replacing Ruby's URI with Addressable

Posted on by Michael Orr

Corey and I have been seriously tearing apart some URLs in our recent work and found that the standard URI library in ruby just wasn't cutting it. It didn't parse and merge exactly like we expected, sometimes even dropping parts of the URL. It also had problems handling special characters in URLs like these: ™ ‘ ’ ° ®. We found the Addressable ruby gem from sporkmonger on git hub. From the description it sounded like exactly what we needed, "Addressable is a replacement for the URI implementation that is part of Ruby's standard library. It more closely conforms to the relevant RFCs and adds support for IRIs and URI templates."

We took a look at the documentation and the library seemed to implement the same functionality as ruby's standard library and then some. It even supports punycode! We installed the gem and tested it in irb with the URLs that had given us problems. It worked perfectly.

For the most part, just config the gem in your environment.rb file and append "Addressable::" before any instance of "URI." in your code. Instead of "URI.parse(url)" you'll now have "Addressable::URI.parse(url)". But be sure to take a glance at the documentation because there are a few methods that don't exist one-to-one in the libraries. For example: instead of calling URI.decode you would call Addressable::URI.unencode.'

We decided to run a test using ruby's built in Benchmark library and compare the parse methods in each since that was the method used most often in our code. We started by requiring all the neccessary bits and whipping up an array of 1 million random URLs.

array = (1..1000000).map { 
  "http://cloudspace.com/#{rand}?#{rand}=#{rand}\##{rand}"
 }

require 'rubygems'
require 'uri'
require 'addressable/uri'

Then here is the code to run the benchmark:

Benchmark.bmbm do |x|
  x.report("uri") { array.each { |u| URI.parse(u) } }
  x.report("add") { array.each { |u| Addressable::URI.parse(u) } }
end

And here is the result (the values are in seconds):

Rehearsal ---------------------------------------
uri  65.660000  23.670000  89.330000 ( 97.630597)
add 128.030000  22.270000 150.300000 (161.927209)
---------------------------- total: 239.630000sec

          user     system      total        real
uri  65.170000  23.140000  88.310000 ( 93.674252)
add 127.920000  22.600000 150.520000 (166.662719)

With a little bit of math on our own

Average Computation Time Per URL over 1 million URLs
URI.parse 0.093674252 ms
Addressable::URI.parse 0.166662718 ms

Addressable takes almost twice as long as URI to parse the URLs. In our case, the time difference is still a minimal concern considering the benefits of having the URLs parsed to the more modern RFC. So we went for it and it is working great for us! Thanks Sporkmonger!

Reblog this post [with Zemanta]
 
comments powered by Disqus