Replacing Ruby's URI with Addressable
Corey and I have been seriously tearing apart some URLs in our recent work and found that the standard URI library in ruby just wasn't cutting it. It didn't parse and merge exactly like we expected, sometimes even dropping parts of the URL. It also had problems handling special characters in URLs like these: ™ ‘ ’ ° ®. We found the Addressable ruby gem from sporkmonger on git hub. From the description it sounded like exactly what we needed, "Addressable is a replacement for the URI implementation that is part of Ruby's standard library. It more closely conforms to the relevant RFCs and adds support for IRIs and URI templates."
We took a look at the documentation and the library seemed to implement the same functionality as ruby's standard library and then some. It even supports punycode! We installed the gem and tested it in irb with the URLs that had given us problems. It worked perfectly.
For the most part, just config the gem in your environment.rb file and append "Addressable::" before any instance of "URI." in your code. Instead of "URI.parse(url)" you'll now have "Addressable::URI.parse(url)". But be sure to take a glance at the documentation because there are a few methods that don't exist one-to-one in the libraries. For example: instead of calling URI.decode you would call Addressable::URI.unencode.'
We decided to run a test using ruby's built in Benchmark library and compare the parse methods in each since that was the method used most often in our code. We started by requiring all the neccessary bits and whipping up an array of 1 million random URLs.
array = (1..1000000).map { "http://cloudspace.com/#{rand}?#{rand}=#{rand}\##{rand}" } require 'rubygems' require 'uri' require 'addressable/uri'
Then here is the code to run the benchmark:
Benchmark.bmbm do |x| x.report("uri") { array.each { |u| URI.parse(u) } } x.report("add") { array.each { |u| Addressable::URI.parse(u) } } end
And here is the result (the values are in seconds):
Rehearsal --------------------------------------- uri 65.660000 23.670000 89.330000 ( 97.630597) add 128.030000 22.270000 150.300000 (161.927209) ---------------------------- total: 239.630000sec user system total real uri 65.170000 23.140000 88.310000 ( 93.674252) add 127.920000 22.600000 150.520000 (166.662719)
With a little bit of math on our own
URI.parse | 0.093674252 ms |
---|---|
Addressable::URI.parse | 0.166662718 ms |
Addressable takes almost twice as long as URI to parse the URLs. In our case, the time difference is still a minimal concern considering the benefits of having the URLs parsed to the more modern RFC. So we went for it and it is working great for us! Thanks Sporkmonger!