codahale.com٭blog

get smart ringtonehi fi ringtoneskottonmouth kings ringtoneboot made ringtone these walkingcrack ringtone creator
Coda Hale lives in Berkeley, CA, where he writes about Ruby on Rails, usability, web design and development, and the occasional bit about bicycles.

Google Analytics: The goggles, they do nothing!

Google Analytics is in need of some serious help. Its information visualization is profoundly broken, and that gets in the way of me trying to figure out what people are doing with my website. Here’s what’s wrong with it, and how it can be fixed.

What’s Broken: Visualizing whirled peas

The visualization of quantitative information like website usage statistics has a single purpose: it helps us to make decisions about the processes which generated the data. (Information visualization can also have an aesthetic component, it’s true, but when it comes to business, form follows function.) Infovis allows us to utilize the pattern-detection capabilities of our visual cortex to find meaning in abstract datasets. Given this meaning, we can then make data-driven decisions, which tend to have much better results than stabs in the dark, guesses, or Ouija boards.

Any evaluation of an infovis package like Analytics must be driven by a single, all-important criteron–how well does this help us make decisions? Let’s take a look at how well Analytics helps the average webmaster make decisions about her site.

(Let me be clear that I like Google, and I think Analytics is a very promising application. I am brutal in this post in the hopes that someone at Google will read this and put some effort into making Analytics better, through my suggestions or otherwise.)

But first, a rant about Flash

I cannot talk about Analytics without first whinging about Flash. Macromedia hasn’t released a version of Flash for Linux in years (current version–7), and there are no decent Flash viewers available for x86_64, meaning the admins who have nice big Athlon64/Xeon/Pentium D workstations running Linux/BSD installations actually built for that architecture have to install a 32-bit copy of their browser (using linux32) so they can see how the websites they’re tasked with maintaining or developing are doing.

Sure, Flash is nearly ubiquitous for most web users, but Analytics users aren’t going to be average web users. For the purposes of displaying charts and graphs, it’s not like there aren’t good alternatives to Flash. SVG support is native in Firefox and Opera, is enabled by default in Safari, and can be installed via plugin in Internet Explorer. Or go all-native with Javascript using something like PlotKit. Flash is useful for games, extremely interactive applications, and alienating users. If you must use Flash, use it only when developing an SVG/Javascript/AJAX solution would obviously not work.

For God’s sake, even rendering the data as PNGs would be preferable.

Piecharts are for middle management

Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps. Anyone who suggests their use should be instinctively slapped.

I am not the only person who feels this way; Edward Tufte comes to mind:

Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.*

– The Visual Display of Quantitative Information, pg. 178

* The footnote for this references Jacques Bertin, Graphics and Graphic Information, who says multiple pie charts are “completely useless.”

Evil Genius Marketing puts the final stake through the heart of the piechart:

A pie chart from Evil Genius Marketing showing East, West, and North at 31%, 33%, and 35%, respectively.

Rank these three numbers. EGM and I dare you.

And yet… Analytics is stuffed with piecharts.

A horrifically useless piechart from Google Analytics, displaying the difference between new and returning visitors.

This piechart uses 78,050 pixels to display a single fact–that 9.94% of all visitors had previously visited the site–resulting in a spectacular data-point-to-pixel ratio of 0.0013%. Using this methodology, Analytics could use the entirety of my monitor to display a meager 14 data points. What an astoundingly compact overview of my website.

If Analytics were a 10-foot interface, this would be excusable. Given the fact that it uses 10pt fonts, this is horrific.

Trading in laughs for charts and graphs

Most of the charts in Analytics use three dimensional shapes or have an oddly three-dimensional grid behind them. This is bad, since these dimensions of visualization do not convey any additional information. Instead, they require the viewer to discard elements of the visualization in order to accurately compare data. These Powerpoint-esque graphics make us all stupider for having seen them.

An ugly bar graph from Google Analytics, with a heavy grid and uselessly three-dimensional bars.

That high-pitch whine in your ears is your IQ dropping a notch.

Compare the faux-3D look of Analytics to the sparse utilitarianism of The Economist’s graphs:

A clean, sparse graph from The Economist, showing the decline of participation in trade unions over time in Britain, Germany, France, and the US.

Notice the subtle grid, the use of (but not dependence on) colors, the fact that both axes do not differ in increments based on the dataset, the lack of a legend (since the data is explained in context, and requires no memorization of arbitrary labeling schemes), and the high ratio of data to ink.

Also compare Analytics to Kyle Rove’s Fresh View, a plugin for Mint, Shaun Inman’s $30 web analysis package:

A graph from Fresh View for Mint, showing the number of unique and returning visitors over the past week.

Some guy in med school vs. The Biggest Search Engine Company Evar, and it’s the guy with no free time who manages to make a better way of displaying visitors over time. The only improvement I could suggest for Kyle’s plugin would be to put the information in the legend in context with the actual information.

Now what?

Okay, so Analytics looks like crap. Here’s how to fix it!

Unbreaking Analytics: So crazy it just might work

1. Piecharts are defeat in circular form.

First, no piecharts. At all. Period. As I’ve said, anything with a piechart simply reeks of bozocity, and we’re trying to act like grownups.

2. Comparison is your raison d’ĂȘtre.

Second, compare things. The essence of information visualization is comparison: how many vistors do I have compared to how many visitors I have had? What is the most common browser viewport width? How are my AdWords campaigns doing in terms of money in my pocket? Don’t just tell me what things are like right now–I can find that out by looking at the right side of the damn graph–tell me what things are like right now compared to the way they’ve been.

3. Simplify, simplify, simplify.

Third, if it’s a simple dataset, boil it down to the essentials. If there are two numbers which add up to 100%, you don’t need to tell me both, and you certainly don’t need to draw me a picture of it. I, like most people, have a pretty good grasp of how close 9.94% is to 100%. If you can’t reduce it to a single number, use a simple table. If you feel like making things fancy, add the browser icons to the table. When you start to deal with complicated data, like browser share over time, use a graph. Your rule of thumb should be “Only make a graph if the table doesn’t make sense, and only use a table if a single fact dosn’t explain it all.” Define and stick to a methodology which specifies a minimum data-ink ratio. Give Edward Tufte fistfuls of money to come in and slap the Analytics dev team with copies of his books.

4. Take the geeks off the leash.

Fourth, diversify the vis team. “Yes, well, it uses a reporting engine written by a blind misanthrope in the early 80s” does not adequately explain why a hugely important piece of software run by a hugely important internet company looks like Powerpoint and USA Today got drunk and made a website. C’mon, Google! Where are the bright people who solve problems for fun? Where are the boxless thinkers? Where are the supernerds? I want to see treemaps (square, circular, or otherwise) of my website’s contents vs. usage patterns, or hyperbolic trees or circos circular graphs or linkmaps or text visualization of search keywords vs. density or massive timelines or sparklines or keyword heatmaps or search engine ranking comparisons or… you get the picture. There are hundreds of wonderful, innovative ways of analyzing website usage, and Analytics is mired in the infovis Stone Age of Bar Graph, Line Graph, Pie Chart, Table.

Get ten of your best, most wild-eyed infovis people, feed them their choice of high-dosage caffeine or DMT, and give them two weeks to come up with a proof-of-concept vis mode for Analytics, either static or interactive. Marvel at the results.

5. Make sure your product helps real people make good decisions.

Once you’re done marveling, feed each infovis various sets of data, and bounce the results off a bunch of average webmasters, who will be shown their broken-headed creations and asked specific questions about the actual shape of the data. Whoever wrote the vis which helps the lab rats get the most right answers gets two thousand dollars and another belt of DMT. Rinse. Repeat.

Remember, that’s why you’re building this–to help people make decisions.

A shortcut to this process

Google is not constitutionally incapable of making beautiful information visualizations; on the contrary, Google Finance is one of the finest instances of infovis I’ve seen. Behold:
Google Finance

Find out where the Google Finance crew is hiding. Get the people who made financial markets visually intelligible and set them to work on Analytics.

A final word

The purpose of data visualization is to help people make decisions by allowing them to process data using their visual cortex in all its pattern recognizing glory. The purpose of Google Analytics is to record and present aggregate statistics about the composition and usage of your websites, and thus allow you to make data-driven decisions regarding your website.

Please, Google: fix Analytics. Now. It’s too important to let it linger, Google News-like, in despair and neglect for a few years.

13 comments »

On booleans and database portability

Oh, booleans. Simplest and most elusive of database types. How to specify you? Y and N? T and F? 0 or NULL and anything else? If only we knew…

When working on a Rails app that uses different databases (say, SQLite for development/testing and MySQL for production), be sure that your conditions clauses aren’t assuming a particular form of boolean representation.

This will return nothing in SQLite:

@monkeys = Monkey.find(:all,
  :conditions => 'rabid = 0')

But it’ll work in MySQL.

A Solution

Autogenerate that sucker!

@monkeys = Monkey.find(:all,
  :conditions => ['rabid = ?', false])

Yay!

1 comment »

Ever wonder which is the fastest way to concatenate strings in Ruby?

No? Too bad!

From this:


require 'benchmark'
Benchmark.bm(20) do |x|
  x.report ('<<') do
    1_000_000.times do
      one = 'one'
      two = 'two'
      three = 'three'
      y = one << two << three
    end
  end
  x.report('+') do
    1_000_000.times do
      one = 'one'
      two = 'two'
      three = 'three'
      y = one + two + three
    end
  end
  x.report('#{one}#{two}#{three}') do
    1_000_000.times do
      one = 'one'
      two = 'two'
      three = 'three'
      y = "#{one}#{two}#{three}"
    end
  end
  x.report('one#{two}#{three}') do
    1_000_000.times do
      two = 'two'
      three = 'three'
      y = "one#{two}#{three}"
    end
  end
  x.report('onetwo#{three}') do
    1_000_000.times do
      three = 'three'
      y = "onetwo#{three}"
    end
  end
end

Comes this:


                           user     system      total        real
<<                     4.580000   0.000000   4.580000 (  4.579776)
+                      5.720000   0.000000   5.720000 (  5.815782)
#{one}#{two}#{three}   5.180000   0.000000   5.180000 (  5.185434)
one#{two}#{three}      3.920000   0.000000   3.920000 (  3.917942)
onetwo#{three}         2.610000   0.000000   2.610000 (  2.617674)

Thus proving a two things:

  1. Use << for concatenation. It doesn’t make an intermediate copy, unlike +.
  2. If you need to place a string variable inside a chunk of static text, it’s far faster to use interpreted string literals than to concatenate string variables.

Yup.

5 comments »

Content-only caching for Rails

So you’ve got a Rails app which is mostly static content, but it’s got some dynamic, user-specific stuff mixed in with the layout. You’d love to cache the static data, since it doesn’t change often, but that would leave you updating the dynamic content via AJAX or something, and as cool as AJAX is, it’s for crap when the Javascript is turned off.

You try page caching, but you notice that the dynamic content doesn’t update. You try action caching, but it’s the same story. You try fragment caching, but then your app still performs all the big database queries in your actions. There’s a level of granularity missing in Rails’ caching system. Cached pages are stupid-quick for very static content, cached actions allow you to filter via ActionController, and cached fragments clean up the messy bits of your views. But you can’t cache just a rendered view. Until now… (dun dun duuun)

Content caching is a different level of granularity for Rails. Like action caching, requests are routed through the ActionController framework. Unlike action caching, none of the layout is cached, allowing you to provide some dynamic, user-specific content while reducing DB loads and rendering times. It’s a bit like fragement caching, but if a copy exists in the cache, the controller action isn’t called, meaning the database is never queried.

Installation

Subversionality:


./script/plugin install -x http://svn.codahale.com/content_cache

Living dangerously?


./script/plugin install http://svn.codahale.com/content_cache

Usage


class NotesController < ActionController::Base
  caches_action_content :index

  def index
    @notes = Note.find(:all, :include => [:monkeys, dirigibles, robots])
  end
end

The first time /notes/index is requested, #index is executed and whatever it renders is stored wherever you have the cache store configured. None of the layout is stored, just the rendered view. The next time /notes/index is requested, the cached action content is read from the cache, placed within the layout, and sent to the client.

Sometimes an action’s instance variables are used in the layout itself–to set the title, for example. Instance variables your layout depends on can be specified as such:


ActionContentFilter.preserved_instance_variables += ['@title', '@content_type']

The content of these instance variables are cached alongside the action’s content, and sent to the layout during a request. (These types are marshalled, which means that simple data types are preferred, and anything which refers to records in a database will likely break after a certain period of time. Best to limit this to strings, integers, arrays, and other simple types which play well with marshalling.)

Speed

I did some rough benchmarking with this, and created an SQLite3 database with a single table and a thousand records in it. I generated a scaffold for this model and removed the pagination from the #list action. I made an alias of the #index action, called #index_with_cache, because I’m creative like this.

I requested each location twice. The first round was, as you’d expect, identical. The second round shows the benefit of the content caching mojo:


Completed in 3.93749 (0  reqs/sec) | Rendering: 3.38100 (85%) | DB: 0.48483 (12%) | 200 OK [http://localhost/monkeys/index]
Completed in 0.01605 (62 reqs/sec) | Rendering: 0.01167 (72%) | DB: 0.00000 ( 0%) | 200 OK [http://localhost/monkeys/index_with_cache]

That’s 24,532% faster!

Then I limited the number of results to 100, to get a more realistic picture:


                     user     system      total        real
without cache   15.360000   0.870000  16.230000 ( 20.844324)
with cache       3.250000   0.280000   3.530000 (  5.775237)

And this is just in development mode, too. Granted, this is a pretty edge case, but if you’ve got a lot of database traffic, caching will speed your app up something fierce. Plenty of people have been putting off adding caching because until now it’s been an all-or-nothing affair.

Have fun, kids!

30 comments »

Rails Environments: a plugin for, well, Rails

This one’s simple. Sometimes I need to know what environment Rails is running in (for example, so I can display additional debugging information, or not link to the HTTPS version of an action), and typing out ENV['RAILS_ENV'] is a hassle and looks ugly. Hence, this plugin.

Installation

Are you down with SVN? Yeah, you know me!


./script/plugin install -x http://svn.codahale.com/rails_environments/trunk

Think source control is for the weak?


./script/plugin install http://svn.codahale.com/rails_environments/trunk

Usage

Rails Environments drops five hot new methods in your lap:


if Rails.production?
  'Production'
elsif Rails.development?
  'Development'
elsif Rails.test?
  'Test'
elsif Rails.none?
  'Oh, You Gotta Be Kidding Me'
else
  "Dude, what the hell: #{Rails.environment}"
end

Not complicated, sure, but it makes my code pretty. Ponies!

4 comments »