codahale.com٭blog

This is my old blog. My current writing is here: codahale.com

Alive! It’s alive! It’s ALIVE!

Moo hoo ha ha!

The web site I’ve been working on for the past three and a half months is done. Dig it:

GPSgeek

Only the finest in GPS antennas, cables, cases, and other accessories.

Send me an email if you see anything wrong, eh?

WOO!

1 comment »

Google Analytics: The goggles, they do nothing!

Google Analytics is in need of some serious help. Its information visualization is profoundly broken, and that gets in the way of me trying to figure out what people are doing with my website. Here’s what’s wrong with it, and how it can be fixed.

What’s Broken: Visualizing whirled peas

The visualization of quantitative information like website usage statistics has a single purpose: it helps us to make decisions about the processes which generated the data. (Information visualization can also have an aesthetic component, it’s true, but when it comes to business, form follows function.) Infovis allows us to utilize the pattern-detection capabilities of our visual cortex to find meaning in abstract datasets. Given this meaning, we can then make data-driven decisions, which tend to have much better results than stabs in the dark, guesses, or Ouija boards.

Any evaluation of an infovis package like Analytics must be driven by a single, all-important criteron–how well does this help us make decisions? Let’s take a look at how well Analytics helps the average webmaster make decisions about her site.

(Let me be clear that I like Google, and I think Analytics is a very promising application. I am brutal in this post in the hopes that someone at Google will read this and put some effort into making Analytics better, through my suggestions or otherwise.)

But first, a rant about Flash

I cannot talk about Analytics without first whinging about Flash. Macromedia hasn’t released a version of Flash for Linux in years (current version–7), and there are no decent Flash viewers available for x86_64, meaning the admins who have nice big Athlon64/Xeon/Pentium D workstations running Linux/BSD installations actually built for that architecture have to install a 32-bit copy of their browser (using linux32) so they can see how the websites they’re tasked with maintaining or developing are doing.

Sure, Flash is nearly ubiquitous for most web users, but Analytics users aren’t going to be average web users. For the purposes of displaying charts and graphs, it’s not like there aren’t good alternatives to Flash. SVG support is native in Firefox and Opera, is enabled by default in Safari, and can be installed via plugin in Internet Explorer. Or go all-native with Javascript using something like PlotKit. Flash is useful for games, extremely interactive applications, and alienating users. If you must use Flash, use it only when developing an SVG/Javascript/AJAX solution would obviously not work.

For God’s sake, even rendering the data as PNGs would be preferable.

Piecharts are for middle management

Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps. Anyone who suggests their use should be instinctively slapped.

I am not the only person who feels this way; Edward Tufte comes to mind:

Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.*

– The Visual Display of Quantitative Information, pg. 178

* The footnote for this references Jacques Bertin, Graphics and Graphic Information, who says multiple pie charts are “completely useless.”

Evil Genius Marketing puts the final stake through the heart of the piechart:

A pie chart from Evil Genius Marketing showing East, West, and North at 31%, 33%, and 35%, respectively.

Rank these three numbers. EGM and I dare you.

And yet… Analytics is stuffed with piecharts.

A horrifically useless piechart from Google Analytics, displaying the difference between new and returning visitors.

This piechart uses 78,050 pixels to display a single fact–that 9.94% of all visitors had previously visited the site–resulting in a spectacular data-point-to-pixel ratio of 0.0013%. Using this methodology, Analytics could use the entirety of my monitor to display a meager 14 data points. What an astoundingly compact overview of my website.

If Analytics were a 10-foot interface, this would be excusable. Given the fact that it uses 10pt fonts, this is horrific.

Trading in laughs for charts and graphs

Most of the charts in Analytics use three dimensional shapes or have an oddly three-dimensional grid behind them. This is bad, since these dimensions of visualization do not convey any additional information. Instead, they require the viewer to discard elements of the visualization in order to accurately compare data. These Powerpoint-esque graphics make us all stupider for having seen them.

An ugly bar graph from Google Analytics, with a heavy grid and uselessly three-dimensional bars.

That high-pitch whine in your ears is your IQ dropping a notch.

Compare the faux-3D look of Analytics to the sparse utilitarianism of The Economist’s graphs:

A clean, sparse graph from The Economist, showing the decline of participation in trade unions over time in Britain, Germany, France, and the US.

Notice the subtle grid, the use of (but not dependence on) colors, the fact that both axes do not differ in increments based on the dataset, the lack of a legend (since the data is explained in context, and requires no memorization of arbitrary labeling schemes), and the high ratio of data to ink.

Also compare Analytics to Kyle Rove’s Fresh View, a plugin for Mint, Shaun Inman’s $30 web analysis package:

A graph from Fresh View for Mint, showing the number of unique and returning visitors over the past week.

Some guy in med school vs. The Biggest Search Engine Company Evar, and it’s the guy with no free time who manages to make a better way of displaying visitors over time. The only improvement I could suggest for Kyle’s plugin would be to put the information in the legend in context with the actual information.

Now what?

Okay, so Analytics looks like crap. Here’s how to fix it!

Unbreaking Analytics: So crazy it just might work

1. Piecharts are defeat in circular form.

First, no piecharts. At all. Period. As I’ve said, anything with a piechart simply reeks of bozocity, and we’re trying to act like grownups.

2. Comparison is your raison d’être.

Second, compare things. The essence of information visualization is comparison: how many vistors do I have compared to how many visitors I have had? What is the most common browser viewport width? How are my AdWords campaigns doing in terms of money in my pocket? Don’t just tell me what things are like right now–I can find that out by looking at the right side of the damn graph–tell me what things are like right now compared to the way they’ve been.

3. Simplify, simplify, simplify.

Third, if it’s a simple dataset, boil it down to the essentials. If there are two numbers which add up to 100%, you don’t need to tell me both, and you certainly don’t need to draw me a picture of it. I, like most people, have a pretty good grasp of how close 9.94% is to 100%. If you can’t reduce it to a single number, use a simple table. If you feel like making things fancy, add the browser icons to the table. When you start to deal with complicated data, like browser share over time, use a graph. Your rule of thumb should be “Only make a graph if the table doesn’t make sense, and only use a table if a single fact dosn’t explain it all.” Define and stick to a methodology which specifies a minimum data-ink ratio. Give Edward Tufte fistfuls of money to come in and slap the Analytics dev team with copies of his books.

4. Take the geeks off the leash.

Fourth, diversify the vis team. “Yes, well, it uses a reporting engine written by a blind misanthrope in the early 80s” does not adequately explain why a hugely important piece of software run by a hugely important internet company looks like Powerpoint and USA Today got drunk and made a website. C’mon, Google! Where are the bright people who solve problems for fun? Where are the boxless thinkers? Where are the supernerds? I want to see treemaps (square, circular, or otherwise) of my website’s contents vs. usage patterns, or hyperbolic trees or circos circular graphs or linkmaps or text visualization of search keywords vs. density or massive timelines or sparklines or keyword heatmaps or search engine ranking comparisons or… you get the picture. There are hundreds of wonderful, innovative ways of analyzing website usage, and Analytics is mired in the infovis Stone Age of Bar Graph, Line Graph, Pie Chart, Table.

Get ten of your best, most wild-eyed infovis people, feed them their choice of high-dosage caffeine or DMT, and give them two weeks to come up with a proof-of-concept vis mode for Analytics, either static or interactive. Marvel at the results.

5. Make sure your product helps real people make good decisions.

Once you’re done marveling, feed each infovis various sets of data, and bounce the results off a bunch of average webmasters, who will be shown their broken-headed creations and asked specific questions about the actual shape of the data. Whoever wrote the vis which helps the lab rats get the most right answers gets two thousand dollars and another belt of DMT. Rinse. Repeat.

Remember, that’s why you’re building this–to help people make decisions.

A shortcut to this process

Google is not constitutionally incapable of making beautiful information visualizations; on the contrary, Google Finance is one of the finest instances of infovis I’ve seen. Behold:
Google Finance

Find out where the Google Finance crew is hiding. Get the people who made financial markets visually intelligible and set them to work on Analytics.

A final word

The purpose of data visualization is to help people make decisions by allowing them to process data using their visual cortex in all its pattern recognizing glory. The purpose of Google Analytics is to record and present aggregate statistics about the composition and usage of your websites, and thus allow you to make data-driven decisions regarding your website.

Please, Google: fix Analytics. Now. It’s too important to let it linger, Google News-like, in despair and neglect for a few years.

13 comments »

37 signals, and nothing’s on

No wonder they disabled comments on 37signals’ blog Signal vs. Noise. For “a design and usability blog,” their content sure has been… uh… noisy, lately.

Some highlights:

Maybe the transition from a consulting company to a branded services company is a rough one. I wouldn’t be surprised. Thanks for Rails, guys. Now go work on Getting Real, or Finding Where The Beef Is, or Establishing What Willis Is Talking About, or some other catchphrase which indicates not only your proprietary relationship with reality but also your affinity for incorporating elements from 80s slang into Edgy Business Lingo.

Hmm… maybe it’s time to start floating that business book treatment I’ve been working on–Partying On: Constructing Bodacious Brands With Gerundial Phrases.

Until the interns stop posting LiveJournal-worthy material, SvN gets the boot from the blogroll. Buhweeted!

6 comments »

Bad Design Kills: Sweet mother of God, Flash-only?

What the hell? Bad Design Kills is a pompous site in which graphic designers, angling for some work, talk about how important design is.

True, but…

WHY THE HELL DID YOU MAKE THE SITE FLASH-ONLY!?!?!?!

Why would you do that?

Why?

Aiyah…

4 comments »

Making social networking software relevant: A Napkin Plan

Charlie over at This is going to be BIG! has very smart readers, one of whom–Gabe Morris–points out the flaws in existing social networking software. I’ve got some great ideas on how to fix this stuff, too.

How to Build a Social Network:

LinkedIn annoys people to the extent that it connects you without relevance. The basis for LinkedIn and Friendster’s automatic relevance is degrees of separation. But this has weaknesses – there are second degree contacts who I have very little in common with, while I am sure there are hundreds of people in the sixth degree and beyond that I would have plenty in common with.

Right now on Friendster I’ve got maybe 100 friends–the product of a drunken summer ‘03–very few of whom have anything in common. Some are just people I added not to be rude, some are bosom buddies, and some are acquaintances. Neither Friendster nor any other social networking software, to my knowledge, takes this into consideration. Instead, it lumps them all into the category “My Friends,” and recommends random selections from this hodge-podge of people to others as people they should get to know. How helpful. To solve this, a social networking site needs to provide people with the tools to quickly and easily describe their social networks. Easiest way to to this? Tags!

Here’s what I’m thinking. Each user gets to describe their contacts using tags, preferably tags which describe that contact’s relation to them. To seed this process, the software could have a set of recommended defaults: co-worker, ex, drinking-buddy, boss, annoying, douchbag, crush, meh, etc., etc. These tags need to be private, because otherwise it’s a public opinion, which limits the usefulness of the data. How many people want their boss to know that they tagged him with both “boss” and “douchebag?” You only get to see your own tags.

This would provide a better dataset to evaluate relevance: the software would recommend contacts which share the same tags as you. If you’re a bicycle nut, you get potential riding buddies; if you’re into radical feminism, you get hooked up with other femsexies; if you’re a douchebag, it’ll hook you up with all the other jerks. This shouldn’t be a deterministic process, however, otherwise it would limit recommendations to particular cliques. Weighting is essential, and I’m sure there’s some maths post-doc all full of coffee with a few ideas about how to tease further correlations out of this dataset.

So who wants to do this?

5 comments »