Wednesday, March 05, 2014

A Simple A/B Test for Visitor Talkback Stations

Let's say you create a station where visitors contribute content. You want their stories, their feedback, their colorful drawings of the future.

How do you measure success?

We've started using a very simple measure: the number of people who actually respond to the prompt. We look at the visitor contributions, and we code them either as responding to the prompt or doing something unrelated. Answer the question, and you're in. Make a scribble, and you're not. That's it.

Obviously, this does not give us the holy grail of success for a visitor talkback station. Each talkback is different. Sometimes success means deep, personal stories; other times, we value speculative argumentation or creative expression. Sometimes it means a large volume of responses; other times, we are looking for people with specific expertise to respond.

But in all cases, we want people to respond "appropriately"--whatever appropriate might mean for a given talkback.

The measure of whether people respond to the prompt appropriately is really a measure of us, not them. It measures whether the design of the talkback is sufficiently clear and compelling. This is especially useful in exhibitions or areas with multiple different talkbacks; it allows us to do A/B comparisons across talkbacks and learn which of our designs worked best (presumably, for the same group of visitors).

Consider three very different talkbacks in the Santa Cruz Museum of Art & History's fall exhibition, Santa Cruz is in the Heart: cocktail napkins, rear view mirrors, and refrigerator certificates.

Each of these talkbacks was very different.
  • The cocktail napkins were in an area about the demise of a beloved dive bar in Santa Cruz. We invited people to slide up to a bar and use a napkin to scrawl an answer to the question "How do you deal with loss?" This was the most popular talkback, with 541 responses in the three months of the exhibition.
  • The fridge was in an area about unsung heros in our community. We invited people to sit down at a modified kitchen table and make a certificate of accomplishment for someone they felt deserved to be honored. These certificates were less than half as popular as the napkins, with 221 completed. They took awhile to make, though--this was definitely the longest talkback activity. 
  • The rear view mirrors were mounted on the wall next to a story in a simulated car about looking back and seeing the past differently from an adult perspective. We offered people markers and invited them to write directly on the mirrors to complete the sentence "I look back and remember..." This was the least used talkback, with 120 responses. It wasn't easy to write much with a marker on the mirrors, and you had to be creative to come up with a response in just a couple words.
Here's the data on how people responded to the prompts (with thanks to Brandt Courtway, intern extraordinaire):
  • Cocktail napkins: 541 responses, 51% appropriate
  • Rear view mirrors: 120 responses, 52% appropriate
  • Fridge certificates: 221 responses, 72% appropriate
Clearly, the fridge was the big winner. While it was not the most-used talkback, it was the one where people were most likely to actually do what we asked of them.

This information surprised us. We used the data to interrogate what was unique about the design of the fridge talkback: the fact that it required a longer time commitment, that it had more involved setup and design, that the prompt was in the form of a "fill in the blank" instead of a question, and that the content was positive/uplifting (as opposed to the others, which focused more on nostalgia and sadness).

We consider this a good measurement because it is easy to collect the data, the result is non-obvious, and the result is useful in helping us improve our design techniques. A good measurement doesn't need to exhaustively answer every single question about a project. It just needs to provide information you can actually use to do better.

I'm curious what "single measure" tests you are using to compare projects and improve your practice. What simple number has changed your work?

Also, a sidenote. We asked Brandt to also count any responses that were "aggressive"--swear words, violent language, etc. Total number across all three talkbacks: 0.
blog comments powered by Disqus