Are these the rocks your stakeholders need? Image courtesy of Alicia Dudek

The Parable of the Rock Project: Why Data Scientists Need Ethnographic Skills

Special thanks to Richard Beckwith, who first warned me about Rock Projects.

Are these the rocks your stakeholders need? Image courtesy of Alicia Dudek

Are these the rocks your stakeholders need? Image courtesy of Alicia Dudek

“Bring me a rock,” the stakeholder says. “I’ve spent a lot of money on this great big mountain, and I need you to bring me a rock from this mountain that makes the expense worthwhile.”

“Okay, a rock. Gotcha,” the analyst says. “A rock I can do. I am all about rocks. What kind of rock are you looking for?”

“Oh, I don’t know,” the stakeholder responds. “A rock. A rock that will change the way we see things. An Important Rock.”

“Important rock. Check. Coming right up,” the analyst agrees.

The analyst goes off to the mountain. She hammers at it and bashes it with a pickax and washes away dust and gravel until finally she finds it–an Important Rock. Delighted, she picks up the Important Rock and races down the mountain to show the stakeholder.

“I’m back!” the analyst cries. “I’ve returned from the mountain with an Important Rock, just like you asked for!”

The stakeholder takes the rock and inspects it.

“We can do a lot of great and interesting things with this Important Rock,” the analyst continues. “Don’t you think it’s a good one?”

“Well,” the stakeholder turns the rock over in her hands. “I’m not sure. It’s just… It’s not quite what I have in mind. It may be that this is an Important Rock, but I think I would like a different rock. Could you go back to the mountain and bring me another one?”

“Of course, I can do that right away,” the analyst agrees, eager to please. “Could you tell me a bit more about the rock you want? Are we talking a sedimentary rock? Metamorphic? Maybe a crystal?”

“Oh, you know, I’m not exactly sure. But I know you’ll find it. I bought this whole mountain, after all.”

The analyst packs up her tools and returns to the mountain. She hammers and she chisels and she blows up some boulders until finally she finds another Important Rock. It is different than the first Important Rock, and she thinks it’s even better! Yes, this is The Rock that the stakeholder wants. Off she goes, back down the mountain to show the stakeholder.

“Check out this Important Rock!” the analyst crows. “It’s significantly more awesome than the first Important Rock. Look at all of its great rock-like features!”

“That is a very good rock,” the stakeholder agrees. “Under other circumstances, it might even be The Rock that I need. But things being what they are, I’m not sure this is the rock for me. Did you find any other good rocks while you were up there? Would you mind going back to check?”

“Well, this rock is a Very Good Rock,” the analyst says, a bit crestfallen. “I’m not sure that the mountain is going to produce many more rocks this good. But, I’ll go back up and take a look.”

The analyst packs up her tools and returns to the mountain. By now, she’s familiar with the mountain’s idiosyncrasies, so she puts in some major infrastructure. She digs a mine, deep into the mountain. She hammers away at the heart of the mountain, her headlamp the only light illuminating potential rocks as she inspects them. Not that rock, not that one, not that one either… She had just about given up hope when the circle of light from her headlamp passed over it–the Perfect Rock. It’s so far superior to the rocks she found previously that she’s a little embarrassed to have brought those rocks to the stakeholder. This rock–it’s a Truly Great Rock.

Down from the mountain she races, cradling The Perfect Rock in her outstretched palms.

“Stakeholder!” the analyst shouts, exuberant. “Have I got a Truly Great Rock for you, or what! Check out its ultra-fine rock qualities! It is so much better than the previous rocks–it’s practically a rock star!”

“Wow, that is a nice rock,” the stakeholder says. “But, well, I’m still not sure that this is the rock I had in mind. This mountain was really an investment, I just want to make sure we’re making the most of it. Perhaps try the other side of the mountain. I heard that other people were having good luck finding rocks on that side of the mountain range.”

And so the analyst packs up her tools and returns to the mountain. This cycle continues, with the analyst bringing new Important Rocks to the stakeholder only to see the rocks rejected, until the analyst snaps and pelts the stakeholder with rocks or the stakeholder sells the mountain to rock prospectors for a loss.

Preventing Rock Projects

No one wants to be involved with a Rock Project. The analyst is frustrated, the stakeholder is disappointed, and no one is getting what they want from the collaboration. So how can we prevent Rock Projects?

Rock projects happen when we try to extract value from data without defining what value we’re trying to extract. Too often, data scientists expect stakeholders to provide them with the questions (rock descriptions) to drive the search for insights (Important Rocks). While it’s great when stakeholders start projects with specific outcomes in mind, often they don’t know what is even possible to accomplish with data–nor should they be expected to. It is our responsibility, as data scientists, to identify their needs and let those needs drive the analysis.

Ethnography, the practice of understanding people in their own contexts (usually through interviews and observation), excels in the identification of needs. Understanding your stakeholders in their context will help you hone in on the kinds of insights that matter to them, which will save you a whole lot of running up and down the mountain. Ask your stakeholders questions, not just about the data, but about the organization surrounding the data. Listen deeply and learn both their goals and their concerns about the project. Observe their current processes, looking for opportunities to incorporate data-driven design.

As data scientists, we’re always honing our technical tool kit, eager to dive into data. Investing the time to build your ethnographic skills and deeply understand your stakeholders in their context will pay dividends, however, in avoiding Rock Projects.

Looking to build your ethnographic skills? I highly recommend Steve Portigal’s book Interviewing Users.

Unstructured data is like a pile of silverware.

Structure: A Better Way of Thinking about Data

There are many different ways to classify data, but one classification that I hear frequently is “quantitative” versus “qualitative”. This can be a useful classification, at least in the context of determining what statistical analyses are appropriate for specific variables in the data. However, those classifications are being applied more and more broadly, as shorthand for other attributes of datasets. Quantitative is often used to mean “data collected by computers”, and is assumed to be consistent, objective, and reductive; qualitative is often used to mean “data collected by humans” and is assumed to be inconsistent, subjective, and rich.

This shorthand is sloppy at best; at worst, it is misleading, inaccurate, and obscures actual information about the data that would help a listener understand what analyses are appropriate for the dataset.

Fortunately, there is an alternative. Classifying data by its structure both avoids potentially false implications about the data while also giving a listener good information about what analysis methods may be appropriate for that data.

What is structure?

Structure is a consistent underlying organization. This consistent organization is the quality that makes it easier to search, transform, and analyze structured data. Unstructured data has no consistent underlying organization, which makes it more difficult to search, transform, and analyze.

Unstructured data is like a pile of silverware.

Unstructured data is like a pile of silverware. Accessing particular kinds of silverware in the pile requires inspecting all the silverware.

Unstructured data is like a pile of silverware at a flea market. If I asked you to pull all of the salad forks out of the pile of silverware, it would take a while. You would need to pick up each piece of silverware, determine whether or not it was a salad fork, and place the salad forks in a separate pile.

Structured data is like silverware in an organizer. Accessing a particular kind of silverware is as straightforward as reaching into a cubby.

Structured data is like silverware in an organizer. Accessing a particular kind of silverware is as straightforward as reaching into a cubby.

Structured data is like silverware in a silverware organizer. If I asked you to pull all the salad forks out of a silverware organizer, you would only need to reach into the salad fork cubby and pull out the stack of salad forks. If I asked you to pull out a particular spoon, you would only need to search through the teaspoon cubby, which contains only a small percentage of the silverware items in the drawer. With the unstructured pile of silverware, finding a particular spoon would require inspecting all of the silverware in the pile individually.

Most of the data we interact with on the Internet exists somewhere in between “highly structured” and “totally unstructured”. Images, status updates, books, web pages, videos, and countless other types of data have some consistent underlying organization. A file format, like jpeg or bitmap, is a consistent organization that a computer uses to recognize and display the data present in a file. Books have titles, authors, and pages of content–that’s all structure. However, the bulk of the data in those items is unstructured.

Generally speaking, computational analysis methods require structured data. It’s easy for your computer to order your images by creation date, because that information is included in an image’s structured data. It’s very difficult for your computer to identify all the images containing birds, because that part of the image information is not structured.

Unstructured data can be transformed into structured data. The process can be labor intensive and often requires human intervention, but depending on your analysis needs, it may be worthwhile. Image tagging is a good example of adding useful structure to unstructured data. For instance, if you went through your image collection and added tags to each indicating what was in the images, it would then be straightforward for your computer to identify all the images containing birds. When structuring data, it is useful to retain the original, unstructured version. This is important not only because data may be lost in transformation, but also because the structure that is appropriate for one type of analysis may not be appropriate for another.

Ready to start classifying data by its structure? Here’s a quick-reference for your future data-discussing pleasure!


 Shoutout to my former classmate Jason Foss, who graciously provided the silverware organizer metaphor for structured data. Thanks, Jason!