This image shows a proposed change to the Trimet Rail System, in which the Yellow line and Green line colors are swapped. This solution works because the Green-Line-Formerly-Known-As-Yellow shares no stops with the Red Line.

A Simple Color Change Would Improve TriMet Accessibility

Selecting appropriate colors is one of the most challenging parts of information design. Color is a great, engaging way to encode information–but poorly chosen colors can make a design difficult or even impossible to use. This is especially true for the approximately 8% of men and 1% of women in your audience who are colorblind. We don’t want our designs to alienate people, so how do we choose colors with an eye for accessibility?

A first step is to be aware that red/green colorblindness (deuteranopia) is the most common form of colorblindness; people with red/green colorblindness have difficulty distinguishing between red and green. While the cultural symbol of stop lights makes red and green a tempting candidate for encoding information, it is a poor choice and should not be used.

There are also tools available to help you evaluate how well your designs hold up for people with colorblindness. My favorite is Color Oracle, a cross-platform tool that simulates common forms of colorblindness. Test your already-released designs to see if they need improved accessibility, and then use it regularly throughout the process of design and development so that you don’t find yourself with a disappointing surprise right before your planned release. Simple color changes can make a big difference for nearly 10% of your audience!

A map of Portland's TriMet train system, as it would appear to someone without colorblindness.

A map of Portland’s TriMet train system, as it would appear to someone without colorblindness.

Portland’s MAX system provides a good example of a simple color change that would improve accessibility of the MAX system for people with colorblindness. For the most part, they’ve done a good job selecting colors–most of the map does not require distinguishing between red and green in order to navigate the transit system. However, there is one problem area: a part of NE Portland where the Red and Green MAX lines run parallel to each other, sharing all the same stops.

railsystem_color_vision_problem_area

This image shows the section of the TriMet rail lines that may be problematic for colorblind people to navigate, because the Red and Green lines run parallel to each other and share all the same stops.

This is a Color Oracle simulation of how the problematic section of the TriMet rail map would appear to someone with deuteranopia, the most common form of colorblindness.

This image shows a Color Oracle simulation of how the problematic section of the TriMet Rail lines would appear to someone with deuteranopia. Under ideal lighting conditions, a person might be able to distinguish between the Red and Green lines by color, but conditions are rarely ideal.

This image shows a Color Oracle simulation of how the problematic section of the TriMet Rail lines would appear to someone with deuteranopia.

Under ideal lighting conditions, a person with deuteranopia might be able to distinguish between the Red and Green lines on the rail map by color, but conditions are rarely ideal. Further, the rail map is not the only place this problem appears. Consider the sign at the Convention Center MAX station.

A sign from the Convention Center MAX station showing the Blue, Green, and Red eastbound lines.

A sign from the Convention Center MAX station showing the Blue, Green, and Red eastbound lines.

A Color Oracle simulation of how the Convention Center MAX sign appears to people with deuteranopia. It is very difficult to distinguish between the Red and Green line symbols on the sign.

A Color Oracle simulation of how the Convention Center MAX sign appears to people with deuteranopia. It is very difficult to distinguish between the Red and Green line symbols on the sign. At night, in poor lighting conditions, it may be impossible.

Now that we’ve established that the current rail line coloring creates accessibility problems, we can make design changes to improve the system. In this case, there’s a straightforward solution: swap the colors of the Yellow and Green lines.

This image shows a proposed change to the TriMet Rail System, in which the Yellow line and Green line colors are swapped. This solution works because the Green-Line-Formerly-Known-As-Yellow shares no stops with the Red Line.

This image shows a proposed change to the TriMet Rail System, in which the Yellow line and Green line colors are swapped. This solution works because the Green-Line-Formerly-Known-As-Yellow shares no stops with the Red Line.

This image shows the Color Oracle simulation of how the TriMet rail map would appear to a person with deuteranopia after the Yellow line and Green line colors are swapped. Note that it is dramatically easier to distinguish between the Red Line and the Yellow-Line-Formerly-Known-As-Green in the area where they share stops.

This image shows the Color Oracle simulation of how the TriMet rail map would appear to a person with deuteranopia after the Yellow line and Green line colors are swapped. Note that it is dramatically easier to distinguish between the Red Line and the Yellow-Line-Formerly-Known-As-Green in the area where they share stops.

Practically speaking, swapping the colors of the Yellow and Green lines would not be free–signs and maps would need to be updated, a publicity campaign would need to be launched. However, in pursuit of a public transit system that is easy for all users to navigate, it is money worth spending. Problems like this can be avoided in the future by carefully considering how color choices impact all users throughout the design process.

Are these the rocks your stakeholders need? Image courtesy of Alicia Dudek

The Parable of the Rock Project: Why Data Scientists Need Ethnographic Skills

Special thanks to Richard Beckwith, who first warned me about Rock Projects.

Are these the rocks your stakeholders need? Image courtesy of Alicia Dudek

Are these the rocks your stakeholders need? Image courtesy of Alicia Dudek

“Bring me a rock,” the stakeholder says. “I’ve spent a lot of money on this great big mountain, and I need you to bring me a rock from this mountain that makes the expense worthwhile.”

“Okay, a rock. Gotcha,” the analyst says. “A rock I can do. I am all about rocks. What kind of rock are you looking for?”

“Oh, I don’t know,” the stakeholder responds. “A rock. A rock that will change the way we see things. An Important Rock.”

“Important rock. Check. Coming right up,” the analyst agrees.

The analyst goes off to the mountain. She hammers at it and bashes it with a pickax and washes away dust and gravel until finally she finds it–an Important Rock. Delighted, she picks up the Important Rock and races down the mountain to show the stakeholder.

“I’m back!” the analyst cries. “I’ve returned from the mountain with an Important Rock, just like you asked for!”

The stakeholder takes the rock and inspects it.

“We can do a lot of great and interesting things with this Important Rock,” the analyst continues. “Don’t you think it’s a good one?”

“Well,” the stakeholder turns the rock over in her hands. “I’m not sure. It’s just… It’s not quite what I have in mind. It may be that this is an Important Rock, but I think I would like a different rock. Could you go back to the mountain and bring me another one?”

“Of course, I can do that right away,” the analyst agrees, eager to please. “Could you tell me a bit more about the rock you want? Are we talking a sedimentary rock? Metamorphic? Maybe a crystal?”

“Oh, you know, I’m not exactly sure. But I know you’ll find it. I bought this whole mountain, after all.”

The analyst packs up her tools and returns to the mountain. She hammers and she chisels and she blows up some boulders until finally she finds another Important Rock. It is different than the first Important Rock, and she thinks it’s even better! Yes, this is The Rock that the stakeholder wants. Off she goes, back down the mountain to show the stakeholder.

“Check out this Important Rock!” the analyst crows. “It’s significantly more awesome than the first Important Rock. Look at all of its great rock-like features!”

“That is a very good rock,” the stakeholder agrees. “Under other circumstances, it might even be The Rock that I need. But things being what they are, I’m not sure this is the rock for me. Did you find any other good rocks while you were up there? Would you mind going back to check?”

“Well, this rock is a Very Good Rock,” the analyst says, a bit crestfallen. “I’m not sure that the mountain is going to produce many more rocks this good. But, I’ll go back up and take a look.”

The analyst packs up her tools and returns to the mountain. By now, she’s familiar with the mountain’s idiosyncrasies, so she puts in some major infrastructure. She digs a mine, deep into the mountain. She hammers away at the heart of the mountain, her headlamp the only light illuminating potential rocks as she inspects them. Not that rock, not that one, not that one either… She had just about given up hope when the circle of light from her headlamp passed over it–the Perfect Rock. It’s so far superior to the rocks she found previously that she’s a little embarrassed to have brought those rocks to the stakeholder. This rock–it’s a Truly Great Rock.

Down from the mountain she races, cradling The Perfect Rock in her outstretched palms.

“Stakeholder!” the analyst shouts, exuberant. “Have I got a Truly Great Rock for you, or what! Check out its ultra-fine rock qualities! It is so much better than the previous rocks–it’s practically a rock star!”

“Wow, that is a nice rock,” the stakeholder says. “But, well, I’m still not sure that this is the rock I had in mind. This mountain was really an investment, I just want to make sure we’re making the most of it. Perhaps try the other side of the mountain. I heard that other people were having good luck finding rocks on that side of the mountain range.”

And so the analyst packs up her tools and returns to the mountain. This cycle continues, with the analyst bringing new Important Rocks to the stakeholder only to see the rocks rejected, until the analyst snaps and pelts the stakeholder with rocks or the stakeholder sells the mountain to rock prospectors for a loss.

Preventing Rock Projects

No one wants to be involved with a Rock Project. The analyst is frustrated, the stakeholder is disappointed, and no one is getting what they want from the collaboration. So how can we prevent Rock Projects?

Rock projects happen when we try to extract value from data without defining what value we’re trying to extract. Too often, data scientists expect stakeholders to provide them with the questions (rock descriptions) to drive the search for insights (Important Rocks). While it’s great when stakeholders start projects with specific outcomes in mind, often they don’t know what is even possible to accomplish with data–nor should they be expected to. It is our responsibility, as data scientists, to identify their needs and let those needs drive the analysis.

Ethnography, the practice of understanding people in their own contexts (usually through interviews and observation), excels in the identification of needs. Understanding your stakeholders in their context will help you hone in on the kinds of insights that matter to them, which will save you a whole lot of running up and down the mountain. Ask your stakeholders questions, not just about the data, but about the organization surrounding the data. Listen deeply and learn both their goals and their concerns about the project. Observe their current processes, looking for opportunities to incorporate data-driven design.

As data scientists, we’re always honing our technical tool kit, eager to dive into data. Investing the time to build your ethnographic skills and deeply understand your stakeholders in their context will pay dividends, however, in avoiding Rock Projects.

Looking to build your ethnographic skills? I highly recommend Steve Portigal’s book Interviewing Users.

Unstructured data is like a pile of silverware.

Structure: A Better Way of Thinking about Data

There are many different ways to classify data, but one classification that I hear frequently is “quantitative” versus “qualitative”. This can be a useful classification, at least in the context of determining what statistical analyses are appropriate for specific variables in the data. However, those classifications are being applied more and more broadly, as shorthand for other attributes of datasets. Quantitative is often used to mean “data collected by computers”, and is assumed to be consistent, objective, and reductive; qualitative is often used to mean “data collected by humans” and is assumed to be inconsistent, subjective, and rich.

This shorthand is sloppy at best; at worst, it is misleading, inaccurate, and obscures actual information about the data that would help a listener understand what analyses are appropriate for the dataset.

Fortunately, there is an alternative. Classifying data by its structure both avoids potentially false implications about the data while also giving a listener good information about what analysis methods may be appropriate for that data.

What is structure?

Structure is a consistent underlying organization. This consistent organization is the quality that makes it easier to search, transform, and analyze structured data. Unstructured data has no consistent underlying organization, which makes it more difficult to search, transform, and analyze.

Unstructured data is like a pile of silverware.

Unstructured data is like a pile of silverware. Accessing particular kinds of silverware in the pile requires inspecting all the silverware.

Unstructured data is like a pile of silverware at a flea market. If I asked you to pull all of the salad forks out of the pile of silverware, it would take a while. You would need to pick up each piece of silverware, determine whether or not it was a salad fork, and place the salad forks in a separate pile.

Structured data is like silverware in an organizer. Accessing a particular kind of silverware is as straightforward as reaching into a cubby.

Structured data is like silverware in an organizer. Accessing a particular kind of silverware is as straightforward as reaching into a cubby.

Structured data is like silverware in a silverware organizer. If I asked you to pull all the salad forks out of a silverware organizer, you would only need to reach into the salad fork cubby and pull out the stack of salad forks. If I asked you to pull out a particular spoon, you would only need to search through the teaspoon cubby, which contains only a small percentage of the silverware items in the drawer. With the unstructured pile of silverware, finding a particular spoon would require inspecting all of the silverware in the pile individually.

Most of the data we interact with on the Internet exists somewhere in between “highly structured” and “totally unstructured”. Images, status updates, books, web pages, videos, and countless other types of data have some consistent underlying organization. A file format, like jpeg or bitmap, is a consistent organization that a computer uses to recognize and display the data present in a file. Books have titles, authors, and pages of content–that’s all structure. However, the bulk of the data in those items is unstructured.

Generally speaking, computational analysis methods require structured data. It’s easy for your computer to order your images by creation date, because that information is included in an image’s structured data. It’s very difficult for your computer to identify all the images containing birds, because that part of the image information is not structured.

Unstructured data can be transformed into structured data. The process can be labor intensive and often requires human intervention, but depending on your analysis needs, it may be worthwhile. Image tagging is a good example of adding useful structure to unstructured data. For instance, if you went through your image collection and added tags to each indicating what was in the images, it would then be straightforward for your computer to identify all the images containing birds. When structuring data, it is useful to retain the original, unstructured version. This is important not only because data may be lost in transformation, but also because the structure that is appropriate for one type of analysis may not be appropriate for another.

Ready to start classifying data by its structure? Here’s a quick-reference for your future data-discussing pleasure!

structured_data_300

 Shoutout to my former classmate Jason Foss, who graciously provided the silverware organizer metaphor for structured data. Thanks, Jason!

A grilled cheese sandwich on a wooden cutting board, next to a knife. Creative Commons image by Mack Male on Flickr.

Dividing Grilled Cheese: A Metaphor for Centralized and Decentralized Systems

A grilled cheese sandwich on a wooden cutting board, next to a knife. Creative Commons image by Mack Male on Flickr.

A system for dividing a grilled cheese sandwich could be implemented in a centralized or decentralized fashion. Image by Mack Male, used in accordance with the Creative Commons license.

We’ve been working with a lot of decentralized systems here at Akashic Labs recently. But we’ve sometimes struggled to explain them to people who don’t spend their days up to their eyeballs in decentralized system design. Today, we’ve happened upon a metaphor that we’re satisfied with (and not just because it involves cheese).

Let’s say there’s a grilled cheese sandwich that needs to be divided between two hungry kids.

In a centralized system, a babysitter cuts the sandwich in half and distributes a half to each kid. The babysitter acts as an authority, arbitrating the resource allocation between the two kids.

In a decentralized system, the kids play a round of Rock Paper Scissors. The winner of the round of Rock Paper Scissors divides the sandwich; the loser gets to choose which half of the sandwich they want. The system has no central authority; it provides a framework for the users to negotiate use amongst themselves.

Both styles of systems have advantages and disadvantages. Centralized systems are often more straightforward to design, but rely on the central authority to act in good faith. If our babysitter is actually a witch and our two kids are Hansel and Gretel, for instance, the babysitter could distribute a larger half of the sandwich to Hansel with the intention of plumping him up to eat him. Not only is this unfair to Gretel, who ends up with a smaller share of the sandwich, it probably won’t work out well for Hansel, either. This is one of the reasons that transparency is so vital in centralized systems.

Additionally, the authority in a centralized system represents a single point of failure for the system. If the authority is incapacitated, resource allocation must be handled by system users in an ad hoc manner and may break down entirely. We’d hate for the kids to come to blows over the grilled cheese.

Decentralized systems present greater design challenges, but have the significant advantage of robustness. There is no single point of failure; the system continues as long as there are users implementing the system’s protocol.

However, because of the lack of a central authority, it can be difficult to deal with bad actors using the system. In a centralized system, if one kid steals the other kid’s half of the sandwich, the authority can step in and force the thief to return the sandwich. Identifying and punishing bad actors in a decentralized system can be much more challenging. If our decentralized sandwich division protocol is not well-specified, one child could intentionally cut the sandwich into multiple pieces in such a way that it would be difficult for the other child to judge the size of the pieces. Without a centralized authority, there would be no recourse for the choosing child, except to refuse to participate in the system with the bad actor in the future. For this reason, it’s important to design decentralized systems with bad actors in mind.

PDX Design Research Group Talk: Edge Case–Adventures in Hybrid User Research


From the session description:

Big Data is certainly having a moment. Proponents promise that data will serve up answers to all of our most pressing questions. But what happens when all that data leads to a dead end? Rachel Shadoan, CEO and research scientist at Akashic Labs, tells the story of the data dead end that put her on the path from data science to ethnography–and where that path has led since. Through a variety of case studies featuring the blending of qualitative and quantitative research methodologies, Shadoan shows where data shines, where qualitative research techniques are necessary, and how visualization can bridge the gap between the two.

Download the slides here!

An algorithm is a finite, ordered list of steps that takes inputs and produces outputs.

Why Algorithm Transparency is Vital to the Future of Thinking

The internet presents one of the greatest opportunities in human history for sharing and cultivating knowledge. The technologies leveraging those opportunities help us think better, do better, and as a result become tightly integrated into our lives. Billions of people already rely heavily (or even exclusively) on internet technologies to exchange information and nurture relationships.  However, our access to information in the digital landscape is controlled by black-box algorithms over which we have limited control. This algorithm opacity endangers our ability to access and share information, which in turn limits our opportunities for knowledge, connection, and growth. Algorithm transparency offers a new way forward.

What is an algorithm?

An algorithm is a finite, ordered list of steps that takes inputs and produces outputs.

An algorithm is a finite, ordered list of steps that takes inputs and produces outputs.

In the most general terms, an algorithm is simply an ordered list of steps that, when given inputs, produces outputs. While algorithms are typically associated with computing, many of the analogue processes we engage in every day are algorithms. Recipes are a good example of this.

Below, we present a recipe for a smoothie. The smoothie algorithm takes one each of three different kinds of input: a frozen fruit, a fluid, and additional flavorings. The algorithm has two steps: combining the inputs in the blender, and blending to the desired consistency. The algorithm may also have settings or control surfaces that we can adjust to control some part of the process, like the blending speed. When all the steps have been executed, the algorithm output is a smoothie! Inputs of frozen strawberries, yogurt, and basil, for example, produce the output of a strawberry-basil smoothie.

A smoothie recipe is an algorithm. It has two steps--combining the inputs in a blender, and then blending until smooth. It takes ingredients as inputs--a frozen fruit, some kind of fluid, and flavorings, and produces a smoothie as output. Operating the smoothie algorithm on inputs of frozen mango, yogurt, and vanilla would produce the output of a vanilla-mango smoothie.

A smoothie recipe is an algorithm. It has two steps–combining the inputs in a blender, and then blending until smooth. It takes ingredients as inputs–a frozen fruit, some kind of fluid, and flavorings, and produces a smoothie as output. Operating the smoothie algorithm on inputs of frozen mango, yogurt, and vanilla would produce the output of a vanilla-mango smoothie.

An important consideration about algorithms is that their outputs depend on their inputs. If we have no ingredients to input to our smoothie algorithm, the algorithm outputs no smoothie. Similarly, bad inputs produce bad outputs. If we input frozen mistletoe berries, motor oil, and shavings from a scented candle (a frozen fruit, a fluid, and a flavoring, respectively) into our smoothie algorithm, it would produce something very poisonous that happened to have the texture of a smoothie. However, without knowing that the inputs were bad, we would not know that the resulting smoothie was bad. The ability to evaluate the quality and validity of inputs to an algorithm is essential to our ability to assess the quality and validity of the outputs of an algorithm.

The ability to evaluate the quality of inputs is especially important because we are algorithms. Our brains take environmental stimulus as input, run it through a variety of complex biological processes, and output thoughts. This has two important implications. First, we cannot think about things we have not been exposed to. If we have never encountered something, we cannot produce thoughts about it. Second, in order to evaluate the quality of our thoughts, we must evaluate the quality of the inputs to our brains. As so many of the inputs to our brains are now produced by algorithms, the opacity of those algorithms interferes with our ability to evaluate the quality of our thoughts.

What makes an algorithm transparent?

A transparent algorithm is one that facilitates scrutiny of itself. Specifically, a transparent algorithm facilitates scrutiny of:

  • Inputs. If an someone is sneaking motor oil into our smoothie, we want to be aware of that. A transparent algorithm allows a user to investigate all of the inputs.
  • Control surfaces. If there are settings to control the way an algorithm executes its steps, those settings should be clearly identified and the resulting impact on outputs clearly described.
  • Algorithm steps and internal state. The processes that the algorithm executes and its internal state must be open to the user. If our blender was a opaque algorithm that used child labor to produce our smoothies, we need to know that.
  • Assumptions and models the algorithm uses. Algorithms make certain assumptions about the inputs they will receive, as well as what the user wants. These assumptions need to be described in detail, so that users can evaluate whether the algorithm is producing results in line with their needs.
  • Justification for outputs produced. For any given output, a user needs to be able to answer the question, “Why was this output produced from the inputs I provided?” If, for instance, we find that our smoothie algorithm is producing smoothies full of wood chips, we should be able to find out why. Perhaps the answer is that our roommate was storing wood chips in the frozen strawberry bag, or perhaps the answer is that our blender throws in bonus wood chips for every 100th smoothie. Either way, that information must be available to the user.

If an algorithm does not meet the above criteria, it is (to varying degrees) opaque. In my reasoning, an algorithm cannot be closed-source and transparent at the same time. While good documentation of the algorithm is necessary for transparency, it is not sufficient. An organization can say anything they want about their algorithm in their documentation; without the ability to scrutinize the source code, it is impossible to say if their documentation accurately represents their algorithm.

What does this mean for information access in the internet age?

Much of our education is dedicated to learning algorithms to help us access new information. Before the internet, this included things like dictionaries, atlases, card catalogs, and encyclopedias. Each of those data stores had specific algorithms that we used to access particular pieces of information. Our algorithm for learning a new definition, for example. used a dictionary. The input is an unfamiliar word. We open the dictionary to the section for words starting with the same letter as the unfamiliar word we are looking up. We then flip through the pages in the section, looking for words with the same first two letters as our unfamiliar word, then for words with the same first three letters. We continue this process until we have located our input word, which will be adjacent to the algorithm’s output–the input word’s definition.

The importance of those information access algorithms is not that they are analog, though that does make them charmingly retro. What is important is that the algorithms were transparent to those using them and explicitly taught.

The internet has dramatically changed the way we access information, however. Consider the problem of navigating to a new place. Before the internet we would pull out a map and plot a route based on whatever parameters were important to us. Now, a variety of services provide automatically generated routes and live directions guided by the GPS on your phone or GPS device. However, from a user’s perspective, it is impossible to know why a particular route is chosen. I largely use Google Maps for this purpose, and while I assume that the routes are suggested based on some combination of shortest distance/least time, the algorithm is opaque to the user and cannot be known. It is entirely possible that the algorithm that generates routes intended to take the user by the greatest number of doughnut shops. While this seems unlikely, it is not implausible; a free service provided by a for-profit entity has incentive to generate revenue in other ways, such as charging businesses to increase drive-by traffic. As long as the algorithm is opaque, this possibility cannot be discounted. As the routes you follow impact what you know about your surroundings, it is important to know how those routes are generated.

Increasingly, we use the internet to manage relationships as well as to seek information. Over one billion people–more than 14% of the entire population of the world–have signed up for Facebook. I personally rely on Facebook to help me maintain relationships with some 400 people. My primary means of interacting with people on Facebook is the News Feed–after all, checking each of those 400 pages individual would be quite a chore. However, the algorithm that controls what appears in my News Feed is very opaque. There are settings, of course–I can ban posts from particular sources, or set it to only show a friend’s “most important updates”, but no indication is given as to what is considered an important update. No discussion is provided as to how my interactions on Facebook further alter what appears in my News Feed. We develop hypotheses, of course–my feed appears to show more posts from people I interact with more often. But these hypotheses could easily be confirmation bias. We are left to wonder: how many announcements have I missed because Facebook and I disagree on what is considered important? How many opportunities to strengthen relationships have I lost because Facebook’s opaque News Feed algorithm is making choices for me, without informing me of those choices? If I don’t know who is being excluded from my News Feed and why, then I have no way to know who I need to check in with manually. The silent, opaque filtering produces a field of vision full of blind spots. Facebook has evolved from allowing us to nurture relationships in new ways to actually shaping–and potentially limiting–our interactions in unknown ways. This is especially problematic in light of their recent emotional contagion study; the opacity of the News Feed algorithm leaves it wide open for ethically questionable manipulation by the company.

It is the opacity of search engines that seems the greatest concern, however. The internet–this great, glorious store of information that changes the game in so many ways–is primarily accessed via free search services provided by corporations. We type what we want into the search bar, and results appear–“automagically”. Why certain results are returned, why results are returned in a particular order, why certain results are not returned–those things are inscrutable. Google did publish their first search algorithm, PageRank, but it is by no means the only algorithm they use. The Google website describing search says, “Today Google’s algorithms rely on more than 200 unique signals or “clues” that make it possible to guess what you might really be looking for. These signals include things like the terms on websites, the freshness of content, your region and PageRank.” Beyond that scant listing, we can only guess at what those 200 signals might be.

In a perfect, “Don’t be evil” world, the opacity of search engines might work out. However, the world is rarely perfect. In the current internet ecosystem, we–the users–are not customers. We are product, packaged and sold to advertisers for the benefit of shareholders. This, in combination with the opacity of the algorithms that facilitate these services, creates an incentive structure where our ability to access information can easily fall prey to a company’s desire for profit. Right now, it appears that Google generates search rankings based, to the best of their ability, on what they think the user wants. However, there is nothing in place to stop them from, for example, accepting payment for boosting a page’s position in the ranking. Already certain search results disappear at the behest of governments; what else disappears or appears in a different order that we do not know about? Algorithm opacity creates the opportunity for unknown restrictions to be placed on our ability to access information. Allowing corporate entities to use opaque algorithms to provide the primary window into an incredibly important information resource is like using a secret, invisible alphabet to organize a dictionary. Opaque algorithms are bad for securing our future access to information, and especially they’re bad for the future of thinking.

Why are opaque algorithms so bad?

As I discussed above, our brains are algorithms–sets of complex biological steps that input information from our environments and output thought. Like other algorithms, however, the quality of inputs to our brains impacts the quality of the outputs of our brains. In order to be able to assess the quality of our brains’ outputs, we much be able to assess the quality of the inputs. Using opaque algorithms to access information interferes with our ability to assess the quality of that information as an input. Opaque algorithms decontextualize information; without that context, we have difficulty developing models of what we know we don’t know. Without that context, we don’t know what information was available to be accessed, and why a particular piece of information was selected over others. Without that context, we are unable to map the blind spots in our knowledge. By filtering the inputs we receive–by choosing information for us–these algorithms shape our thoughts. We must understand the what and how of the shaping to be able to reason about what we do and do not know.

Opaque algorithms are not just bad for their potential for abuse and for our ability to reason. They’re also bad for equality. They foster the creation of an elite class–those who, by privilege of education or experimentation, have developed better mental models to describe the inner workings of the opaque algorithms. That elite class can leverage their better understanding of the algorithm to reason better about their own thoughts, which in turn allows them to refine their models, and so on. Individuals not afforded the privilege of a solid understanding of the opaque algorithms are less able to leverage those algorithms to access information or understand the context of the information being accessed. This, in turn, hampers their ability to reason about the quality of the information they’re receiving and the resulting thoughts their brains generate. The cycle is self-reinforcing, and a techno-elite is born; we are separated into those who understand the algorithms, and those who do not. This is no way to secure access to a common resource as important as the internet.

Addressing Common Developer Protestations of Transparency

Algorithm opacity did not come around just because some developers thought it would be fun to endanger our ability to reason about information. It is, of course, nothing so nefarious as that. Opaque algorithms are convenient for many reasons. They discourage competition and spammers, they reduce the overhead of updating user-facing documentation, they make it simple to add new features or make changes. For those reasons and many others, algorithm transparency is a difficult sell. Here are some of the common protestations I’ve encountered, with my responses.

  • But my algorithm is already open-source, isn’t that enough? Unfortunately, no. Most of our users will never look at our source code (though, as stated previously, they should be able to). Excellent, well-written documentation of your algorithm is necessary for transparency. As an added bonus, it will not only make it easier for new developers to get involved with your project, it will also protect your project from ruin if all of your project’s developers are hit by a bus tomorrow.
  • But my users don’t want to know how my algorithm works! I will take you at your word that you have sat down with many of your users of a variety of backgrounds and walks of life and all of them said they didn’t care. However, this is not an issue of who wants or doesn’t want what. This is an issue of securing the future of thinking. Even if at this moment, none of your users care at all how your algorithm works, they should be able to find out exactly how it works at any time they choose. It is our ethical obligation as developers to disclose how our algorithms impact our users’ thoughts. This cannot be dismissed simply because our users do not yet know this is something that they are entitled to have.
  • But spammers will take over if my algorithm is transparent! This is a significant problem. Like any system, bad players will attempt to exploit the system to their gain. However, I have nothing but confidence that developers with the skills to build the infrastructure we already have can successfully build an infrastructure that fosters transparency while discouraging spammers. In fact, if you’d like to start a transparent search engine, I would love to work with you to discourage spammers. Drop me a line.
  • But my algorithm is too advanced to explain to my users! I will use strong language here–this viewpoint is techno-elitist bullshit. We are all too good at our jobs to buy into this. If our algorithms are too complicated to explain to a user, either the algorithm is bad, or the explanation is bad. One or the other will therefore need to be updated until an understandable explanation is developed.

What can users do?

Even though algorithm transparency can be a tough sell, all is not lost! The internet is yet young, and there is plenty of space for change. So what can we, as users, do to secure the future of thinking? Here are a few suggestions.

  • Contact the platforms you use and ask them to make their opaque algorithms transparent. Given the current incentive structure of the internet, it will be a while before these changes will happen. However, if enough people demand it, transparency will come.
  • Migrate to platforms that have a commitment to transparency, or demonstrate a willingness to do so. For example, if you are displeased with the filtering that Facebook applies to your news feed, consider joining Diaspora, a decentralized, open-source, non-profit social network that affords users control over their data and what appears in their stream.
  • Support and fund new projects that are founded on transparent principles. I love Google search as much as the next person, but what I want for all of us is a free and open index of the internet on which we can run many different transparent search algorithms, optimized for different purposes. This is the future I envision for us and our internet. Let’s make it happen.

OSBridge Talk: Data Wrangling–Getting Started Working with Data for Visualization

From the session description:

Data visualizations can be a powerful tool for analysis and presentation, but what do you need to make one? Data is the obvious answer, but is that enough?

In this talk, we’ll walk through the building blocks of a good visualization. I’ll discuss the difference between structured and unstructured data, how to transform one into the other, and why good questions are especially important for crafting visualizations. We’ll see the kinds of views that are appropriate for different kinds of data, encounter some fundamental interaction techniques, and send you off better prepared to incorporate data visualization into your work and play!

Download the slides!

Update August 6, 2015: The full talk is now available for viewing on YouTube

OSBridge Talk: Open Source is Not Enough–The Importance of Algorithm Transparency

From the session description:

When I was a child, before the Internet was common, much of my education was devoted to the tools for accessing information. Dictionaries, atlases, card catalogs, and encyclopedias featured prominently, as did the frameworks necessary to locate information within those resources.

Now, as the analogue has given way to the digital, those indexing frameworks are being replaced by free* tools developed by corporations interested less in selling information to us and more in peddling us to advertisers. In a perfect “Don’t Be Evil” world, this could work out to everyone’s advantage.

Things are rarely perfect, however. By turning over the indexing, organization, and delivery of information to entities who have financial incentives to keep their algorithms opaque to users, we have effectively moved to using a secret, invisible alphabet to organize the dictionary. Sure, sometimes it’s nice to have your needs met “automagically”, but at what cost?

The ordering of Google’s search results, the inclusion of posts in your Facebook feed, the people Twitter recommends you follow: all of these shape your perception of and interaction with the world. In order to critically assess our own perceptions, we need to understand what is doing the shaping.

This talk will explore these issues in more detail, in addition to presenting ideas for promoting algorithm transparency within your own projects.

Download the slides

Update August 6, 2015: The full talk is now available for viewing on YouTube.