The internet presents one of the greatest opportunities in human history for sharing and cultivating knowledge. The technologies leveraging those opportunities help us think better, do better, and as a result become tightly integrated into our lives. Billions of people already rely heavily (or even exclusively) on internet technologies to exchange information and nurture relationships. However, our access to information in the digital landscape is controlled by black-box algorithms over which we have limited control. This algorithm opacity endangers our ability to access and share information, which in turn limits our opportunities for knowledge, connection, and growth. Algorithm transparency offers a new way forward.
What is an algorithm?
In the most general terms, an algorithm is simply an ordered list of steps that, when given inputs, produces outputs. While algorithms are typically associated with computing, many of the analogue processes we engage in every day are algorithms. Recipes are a good example of this.
Below, we present a recipe for a smoothie. The smoothie algorithm takes one each of three different kinds of input: a frozen fruit, a fluid, and additional flavorings. The algorithm has two steps: combining the inputs in the blender, and blending to the desired consistency. The algorithm may also have settings or control surfaces that we can adjust to control some part of the process, like the blending speed. When all the steps have been executed, the algorithm output is a smoothie! Inputs of frozen strawberries, yogurt, and basil, for example, produce the output of a strawberry-basil smoothie.
An important consideration about algorithms is that their outputs depend on their inputs. If we have no ingredients to input to our smoothie algorithm, the algorithm outputs no smoothie. Similarly, bad inputs produce bad outputs. If we input frozen mistletoe berries, motor oil, and shavings from a scented candle (a frozen fruit, a fluid, and a flavoring, respectively) into our smoothie algorithm, it would produce something very poisonous that happened to have the texture of a smoothie. However, without knowing that the inputs were bad, we would not know that the resulting smoothie was bad. The ability to evaluate the quality and validity of inputs to an algorithm is essential to our ability to assess the quality and validity of the outputs of an algorithm.
The ability to evaluate the quality of inputs is especially important because we are algorithms. Our brains take environmental stimulus as input, run it through a variety of complex biological processes, and output thoughts. This has two important implications. First, we cannot think about things we have not been exposed to. If we have never encountered something, we cannot produce thoughts about it. Second, in order to evaluate the quality of our thoughts, we must evaluate the quality of the inputs to our brains. As so many of the inputs to our brains are now produced by algorithms, the opacity of those algorithms interferes with our ability to evaluate the quality of our thoughts.
What makes an algorithm transparent?
A transparent algorithm is one that facilitates scrutiny of itself. Specifically, a transparent algorithm facilitates scrutiny of:
- Inputs. If an someone is sneaking motor oil into our smoothie, we want to be aware of that. A transparent algorithm allows a user to investigate all of the inputs.
- Control surfaces. If there are settings to control the way an algorithm executes its steps, those settings should be clearly identified and the resulting impact on outputs clearly described.
- Algorithm steps and internal state. The processes that the algorithm executes and its internal state must be open to the user. If our blender was a opaque algorithm that used child labor to produce our smoothies, we need to know that.
- Assumptions and models the algorithm uses. Algorithms make certain assumptions about the inputs they will receive, as well as what the user wants. These assumptions need to be described in detail, so that users can evaluate whether the algorithm is producing results in line with their needs.
- Justification for outputs produced. For any given output, a user needs to be able to answer the question, “Why was this output produced from the inputs I provided?” If, for instance, we find that our smoothie algorithm is producing smoothies full of wood chips, we should be able to find out why. Perhaps the answer is that our roommate was storing wood chips in the frozen strawberry bag, or perhaps the answer is that our blender throws in bonus wood chips for every 100th smoothie. Either way, that information must be available to the user.
If an algorithm does not meet the above criteria, it is (to varying degrees) opaque. In my reasoning, an algorithm cannot be closed-source and transparent at the same time. While good documentation of the algorithm is necessary for transparency, it is not sufficient. An organization can say anything they want about their algorithm in their documentation; without the ability to scrutinize the source code, it is impossible to say if their documentation accurately represents their algorithm.
What does this mean for information access in the internet age?
Much of our education is dedicated to learning algorithms to help us access new information. Before the internet, this included things like dictionaries, atlases, card catalogs, and encyclopedias. Each of those data stores had specific algorithms that we used to access particular pieces of information. Our algorithm for learning a new definition, for example. used a dictionary. The input is an unfamiliar word. We open the dictionary to the section for words starting with the same letter as the unfamiliar word we are looking up. We then flip through the pages in the section, looking for words with the same first two letters as our unfamiliar word, then for words with the same first three letters. We continue this process until we have located our input word, which will be adjacent to the algorithm’s output–the input word’s definition.
The importance of those information access algorithms is not that they are analog, though that does make them charmingly retro. What is important is that the algorithms were transparent to those using them and explicitly taught.
The internet has dramatically changed the way we access information, however. Consider the problem of navigating to a new place. Before the internet we would pull out a map and plot a route based on whatever parameters were important to us. Now, a variety of services provide automatically generated routes and live directions guided by the GPS on your phone or GPS device. However, from a user’s perspective, it is impossible to know why a particular route is chosen. I largely use Google Maps for this purpose, and while I assume that the routes are suggested based on some combination of shortest distance/least time, the algorithm is opaque to the user and cannot be known. It is entirely possible that the algorithm that generates routes intended to take the user by the greatest number of doughnut shops. While this seems unlikely, it is not implausible; a free service provided by a for-profit entity has incentive to generate revenue in other ways, such as charging businesses to increase drive-by traffic. As long as the algorithm is opaque, this possibility cannot be discounted. As the routes you follow impact what you know about your surroundings, it is important to know how those routes are generated.
Increasingly, we use the internet to manage relationships as well as to seek information. Over one billion people–more than 14% of the entire population of the world–have signed up for Facebook. I personally rely on Facebook to help me maintain relationships with some 400 people. My primary means of interacting with people on Facebook is the News Feed–after all, checking each of those 400 pages individual would be quite a chore. However, the algorithm that controls what appears in my News Feed is very opaque. There are settings, of course–I can ban posts from particular sources, or set it to only show a friend’s “most important updates”, but no indication is given as to what is considered an important update. No discussion is provided as to how my interactions on Facebook further alter what appears in my News Feed. We develop hypotheses, of course–my feed appears to show more posts from people I interact with more often. But these hypotheses could easily be confirmation bias. We are left to wonder: how many announcements have I missed because Facebook and I disagree on what is considered important? How many opportunities to strengthen relationships have I lost because Facebook’s opaque News Feed algorithm is making choices for me, without informing me of those choices? If I don’t know who is being excluded from my News Feed and why, then I have no way to know who I need to check in with manually. The silent, opaque filtering produces a field of vision full of blind spots. Facebook has evolved from allowing us to nurture relationships in new ways to actually shaping–and potentially limiting–our interactions in unknown ways. This is especially problematic in light of their recent emotional contagion study; the opacity of the News Feed algorithm leaves it wide open for ethically questionable manipulation by the company.
It is the opacity of search engines that seems the greatest concern, however. The internet–this great, glorious store of information that changes the game in so many ways–is primarily accessed via free search services provided by corporations. We type what we want into the search bar, and results appear–“automagically”. Why certain results are returned, why results are returned in a particular order, why certain results are not returned–those things are inscrutable. Google did publish their first search algorithm, PageRank, but it is by no means the only algorithm they use. The Google website describing search says, “Today Google’s algorithms rely on more than 200 unique signals or “clues” that make it possible to guess what you might really be looking for. These signals include things like the terms on websites, the freshness of content, your region and PageRank.” Beyond that scant listing, we can only guess at what those 200 signals might be.
In a perfect, “Don’t be evil” world, the opacity of search engines might work out. However, the world is rarely perfect. In the current internet ecosystem, we–the users–are not customers. We are product, packaged and sold to advertisers for the benefit of shareholders. This, in combination with the opacity of the algorithms that facilitate these services, creates an incentive structure where our ability to access information can easily fall prey to a company’s desire for profit. Right now, it appears that Google generates search rankings based, to the best of their ability, on what they think the user wants. However, there is nothing in place to stop them from, for example, accepting payment for boosting a page’s position in the ranking. Already certain search results disappear at the behest of governments; what else disappears or appears in a different order that we do not know about? Algorithm opacity creates the opportunity for unknown restrictions to be placed on our ability to access information. Allowing corporate entities to use opaque algorithms to provide the primary window into an incredibly important information resource is like using a secret, invisible alphabet to organize a dictionary. Opaque algorithms are bad for securing our future access to information, and especially they’re bad for the future of thinking.
Why are opaque algorithms so bad?
As I discussed above, our brains are algorithms–sets of complex biological steps that input information from our environments and output thought. Like other algorithms, however, the quality of inputs to our brains impacts the quality of the outputs of our brains. In order to be able to assess the quality of our brains’ outputs, we much be able to assess the quality of the inputs. Using opaque algorithms to access information interferes with our ability to assess the quality of that information as an input. Opaque algorithms decontextualize information; without that context, we have difficulty developing models of what we know we don’t know. Without that context, we don’t know what information was available to be accessed, and why a particular piece of information was selected over others. Without that context, we are unable to map the blind spots in our knowledge. By filtering the inputs we receive–by choosing information for us–these algorithms shape our thoughts. We must understand the what and how of the shaping to be able to reason about what we do and do not know.
Opaque algorithms are not just bad for their potential for abuse and for our ability to reason. They’re also bad for equality. They foster the creation of an elite class–those who, by privilege of education or experimentation, have developed better mental models to describe the inner workings of the opaque algorithms. That elite class can leverage their better understanding of the algorithm to reason better about their own thoughts, which in turn allows them to refine their models, and so on. Individuals not afforded the privilege of a solid understanding of the opaque algorithms are less able to leverage those algorithms to access information or understand the context of the information being accessed. This, in turn, hampers their ability to reason about the quality of the information they’re receiving and the resulting thoughts their brains generate. The cycle is self-reinforcing, and a techno-elite is born; we are separated into those who understand the algorithms, and those who do not. This is no way to secure access to a common resource as important as the internet.
Addressing Common Developer Protestations of Transparency
Algorithm opacity did not come around just because some developers thought it would be fun to endanger our ability to reason about information. It is, of course, nothing so nefarious as that. Opaque algorithms are convenient for many reasons. They discourage competition and spammers, they reduce the overhead of updating user-facing documentation, they make it simple to add new features or make changes. For those reasons and many others, algorithm transparency is a difficult sell. Here are some of the common protestations I’ve encountered, with my responses.
- But my algorithm is already open-source, isn’t that enough? Unfortunately, no. Most of our users will never look at our source code (though, as stated previously, they should be able to). Excellent, well-written documentation of your algorithm is necessary for transparency. As an added bonus, it will not only make it easier for new developers to get involved with your project, it will also protect your project from ruin if all of your project’s developers are hit by a bus tomorrow.
- But my users don’t want to know how my algorithm works! I will take you at your word that you have sat down with many of your users of a variety of backgrounds and walks of life and all of them said they didn’t care. However, this is not an issue of who wants or doesn’t want what. This is an issue of securing the future of thinking. Even if at this moment, none of your users care at all how your algorithm works, they should be able to find out exactly how it works at any time they choose. It is our ethical obligation as developers to disclose how our algorithms impact our users’ thoughts. This cannot be dismissed simply because our users do not yet know this is something that they are entitled to have.
- But spammers will take over if my algorithm is transparent! This is a significant problem. Like any system, bad players will attempt to exploit the system to their gain. However, I have nothing but confidence that developers with the skills to build the infrastructure we already have can successfully build an infrastructure that fosters transparency while discouraging spammers. In fact, if you’d like to start a transparent search engine, I would love to work with you to discourage spammers. Drop me a line.
- But my algorithm is too advanced to explain to my users! I will use strong language here–this viewpoint is techno-elitist bullshit. We are all too good at our jobs to buy into this. If our algorithms are too complicated to explain to a user, either the algorithm is bad, or the explanation is bad. One or the other will therefore need to be updated until an understandable explanation is developed.
What can users do?
Even though algorithm transparency can be a tough sell, all is not lost! The internet is yet young, and there is plenty of space for change. So what can we, as users, do to secure the future of thinking? Here are a few suggestions.
- Contact the platforms you use and ask them to make their opaque algorithms transparent. Given the current incentive structure of the internet, it will be a while before these changes will happen. However, if enough people demand it, transparency will come.
- Migrate to platforms that have a commitment to transparency, or demonstrate a willingness to do so. For example, if you are displeased with the filtering that Facebook applies to your news feed, consider joining Diaspora, a decentralized, open-source, non-profit social network that affords users control over their data and what appears in their stream.
- Support and fund new projects that are founded on transparent principles. I love Google search as much as the next person, but what I want for all of us is a free and open index of the internet on which we can run many different transparent search algorithms, optimized for different purposes. This is the future I envision for us and our internet. Let’s make it happen.