During the last fifteen years, a strange parallel economy has covertly developed to the point where it envelops almost all internet users, including you. No money changes hands in this immense network, but it produces enormous transactional benefits nonetheless.
In this economy, you labor daily as a trainer, teaching software robots how to perform tasks. In return, the bots then take over much of those tasks for you. You trade your daily labor in exchange for the value produced by the work of your powerful and ubiquitous robot apprentices.
The most successful products of this epoch – applications like Gmail, YouTube, Amazon’s store, Facebook, Google Maps, and Spotify – have learned a great deal about their users’ likes, dislikes, and similarities. Applying Machine Learning technology, they use that data to better present what consumers will want to see and hide what they do not.
These applications did not get so powerful through traditional programming, but through self-learning. Instead of a team of coders defining the steps to be followed using a computing language, Machine Learning starts with a set of observable data – such as items that users bought – and learns to infer the patterns within the data.
So how does one get the data set to train a Machine Learning system?
Sometimes a Data Scientist can mine the data out of existing records collected for some other purpose. This is one of the reasons that companies now like to collect every bit of data they can about you. It is much easier to re-use existing data than to collect it from scratch. But frequently to obtain a well-organized set of questions and users’ responses, the team must gather new data.
One way to collect orderly data is to pay humans to answer questions. Using a system like the Amazon Mechanical Turk, you can define a question and get many thousands of answers from workers, paying a few cents for an answer. This approach is often used for problems that people solve well, such as image recognition. A Mechanical Turk worker might, for example, classify images as landscapes or indoor photos, or draw a circle around faces in photos. This well-organized set of images and identified areas is ideal for accelerating the training of a Machine Learning program.
It is even cheaper to get the humans to answer the questions for no money at all, by providing them some utility value. This is where you come in.
You have probably used a CAPTCHA, an application that requires that you identify a number or a small piece of text in order to prove that you are human. In doing so, you are doing useful work for somebody. Google initially trained its street number recognizers for Google Street View on data sets it built by putting photos of doorway areas into its CAPTCHA system.
Another way to get free data sets is by turning data collection into a game. Development teams are great at those kinds of problems, as software and UX designers often love to make (and play) games. They can quickly turn a Data Science problem into a slick quiz with a polished user experience.
You have probably taken a quiz like this on Facebook or another website. Applications that allow users to learn about themselves – or purport to do so – are very popular. However, the real goal of the quiz may have nothing to do with its ostensible purpose. For example, a quiz that purports to give you personality insight might well be measuring the subtle difference in response speed when you reply on questions containing one group of words vs. another. It might also correlate that information with metadata you allow it to access in your social profile, like your gender, age, or political affiliation. When companies speak about “converting clicks to value,” this is what they are talking about.
In a stealthy economic transition, most of us have acquired a new secondary role as a machine trainer. But while this work produces useful value, you can’t use it to pay for groceries. And here we come to the cusp of a looming economic crisis.
Up until now, the internet economy of smart agents has been subsidized by the traditional economy, in which employers pay workers paychecks in a structured manner. But as the role of machine training grows in importance, automation technologies are, through efficiencies, eliminating jobs. Automation creates new “traditional” wage-paying jobs, but not as many as it eliminates.
In the last such transition – the industrial revolution – farmers moved into factory jobs. Now, the industrial workers are moving stealthily into knowledge jobs such as machine training. But unlike during the industrial revolution, these jobs are not a direct replacement for the old ones. At present, working as a machine trainer is a second, usually unpaid job. It does provide value, such as more efficient email processing and better autonomous agents, but you can’t feed your family by helping to train a recommender system.
Discussions of the AI revolution often focus on the permanent elimination of entire job classes, such as drivers being replaced by self-driving cars. Proponents of the “abundance” view believe that new jobs will arise in previously unforeseen areas to replace the old ones. The problem we have at this moment is that the new jobs, like machine trainer, are arriving – but they are not replacing the earnings of the old ones.
As the internet economy continues to subsume the “real” economy, this automation crisis is coming to a head. In a follow-on article, we will discuss the evolving AI Economy and directions in which it might develop.
About the Author:
David Rostcheck is a consulting data scientist helping companies tackle challenging problems and develop advanced technology. He can be reached at drostcheck [at] leopardllc.com.