Machine learning and artificial intelligence have given us great things in recent years. In addition to recommending the next TV show to binge on or a new song to listen to, machine learning and artificial intelligence have improved security in energy, financial services, and transportation, and helped pharmaceuticals like Johnson & Johnson to accelerate the development of the COVID-19 vaccine.
CEOs, politicians, doctors, teachers, researchers – people from all walks of life have benefited, but the triumphs of machine learning and artificial intelligence have also exposed their limits. For example, what if you are looking for a cure for one of the 6,800 rare diseases in the world (Source: National Human Genome Research Institute)?
Or what if you’re a small business that wants to take on Google or Amazon? Lacking access to unlimited data and GPUs, should you give up now and resign yourself to permanent second-class status? And while we’re on the subject of class, what are we saying to the diversity-conscious HR professionals who deal with racial, gender, and class biases encoded right into their talent acquisition software?
The bottom line is that without access to deep stores of the right data, machine learning and artificial intelligence can be worse than useless; they can produce results that cost businesses revenues and hurt their brands. A recent report from Gartner found that an average American business loses between $ 9.7 million and $ 14.2 million per year because of bad data. At the aggregate level, IBM estimates that bad data drains companies over $ 3 trillion a year.
As we begin to set priorities for 2022, I would like to offer an alternative for organizations of all sizes to develop more effective ways to leverage machine learning and artificial intelligence. The point is not to denigrate anyone’s approach; it is rather a question of offering a framework which empowers the small stakeholders, alongside the large players seeking to innovate.
Problems with the incumbent’s approach to data
Like the assembly line workers, packers and deliverers who are the links in the real world supply chain, data scientists, data labellers and program managers make up the chain. virtual, data-driven procurement on which machine learning and artificial intelligence depend. . In order for us to improve the way we mine data, we need to forge a new chain that doesn’t perpetuate flaws.
In an encouraging recent story, Etsy gave AI and other machine learning tools to its army of 5 million artisans. The immediate goal was to help sellers affected by the pandemic switch to essential items like face masks and hand sanitizers. The tools Etsy provided to its community included the same cutting-edge data science, AI, and marketing apps used by large retailers.
The approach yielded immediate results. With the traditional supply chain in tatters and demand for masks booming, Etsy’s shares have climbed 600% from a pandemic low in March 2020, and active buyers and sellers have doubled to 90 million and 5 million, respectively. With new momentum, analysts are betting Etsy will achieve a 30% sales increase by the end of 2021. Whether this bottom-up approach will turn the craft market into “anti-Amazon” remains to be seen, but , so far, they seem to be on the right track.
Large companies are also getting into the action. While developing Alexa, Amazon, Etsy’s (and everyone’s) nemesis, realized that its internal test team would not generate enough data. Entrepreneurs were asked to read scripts containing “open queries”. The process took place six days a week for six months, with over 20 smart devices per test site capturing every growl and every syllable.
According to Brad Stone in Amazon unrelated (Simon & Schuster, 2021), the raw, untagged data generated was so useful to Amazon developers that the program was extended to ten cities in the United States. Today, Alexa has acquired 100,000 skills since its launch (source: TechCrunch), around 10.8% of digital buyers have used Amazon Alexa for their online purchases in 2020 (source: eMarketer), and 130 million speakers Echo powered by Alexa is expected to be sold by 2025 (source: eMarketer).
The lesson here is that in order to create a new offering, Amazon and Etsy had to step outside of their historical approach to data management. The reason, as Herbert Roitblat explains in Algorithms are not enough (MIT Press, 2020), is that machine learning and artificial intelligence rely on existing data to make predictions. They cannot solve problems for which data is missing or biased on their own.
What organizations should do in 2022
For organizations launching new data-driven offerings in 2022, breaking out of your existing data stack is the first step. The next one, as we’ve learned from Etsy and Amazon, is to put the tools in the hands of domain experts and business owners. Unlike the approach that empowers specialists, this lean approach speeds up development, removing unnecessary complexity.
According to a recent State of Data Science survey of 4,200 professionals in 140 countries by Anaconda, data scientists spend 45% of their time preparing data, 19% loading data, 21% viewing , which leaves only 11 to 12%. of their day for model selection, training and scoring. With the average data scientist earning $ 100,560 (source: US Bureau of Labor Statistics), the data approach in place is expensive and often inefficient.
Knowing the problem you are solving is always the key. If you’re a startup or an established organization launching a new offering, getting out of your stack, putting the tools in the hands of businesses and domain owners is the way to go, as well as creating an interactive loop. Like the sprints used by agile product specialists, an interactive loop promotes AI-assisted exploration of large unlabeled datasets. The discovery and rapid actions of subject matter experts lead to evolutionary patterns, more focused exploration, and new discoveries. Interactivity goes both ways: models get more precise and robust, subject matter experts get smarter and better informed.
As Eric Ries advises in the Thin Start, interactive loops should continue until the business, data, and machine learning tools demonstrate a 1: 1 alignment. If done right, your organization won’t just have to work better and smarter, it will unlock the true capabilities of machine learning and artificial intelligence to transform the way we all, regardless of access to data and muscle treatment, let’s live and do business in 2022 and beyond.
About the Author
Patrice Simard, former Microsoft executive, is CEO and co-founder of Intelus.ai, the main scientific platform without code / without data.
Join us on Twitter: @ InsideBigData1 – https://twitter.com/InsideBigData1
Sign up for the free insideBIGDATA newsletter.