Five Reasons Machine Learning Is Moving to the Cloud


Amazon Web Services turned a lot of heads recently when it launched a machine learning platform aimed at making predictive analytics applications easy to build and run, joining cloud juggernauts Microsoft and Google with similar ML offerings. It turns out the cloud is very well-suited for this critical type of big data workload. Here are five reasons why.


1. Machine Learning Is Everywhere


If predictive analytics is the killer app for big data, then machine learning is the technological heart powering that killer app. Whether you’re aiming to leverage your big data to stop fraudulent transactions, reduce customer churn, fight cybercriminals, or make product recommendations, machine learning algorithms are the keys to creating models of what happened in the past, so you can use new data to predict what happens next.


2. The Cloud’s Super Gravity


The cloud is like the Death Star: The more workloads it sucks in, the cheaper it gets for all cloud customers, and the harder it is to ever get away. Consider that Amazon Web Services (AWS) has between 2.4 million and 5.6 million servers installed in about 90 data centers across the world, according to a 2014 EnterpriseTech story, and is adding enough server capacity every day to support’s entire ecommerce operation circa 2004.


Cloud services like AWS’ S3 and Microsoft’s Azure make it very cheap to store all kinds of data—including log data, mobile data, and data generated by cloud-based apps like Salesforce and Workday. When it comes time to running analytics on that data, the economics of the matter make it difficult to justify landing it back down on earth.


3. Statistics Is Really Hard


 “Machine learning and predictive analytics aren’t new,” BigML vice president of business development Andrew Shikiar tells Datanami. “But the only alternative in the past was to buy some SAS for your quants and have them do machine learning. Instead of buying SAS or putting R on your desktop, users can just log into BigML…and use an array of algorithms that we’ve introduced to the platform.”


BigML has attracted more than 17,000 users over the past four years, and has more than 200 paying clients, making it one of biggest providers of cloud-based machine learning software whose name isn’t Amazon, Google, Microsoft, or IBM.


4. ML Workloads Are Highly Variable


The actual underlying computational requirements for machine learning vary depending on where you are in the machine learning lifecycle. When you’re training (or retraining) your models, you may need a large amount of processing power, whereas actually running the models may not consume much resources at all. That variability makes the cloud a perfect place to park machine learning workloads, especially if the training data already lives on the cloud. Cloud providers like Amazon can quickly spin up virtual partitions to handle massive training sets, then turn them off when they’re no longer needed.


Consider the experience of Cisco. The computing device maker maintains an extensive collection of 60,000 “propensity to buy” (P2B) models, which it uses to predict sales of its products every quarter (we profiled Cisco in a January feature in Datanami).


Getting the necessary computer time was a challenge for Cisco’s data scientists, and as a result, it would often take several weeks to retrain the models every quarter. For a big company like Cisco, this type of delay between training and deploying ML models could result in millions in lost sales opportunities. While Cisco doesn’t run on the cloud (it adopted to speed up its in-house ML environment), the company’s experience shows the importance of scalability in machine learning.


5. Data Scientists Are Still Unicorns


The shortage of data scientists has been well documented, in this publication and others. In response, universities have ramped up data science programs, and software companies have shifted into overdrive to abstract away the need for data scientists in the first place. While it’s debatable whether software can completely eliminate the need for data scientists, it’s undeniable that many data science activities previously done by highly trained PhDs will eventually be automated. We’re seeing many of these software offerings moved to the cloud.


The combination of advanced analytics software and the availability of cheap processing power makes the cloud a perfect place to play with algorithms—as well as a great place for startups to ramp up their business models.


To read the full article please visit Datanami.

Image courtesy of hywards at