Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts



Optimizing Data Centers Through Machine Learning

Google has published a paper outlining their approach on using machine learning, a neural network to be specific, to reduce energy consumption in their data centers. Joe Kava, VP, Data Centers at Google also has a blog post explaining the backfround and their approach. Google has one of the best data center designs in the industry and takes their PUE (power usage effectiveness) numbers quite seriously. I blogged about Google's approach to optimize PUE almost five years back! Google has come a long way and I hope they continue to publish such valuable information in public domain.

There are a couple of key takeaways.

In his presentation at Data Centers Europe 2014 Joe said:  
As for hardware, the machine learning doesn’t require unusual computing horsepower, according to Kava, who says it runs on a single server and could even work on a high-end desktop.
This is a great example of a small data Big Data problem. This neural network is a supervised learning approach where you create a model with certain attributes to assess and fine tune the collective impact of these attributes to achieve a desired outcome. Unlike an expert system which emphasizes an upfront logic-driven approach neural networks continuously learn from underlying data and are tested for their predicted outcome. The outcome has no dependency on how large your data set is as long as it is large enough to include relevant data points with a good history. The "Big" part of Big Data misleads people in believing they need a fairly large data set to get started. This optimization debunks that myth.

The other fascinating part about Google's approach is not only they are using machine learning to optimize PUE of current data centers but they are also planning to use it to effectively design future data centers.

Like many other physical systems there are certain attributes that you have operational control over and can be changed fairly easily such as cooling systems, server load etc. but there are quite a few attributes that you only have control over during design phase such as physical layout of the data center, climate zone etc. If you decide to build a data center in Oregon you can't simply move it to Colorado. These neural networks can significantly help make those upfront irreversible decisions that are not tunable later on.

One of the challenges with neural networks or for that matter many other supervised learning methods is that it takes too much time and precision to perfect (train) the model. Joe describing it as a "nothing more than series of differential calculus equations " is downplaying the model. Neural networks are useful when you know what you are looking for - in this case to lower the PUE. In many cases you don't even know what you are looking for.

Google mentions identifying 19 attributes that have some impact on PUE. I wonder how they short listed these attributes. In my experience unsupervised machine learning is a good place to short list attributes and then move on to supervised machine learning to fine tune them. Unsupervised machine learning combined with supervised machine learning can yield even better results, if used correctly.

Friday, May 31, 2013

Unsupervised Machine Learning, Most Promising Ingredient Of Big Data

Orange (France Telecom), one of the largest mobile operators in the world, issued a challenge "Data for Development" by releasing a dataset of their subscribers in Ivory Coast. The dataset contained 2.5 billion records, calls and text messages exchanged between 5 million anonymous users in Ivory Coast, Africa. Various researchers got access to this dataset and submitted their proposals on how this data can be used for development purposes in Ivory Coast. It would be an understatement to say these proposals and projects were mind-blowing. I have never seen so many different ways of looking at the same data to accomplish so many different things. Here's a book [very large pdf] that contains all the proposals. My personal favorite is AllAborad where IBM researchers used the cell-phone data to redraw optimal bus routes. The researchers have used several algorithms including supervised and unsupervised machine learning to analyze the dataset resulting in a variety of scenarios.

In my conversations and work with the CIOs and LOB executives the breakthrough scenarios always come from a problem that they didn't even know existed or could be solved. For example, the point-of-sale data that you use for your out-of-stock analysis could give you new hyper segments using clustering algorithms such as k-means that you didn't even know existed and also could help you build a recommendation system using collaborative filtering. The data that you use to manage your fleet could help you identify outliers or unproductive routes using SOM (self organizing maps) with dimensionality reduction. Smart meter data that you use for billing could help you identify outliers and prevent thefts using a variety of ART (Adoptive Resonance Theory) algorithms. I see endless scenarios based on a variety of unsupervised machine learning algorithms similar to using cell phone data to redraw optimal bus routes.

Supervised and semi-supervised machine learning algorithms are also equally useful and I see them complement unsupervised machine learning in many cases. For example, in retail, you could start with a k-means to unearth new shopping behavior and end up with Bayesian regression followed by exponential smoothing to predict future behavior based on targeted campaigns to further monetize this newly discovered shopping behavior. However, unsupervised machine learning algorithms are by far the best that I have seen—to unearth breakthrough scenarios—due to its very nature of not requiring you to know a lot of details upfront regarding the data (labels) to be analyzed. In most cases you don't even know what questions you could ask.

Traditionally, BI has been built on pillars of highly structured data that has well-understood semantics. This legacy has made most enterprise people operate on a narrow mindset, which is: I know the exact problem that I want to solve and I know the exact question that I want to ask, and, Big Data is going to make all this possible and even faster. This is the biggest challenge that I see in embracing and realizing the full potential of Big Data. With Big Data there's an opportunity to ask a question that you never thought or imagined you could ask. Unsupervised machine learning is the most promising ingredient of Big Data.