Showing posts with label analytics. Show all posts
Showing posts with label analytics. Show all posts

pt老虎机官方网站,彩票网站大全64940_com官网

晓风彩票登录平台

Analytics-first Enterprise Applications


This is the story of Tim Zimmer who has been working as a technician for one of the large appliance store chains. His job is to attend service calls for washers and dryers. He has seen a lot in his life; a lot has changed but a few things have stayed the same.

The 80's saw a rise of homegrown IT systems and 90's was the decade of standardized backend automation where a few large vendors as well as quite a few small vendors built and sold solutions to automate a whole bunch of backend processes. Tim experienced this firsthand. He started getting printed invoices that he could hand out to his customers. He also heard his buddies in finance talking about a week-long training class to learn "computers" and some tools to make journal entries. Tim's life didn't change much. He would still get a list of customers handed out to him in the morning. He would go visit them. He would turn-in a part-request form manually for the parts he didn't carry in his truck and life went on. Not knowing what might be a better way to work Tim always knew there must be a better way. Automation did help the companies run their business faster and helped increased their revenue and margins but the lives of their employees such as Tim didn't change much.

Mid to late 90's saw the rise of CRM and Self-Service HCM where vendors started referring to "resources" as "capital" without really changing the fundamental design of their products. Tim heard about some sales guys entering information into such systems after they had talked to their customers. They didn't quite like the system, but their supervisors and their supervisors' supervisors had asked them to do so. Tim thought somehow the company must benefit out of this but he didn't see his buddies' lives get any better. He did receive a rugged laptop to enter information about his tickets and resolutions. The tool still required him to enter a lot of data, screen by screen. He didn't really like the tool and the tool didn't make him any better or smarter, but he had no other choice but to use it.

Tim heard that the management gets weekly reports of all the service calls that he makes. He was told that the parts department uses this information to create a "part bucket" for each region. He thought it doesn't make any sense - by the time the management receives the part information, analyzes it, and gives me parts, I'm already on a few calls where I am running out of parts that I need. He also received an email from "Center of Excellence" (he couldn't tell what it is, but guessed, "must be those IT guys") whether he would like to receive some reports. He inquired. The lead time for what he thought was a simple report, once he submits a request, was 8-10 weeks and that "project" would require three levels of approval. He saw no value in it and decided not to pursue. While watching a football game, over beer, his buddy in IT told him that the "management" has bought very expensive software to run these reports and they are hiring a lot of people who would understand how to use it.

One day, he received a tablet. And he thought this must be yet another devious idea by his management to make him do more work that doesn't really help him or his customers. A fancy toy, he thought. For the first time in his life, the company positively surprised him. The tablet came with an app that did what he thought the tool should have done all along. As soon as he launched the app it showed him a graphical view of his service calls and parts required for those calls based on the historic analysis of those appliances. It showed him which trucks has what parts and which of his team members are better of visiting what set of customers based on their skill-set and their demonstrated ability in having solved those problems in the past. Tim makes a couple of clicks to analyze that data, drills down into line-item detail in realtime, and accepts recommendations with one click. He assigns the service calls to his team-members and drives his truck to a customer that he assigned to himself. As soon as he is done he pulls out his tablet. He clicks a button to acknowledge the completion of a service call. He is presented with new analysis updated in realtime with available parts in his truck as well as in his teammates' trucks. He clicks around, makes some decisions, cranks up the radio in his truck, and he is off to help the next customer. No more filling out any long meaningless screens. His view of his management has changed for good for the very first time.

As the world is moving towards building mobile-first or mobile-only applications I am proposing to build analytics-first enterprise applications that are mobile-only. Finally, we have access to sophisticated big data products, frameworks, and solutions that can help analyze large volume of data in real time. The large scale hardware — commodity, specialized, or virtualized — are accessible to the developers to do some amazing things. We are at an inflection point. There is no need to discriminate between transactional and analytic workload. Navigating from aggregated results to line-item details should just be one click instead of punching out into a separate system. There are many processes, if re-imagined without any pre-conceived bias, would start with an analysis at the very first click and will guide the user to a more fine-grained data-entry or decision-making screens. If mobile-first is the mindset to get the 20% of the scenarios of your application right that are used 80% of the times, the analytics-first is a design that should thrive to move the 20% of the decision-making workflows used 80% of the time that currently throw the end users into the maze of data entries and beautiful but completely isolated, outdated, and useless reports.

Let's rethink enterprise applications. Today's analytics is an end result of years of neglect to better understand human needs to analyze and decide as opposed to decide and analyze. Analytics should not be a category by itself disconnected from the workflows and processes that the applications have automated for years to make businesses better. Analytics should be an integral part of an application, not embedded, not contextual, but a lead-in.

Wednesday, September 19, 2012

Role Of Analytics In Creating New Consumer Behaviors



I am in India visiting a large customer who has heavily invested into organized retail stores, a relatively new category for the Indian market. Their head of analytics shared some details of their last promotion with me. They ran an email promotion to send out coupons that were valid on one and only one day -15th August, the Independence Day of India, which is a holiday in the country. They were really bold to take out a page-long ad in all large newspapers on the 15th August highlighting this promotion.

Their sales, in all regions, soared on that day. It not only soared but broke all their previous records. They registered the highest sale in that year which was more than the Diwali sale. In the American terms, they managed to sell more on 4th July than on the Black Friday. This shocked me. I analyzed their efforts further to better understand this behavior.

Indians in India don't drink beer, barbecue, or watch fireworks on the Independence Day. In fact they don't do anything. It's just another day except that you don't go to work and kids don't go to school. That was the key. Since they didn't have anything else to do they went to the store and shopped. They bought things they were contemplating to buy for some time. This is where coupons helped and they also ended up buying things they didn't need. Yes, they are quickly learning from Americans.

What amazed me the most that the company manufactured this behavior that was analytics-led. They studied all kinds of data, created a promotion, made sure that they can execute on their promotions, and customers came. And, they are using this data to further refine their promotions and store inventory.

Big Data and analytics are not only useful to instrument existing customers' behavior but they could also help create new customer behaviors. This is especially powerful when the company is in high growth mode and has a bold vision to do whatever it takes to gain a top position in the market.

As I blog this, Indian government just changed their policy to allow up to 51% of foreign direct investment (FDI) into multi-brand organized retail sector. India has miles to go before the organized retail sector shapes up; Indians still prefer to shop at mom and pop stores and not at a large organized Walmartish store. Due to lack of a mature organized retail sector the (Indian) companies don't have a pre-conceived bias on how to run a large brick and mortar store - that's a good thing. They are not localizing a global brand. They are creating a new brand, and hence new consumer behavior, from ground up. And, analytics has been playing a key role than ever before.

Photo courtesy: McKay Savage

Friday, December 30, 2011

Loving What I Do For Living



A few months back, I was helping a very large customer of ours to help simplify as well as automate their process of trading financial instruments. During one of my many visits to their office, I met a person who was trying to explain to me his job in supporting the people that are involved in this super complex process. I always ask a lot of questions — until they're totally annoyed and ready to kick me out of the room — to get a complete understanding of the business rationale behind whatever they're thriving for and their personal motivation behind it. Something unusual happened at this meeting. Instead of getting into the gory technical details of how they get things done, he chose to tell me a short and simple story.

"You know, um.. there's this early morning meeting everyday that Peter goes to with a bunch of other people. They all gather around a large table in a dimly lit conference room with a bunch of printed spreadsheets, a laptop, and a large calculator. Peter has a cup of coffee in one hand and a cigarette in the other hand talking to people who have coffee cups in their one hand and cigarettes in the other hand. This is their lives. I am concerned about Peter and I want him to stop smoking. Can you please help me?"

Now, this is the job that I love that makes me get out the bed and run for it. This is the human side of enterprise software. It's not boring.

Photo Courtesy: Jane Rahman

Tuesday, November 9, 2010

Challenging Stonebraker’s Assertions On Data Warehouses - Part 2

Check out the Part 1 if you haven’t already read it to better understand the context and my disclaimer. This is the Part 2 covering the assertions from 6 to 10.

Assertion 6: Appliances should be "software only."

“In my 40 years of experience as a computer science professional in the DBMS field, I have yet to see a specialized hardware architecture—a so-called database machine—that wins.”

This is a black swan effect; just because someone hasn’t seen an event occur in his or her lifetime, it doesn’t mean that it won’t happen. This statement could also be re-written as “In my 40 years of experience, I have yet to see a social network that is used by 500 million people.” You get the point. I am the first one who would vote in favor of commodity hardware against a specialized hardware, but there are very specific reasons why the specialized hardware makes sense in some cases.

“In other words, one can buy general purpose CPU cycles from the major chip vendors or specialized CPU cycles from a database machine vendor.”

Specialized machines don’t necessarily mean specialized CPU cycles. I hope the word “CPU cycle” is used as metaphor and not to indicate its literal meaning.

“Since the volume of the general purpose vendors are 10,000 or 100,000 times the volume of the specialized vendors, their prices are an order of magnitude under those of the specialized vendor.”

This isn’t true. The vendors who make general-purpose hardware also make specialized hardware, and no, it’s not an order of magnitude expensive.

“To be a price- performance winner, the specialized vendor must be at least a factor of 20-30 faster.”

It’s a wrong assumption that BI vendors use specialized hardware just because of the performance reasons. The “specialized” in many cases for an appliance is simply a specialized configuration. The appliance vendors also leverage their relationship with the hardware vendors to fine tune the configuration based on their requirements, negotiate a hefty discount, and execute a joint go-to-market strategy.

The enterprise software follows value-based pricing and not cost-based pricing. The price difference between a commodity and a specialized appliance is not just the difference of the cost of hardware that it runs on.

“However, every decade several vendors try (and fail).”

Not sure what is the success criteria behind this assertion to declare someone a winner or a failure. Acquisitions of Netezza, Greenplum, and Kickfire are recent examples of how well the appliance companies have performed. The incumbent appliance vendors are doing great, too.

“Put differently, I think database appliances are a packaging exercise”

The appliances are far more than a packaging exercise. Other than making sure that the software appliance works on the selected hardware, commoditized or otherwise, they provide a black box lifecycle management approach to the customers. The upfront cost of an appliance is a small fraction of the overall money that the customers would end up spending during the entire lifecycle of an appliance and the related BI efforts. The customers do welcome an approach where they are responsible for managing one appliance against five different systems at ten different levels with fifteen different technology stack versions.

Assertion 7: Hybrid workloads are not optimized by "one-size fits all."

Yes, I agree, but that’s not the point. It’s difficult to optimize hybrid workloads for a row or a column store, but it is not as difficult, if it’s a hybrid store.

“Put differently, two specialized systems can each be a factor of 50 faster than the single "one size fits all" system in solution 1.”

Once again, I agree, but it does not apply to all the situations. As I discussed earlier, the performance is not the only criteria that matters in the BI world. In fact, I would argue the opposite. Just because the OLTP and OLAP systems are orthogonal, the vendors compromised everything else to gain the performance. Now that’s changing. Let’s take an example of an operational report. This is the kind of report that only has the value if consumed in realtime. For such reports, the users can’t wait until the data is extracted out of the OLTP system, cleaned up, and transferred into the OLAP system. Yes, it could be 50 times faster, but completely useless, since you missed the boat.

The hybrid systems, the once that combine OLTP and OLAP, are fairly new, but they promise to solve a very specific problem, which is real real-time. While the hybrid systems evolve, the computational capabilities of OLTP and OLAP systems have started to change as well. I now see OLAP systems supporting write-backs with a reasonable throughput and OLTP systems with good BI style query performance, all of these achieved through modern hardware and clever use of architectural components.

Let’s not forget what optimization really is. It means desired functionality at reasonable performance. A real-time report, that takes 10 seconds to run could be far more valuable than a report that runs under ten milliseconds, three days later.

“A factor of 50 is nothing to sneeze at.”

Yes, point taken. :-)

Assertion 8: Essentially all data warehouse installations want high availability (HA).

No, they don’t. This is like saying all the customers want five 9 SLA on the cloud. I don’t underestimate the business criticality of a DW if it goes down, but not all the DW are being used 24x7 and are mission critical. One size doesn’t fit all. And, if your DW is not required to be highly available, you need to ask yourself, whether it is fair for you to pay for the HA architectural cost, if you don’t want it. Tiered SLAs are not new, and tiered HA is not a terrible idea.

Let’s talk about the DWs that do require to be highly available.

“Moreover, there is no reason to write a DBMS log if this is going to be the recovery tactic. As such, a source of run-time overhead can be avoided.”

I am a little confused how this is worded. Which logs are we referring to - the source systems or the target systems? The source systems are beyond the control of a BI vendor. There are newer approaches to design an OLTP system without a log, but that’s not up for discussion for this assertion. If the assertion is referring to the logs of the target system, how does that become a run-time overhead? Traditional DW systems are a read-only system at runtime. They don’t write logs back to the system. If he is referring to the logs while the data is being moved to DW, that’s not really run-time, unless we are referring to it as a hot-transfer.

There is one more approach, NoSQL, where eventual consistency is achieved over a period of time and the concept of a “corrupted system” is going away. Incomplete data is an expected behavior and people should plan for it. That’s the norm, regardless of a system being HA or not. Recently Netflix moved some of its applications to the cloud, where they have designed a background data fixer to deal with data inconsistencies.

HA is not black and white, and there are way more approaches, beyond the logs, to accomplish to achieve desired outcome.

Assertion 9: DBMSs should support online reprovisioning.

“Hardly anybody wants to take the required amount of down time to dump and reload the DBMS. Likewise, it is a DBA hassle to do so. A much better solution is for the DBMS to support reprovisioning, without going offline. Few systems have this capability today, but vendors should be encouraged to move quickly to provide this feature.”

I agree. I would add one thing. The vendors, even today, have a trouble supporting offline provisioning to cater to the increasing load. On-line reprovisioning is not trivial, since in many cases, it requires to re-architect their systems. The vendors typically get away with this, since the most customers don’t do capacity planning in real-time. Unfortunately, traditional BI systems are not commodity where the customers can plug-in more blades when they want and take them out when they don’t.

This is the fundamental premise behind why cloud makes it a great BI platform to address such re-provisioning issues with elastic computing. Read my post “The Future Of BI In The Cloud”, if you are inclined to understand how horizontal scale-out systems can help.

Assertion 10: Virtualization often has performance problems in a DBMS world.

This assertion, and the one before this, made me write the post “The Future Of BI In The Cloud”. I would not repeat what I wrote there, but I will quickly highlight what is relevant.

“Until better and cheaper networking makes remote I/O as fast as local I/O at a reasonable cost, one should be very careful about virtualizing DBMS software.”

Virtualizing I/O is not a solution for large DW with complex queries. However, as I wrote in the post, a good solution is not to make the remote I/O faster, but rather tap into the innovation of software-only SSD block I/O that are local.

“Of course, the benefits of a virtualized environment are not insignificant, and they may outweigh the performance hit. My only point is to note that virtualizing I/O is not cheap.”

This is what a disruption initially looks like. You start seeing good enough value in an approach, for certain types of solutions, that seems expensive for other set of solutions. Over a period of time, rapid innovation and economies of scale remove this price barrier. I think that’s where the virtualization stands, today. The organizations have started to use the cloud for IaaS and SaaS for a variety of solutions including good enough self-service BI and performance optimization solutions. I expect to see more and more innovation in this area where traditional large DW will be able to get enough value out of the cloud, even after paying the virtualization overhead.

Thursday, October 28, 2010

Challenging Stonebraker’s Assertions On Data Warehouses - Part 1

I have tremendous respect for Michael Stonebraker. He is an apt visionary. What I like the most about him is his drive and passion to commercialize the academic concepts. ACM recently published his article “My Top 10 Assertions About Data Warehouses." If you haven’t read it, I would encourage you to read it.

I agree with some of his assertions and disagree with a few. I am grounded in reality, but I do have a progressive viewpoint on this topic. This is my attempt to bring an alternate perspective to the rapidly changing BI world that I am seeing. I hope the readers take it as constructive criticism. This post has been sitting in my draft folder for a while. I finally managed to publish it. This is Part 1 covering the assertions 1 to 5. The Part 2 with the rest of the assertions will follow in a few days.

“Please note that I have a financial interest in several database companies, and may be biased in a number of different ways.”

I appreciate Stonebraker’s disclaimer. I do believe that his view is skewed to what he has seen and has invested into. I don’t believe there is anything wrong with it. I like when people put money where their mouth is.

As you might know, I work for SAP, but this is my independent blog and these are my views and not those of SAP’s. I also try hard not to have SAP product or strategy references on this blog to maintain my neutral perspective and avoid any possible conflict of interest.

Assertion 1: Star and snowflake schemas are a good idea in the data warehouse world.

This reads like an incomplete statement. The star and snowflake schemas are a good idea because they have been proven to perform well in the data warehouse world with row and column stores. However, there are emergent NoSQL based data warehouse architectures I have started to see that are far from a star or a snowflake. They are in fact schemaless.

“Star and Snowflake schemas are clean, simple, easy to parallelize, and usually result in very high-performance database management system (DBMS) applications.”

The following statement contradicts the statement above.

“However, you will often come up with a design having a large number of attributes in the fact table; 40 attributes are routine and 200 are not uncommon. Current data warehouse administrators usually stand on their heads to make "fat" fact tables perform on current relational database management systems (RDBMSs).”

There are a couple of problems with this assertion:
  1. The schema is not simple; 200 attributes, fact tables, and complex joins. What exactly is simple?
  2. Efficient parallelization of a query is based on many factors, beyond the schema. How the data is stored and partitioned, performance of a database engine, and hardware configuration are a few to name.
"If you are a data warehouse designer and come up with something other than a snowflake schema, you should probably rethink your design.”

Really?

The requirement, that the schema has to be perfect upfront, has introduced most of the problems in the BI world. I call it the design time latency. This is the time it takes after a business user decides what report/information to request and by the time she gets it (mostly the wrong one.) The problem is that you can only report based what you have in your DW and what’s tuned.

This is why the schemaless approach seems more promising as it can cut down the design time latency by allowing the business users to explore the data and run ad hoc queries without locking down on a specific structure.

Assertion 2: Column stores will dominate the data warehouse market over time, replacing row stores.

This assertion assumes that there are only two ways of organizing data, either in a row store or in a column store. This is not true. Look at my NoSQL explanation above and also in my post “The Future Of BI In The Cloud”, for an alternate storage approach.

This assertion also assumes that the access performance is tightly dependent on how the data is stored. While this is true in the most cases, many vendors are challenging this assumption by introducing an acceleration layer on top of the storage layer. This approach makes is feasible to achieve consistent query performance, by clever acceleration architecture, that acts as an access layer, and does not depend on how data is stored and organized.

“Since fact tables are getting fatter over time as business analysts want access to more and more information, this architectural difference will become increasingly significant. Even when "skinny" fact tables occur or where many attributes are read, a column store is still likely to be advantageous because of its superior compression ability."

I don’t agree with the solution that we should have fatter fact tables when business analysts want more information. Even if this is true, how will column store be advantageous when the data grows beyond the limit where compression isn’t that useful?

“For these reasons, over time, column stores will clearly win”

Even if it is only about rows versus columns, the column store may not be a clear commercial winner in the marketplace. Runtime performance is just one of many factors that the customers consider while investing in DW and business intelligence.

“Note that almost all traditional RDBMSs are row stores, including Oracle, SQLServer, Postgres, MySQL, and DB2.”

Exactly!

The row stores, with optimization and acceleration, have demonstrated reasonably good performance to stay competitive. Not that I favor one over the other, but not all row-based DW are that large or growing rapidly, and have serious performance issues, warranting a switch from a row to a column.

This leads me to my last issue with this assertion. What about a hybrid store – row and column? Many vendors are trying to figure this one out and if they are successful, this could change the BI outlook. I will wait and watch.

Assertion 3: The vast majority of data warehouses are not candidates for mainmemory or flash memory.

I am assuming that he is referring to the volatile flash memory and not flash memory as storage. Though, the SSD block storage have huge potential in the BI world.

“It will take a long time before main memory or flash memory becomes cheap enough to handle most warehouse problems.”

Not all DW are growing at the same speed. One size does not fit all. Even if I agree that the price won’t go down significantly, at the current price point, main memory and flash memory can speed up many DW without breaking the bank.

The cost of DW, and especially the cost of flash memory, is a small fraction of the overall cost; hardware, license, maintenance, and people. If the added cost of flash memory makes business more agile, reduces maintenance cost, and allows the companies to make faster decisions based on smarter insights, it’s worth it. The upfront capital cost is not the only deciding factor for BI systems.

“As such, non-disk technology should only be considered for temporary tables, very "hot" data elements, or very small data warehouses.”

This is easier said than done. The customers will spend significant more time and energy, on a complicated architecture, to isolate the hot elements and running them on a different software/hardware configuration.

Assertion 4: Massively parallel processor (MPP) systems will be omnipresent in this market.

Yes, MPP is the future. No disagreements. The assertion is not about on-premise or the cloud, but I truly believe that cloud is the future for MPP. There are other BI issues that need to be addressed before cloud makes it a good BI platform for a massive scale DW, but the cloud will beat any other platform when it comes to MPP with computational elasticity.

Assertion 5: "No knobs" is the only thing that makes any sense.

“In other words, look for "no knobs" as the only way to cut down DBA costs.”

I agree that “no knobs” is what the customers should thrive for to simplify and streamline their DW administration, but I don’t expect these knobs to significantly drive down the overall operational cost, or even the cost just associated with the DBAs. Not all the DBAs have a full time job to manage and tune the DW. The DW deployments go through a cycle where the tasks include schema design, requirements gathering, ETL design etc. Tuning or using the “knobs” is just one of many tasks that the DBAs perform. I absolutely agree that the no-knobs would certainly take some burden off the shoulders of a DBA, but I disagree that it would result into significant DBA cost-savings.

For a fairly large deployment, there is significant cost associated with the number of IT layers
that are responsible to channel the reports to the business users. There is an opportunity to invest into the right kind of architecture, technology-stack for the DW, and the tools on top of that to help increase the ratio of Business users to the BI IT. This should also help speed up the decision-making process based on the insights gained from the data. Isn’t that the purpose to have a DW to begin with? I see the self-service BI as the only way to make IT scale. Instead of cutting the DBA cost, I would rather focus on scaling the BI IT with the same budget and a broader coverage amongst the business users in an organization.

Monday, October 25, 2010

The Future Of BI In The Cloud



Actual numbers vary based on whom you ask, but the general consensus is that the Business Intelligence (BI) and Analytics in the cloud is a fast growing market. IDC expects a compounded annual growth rate (CAGR) of 22.4% through 2013. This growth is primarily driven by two kinds of SaaS applications. The first kind is a purpose-specific analytics-driven application for business processes such as financial planning, cost optimization, inventory analysis etc. The second kind is a self-service horizontal analytics application/tool that allows the customers and ISVs to analyze data and create, embed, and share analysis and visualizations.

The category that is still nascent and would require significant work is the traditional general-purpose BI on large data warehouses (DW) in the cloud. For the most enterprises, not only all the DW are on-premise, but the majority of the business systems that feed data into these DW are on-premise as well. If these enterprises were to adopt BI in the cloud, it would mean moving all the data, warehouses, and the associated processes such as ETL in the cloud. But then, the biggest opportunities to innovate in the cloud exist to innovate the outside of it. I see significant potential to build black-box appliance style systems that sit on-premise and encapsulate the on-premise complexity – ETL, lifecycle management, and integration - in moving the data to the cloud.

Assuming that the enterprises succeed in moving data to the cloud, I see a couple of challenges, if treated as opportunities, will spur the most BI innovation in the cloud.

Traditional OLAP data warehouses don’t translate well into the cloud:

The majority of on-premise data warehouses run on some flavor of a relational or a columnar database. The most BI tools use SQL to access data from these DW. These databases are not inherently designed to run natively on the cloud. On top of that, the optimizations performed on these DW such as sharding, indices, compression etc. don’t translate well into the cloud either since cloud is a horizontally elastic scale-out platform and not a vertically integrated, scale-up, system.

The organizations are rethinking their persistence as well as access languages and algorithms options, while moving their data to the cloud. Recently, Netflix started moving their systems into the cloud. It’s not a BI system, but it has the similar characteristics such as high volume of read-only data, a few index-based look-ups etc. The new system uses S3 and SimpleDB instead of Oracle (on-premise). During this transition, Netflix picked availability over consistency. Eventual consistency is certainly an option that BI vendors should consider in the cloud. I have also started seeing DW in the cloud that uses HDFS, Dynamo, and Cassandra. Not all the relational and columnar DW systems will translate well into NoSQL, but I cannot overemphasize the importance of re-evaluating persistence store and access options when you decide to move your data into the cloud.

Hive, a DW infrastructure built on top of Hadoop, is a MapReduce meet SQL approach. Facebook has a 15 petabytes of data in their DW running Hive to support their BI needs. There are a very few companies that would require such a scale, but the best thing about this approach is that you can grow linearly, technologically as well as economically.

The cloud does not make it a good platform for I/O intensive applications such as BI:

One of the major issues with the large data warehouses is, well, the data itself. Any kind of complex query typically involves an intensive I/O computation. But, the I/O virtualization on the cloud, simply does not work for large data sets. The remote I/O, due to its latency, is not a viable option. The block I/O is a popular approach for I/O intensive applications. Amazon EC2 does have block I/O for each instance, but it obviously can’t hold all the data and it’s still a disk-based approach.

For BI in the cloud to be successful, what we really need is ability for scale-out block I/O, just like scale-out computing. Good news is that there is at least one company, Solidfire, that I know, working on it. I met Dave, the founder, at the Structure conference reception. He explained to me what he is up to. Solidfire has a software solution that uses solid state drives (SSD) as scale-out block I/O. I see huge potential in how this can be used for BI applications.

When you put all the pieces together, it makes sense. The data is distributed across the cloud on a number of SSDs that is available to the processors as block I/O. You run some flavor of NoSQL to store and access this data that leverages modern algorithms and more importantly horizontally elastic cloud platform. What you get is commodity and blazingly fast BI at a fraction of cost with pay-as-you-go subscription model.
Now, that’s what I call the future of BI in the cloud.