Big Data - the Real Information Revolution

Submitted by David Pearce Snyder on Mon, 2014-11-17 21:59

In my strategic briefings around the country, I routinely ask my audiences (largely managers and professionals), whether they have heard the term "Big Data," and most of them raise their hands. But, when I ask how many of them know exactly what "Big Data" means, almost no hands go up. The term is clearly riding a rising "hype cycle," as reflected by a report in the July 24, 2014, issue of InfoWorld Tech Watch: "A recent survey by Gartner, the IT research firm, found that 64% of large enterprises are investing in 'Big Data,' but also found that a similar chunk of firms (60%) don't have a clue as to what to do with it."

This apparently indiscriminate management enthusiasm for Big Data can be forgiven (at least in part). Big Data boosterism has been intense. One keynote speaker at the 2012 Davos World Economic Forum proclaimed that "Big Data is a new class of asset, like currency or gold!" Later that year, Gartner forecast that, by 2015, Big Data "would directly generate 1.9 million new jobs, and indirectly generate 5.7 million additional new positions." And, more recently, the director of Harvard University's Institute for Quantitative Social Sciences has written that "the march of quantification, made possible by cloud computing and enormous new sources of data, will sweep through academia, business and government; no area will be left untouched."

Birth of a Notion

Such claims can only be put in proper perspective by considering the simple hypothesis on which the "Big Data" concept is based. That hypothesis was first voiced by Microsoft senior scientist Jim Gray, in an address to the National Research Council (NRC), January 11, 2007. Specifically, Gray argued that, "with an exaflood of unexamined data and teraflops of cheap computing power, we should be able to make many valuable discoveries simply by searching all that information for unexpected patterns." He called such pattern searches "data-intensive scientific discovery," and he proposed that the methodology be formally acknowledged as a fourth paradigm of scientific research, in addition to: [1] observation, [2] experimentation, and [3] computer simulation.

Jim Gray vanished while sailing off the California Coast 3 weeks after his NRC presentation. As a spontaneous tribute, his colleagues created a Website on which dozens of scientists and scholars posted papers supporting Gray's concept, which they characterized as "Big Data." Those papers were published in 2009, as a book: Data-Intensive Scientific Discovery. A year later, the NRC formally designated the concept as the fourth paradigm of scientific research.

A New Scientific Paradigm

From the outset, the behavioral and social sciences have been particularly energized by the research possibilities offered by Big Data pattern searches. So have the folks in marketing, who are confident that Big Data will enable them to discover many useful insights into consumer behavior and motivation. One national retailer is reported to have discovered that a sudden increase in the purchase of cotton balls is a remarkably accurate predictor of impending pregnancy among women of a certain age.

The transparent, sometimes shameless use of Big Data mining discoveries in consumer marketing is already eliciting increased concerns by privacy advocates, but several recent surveys have revealed that the general public is much less worried about corporate abuses of Big Data than they are about the National Security Administration. Meanwhile, the use of data-intensive discovery by the social and behavioral sciences has already begun to validate long-standing theories and reveal unexpected truths.

An Era of Social Discovery

For example, family counselors have long reported anecdotal evidence that disagreements about money are the most common source of friction between married couples. Now, a Big Data pattern search has shown that couples who have widely differing credit scores when they marry have much higher divorce rates than do couples with equal high credit scores. We've also discovered that divorce rates are lower among people who have multiple siblings than they are for single off-spring. And, at the Institute for Clinical Evaluative Sciences in Toronto, researchers have recently found that auto accident rates for pregnant women remain unchanged during their first trimester, more than double during their second trimester, and fall below normal during the final trimester. The causality behind these correlations has sparked considerable scholarly speculation.

Health, about which we have vast amounts of "unexamined data," is a fertile field for Data-Intensive Scientific Discovery. In 2013, the Mayo Clinic reported that heavy coffee consumption (over four 8 oz. cups-per-day, for more than 10 years) is associated with higher death risk among people under the age of 55. For men, mortality rates are 56% higher, and for women, rates are 200% higher. These findings came as a surprise, since all previous research had shown that coffee consumption - while causing temporary rises in heart rate, blood pressure, and blood sugar - had no apparent long-term ill effects on humans.

The unexpected results from Mayo's Big Data analysis demonstrates the unique power and efficiency of Data-Intensive Scientific Discovery. Traditional scientific research is narrowly focused; principally designed to prove - or dis-prove - a specific hypothesis. The potential value of such research is heavily dependent upon the investigator(s) framing of a purposeful, unambiguous hypothesis, typically based upon the outcomes from previous focused research. Because previous research on coffee consumption had found no permanent ill effects, there was little reason to hypothesize otherwise.

Challenging Conventional Thinking

Generalists - e.g. policy makers, behavioral economists, and futurists like myself - derive our models of reality from the scattered points of focused scholarly proofs, while filling-in the empty spaces with our informed (more-or-less) intuition. Jim Gray's data-intensive scientific discovery, on the other hand, has the potential to produce many insights from a single pattern search of well-chosen data sets. San Francisco H.R. consultancy, Evolv, compiled terabytes of employee surveys and worker career histories, enabling the firm to identify a number of reliable predictors of potential recruits' future on-the-job performance. Evolv has used this knowledge to dramatically improve the quality of new hires at Xerox and AT&T.

Evolv has (unsurprisingly) not revealed the predictive recruit selection criteria that they discovered in their Big Data bases. But, during those pattern searches they also discovered that a number of commonly-used job screening criteria are not, in fact, accurate predictors of future recruit performance, and Evolv has made those findings public. Specifically, they found that, contrary to conventional wisdom, long periods of unemployment, frequent job-hopping, felony convictions and low intelligence test scores are not statistically correlated with poor future job performance. The re-evaluation of these long-assumed employment screening criteria is likely to become a strategic necessity in the not-too-distant future, as the widely projected shortage of entry-level workers begins to shrink the supply of new recruits.

In fact, as Data-Intensive Scientific Discovery is inevitably applied across all domains of private and public enterprise, Big Data pattern searches will almost certainly call conventional wisdom into question in every field of endeavor. In education, Big Data analyses of the outcomes of schooling will ultimately tell us what mix of curricular content and methods of delivery produce the best results. As the new "Accountable Care" regimen for medical services is better informed by Big Data assessments of patient outcomes, the quality of healthcare will rise as costs fall. In every institution, practitioners and decision-makers will increasingly be awash in the discoveries produced by Big Data searches, some of which will confirm traditional organizational predispositions, and some of which will not.

From Idea to Industry in a Decade

Of course, all of these Big Data outputs won't just magically appear on decision-makers' desks. Big Data is rapidly becoming a major industry.. Big Data sets are being assembled and analyzed in thousands of cloud server centers - the "factories" of the information economy. the results of these pattern searches, in turn, will be mined and refined by armies of data scientists, and interpreted and applied by cadre's of math modelers and quantitative analysts, all of whom will constitute an in-house counter-poise to established management authority. Over the next 5 years, there will be a growing potential for cultural conflict between the increasingly powerful "quants" and traditional managers, similar to the controversies generated by the rise of Taylorism in the 1920s and '30s.

Big Data and Better Decisions

While the extreme manifestations of Frederick Taylor's theories - e.g. time-and-motion studies, "one best way to work," etc. - are now anachronisms, Prof. Taylor and his disciples established the notion of "scientific management" that remains the basis of the profession today. Erik Brynjolfsson, Director of the Center for Digital Business at MIT's Sloan School of Management, has spent decades assessing the impacts of IT on economic performance. He has now demonstrated that "Companies that adopt data-directed decision-making enjoy a 5% to 6% boost in productivity. " In order to maintain its legitimacy, 21st Century management will have to embrace the newly established paradigm of "data-intensive scientific discovery."

Responsible leadership in both the private and public sectors can scarcely be expected to ignore the proven capacity of Big Data research to produce big improvements in performance. Moreover, Big Data analysis is uniquely capable of providing decision-makers with prescient insights into otherwise unforeseeable developments - emergent "Black Swan " surprises - that complexity experts assure us will increasingly confront all institutions, from now on. At the same time, responsible leadership cannot afford to ignore the lessons of our recent past, in which unconstrained and ill-advised applications of quantative risk analysis contributed to the credit bubble that nearly wrecked the global economy. As they did with Taylorism, managers will not only have to embrace big data, they will have to civilize it as well.

David Pearce Snyder is a consulting futurist, and has been a contributing editor for The Futurist magazine since 1979. He can be reached at david_snyder@verizon.net.

David Pearce Snyder's blog