It gives me great pleasure to write this Foreword for this timely publication on the topic of the ever-growing list of Big Data applications. The potential for leveraging existing data from multiple sources has been articulated over and over, in an almost infinite landscape, yet it is important to remember that in doing so, domain knowledge is key to success. Naïve attempts to process data are bound to lead to errors such as accidentally regressing on noncausal variables. As Michael Jordan at Berkeley has pointed out, in Big Data applications the number of combinations of the features grows exponentially with the number of features, and so, for any particular database, you are likely to find some combination of columns that will predict perfectly any outcome, just by chance alone. It is therefore important that we do not process data in a hypothesis-free manner and skip sanity checks on our data.
In this collection titled “Guide to Big Data Applications,” the editor has assembled a set of applications in science, medicine, and business where the authors have attempted to do just this—apply Big Data techniques together with a deep understanding of the source data. The applications covered give a flavor of the benefits of Big Data in many disciplines. This book has 19 chapters broadly divided into four parts. In Part I, there are four chapters that cover the basics of Big Data, aspects of privacy, and how one could use Big Data in natural language processing (a particular concern for privacy). Part II covers eight chapters that
vii
viii Foreword
look at various applications of Big Data in environmental science, oil and gas, and civil infrastructure, covering topics such as deduplication, encrypted search, and the friendship paradox.
Part III covers Big Data applications in medicine, covering topics ranging from “The Impact of Big Data on the Physician,” written from a purely clinical perspective, to the often discussed deep dives on electronic medical records. Perhaps most exciting in terms of future landscaping is the application of Big Data application in healthcare from a developing country perspective. This is one of the most promising growth areas in healthcare, due to the current paucity of current services and the explosion of mobile phone usage. The tabula rasa that exists in many countries holds the potential to leapfrog many of the mistakes we have made in the west with stagnant silos of information, arbitrary barriers to entry, and the lack of any standardized schema or nondegenerate ontologies.
In Part IV, the book covers Big Data applications in business, which is perhaps the unifying subject here, given that none of the above application areas are likely to succeed without a good business model. The potential to leverage Big Data approaches in business is enormous, from banking practices to targeted advertising. The need for innovation in this space is as important as the underlying technologies themselves. As Clayton Christensen points out in The Innovator’s Prescription, three revolutions are needed for a successful disruptive innovation:
1. A technology enabler which “routinizes” previously complicated task
2. A business model innovation which is affordable and convenient
3. A value network whereby companies with disruptive mutually reinforcing
economic models sustain each other in a strong ecosystem
We see this happening with Big Data almost every week, and the future is
exciting.
In this book, the reader will encounter inspiration in each of the above topic areas and be able to acquire insights into applications that provide the flavor of this fast-growing and dynamic field.
Atlanta, GA, USA Gari Clifford