Big Data

big data wordcloud

Is the emergence of huge external memory capacities the impetus for information in the world doubling every 20 months? Much as we attribute erratic weather to global warming, we attribute the need, nay the demand for ever-increasing external memory capacity to Big Data.

We may unequivocally state that the era of Big Data has begun. Computer scientists, physicists, business, economists, mathematicians, political scientists, government, intelligence entities, law-enforcement, bio-informaticists, sociologists, and many others are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. And to obtain this data they mine and analyze information from Twitter, Google, Verizon, Facebook, Wikipedia, and every space where large groups of people leave digital traces and deposit data. (Boyd & Crawford, 2011)

A typical business practice for large-scale data analysis is the utilization of an Enterprise Data Warehouse (EDW) that is queried by Business Intelligence (BI) software. These BI applications produce reports and interactive interfaces that summarize data via aggregation functions, so as to facilitate business decisions. (Cohen et al., 2009)

As an example, the oil and gas industry long has utilized high-performance computing (HPC) systems to analyze large data sets and to model underground reserves from seismic data. Critical to this has been the requirement of redundant commodity servers with direct-attached storage (DAS), and their ability to provide the input/output operations per second (IOPS) required to transport the data to the analytics tools. (Adshead, 2014)

Furthermore, we find that the data of the world data is doubling every three years and is now measured in exabytes. According to the How Much Information project, print, film, magnetic and optical storage media produced about 5 exabytes of new information in 2002. That is equivalent to 37,000 new Library of Congress size libraries, with its 17 million books. Of this new data, roughly 92% resides on magnetic media, mostly on hard drives. (“Executive Summary, 2014)

Intel predicts the Era of Tera, will necessitate systems that process teraflops (a trillion floating point operations per second), terabits / second of bandwidth, and terabytes (1,024 gigabytes) of data storage.

To handle all this information, people will need systems that can help them understand and interpret data, and they will find that search engines will not be up to the task.

As anyone who has searched for anything on the Web knows, results will often yield tens of thousands of results with no relevance to the search. Thus we need computers to be able to “see” data the way we do, to look beyond the 0’s and 1’s and identify what is useful to us and assemble it for our review. (Dubey, 2005)

Thus future computer and data innovations must be cognizant that:

“The great strength of computers is that they can reliably manipulate vast amounts of data very quickly. Their great weakness is that they don’t have a clue as to what any of that data actually means.”(Cass, 2004)

Please click here for references.