Tuesday, August 23, 2011

What is Big Data


Definitions:
“Big Data” is when size of the data itself becomes part of the “problem”

“Big Data” is a term applied to data sets that are large, complex and dynamic, and is beyond the ability of commonly used software tools to capture, manage, and process it within a tolerable elapsed time.

“Big Data” is also used as an umbrella term for ecosystem of processing and storage of large, complex and dynamic data using Map/Reduce processing and NoSQL storage techniques.


Dimension of Big Data:
 Volume (amount of data), Velocity(rate of data in/out) and Variety(range of data types).

Some facts:

  • A recent IDC study projected that the total volume of electronically stored data and files - the digital universe - would reach 1.2 zettabytes in 2010. That's 21 zeros behind the 1, if you're keeping count. 
  • Traditional DW/BI tools are capable of handling 5TB of data at a time.

Some of you may ask,"This data has been there for a long time, what's the big deal now?"

The answer to that is:
Need of massive processing resources and storage posed biggest challenge in  putting this data to use. With access to cheap storage and large/scalable  computing infrastructure, the significance of the dormant data has increased many fold. And there are new technologies available to give insight into un-structure data.
Usages of Big Data:
1.Information transparency and usability at much higher frequency
2.More accurate and detailed performance information
3.Ever-narrower segmentation of customers

So let's re-define Big-Data:
"Big Data technologies describe a new generation of technologies and architecture, designed to economically extract value from very large volumes of wide variety of data by enabling high velocity capture, discovery and/or analysis" 


Having laid down the basics of understanding the Big Data. I will get a little deeper into ecosystem of Big Data in my next post.