Thursday, January 5, 2012

Big Data Technologies and players

Big Data technology/players have four key layer/areas:

Following are some player in respective area. This list is in no means complete, these are some known players in each area.

  • Infrastructure - They key to big data infrastructure is easy scalability to handle PetaBytes of data, so the cloud become a natural choice. So you will see a lot public cloud providers in the graphic below in this area. 
  • Data Storage - The traditional methods of storage (i.e. RDBMS) is not a good option for it's price and scalability restrictions. So the new methods of storage, particularly NoSQL and DFS(distributed file system) is the paradigm shift in storage arena. Among these Hadoop HDFS is most commonly used storage for Big Data, however there are other storage are also used depending upon the use case. 
  • Data Processing and Management - In this area Hadoop Mapreduce stand out, as this is the framework that is used for processing massing amount of data in parallel. Products Vendors also have implemented the same principles into their products. 
  • Data Analytics - This is the area where lot of old vendors are still in play to provide visualization and    predictive analytics. Hadoop project is also implementing libraries like Mahout but are not as mature as some of thee product vendors are.  Another are that has emerged because of Big Data is "Dataset Providers". There players also provide public datasets which you can simply download and use for your analytics. You can get some 
Following are some public data providers. 

For deep dive into each of the technology areas I am writing hadoop blogs. Please refer to those for details.