Thursday, January 5, 2012

Big Data Technologies and players

Big Data technology/players have four key layer/areas:

Following are some player in respective area. This list is in no means complete, these are some known players in each area.

  • Infrastructure - They key to big data infrastructure is easy scalability to handle PetaBytes of data, so the cloud become a natural choice. So you will see a lot public cloud providers in the graphic below in this area. 
  • Data Storage - The traditional methods of storage (i.e. RDBMS) is not a good option for it's price and scalability restrictions. So the new methods of storage, particularly NoSQL and DFS(distributed file system) is the paradigm shift in storage arena. Among these Hadoop HDFS is most commonly used storage for Big Data, however there are other storage are also used depending upon the use case. 
  • Data Processing and Management - In this area Hadoop Mapreduce stand out, as this is the framework that is used for processing massing amount of data in parallel. Products Vendors also have implemented the same principles into their products. 
  • Data Analytics - This is the area where lot of old vendors are still in play to provide visualization and    predictive analytics. Hadoop project is also implementing libraries like Mahout but are not as mature as some of thee product vendors are.  Another are that has emerged because of Big Data is "Dataset Providers". There players also provide public datasets which you can simply download and use for your analytics. You can get some 
Following are some public data providers. 

For deep dive into each of the technology areas I am writing hadoop blogs. Please refer to those for details.

6 comments:

  1. Thanks for structuring the various players in the market.

    I wonder about several of the products and their categorization, e.g. splunk is not only an analytics product but a data storage too - at least that was my understanding at the big data conference in Germany, two month ago.

    You could also add ParStream at the data storage level, of course wiht processing and analtics.

    How are you planning to go from here - add further products to your diagram?
    Check out the following diagram from the 451-group:
    http://blogs.the451group.com/opensource/2011/04/15/nosql-newsql-and-beyond-the-answer-to-sprained-relational-databases

    ReplyDelete
  2. Micheal, I know exactly what you mean. I am looking for a good representation. I liked the on from 451-group, but that looks too complex.

    Also I agree that there are some players that can fall into multiple categories. I will update this and definitely put ParStream in this graphic.

    ReplyDelete
  3. This is a great start and thanks for posting it. If you were to add in all the various players this would be a very large and crowded diagram. As we do research of Big Data players, we have been attempting to categorize them in a similar manner and are also including 'traditional' technology vendors with some big data capabilities. How are you planning to proceed with this? This would be a good effort to collaborate on to document a holistic list of the many vendors and where they fit into the stack. I agree that multiple vendors will fit into multiple categories which will further crowd the graphic.

    ReplyDelete
  4. Mike, It's a good idea to collaborate. Do send me you website where I can look at your representation.

    ReplyDelete
  5. Mike, Rajan -- I would also add InfiniDB. Currently the most scalable DB for analytics on massive datasets. Our largest customer is 7 Petabytes.

    ReplyDelete
  6. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care.As always, we appreciate you confidence and trust in us.


    AttendaSoft Offshore project management
    AttendaSoft NearShore project management
    Software Development Company in Chennai
    AttendaSoft cloud technology

    ReplyDelete