Wednesday, October 22, 2014: Big Data is a concept which is quite broad and comprises several trends and technology developments. Over last few years Big Data technologies are getting due attention and there are several trends and innovations in this space in recent times. Here we'll discuss top ten emerging Big Data technologies.
1. Column-oriented databases:
Traditional databases are excellent in online transaction processing but when it comes to query performance while data volumes grow, these databases fall short on performance. The new column-oriented databases store data and focuses on columns and not rows. It allows huge data compression and faster query times.
2. Streaming Big Data analytics:
There are several projects in this section including Storm, Spark, Data Torrent, Spring XD and SQL Stream. Apache Storm is an open source distributed real-time computation system which simplifies streams of data and real-time processing. Spark is a data processing platform which is compatible with Hadoop. DataTorrent is a real-time streaming platform which enables businesses to perform data processing. Spring XD supports streams for event driven data while SQLStream provides a distributed stream processing platform for streaming analytics, visualization and continuous integration of machine data.
3. Schema-less databases, or NoSQL databases:
This database category includes key-value stores and document stores. This database focuses on storage and retrieval of large volumes of unstructured, semi-structured or even structured data.
This technology includes Apache Hive, Shark, Apache Drill, Presto and Phoenix among many others. It helps in making queries and it also manages large datasets in distributed storage. Shark is a data warehouse system which supports Hive's query language. Apache Drill is an Apache incubation project and it's designed for scalability. It's backed by MapR. Presto is an open source distributed SQL query engine and Phoenix is an open source SQL query engine for Apache Hbase.
It's a programming paradigm which allows massive job execution scalability against thousands of servers or clusters of servers. Its two tasks are Map task and Reduce task. It converts any input dataset into different set of value pairs while reducing set of tuples.
Hadoop is an open source platform for handling Big Data which can work with multiple data sources. It has other applications too and it's largely used for changing data like location-based data from weather or traffic sensors, web-based or social media data or machine-to-machine transactional data.
PIG brings the Hadoop project close to developers and business users and it's used by Perl like language allowing query execution over data stored on a Hadoop cluster. PIG was a project by Yahoo! But now it's completely open source.
8. Big Data Lambda Architecture:
Lambda Architecture is a hybrid platform which combines real-time data and pre-computed data to provide a near-real time view of the data at all times. Its frameworks include Summingbird by Twitter and Lambdoop.
It almost copies Hadoop and it requires developer knowledge to operate. It's a platform which turns queries into Hadoop jobs with immediate effect and creates an abstraction layer to simplify the datasets in Hadoop.
It's a high performance machine learning data analytics platform which handles Big Data. It's an essential part of Big Data.
Courtesy: TechRepublic and InfoQ
Sanchari Banerjee, EFYTIMES News Network