At #DWS19, the DataWorks Summit in Barcelona, Cloudera introduced its strategy for bringing together Hortonworks Data Platform (HDP) and Cloudera Distribution Hadoop (CDH) following its merger. According to Cloudera’s Hollison, the CDP will have four primary elements: support for multi-function analytics; support every possible means of cloud delivery with a common metadata catalog and schema; a common security and governance model across both; and it shall be open platform.
Flux7 engineer Ahsan Ali and CTO Ali Hussain collaborated on this post
The rise of IoT has given rise to a new generation of needs in the world of big data processing. Now we need to handle data ingress from many sensors around the world, and make real-time decisions to be executed by these devices. As such it is no surprise we see new services to handle processing of streaming data, such as Amazon Kinesis.
In our previous post here, we detailed why Ganglia is a good tool for monitoring clusters. However, when monitoring a Hadoop cluster you often need more information about CPU, disk, memory, and nodal network statistics than the generic Ganglia config can provide. For those who need more finely tuned monitoring, Hadoop supports a framework for recording internal statistics and then for posting them to an external source, either to a file or to Ganglia. In fact, Hadoop now supports an implementation of the Metrics2 Framework for Ganglia. In this post we’ll discuss Hadoop Metrics2 Framework’s design and how it enables Ganglia metrics.
Cassandra is a one stop choice for data driven organizations dealing with real-time Big Data operations for their core functionalities. Now what makes it so dear to the developers and organizations dealing huge databases is a bunch of features that it houses to tackle the stored data.
Amazon Elastic MapReduce is a web service that helps with Big Data challenges. EMR is a framework that splits the large amount of data into pieces, processes the pieces and gathers the result as a single output.