Amazon recently introduced new types of storage-optimized instances. This new generation of instances is available within the I2 and HI1 families. All provide high storage and better IO performance compared to other instance families in AWS. Flux7 Labs decided to benchmark these new instances to better understand the tradeoffs between them that our customers face.
The Amazon I2 Instance Type
Amazon has announced immediate availability of the I2 instance type, the next generation of Amazon EC2 High I/O instance and the best solution for transactional systems and high performance NoSQL databases such as Cassandra and MongoDB. I2 instances feature the latest generation of Intel Ivy Bridge processors, the Intel Xeon E5-2670 v2. Each virtual CPU (vCPU) is a hardware hyperthread from an Intel Xeon E5-2670 v2 (Ivy Bridge) processor. Its features, price and availability can be combined to derive a performance-oriented usage and to explore new use cases.
In our previous post here, we detailed why Ganglia is a good tool for monitoring clusters. However, when monitoring a Hadoop cluster you often need more information about CPU, disk, memory, and nodal network statistics than the generic Ganglia config can provide. For those who need more finely tuned monitoring, Hadoop supports a framework for recording internal statistics and then for posting them to an external source, either to a file or to Ganglia. In fact, Hadoop now supports an implementation of the Metrics2 Framework for Ganglia. In this post we’ll discuss Hadoop Metrics2 Framework’s design and how it enables Ganglia metrics.
Recently at Flux7 Labs we developed an end-to-end Internet of Things project that received sensor data to provide reports to service-provider end users. Our client asked us to support multiple service providers for his new business venture. We knew that rearchitecting the application to incorporate major changes would prove to be both time-consuming and expensive for our client. It also would have required a far more complicated, rigid and difficult-to-maintain codebase.
On January 11, Aater and I attended Data Day Texas 2014 here in Austin. Sponsored by Geek Austin, it was such a great event that I thought I’d share some highlights. Data Day Texas holds special significance for Flux7 Labs because it was at Data Day 2013 that we made our first presentation, when Aater gave a talk on the role of microservers in big data, which you can find here.
At Flux7 Labs we solve a variety of problems for our customers and often that includes guiding clients to the right tools for their needs. In our previous post on NoSQL, we discussed how NoSQL solutions offer a better alternative to RDBMSs. In this post we’ll walk you through different types of NoSQL database models and solutions and show you how different architectures and design philosophies support various features. We’ll explain how NoSQL can be a tool that better serves your needs than a one-size-fits-all tool like an RDBMS.
As mentioned in part 1 of this series (Creating a LAMP Stack AMI), a common concern among most customers is to choose the right instance type.
Big companies, including Amazon, Google, Facebook and Yahoo, first adopted NoSQL for in-house solutions due to the lack of RDBMS feature support for their ever-changing needs. By providing weak consistency and optimizing for certain use cases, they’re able to utilize large distributed systems to handle their required workloads. There are five ways that NoSQL handles large workloads differently than do traditional RDBMSs, and in which NoSQL outperforms RDBMSs.
Cassandra is a one stop choice for data driven organizations dealing with real-time Big Data operations for their core functionalities. Now what makes it so dear to the developers and organizations dealing huge databases is a bunch of features that it houses to tackle the stored data.
”BigData” is a term that has been buzzing around a lot for the last few years. And when you hear this buzz, you’ll hear ”Hadoop” as well. In last 2-3 years, many big players in the industry have come up with their own distribution of Apache Hadoop, be it Intel, Microsoft, IBM, or EMC, etc. Also, some startups, focusing only on Hadoop, have become big players now – Cloudera, Hortonworks – in this area.