Tuesday, August 21, 2012

Book Review: Hadoop: The Definitive Guide, Second Edition by Tom White

















Anyone interested in big data management today has at least a passing familiarity with Hadoop, an open source map-reduce algorithm implementation. Here's my review of the second edition of one of the most comprehensive books on the topic.
As a longtime hadoop enthusiast, I already had read the first book, I was interested in finding out what this second edition has in store for the readers.

The book builds over its predecessor and apart from addition of Hive and Sqoop, a case study covering graph visualization in social networks has been added. The hadoop version has been updated, as a developer, I'd recommend latest stable release of hadoop as it is an active project. However, as Tom White is himself a committer in this project, various project insights are added along the way as in the original edition.

From the first time hadoop adopter's point of view too, this text is an easy to adapt and the learning curve of hadoop is lessened to a great extent.
The book starts by building the context, presenting the history and ecosystem of hadoop and gives its user a high level overview. The underpinings of hadoop, or the mapreduce algorithm and its implementation in hadoop is covered in the next few chapters. This contains practical aspects of running any hadoop application including HDFS file manipulation and map reduce operation in detail. An exhaustive list of mapreduce techniques alongwith their examples are then covered that come up in everyday development while using hadoop api to interface with big data.
Another highlight of this book is the comprehensiveness of running and deploying hadoop in various configurations. Also, closely knit data management tools in the hadoop ecosystem or its sub-projects such as pig, hive, hbase, zookeeper and sqoop have been covered.
This is followed by various case studies that make an interesting read. It was disheartening to see no major updates in the case studies compared to the previous edition .

From a person already having the original edition of this book, the second edition does not have much to cover, but for a person not having read any previous editions, this is a comprehensive book.
Note: This book has been provided to me for reviewing under the Oreilly Blogger Review Program.

No comments: