Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives (Paperback)

TK. 718

  বিকাশ অ্যাপ পেমেন্টে ২০% ক্যাশব্যাক (৩০০ টাকা পর্যন্ত)

In Stock (10 copies available)
Express Delivery
5 Ratings

Product Specification & Summary

'Big Data Analytics Beyond Hadoop:' Foreword of the book
One point that I attempt to impress upon people learning about Big Data is that while Apache Hadoop is quite useful, and most certainly quite successful as a technology, the underlying premise has become dated. Consider the timeline: MapReduce implementation by Google came from work that dates back to 2002, published in 2004. Yahoo! began to sponsor the Hadoop project in 2006. MR is based on the economics of data centers from a decade ago. Since that time, so much has changed: multi-core processors, large memory spaces, 10G networks, SSDs, and such, have become cost-effective in the years since. These dramatically alter the trade-offs for building fault-tolerant distributed systems at scale on commodity hardware. Moreover, even our notions of what can be accomplished with data at scale have also changed. Successes of firms such as Amazon, eBay, and Google raised the bar, bringing subsequent business leaders to rethink, “What can be performed with data?” For example, would there have been a use case for large-scale graph queries to optimize business for a large book publisher a decade ago? No, not particularly. It is unlikely that senior executives in publishing would have bothered to read such an outlandish engineering proposal. The marketing of this book itself will be based on a large-scale, open source, graph query engine described in subsequent chapters. Similarly, the ad-tech and social network use cases that drove the development and adoption of Apache Hadoop are now dwarfed by data rates from the Industrial Internet, the so-called “Internet of Things” (IoT)—in some cases, by several orders of magnitude. The shape of the underlying systems has changed so much since MR at scale on commodity hardware was first formulated. The shape of our business needs and expectations has also changed x BIG DATA ANALYTICS BEYOND HADOOP dramatically because many people have begun to realize what is possible. Furthermore, the applications of math for data at scale are quite different than what would have been conceived a decade ago. Popular programming languages have evolved along with that to support better software engineering practices for parallel processing. Dr. Agneeswaran considers these topics and more in a careful, methodical approach, presenting a thorough view of the contemporary Big Data environment and beyond. He brings the read to look past the preceding decade’s fixation on batch analytics via MapReduce. The chapters include historical context, which is crucial for key understandings, and they provide clear business use cases that are crucial for applying this technology to what matters. The arguments provide analyses, per use case, to indicate why Hadoop does not particularly fit—thoroughly researched with citations, for an excellent survey of available open source technologies, along with a review of the published literature for that which is not open source. This book explores the best practices and available technologies for data access patterns that are required in business today beyond Hadoop: iterative, streaming, graphs, and more. For example, in some businesses revenue loss can be measured in milliseconds, such that the notion of a “batch window” has no bearing. Real-time analytics are the only conceivable solutions in those cases. Open source frameworks such as Apache Spark, Storm, Titan, GraphLab, and Apache Mesos address these needs. Dr. Agneeswaran guides the reader through the architectures and computational models for each, exploring common design patterns. He includes both the scope of business implications as well as the details of specific implementations and code examples. Along with these frameworks, this book also presents a compelling case for the open standard PMML, allowing predictive models to be migrated consistently between different platforms and environments. It also leads up to YARN and the next generation beyond MapReduce. FOREWORD xi This is precisely the focus that is needed in industry today—given that Hadoop was based on IT economics from 2002, while the newer frameworks address contemporary industry use cases much more closely. Moreover, this book provides both an expert guide and a warm welcome into a world of possibilities enabled by Big Data analytics.
Contents:
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
About the Author. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Chapter 1 Introduction: Why Look Beyond Hadoop Map-Reduce? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Hadoop Suitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Big Data Analytics: Evolution of Machine Learning Realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Chapter 2 What Is the Berkeley Data Analytics Stack (BDAS)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Motivation for BDAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
BDAS Design and Architecture. . . . . . . . . . . . . . . . . . . . . 26
Spark: Paradigm for Efficient Data Processing on a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Shark: SQL Interface over a Distributed System . . . . . . . 42
Mesos: Cluster Scheduling and Management System . . . 46
Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 3 Realizing Machine Learning Algorithms with Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61
Basics of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 61
Logistic Regression: An Overview . . . . . . . . . . . . . . . . . . . 67
Logistic Regression Algorithm in Spark. . . . . . . . . . . . . . . 70
Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . 74
PMML Support in Spark . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Machine Learning on Spark with MLbase . . . . . . . . . . . . 90
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
viii BIG DATA ANALYTICS BEYOND HADOOP Chapter 4 Realizing Machine Learning Algorithms in Real Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Introduction to Storm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Design Patterns in Storm . . . . . . . . . . . . . . . . . . . . . . . . . 102
Implementing Logistic Regression Algorithm in Storm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Implementing Support Vector Machine Algorithm in Storm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Naive Bayes PMML Support in Storm . . . . . . . . . . . . . . 113
Real-Time Analytic Applications . . . . . . . . . . . . . . . . . . . 116
Spark Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Chapter 5 Graph Processing Paradigms. . . . . . . . . . . . . . . . . . . . .129
Pregel: Graph-Processing Framework Based on BSP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Open Source Pregel Implementations. . . . . . . . . . . . . . . 134
GraphLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Chapter 6 onclusions: Big Data Analytics Beyond Hadoop Map-Reduce. . . . . . . . . . . . . . . . . . . . . . . . . . .161
Overview of Hadoop YARN . . . . . . . . . . . . . . . . . . . . . . . 162
Other Frameworks over YARN . . . . . . . . . . . . . . . . . . . . 165
What Does the Future Hold for Big Data Analytics? . . . 166
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Appendix A Code Sketches . . . . . . . . . . . . . . . . . . . . . . . . .171
Code for Naive Bayes PMML Scoring in Spark . . . . . . . 171
Code for Linear Regression PMML Support in Spark . . . . . . . . . . . . . . . . . . . . . . . . 182
Page Rank in GraphLab . . . . . . . . . . . . . . . . . . . . . . . . . . 186
SGD in GraphLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Title Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives
Author
Publisher
ISBN 9789332540361
Edition 1st Impression, 2015
Number of Pages 216
Country India
Language English

Customers who bought this product also bought

Reviews and Ratings

5.0

5 Ratings

call center

Help: 16297 / 01519521971 24 Hours a Day, 7 Days a Week

Pay cash on delivery

Pay cash on delivery Pay cash at your doorstep

All over Bangladesh

Service All over Bangladesh

Happy Return

Happy Return All over Bangladesh