Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own big data analytics solution. You’ll not only be able to determine which service best fits the job, but also learn how to implement a complete solution that scales, provides human fault tolerance, and supports future needs. Understand the fundamental patterns of the data lake and lambda architecture Recognize the canonical steps in the analytics data pipeline and learn how to use Azure Data Factory to orchestrate them Implement data lakes and lambda architectures, using Azure Data Lake Store, Data Lake Analytics, HDInsight (including Spark), Stream Analytics, SQL Data Warehouse, and Event Hubs Understand where Azure Machine Learning fits into your analytics pipeline Gain experience using these services on real-world data that has real-world problems, with scenarios ranging from aviation to Internet of Things (IoT)
Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own big data analytics solution. You’ll not only be able to determine which service best fits the job, but also learn how to implement a complete solution that scales, provides human fault tolerance, and supports future needs. Understand the fundamental patterns of the data lake and lambda architecture Recognize the canonical steps in the analytics data pipeline and learn how to use Azure Data Factory to orchestrate them Implement data lakes and lambda architectures, using Azure Data Lake Store, Data Lake Analytics, HDInsight (including Spark), Stream Analytics, SQL Data Warehouse, and Event Hubs Understand where Azure Machine Learning fits into your analytics pipeline Gain experience using these services on real-world data that has real-world problems, with scenarios ranging from aviation to Internet of Things (IoT)
Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own big data analytics solution. You’ll not only be able to determine which service best fits the job, but also learn how to implement a complete solution that scales, provides human fault tolerance, and supports future needs. Understand the fundamental patterns of the data lake and lambda architecture Recognize the canonical steps in the analytics data pipeline and learn how to use Azure Data Factory to orchestrate them Implement data lakes and lambda architectures, using Azure Data Lake Store, Data Lake Analytics, HDInsight (including Spark), Stream Analytics, SQL Data Warehouse, and Event Hubs Understand where Azure Machine Learning fits into your analytics pipeline Gain experience using these services on real-world data that has real-world problems, with scenarios ranging from aviation to Internet of Things (IoT)
Tap the power of Big Data with Microsoft technologies Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies. Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop. Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools Explores both on-premises and cloud-based solutions Shows how to store, manage, analyze, and share Big Data through the enterprise Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more Helps you build and execute a Big Data plan Includes contributions from the Microsoft and HortonWorks Big Data product teams If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.
Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are covered, including Hive, Pig, HBase, Storm, and Spark on Azure HDInsight, and code samples are written in .NET only. Processing Big Data with Azure HDInsight covers the fundamentals of big data, how businesses are using it to their advantage, and how Azure HDInsight fits into the big data world. This book introduces Hadoop and big data concepts and then dives into creating different solutions with HDInsight and the Hadoop Ecosystem. It covers concepts with real-world scenarios and code examples, making sure you get hands-on experience. The best way to utilize this book is to practice while reading. After reading this book you will be familiar with Azure HDInsight and how it can be utilized to build big data solutions, including batch processing, stream analytics, interactive processing, and storing and retrieving data in an efficient manner. What You'll Learn Understand the fundamentals of HDInsight and Hadoop Work with HDInsight cluster Query with Apache Hive and Apache Pig Store and retrieve data with Apache HBase Stream data processing using Apache Storm Work with Apache Spark Who This Book Is For Software developers, technical architects, data scientists/analyts, and Hadoop administrators who want to develop on Microsoft’s managed Hadoop offering, HDInsight
A practical guide to implementing your enterprise data lake using Lambda Architecture as the base About This Book Build a full-fledged data lake for your organization with popular big data technologies using the Lambda architecture as the base Delve into the big data technologies required to meet modern day business strategies A highly practical guide to implementing enterprise data lakes with lots of examples and real-world use-cases Who This Book Is For Java developers and architects who would like to implement a data lake for their enterprise will find this book useful. If you want to get hands-on experience with the Lambda Architecture and big data technologies by implementing a practical solution using these technologies, this book will also help you. What You Will Learn Build an enterprise-level data lake using the relevant big data technologies Understand the core of the Lambda architecture and how to apply it in an enterprise Learn the technical details around Sqoop and its functionalities Integrate Kafka with Hadoop components to acquire enterprise data Use flume with streaming technologies for stream-based processing Understand stream- based processing with reference to Apache Spark Streaming Incorporate Hadoop components and know the advantages they provide for enterprise data lakes Build fast, streaming, and high-performance applications using ElasticSearch Make your data ingestion process consistent across various data formats with configurability Process your data to derive intelligence using machine learning algorithms In Detail The term "Data Lake" has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together. This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient. By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake. Style and approach The book takes a pragmatic approach, showing ways to leverage big data technologies and lambda architecture to build an enterprise-level data lake.
Predictive Analytics with Microsoft Azure Machine Learning, Second Edition is a practical tutorial introduction to the field of data science and machine learning, with a focus on building and deploying predictive models. The book provides a thorough overview of the Microsoft Azure Machine Learning service released for general availability on February 18th, 2015 with practical guidance for building recommenders, propensity models, and churn and predictive maintenance models. The authors use task oriented descriptions and concrete end-to-end examples to ensure that the reader can immediately begin using this new service. The book describes all aspects of the service from data ingress to applying machine learning, evaluating the models, and deploying them as web services. Learn how you can quickly build and deploy sophisticated predictive models with the new Azure Machine Learning from Microsoft. What’s New in the Second Edition? Five new chapters have been added with practical detailed coverage of: Python Integration – a new feature announced February 2015 Data preparation and feature selection Data visualization with Power BI Recommendation engines Selling your models on Azure Marketplace
This book provides you with the skills necessary to get started with Azure Machine Learning to build predictive models as quickly as possible, in a very intuitive way, whether you are completely new to predictive analysis or an existing practitioner. The book starts by exploring ML Studio, the browser-based development environment, and explores the first step—data exploration and visualization. You will then build different predictive models using both supervised and unsupervised algorithms, including a simple recommender system. The focus then shifts to learning how to deploy a model to production and publishing it as an API. The book ends with a couple of case studies using all the concepts and skills you have learned throughout the book to solve real-world problems.
Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop’s power on a flexible, scalable cloud platform using Microsoft’s newest business intelligence, visualization, and productivity tools. This book’s straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You’ll gain more of Hadoop’s benefits, with less complexity–even if you’re completely new to Big Data analytics. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Practical, hands-on examples show you how to apply what you learn Quizzes and exercises help you test your knowledge and stretch your skills Notes and tips point out shortcuts and solutions Learn how to… · Master core Big Data and NoSQL concepts, value propositions, and use cases · Work with key Hadoop features, such as HDFS2 and YARN · Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud · Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters · Integrate, analyze, and report with Microsoft BI and Power BI · Automate workflows for data transformation, integration, and other tasks · Use Apache HBase on HDInsight · Use Sqoop or SSIS to move data to or from HDInsight · Perform R-based statistical computing on HDInsight datasets · Accelerate analytics with Apache Spark · Run real-time analytics on high-velocity data streams · Write MapReduce, Hive, and Pig programs Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
Develop and manage effective real-time streaming solutions by leveraging the power of Microsoft Azure About This Book Analyze your data from various sources using Microsoft Azure Stream Analytics Develop, manage and automate your stream analytics solution with Microsoft Azure A practical guide to real-time event processing and performing analytics on the cloud Who This Book Is For If you are looking for a resource that teaches you how to process continuous streams of data in real-time, this book is what you need. A basic understanding of the concepts in analytics is all you need to get started with this book What You Will Learn Perform real-time event processing with Azure Stream Analysis Incorporate the features of Big Data Lambda architecture pattern in real-time data processing Design a streaming pipeline for storage and batch analysis Implement data transformation and computation activities over stream of events Automate your streaming pipeline using Powershell and the .NET SDK Integrate your streaming pipeline with popular Machine Learning and Predictive Analytics modelling algorithms Monitor and troubleshoot your Azure Streaming jobs effectively In Detail Microsoft Azure is a very popular cloud computing service used by many organizations around the world. Its latest analytics offering, Stream Analytics, allows you to process and get actionable insights from different kinds of data in real-time. This book is your guide to understanding the basics of how Azure Stream Analytics works, and building your own analytics solution using its capabilities. You will start with understanding what Stream Analytics is, and why it is a popular choice for getting real-time insights from data. Then, you will be introduced to Azure Stream Analytics, and see how you can use the tools and functions in Azure to develop your own Streaming Analytics. Over the course of the book, you will be given comparative analytic guidance on using Azure Streaming with other Microsoft Data Platform resources such as Big Data Lambda Architecture integration for real time data analysis and differences of scenarios for architecture designing with Azure HDInsight Hadoop clusters with Storm or Stream Analytics. The book also shows you how you can manage, monitor, and scale your solution for optimal performance. By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution that can work with any type of data. Style and approach A comprehensive guidance on developing real-time event processing with Azure Stream Analysis
Managing Data in Motion describes techniques that have been developed for significantly reducing the complexity of managing system interfaces and enabling scalable architectures. Author April Reeve brings over two decades of experience to present a vendor-neutral approach to moving data between computing environments and systems. Readers will learn the techniques, technologies, and best practices for managing the passage of data between computer systems and integrating disparate data together in an enterprise environment. The average enterprise's computing environment is comprised of hundreds to thousands computer systems that have been built, purchased, and acquired over time. The data from these various systems needs to be integrated for reporting and analysis, shared for business transaction processing, and converted from one format to another when old systems are replaced and new systems are acquired. The management of the "data in motion" in organizations is rapidly becoming one of the biggest concerns for business and IT management. Data warehousing and conversion, real-time data integration, and cloud and "big data" applications are just a few of the challenges facing organizations and businesses today. Managing Data in Motion tackles these and other topics in a style easily understood by business and IT managers as well as programmers and architects. Presents a vendor-neutral overview of the different technologies and techniques for moving data between computer systems including the emerging solutions for unstructured as well as structured data types Explains, in non-technical terms, the architecture and components required to perform data integration Describes how to reduce the complexity of managing system interfaces and enable a scalable data architecture that can handle the dimensions of "Big Data"
Collect and analyze sensor and usage data from Internet of Things applications with Microsoft Azure IoT Suite. Internet connectivity to everyday devices such as light bulbs, thermostats, and even voice-command devices such as Google Home and Amazon.com's Alexa is exploding. These connected devices and their respective applications generate large amounts of data that can be mined to enhance user-friendliness and make predictions about what a user might be likely to do next. Microsoft's Azure IoT Suite is a cloud-based platform that is ideal for collecting data from connected devices. You'll learn in this book about data acquisition and analysis, including real-time analysis. Real-world examples are provided to teach you to detect anomalous patterns in your data that might lead to business advantage. We live in a time when the amount of data being generated and stored is growing at an exponential rate. Understanding and getting real-time insight into these data is critical to business. IoT Solutions in Microsoft's Azure IoT Suite walks you through a complete, end-to-end journey of how to collect and store data from Internet-connected devices. You'll learn to analyze the data and to apply your results to solving real-world problems. Your customers will benefit from the increasingly capable and reliable applications that you'll be able to deploy to them. You and your business will benefit from the gains in insight and knowledge that can be applied to delight your customers and increase the value from their business. What You'll Learn Go through data generation, collection, and storage from sensors and devices, both relational and non-relational Understand, from end to end, Microsoft’s analytic services and where they fit into the analytical ecosystem Look at the Internet of your things and find ways to discover and draw on the insights your data can provide Understand Microsoft's IoT technologies and services, and stitch them together for business insight and advantage Who This Book Is For Developers and architects who plan on delivering IoT solutions, data scientists who want to understand how to get better insights into their data, and anyone needing or wanting to do real-time analysis of data from the Internet of Things
A detailed look at a diverse set of Cloud topics, particularly Azure and Office 365 More and more companies are realizing the power and potential of Cloud computing as a viable way to save energy and money. This valuable book offers an in-depth look at a wide range of Cloud topics unlike any other book on the market. Examining how Cloud services allows users to pay as they go for exactly what they use, this guide explains how companies can easily scale their Cloud use up and down to fit their business requirements. After an introduction to Cloud computing, you'll discover how to prepare your environment for the Cloud and learn all about Office 365 and Azure. Examines a diverse range of Cloud topics, with special emphasis placed on how Cloud computing can save businesses energy and money Shows you how to prepare your environment for the Cloud Addresses Office 365, including infrastructure services, SharePoint 2010 online, SharePoint online development, Exchange online development, and Lync online development Discusses working with Azure, including setting it up, leveraging Blob storage, building Azure applications, programming, and debugging Offers advice for deciding when to use Azure and when to use Office 365 and looks at hybrid solutions between Azure and Office 365 Tap into the potential of Azure and Office 365 with this helpful resource.
Prepare for Microsoft Exam 70-774–and help demonstrate your real-world mastery of performing key data science activities with Azure Machine Learning services. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Prepare data for analysis in Azure Machine Learning and export from Azure Machine Learning Develop machine learning models Operationalize and manage Azure Machine Learning Services Use other services for machine learning This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are familiar with Azure data services, machine learning concepts, and common data science processes About the Exam Exam 70-774 focuses on skills and knowledge needed to prepare data for analysis with Azure Machine Learning; find key variables describing your data’s behavior; develop models and identify optimal algorithms; train, validate, deploy, manage, and consume Azure Machine Learning Models; and leverage related services and APIs. About Microsoft Certification Passing this exam as well as Exam 70-773: Analyzing Big Data with Microsoft R earns your MCSA: Machine Learning certifi¿cation, demonstrating your expertise in operationalizing Microsoft Azure machine learning and Big Data with R Server and SQL R Services. See full details at: microsoft.com/learning
Learn how today’s businesses can transform themselves by leveraging real-time data and advanced machine learning analytics. This book provides prescriptive guidance for architects and developers on the design and development of modern Internet of Things (IoT) and Advanced Analytics solutions. In addition, Business in Real-Time Using Azure IoT and Cortana Intelligence Suite offers patterns and practices for those looking to engage their customers and partners through Software-as-a-Service solutions that work on any device. Whether you're working in Health & Life Sciences, Manufacturing, Retail, Smart Cities and Buildings or Process Control, there exists a common platform from which you can create your targeted vertical solutions. Business in Real-Time Using Azure IoT and Cortana Intelligence Suite uses a reference architecture as a road map. Building on Azure’s PaaS services, you'll see how a solution architecture unfolds that demonstrates a complete end-to-end IoT and Advanced Analytics scenario. What You'll Learn: Automate your software product life cycle using PowerShell, Azure Resource Manager Templates, and Visual Studio Team Services Implement smart devices using Node.JS and C# Use Azure Streaming Analytics to ingest millions of events Provide both "Hot" and "Cold" path outputs for real-time alerts, data transformations, and aggregation analytics Implement batch processing using Azure Data Factory Create a new form of Actionable Intelligence (AI) to drive mission critical business processes Provide rich Data Visualizations across a wide variety of mobile and web devices Who This Book is For: Solution Architects, Software Developers, Data Architects, Data Scientists, and CIO/CTA Technical Leadership Professionals
Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science. You’ll learn how to: Automate and schedule data ingest, using an App Engine application Create and populate a dashboard in Google Data Studio Build a real-time analysis pipeline to carry out streaming analytics Conduct interactive data exploration with Google BigQuery Create a Bayesian model on a Cloud Dataproc cluster Build a logistic regression machine-learning model with Spark Compute time-aggregate features with a Cloud Dataflow pipeline Create a high-performing prediction model with TensorFlow Use your deployed model as a microservice you can access from both batch and real-time pipelines
Direct from Microsoft, this Exam Ref is the official study guide for the Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight certification exam. Exam Ref 70-775 Perform Data Engineering on Microsoft Azure HDInsight offers professional-level preparation that helps candidates maximize their exam performance and sharpen their skills on the job. It focuses on the specific areas of expertise modern IT professionals need to successfully administer and provision HDInsight clusters, and implement effective Big Data processing solutions with HDInsight. Coverage includes: Deploy and configure HDInsight clusters, deploy and secure multi-user HDInsight clusters, ingest data for processing, and manage and debug HDInsight jobs Implement Big Data batch solutions with Hive and Apache Pig, design batch ETL solutions with Spark, and operationalize Hadoop and Spark Create and implement interactive queries with Spark SQL and Interactive Hive; perform exploratory analyses with Spark SQL and Hive, Jupyter, and Apache Zeppelin; perform interactive processing with Apache Phoenix on HBase Implement real-time processing: create Spark streaming applications (including structured streaming); leverage Apache Storm, Kafka, and HBase Microsoft Exam Ref publications stand apart from third-party study guides because they: Provide guidance from Microsoft, the creator of Microsoft certification exams Target IT professional-level exam candidates with content focused on their needs, not "one-size-fits-all" content Streamline study by organizing material according to the exam's objective domain (OD), covering one functional group and its objectives in each chapter Feature Thought Experiments to guide candidates through a set of "what if?" scenarios, and prepare them more effectively for Pro-level style exam questions Explore big picture thinking around the planning and design aspects of the IT pro's job role This is one of two exams required to earn the MCSA Data Engineering with Azure certification. (The second is Exam 70-776 Perform Big Data Engineering on Microsoft Cloud Services.) For more information on Exam 70-775 and this MCSA credential, visit microsoft.com/learning.
Electronics for Vinyl is the most comprehensive book ever produced on the electronic circuitry needed to extract the best possible signal from grooves in vinyl. What is called the "vinyl revival" is in full swing, and a clear and comprehensive account of the electronics you need is very timely. Vinyl reproduction presents some unique technical challenges; the signal levels from moving-magnet cartridges are low, and those from moving-coil cartridges lower still, so a good deal of high-quality low-noise amplification is required. Some of the features of Electronics for Vinyl include: ? integrating phono amplifiers into a complete preamplifier; ? differing phono amplifier technologies; covering active, passive, and semi-passive RIAA equalisation and transconductance RIAA stages; ? the tricky business of getting really accurate RIAA equalisation without spending a fortune on expensive components, such as switched-gain MM/MC RIAA amplifiers that retain great accuracy at all gains, the effects of finite open-loop gain, cartridge-preamplifier interaction, and so on; ? noise and distortion in phono amplifiers, covering BJTs, FETs, and opamps as input devices, hybrid phono amplifiers, noise in balanced MM inputs, noise weighting, and cartridge load synthesis for ultimately low noise; ? archival and non-standard equalisation for 78s etc.; ? building phono amplifiers with discrete transistors; ? subsonic filtering, covering all-pole filters, elliptical filters, and suppression of subsonics by low-frequency crossfeed, including the unique Devinyliser concept; ? ultrasonic and scratch filtering, including a variety of variable-slope scratch filters; ? line output technology, including zero-impedance outputs, on level indication for optimal setup, and on specialised power supplies; and ? description of six practical projects which range from the simple to the highly sophisticated, but all give exceptional performance. Electronics for Vinyl brings the welcome news that there is simply no need to spend huge sums of money to get performance that is within a hair’s breadth of the best theoretically obtainable. But you do need some specialised knowledge, and here it is.
Organizations invest incredible amounts of time and money obtaining and then storing big data in data stores called data lakes. But how many of these organizations can actually get the data back out in a useable form? Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps. Data Lake Architecture will explain how to build a useful data lake, where data scientists and data analysts can solve business challenges and identify new business opportunities. Learn how to structure data lakes as well as analog, application, and text-based data ponds to provide maximum business value. Understand the role of the raw data pond and when to use an archival data pond. Leverage the four key ingredients for data lake success: metadata, integration mapping, context, and metaprocess. Bill Inmon opened our eyes to the architecture and benefits of a data warehouse, and now he takes us to the next level of data lake architecture.
Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies About This Book Comprehend the intricacies of architecting a Data Lake and build a data strategy around your current data architecture Efficiently manage vast amounts of data and deliver it to multiple applications and systems with a high degree of performance and scalability Packed with industry best practices and use-case scenarios to get you up-and-running Who This Book Is For This book is for architects and senior managers who are responsible for building a strategy around their current data architecture, helping them identify the need for a Data Lake implementation in an enterprise context. The reader will need a good knowledge of master data management and information lifecycle management, and experience of Big Data technologies. What You Will Learn Identify the need for a Data Lake in your enterprise context and learn to architect a Data Lake Learn to build various tiers of a Data Lake, such as data intake, management, consumption, and governance, with a focus on practical implementation scenarios Find out the key considerations to be taken into account while building each tier of the Data Lake Understand Hadoop-oriented data transfer mechanism to ingest data in batch, micro-batch, and real-time modes Explore various data integration needs and learn how to perform data enrichment and data transformations using Big Data technologies Enable data discovery on the Data Lake to allow users to discover the data Discover how data is packaged and provisioned for consumption Comprehend the importance of including data governance disciplines while building a Data Lake In Detail A Data Lake is a highly scalable platform for storing huge volumes of multistructured data from disparate sources with centralized data management services. This book explores the potential of Data Lakes and explores architectural approaches to building data lakes that ingest, index, manage, and analyze massive amounts of data using batch and real-time processing frameworks. It guides you on how to go about building a Data Lake that is managed by Hadoop and accessed as required by other Big Data applications. This book will guide readers (using best practices) in developing Data Lake's capabilities. It will focus on architect data governance, security, data quality, data lineage tracking, metadata management, and semantic data tagging. By the end of this book, you will have a good understanding of building a Data Lake for Big Data. Style and approach Data Lake Development with Big Data provides architectural approaches to building a Data Lake. It follows a use case-based approach where practical implementation scenarios of each key component are explained. It also helps you understand how these use cases are implemented in a Data Lake. The chapters are organized in a way that mimics the sequential data flow evidenced in a Data Lake.

Best Books