Cloudera Apache Spark程序員
培訓(xùn)班型: 公開課,內(nèi)訓(xùn)
課程長度: 3天/18小時(shí)
培訓(xùn)日期: 待定
認(rèn)證考試: 暫無
培訓(xùn)地點(diǎn): 博學(xué)國際教育培訓(xùn)中心
環(huán)境要求: 投影儀、白板、大白紙
培訓(xùn)形式: 實(shí)例講授,現(xiàn)場演、練、及時(shí)溝通
培訓(xùn)資料: 培訓(xùn)教材
課程內(nèi)容
Cloudera Developer Training for Apache Spark
課程概述:
結(jié)合批處理、流媒體和交互分析技術(shù),利用 Apache Spark 構(gòu)建完整統(tǒng)一的大 數(shù)據(jù)應(yīng)用。學(xué)習(xí)編寫復(fù)雜的并行應(yīng)用程序,為各種用例、架構(gòu)和行業(yè)執(zhí)行快速良好的決策和實(shí)時(shí)行動(dòng)。
授課對象:
面向意欲優(yōu)化應(yīng)用程序速度、易用性和復(fù)雜程度的開發(fā)人員和工程師。培訓(xùn)對象要求 具 備Python或Scala背景知識,具備Linux 相關(guān)基礎(chǔ)知識更佳。
培訓(xùn)目標(biāo):
Using the Spark shell for interactive data analysis
? The features of Spark’s Resilient Distributed Datasets
? How Spark runs on a cluster
? How Spark parallelizes task execution
? Writing Spark applications
? Processing streaming data with Spark
課程內(nèi)容:
Introduction to Spark
? What is Spark?
? Review: From Hadoop MapReduce to Spark
? Review: HDFS
? Review: YARN
? Spark Overview
Spark Basics
? Using the Spark Shell
? RDDs (Resilient Distributed Datasets)
? Functional Programming in Spark
Working with RDDs in Spark
? Creating RDDs
? Other General RDD Operations
Aggregating Data with Pair RDDs
? Key-Value Pair RDDs
? Map-Reduce
? Other Pair RDD Operations
Writing and Deploying Spark Applications
? Spark Applications vs. Spark Shell
? Creating the SparkContext
? Building a Spark Application (Scala and Java)
? Running a Spark Application
? The Spark Application Web UI
? Hands-On Exercise: Write and Run a Spark Application
? Configuring Spark Properties
? Logging
Parallel Processing
? Review: Spark on a Cluster
? RDD Partitions
? Partitioning of File-based RDDs
? HDFS and Data Locality
? Executing Parallel Operations
? Stages and Tasks
Spark RDD Persistence
? RDD Lineage
? RDD Persistence Overview
? Distributed Persistence
Basic Spark Streaming
? Spark Streaming Overview
? Example: Streaming Request Count
? DStreams
? Developing Spark Streaming Applications
Advanced Spark Streaming
? Multi-Batch Operations
? State Operations
? Sliding Window Operations
? Advanced Data Sources
Common Patterns in Spark Data Processing
? Common Spark Use Cases
? Iterative Algorithms in Spark
? Graph Processing and Analysis
? Machine Learning
? Example: k-means
Improving Spark Performance
? Shared Variables: Broadcast Variables
? Shared Variables: Accumulators
? Common Performance Issues
? Diagnosing Performance Problems
Spark SQL and DataFrames
? Spark SQL and the SQL Context
? Creating DataFrames
? Transforming and Querying DataFrames
? Saving DataFrames
? DataFrames and RDDs
? Comparing Spark SQL, Impala and Hive-on-Spark