《Learning Apache Spark with Python_Wenqiang Feng编著》pdf电子书免费下载

下载方式一：

百度网盘下载地址：https://pan.baidu.com/s/1rt0Hz-zaHCpyRf38JKPRwg

百度网盘密码：1111

下载方式二：

http://ziliaoshare.cn/Download/af_123981_pd_LearningApacheSparkwithPython_WenqiangFengBZ.zip

作者：empty

页数：487

出版社：empty

《Learning Apache Spark with Python_Wenqiang Feng编著》介绍

This is a shared repository for Learning Apache SparkNotes.The PDF version can be downloaded fromHERE， The first version was posted on Git hub in Chen Feng([Feng2017D.This shared repository mainlycontains the self-learning and self-teaching notes from Wen qiang during his IMA Data Science Fellowship.Thereaderisreferredtotherepositoryhttps：//github.com/runawayhorse001/LeamningApacheSparkformoredetails about the dataset and the.ipy nb files.In this repository， I try to use the detailed demo code and examples to show howto use each main functions.If you find your work was n't cited in this note， please feel free to let me know.Although I am by no means an datamining programming and Big Dataexpert I decided that it would beuseful for me to share what Ile a med about Py Spark programming in the form of easy tutorials with detailedexample.I hope those tutorials will be a valuable tool for your studies，The tutorials assume that the reader has a preliminary knowledge of programming and Linux.And thisdocument is generated automatically by using sphinx.

About the authors·Wen qiang Feng·BiographyLearning Apache Spark wih Python-Sr.Data Scientist and PhD in Mathematics-University of Tennessee at Knoxville-Email：von198@gmail.comWen qiang Feng is aSr.Data Scientist at Machine Learning Lab， H&R Block.Before joining Block，Dr.Feng is a Data Scientist at Applied Analytics Group， DST(now SS&C) .Dr.Feng's responsibil-tics include providing clients with access to cutting-edge skills and technologies， including Big Dataanalytic solutions， advanced analytic and data enhancement techniques and modelingDr.Feng has deep analytic expert ie in datamining， analytic systems， machine leam ming algor thms.business intelligence， and applying Big Data tools to strategically solve industry problems in across-funcional business.Before joining DST， Dr.Feng was anIMA Data Science Fellow at The Institutefor Mathematics and its Applications(IMA) at the University of Minnesota， While there， he helpedstartup companies make marketing decisions based on deep predictive analytics.Dr.Feng graduated from University of Tennessee， Knoxville， with Ph.D.in Computational Ma the-matics and Master's degree in Statistics.He also holds Master's degree in Computational Mathematicsfrom Missouri University of Science and Technology(MST) and Master's degree in Applied Ma the-matics from the University of Science and Technology of China(USTC) .

The work of Wen qiang Feng was supported by the IMA， while working atIMA.However， any opin-ion， finding， and conclusions or recommendations expressed in this material are those of the authorand do not necessarily reflect the views of the IMA， UTK， DST and HR&Block.1.2 Motivation for this tutorialI was motivated by the IMA Data Science Fellowship project to learn Py Spark.After that I was impressedand attracted by the Py Spark， And If oud that：I.It is no exaggeration to say that Spark is the most powerful Big data tool.2.However.I still found that learning Spark was a diffcult process.I have to Google it and identify3which one is true， And it was hard to find detailed examples which I can easily learned the fullprocess in on ch le3.Good sources are expensive for a graduate student.1.3 Copyright notice and license infoThis Leaming Apache Spark with Python PDF file is supposed to be a free and living document， whichiswhyitssourceisavailableonlineathttps：/runawayhorse001.github.io/LearningApacheSpark/pyspark.pdf.But this document is licensed according to both MIT License and CreativeCommons Attribution-NonCommercial 2.0 Generic(CC BY-NC 2.0) License.When you plan to use， copy， modify， merge， publish， distribute or sublicense， Please see the terms ofthose licenses for more details and give the corresponding credits to the author.1.4 AcknowledgementA there， I would like to thank Ming Chen， Jian Sun and Zhong boLi at the University of Tennessee atKnoxville for the valuable disscussion and thank the generous anonymous authors for providing the detailedsolutions and sourcecode on the internet.Without those help.this repository would not have been possibleto be made.Wen qiang also would like to thank the Institute for Mathematics and Its Applications(IMA) at4Chapter 1.Preface

《Learning Apache Spark with Python_Wenqiang Feng编著》目录

1 Preface

Mot vation for this tu tonal.

Copy ri ht not ie and license info

Acknowledgement.

Feedback and suggestions

Why Spark with Python?

2.1Why Spark?.

2.2Why Spak wth Python(Py Spark) ?.

Configure Running Platform

3.1Run on Data bricks Community Cloud.

3.2ConfigureS park on Mac and Ubuntu

3.3ConfigureS park on Windows.

3.4Py Spark Wth TextEditor or IDE

3.5PySparklingWater：Spark+H20

3.6Setup Spark on Cloud

3.7Py Spark on Co laboratory

3.8Demo Code in this Section.

An Introduction to Apache Spark

4.1Core Concepts.

4.2Spark Components，

4.3Architecture.

4.4How Spark Works?

Programming with RD Ds

5.1Create RDD.

5.2Spark Operations.

5.3rdd.Data Frame vspd.Data rra me

6Statisties and Linear Algebra Preliminaries

6.1Notations， .

6.2Lin car Algebra Preliminaries

6.3Measurement Formula.

6.4Confusion Matrix.

6.5Statistical Tests.

7 Data Exploration

7.1Univariate Analysis.

7.2Multivariate Analysis.

8 Data Manipulation：Features

8.1Feature Extraction

8.2Feature Transform.

8.3FeatureS election.

8.4Unbalanced da a：Under sampling

9 Regression

9.1Linear Regression.

9.2Generalized linear regression

9.3Decision tree Regression.

9.4Random Forest Regression.

9.5Gradient-boosted tree regression.

10 Regularization

10.1 Ordinary least squares regression

10.2 Ridge regression.

10.4 Elastic net

11 Classification

11.1 Binomial logistic regression.

11.2 Multinomial logistic regression

11.3 Decision tree Classification.

11.4 Random forest Classification.

115Gradient-boosted tree Classification

11.6XG Boost：Gradient boosted tree Class if ca

11.7NaiveBayes Classification

12 Clustering

12.1K-Means Model.

13RFM Analysis

13.1RFM Analysis Methodology.

13.2Dcmo.

13.3 Extension.

14 Text Mining

14.1 Text Collection.

14.2 Text Preprocessing.

14.3 Text Classification.

14.4 Sentiment analysis.

14.5N-grams and Cot relations

14.6 Topic Model：Latent Dirichlet Allocation.

15 Social Network Analysis

15.1 Introduction.

15.2Co-occurrence Network

15.3Appendix：matrix multiplication in Py Spark

15.4 Correlation Network

16ALS：Stock Portfolio Recommendations

16.1 Recommender systems.

16.2Alternating Least Squares

16.3Demo.

17 MonteCarlo Simulation

17.1Simulating Casino Win，，

17.2 Simulating a Random Walk

18 Markov Chain MonteCarlo

18.1 Metropolis algorithm.

18.2A Toy Example of Mero polis

18.3Dcmos， .

19 Neural Network

19.1 Feedforward Neural Network.

20 Automation for Cloud era Distribution Had oop

20.1 Automation Pipe lne.

202DataCleanandManipuatn Automation.

20.3ML Pipeline Automation.

20.4 Save and Load Pipeline Model.

205Ingest Results Back into Had oop

21WrapPyS park Package

21.1 Package Wrapper.

21.2Pacakge Publishing on PyPI

22PyS park Data Audit Library

22.1 Install with pip：：

22.2 Install from Repo，

22.3 Uninstall.

22.4 Test.

22.5AudiingonBigDaaset

23Zeppelintojupyter notebook

23.1How to Install

23.2 Converting Demos

24My CheatSheet

25PySparkAPI

25.1Stat API

252 Regression API

25.3 Classification API.

25.4 Clustering API

25.5 Recommendation API

《Learning Apache Spark with Python_Wenqiang Feng编著》pdf电子书免费下载

下载方式一：

百度网盘下载地址：https://pan.baidu.com/s/1rt0Hz-zaHCpyRf38JKPRwg

百度网盘密码：1111

下载方式二：

http://ziliaoshare.cn/Download/af_123981_pd_LearningApacheSparkwithPython_WenqiangFengBZ.zip

《Learning Apache Spark with Python_Wenqiang Feng编著》介绍

《Learning Apache Spark with Python_Wenqiang Feng编著》目录

计算机

python

AI人工智能

javascript

计算机网络/服务器

数据库技术

计算机F

考试教辅

考研考博

英语四六级

沪ICP备18046276号-5