教学资源 – 图书教辅

扩展信息

语种 : 英文

页数 : 1068

开本 : 16

原书名 : Machine Learning: A Bayesian and Optimization Perspective

原出版社: Elsevier (Singapore) Pte Ltd

属性分类: 教材

包含CD : 无CD

绝版 : 无

图书简介

本书从概率性和确定性的数学角度阐述了机器学习的所有问题，需要较强的数学基础。可供高校计算机及相关专业高年级本科生和研究生使用，也可供对机器学习感兴趣的工程技术人员参考。

图书特色

本书对所有主要的机器学习方法和最新研究趋势进行了深入探索，既涵盖基于优化技术的概率和确定性方法，也包含基于层次化概率模型的贝叶斯推断方法。这些背景各异、用途广泛的方法盘根错节，而本书站在全景视角将其一一打通，形成了明晰的机器学习知识体系。书中各章内容相对独立，在讲解机器学习方法时专注于数学理论背后的物理推理，给出数学建模和算法实现，并辅以应用实例和习题，适合该领域的科研人员和工程师阅读，也适合学习模式识别、统计/自适应信号处理和深度学习等课程的学生参考。本书特色囊括经典方法：平均/最小二乘滤波、卡尔曼滤波、随机逼近和在线学习、贝叶斯分类、决策树、逻辑回归和提升方法。纵览最新趋势：稀疏、凸分析与优化、在线分布式算法、RKH空间学习、贝叶斯推断、图模型与隐马尔可夫模型、粒子滤波、深度学习、字典学习和潜变量建模。实用案例分析：蛋白质折叠预测、光学字符识别、文本作者身份识别、fMRI数据分析、变点检测、超光谱图像分离、目标定位、信道均衡、回声抵消。全面教学支持：访问booksite.elsevier.com/9780128015223/index.php可免费下载所有算法的MATLAB代码，此外，书中对涉及的数学工具也做了必要的介绍。作者简介西格尔斯·西奥多里蒂斯（Sergios Theodoridis）　雅典大学教授，研究兴趣包括机器学习、模式识别和信号处理等。他是IEEE（电气和电子工程师学会）和EURASIP（欧洲信号处理协会）的会士，并担任IEEE信号处理会刊的主编。曾获2014年IEEE信号处理杂志最佳论文奖，2009年IEEE计算智能协会杰出论文奖，以及2014年EURASIP最有价值服务奖等。此外，他还是经典畅销著作《模式识别》的第一作者。

图书前言

Machine Learning is a name that is gaining popularity as an umbrella for methods that have been studied and developed for many decades in different scientific communities and underdiffer entnames,such as Statistical Learning,Statistical Signal Processing, Pattern Recognition,Adaptive Signal Processing,Image Processing and Analysis,System Identification and Control,Data Mining and Information Retrieval,Computer Vision,and Computational Learning.The name“Machine Learning”indicates what all these disciplines have in common,that is,to learn from data,and thenmake predictions.What one tries to learn from data is their underlying structure an dregularities, via the development of a model,which can then be used to provide predictions.
To this end,anumber of diverse approaches have been developed,ranging from optimization of cost functions,whose goal is to optimize the deviation between what one observes from data and what them odelpredicts,to probabilistic models that attempt to model the statistical properties of the observed data.
The goal of this book is to approach the machine learning discipline in a unifying context, by presenting the major paths and approaches that have been followed over the years, without giving preference to a specific one. It is the author’s belief that all of them are valuable to the newcomer who wants to learn the secrets of this topic, from the applications as well as from the pedagogic point of view.As the title of the book indicates,the emphasis is on the processing and analysis front of machine learning and not on topics concerning the theory of learning itself and related performance bounds.In other words,the focusis on methods and algorithms closer to the application level.
The book is the outgrowth of more than three decades of the author’s experience on research and teaching various related courses.The book is written in such a way that individual(orpairsof)chapters are as self-contained as possible. So,one can select and combine chapters according to the focus he/she wants to give to the course he/she teaches,or to the topics he/she wants to grasp in a first reading.Some guidelines on how one can use the book for different courses are provided in the introductory chapter.
Each chapter grows by starting from the basics and evolving to embrace the more recent advances. Some of the topics had to be split into two chapters,such as sparsity-aware learning, Bayesian learning,probabilistic graphical models, and Monte Carlo methods.The book addresses the needs of advanced graduate, postgraduate,and research students as well as of practicing scientists and engineers whose interests lie beyond black-box solutions. Also,the book can serve the needs of short courses on specific topics,e.g.,sparse modeling, Bayesian learning, robabilistic graphical models,neural networks and deep learning.
Most of the chapters include Matlab exercises,and the related code is available from the book’s website. The solutions manual as well as PowerPointlectures are also available from the book’s website.

作者简介

[希]西格尔斯?西奥多里蒂斯（Sergios Theodoridis）著：
西格尔斯·西奥多里蒂斯（Sergios Theodoridis）雅典大学教授，研究兴趣包括机器学习、模式识别和信号处理等。他是IEEE（电气和电子工程师学会）和EURASIP（欧洲信号处理协会）的会士，并担任IEEE信号处理会刊的主编。曾获2014年IEEE信号处理杂志最佳论文奖，2009年IEEE计算智能协会杰出论文奖，以及2014年EURASIP最有价值服务奖等。此外，他还是经典畅销教材《模式识别》的第一作者。

图书目录

Preface.........................................................iv
Acknowledgments.....................................................................vv
Notation.............................................................................vfivi
CHAPTER 1 Introduction .......................................................1
1.1 What Machine Learning is About..................................................1
1.1.1 Classification...............................................................2
1.1.2 Regression..................................................................3
1.2 Structure and a Road Map of the Book............................................5
References................................................................................8
CHAPTER 2 Probability and Stochastic Processes ....................................9
2.1 Introduction.........................................................................10
2.2 Probability and Random Variables.................................................10
2.2.1Probability..................................................................11
2.2.2Discrete Random Variables................................................12
2.2.3Continuous Random Variables............................................14
2.2.4Meanand Variance........................................................15
2.2.5Transformation of Random Variables.....................................17
2.3 Examples of Distributions..........................................................18
2.3.1Discrete Variables..........................................................18
2.3.2Continuous Variables......................................................20
2.4 Stochastic Processes................................................................29
2.4.1First and Second Order Statistics.........................................30
2.4.2Stationarity and Ergodicity................................................30
2.4.3PowerSpectral Density....................................................33
2.4.4Autoregressive Models....................................................38
2.5 InformationTheory.................................................................41
2.5.1Discrete Random Variables................................................42
2.5.2Continuous Random Variables............................................45
2.6 Stochastic Convergence............................................................48
Problems..................................................................................49
References................................................................................51
CHAPTER 3 Learning in Parametric Modeling: Basic Concepts and Directions 53
3.1 Introduction.........................................................................53
3.2 Parameter Estimation: The Deterministic Point of View.........................54
3.3 Linear Regression...................................................................57
3.4 Classification........................................................................60
3.5 Biased Versus Unbiased Estimation...............................................64
3.5.1 Biased or Unbiased Estimation ..........................................65
3.6 The Cramér-Rao Lower Bound....................................................67
3.7 Suf cient Statistic...................................................................70
3.8 Regularization.......................................................................72
3.9 The Bias-Variance Dilemma.......................................................77
3.9.1 Mean-Square Error Estimation............................................77
3.9.2 Bias-Variance Tradeoff....................................................78
3.10 MaximumLikelihoodMethod.....................................................82
3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case................84
3.11 Bayesian Inference..................................................................84
3.11.1 The Maximum a Posteriori Probability Estimation Method.............88
3.12 Curse of Dimensionality............................................................89
3.13 Validation...........................................................................91
3.14 Expected and Empirical Loss Functions...........................................93
3.15 Nonparametric Modeling and Estimation.........................................95
Problems...................................................................................97
References..................................................................................102
CHAPTER4　Mean-quare Error Linear Estimation....................................105
4.1Introduction.........................................................................105
4.2Mean-Square Error Linear Estimation: The Normal Equations..................106
4.2.1The Cost Function Surface................................................107
4.3A Geometric Viewpoint: Orthogonality Condition................................109
4.4Extensionto Complex-Valued Variables..........................................111
4.4.1Widely Linear Complex-Valued Estimation..............................113
4.4.2Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus........................116
4.5Linear Filtering.....................................................................118
4.6MSE Linear Filtering: A Frequency Domain Point of View......................120
4.7Some Typical Applications.........................................................124
4.7.1Interference Cancellation..................................................124
4.7.2System Identification......................................................125
4.7.3Deconvolution: Channel Equalization....................................126
4.8Algorithmic Aspects: The Levinson and the Lattice-Ladder Algorithms........132
4.8.1The Lattice-Ladder Scheme...............................................137
4.9Mean-Square Error Estimation of Linear Models.................................140
4.9.1The Gauss-Markov Theorem..............................................143
4.9.2Constrained Linear Estimation:The Beamforming Case................145
4.10Time-Varying Statistics: Kalman Filtering........................................148
Problems...................................................................................154
References..................................................................................158
CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family .........................161
5.1 Introduction.........................................................................162
5.2 The Steepest Descent Method......................................................163
5.3 Application to the Mean-Square Error Cost Function............................167
5.3.1 The Complex-Valued Case................................................175
5.4 Stochastic Approximation..........................................................177
5.5 The Least-Mean-Squares Adaptive Algorithm....................................179
5.5.1 Convergence and Steady-State Performanceof the LMS in Stationary Environments.................181
5.5.2 Cumulative Loss Bounds..................................................186
5.6 The Affine Projection Algorithm...................................................188
5.6.1 The Normalized LMS.....................................................193
5.7 The Complex-Valued Case.........................................................194
5.8 Relatives of the LMS...............................................................196
5.9 Simulation Examples...............................................................199
5.10 Adaptive Decision Feedback Equalization........................................202
5.11 The Linearly Constrained LMS....................................................204
5.12 Tracking Performance of the LMS in Nonstationary Environments.............................206
5.13 Distributed Learning:The Distributed LMS......................................208
5.13.1 Cooperation Strategies.....................................................209
5.13.2The Diffusion LMS........................................................211
5.13.3 Convergence and Steady-State Performance: Some Highlights................................218
5.13.4 Consensus-Based Distributed Schemes...................................220
5.14 A Case Study:Target Localization................................................222
5.15 Some Concluding Remarks: Consensus Matrix...................................223
Problems...................................................................................224
References..................................................................................227
CHAPTER 6 The Least-Squares Family ....................................................233
6.1 Introduction.........................................................................234
6.2 Least-Squares Linear Regression: A Geometric Perspective.....................234
6.3 Statistical Properties of the LS Estimator..........................................236
6.4 Orthogonalizing the Column Space of X: The SVD Method.....................239
6.5 Ridge Regression...................................................................243
6.6 The Recursive Least-Squares Algorithm..........................................245
6.7 Newton’s Iterative Minimization Method.........................................248
6.7.1 RLS and Newton’s Method................................................251
6.8Steady-State Performance of the RLS.............................................252
6.9Complex-Valued Data:The Widely Linear RLS..................................254
6.10Computational Aspects of the LS Solution........................................255
6.11The Coordinate and Cyclic Coordinate Descent Methods........................258
6.12Simulation Examples...............................................................259
6.13Total-Least-Squares.................................................................261
Problems...................................................................................268
References..................................................................................272
CHAPTER 7 Classification A Tour of the Classics.....................................275
7.1 Introduction.........................................................................275
7.2 Bayesian Classification.............................................................276
7.2.1 Average Risk...............................................................278
7.3 Decision(Hyper)Surfaces..........................................................280
7.3.1 The Gaussian Distribution Case...........................................282
7.4 The Naive Bayes Classifier.........................................................287
7.5 The Nearest Neighbor Rule........................................................288
7.6 Logistic Regression.................................................................290
7.7 Fisher’s Linear Discriminant.......................................................294
7.8 Classification Trees.................................................................300
7.9 Combining Classifiers..............................................................304
7.10 The Boosting Approach............................................................307
7.11 Boosting Trees......................................................................313
7.12 A Case Study:Protein Folding Prediction.........................................314
Problems...................................................................................318
References..................................................................................323
CHAPTER 8 Parameter Learning:A Convex Analytic Path...........................327
8.1Introduction.........................................................................328
8.2Convex Sets and Functions.........................................................329
8.2.1Convex Sets................................................................329
8.2.2Convex Functions..........................................................330
8.3Projections onto Convex Sets......................................................333
8.3.1Properties of Projections..................................................337
8.4Fundamental Theorem of Projections onto Convex Sets.........................341
8.5A Parallel Version of POCS........................................................344
8.6From Convex Setsto Parameter Estimation and Machine Learning.............345
8.6.1Regression..................................................................345
8.6.2Classification...............................................................347
8.7Infinite Many Closed Convex Sets:The On line Learning Case..................349
8.7.1Convergence of APSM....................................................351
8.8Constrained Learning...............................................................356
8.9The Distributed APSM.............................................................357
8.10 Optimizing Nonsmooth Convex Cost Functions..................................358
8.10.1Subgradients and Subdifferentials........................................359
8.10.2Minimizing Nonsmooth Continuous Convex Loss Functions: The Batch Learning Case.........................362
8.10.3Online Learning for Convex Optimization...............................367
8.11Regret Analysis.....................................................................370
8.12Online Learning and Big Data Applications: ADiscussion......................374
8.13Proximal Operators.................................................................379
8.13.1Properties of the Proximal Operator......................................382
8.13.2Proximal Minimization....................................................383
8.14Proximal Splitting Methods for Optimization.....................................385
Problems...................................................................................389
8.15Appendix to Chapter 8..............................................................393
References..................................................................................398
CHAPTER 9 Sparsity-Aware Learning:Concepts and Theoretical Foundations...............................403
9.1Introduction.........................................................................403
9.2Searching for a Norm...............................................................404
9.3The Least Absolute Shrinkage and Selection Operator(LASSO)................407
9.4Sparse Signal Representation......................................................411
9.5In Search of the Sparsest Solution.................................................415
9.6Uniqueness of the l0 Minimizer...................................................422
9.6.1Mutual Coherence.........................................................424
9.7Equivalence of 0 and 1 Minimizers:Sufficiency Conditions...................426
9.7.1Condition Impliedbythe Mutual Coherence Number...................426
9.7.2The Restricted Isometry Property(RIP)..................................427
9.8Robust Sparse Signal Recovery from Noisy Measurements......................429
9.9Compressed Sensing:TheGlory of Randomness.................................430
9.9.1Dimensionality Reduction and Stable Embeddings......................433
9.9.2Sub-NyquistSampling:Analog-to-Information Conversion............434
9.10ACase Study:ImageDe-Noising..................................................438
Problems...................................................................................440
References..................................................................................444
CHAPTER10 Sparsity-Aware Learning:Algorithms and Applications.............449
10.1Introduction.........................................................................450
10.2Sparsity-Promoting Algorithms....................................................450
10.2.1Greedy Algorithms........................................................451
10.2.2Iterative Shrinkage/Thresholding(IST)Algorithms.....................456
10.2.3Which Algorithm :Some Practical Hints................................462
10.3Variations on the Sparsity-AwareTheme..........................................467
10.4Online Sparsity-Promoting Algorithms............................................475
10.4.1LASSO:Asymptotic Performance........................................475
10.4.2The Adaptive Norm-Weighted LASSO...................................477
10.4.3Adaptive CoSaMP(AdCoSaMP)lgorithm.............................479
10.4.4Sparse Adaptive Projection Subgradient Method(SpAPSM)...........480
10.5 Learning Sparse Analysis Models.................................................485
10.5.1Compressed Sensing for Sparse Signal Representation in Coherent Dictionaries......................487
10.5.2Cosparsity..................................................................488
10.6A Case Study:Time-Frequency Analysis.........................................490
10.7Appendix to Chapter 10:Some Hints from the Theory of Frames...............497
Problems...................................................................................500
References..................................................................................502
CHAPTER11 Learning in Reproducing Kernel Hilbert Spaces.......................509
11.1Introduction.........................................................................510
11.2Generalized Linear Models.........................................................510
11.3Volterra,Wiener,and Hammerstein Models.......................................511
11.4Cover’s Theorem:Capacity of a SpaceinLinear Dichotomies..................514
11.5Reproducing Kernel Hilbert Spaces...............................................517
11.5.1Some Properties and Theoretical Highlights.............................519
11.5.2Examples of Kernel Functions............................................520
11.6Representer Theorem...............................................................525
11.6.1Semiparametric Representer Theorem....................................527
11.6.2Nonparametric Modeling:A Discussion.................................528
11.7Kernel Ridge Regression...........................................................528
11.8Support Vector Regression.........................................................530
11.8.1The Linear -Insensitive Optimal Regression............................531
11.9Kernel Ridge Regression Revisited................................................537
11.10Optimal Margin Classification:Support Vector Machines........................538
11.10.1Linearly Separable Classes:Maximum Margin Classifiers.....................................540
11.10.2Nonseparable Classes......................................................545
11.10.3Performance of SVMs and Applications.................................550
11.10.4Choice of Hyperparameters...............................................550
11.11Computational Considerations.....................................................551
11.11.1Multiclass Generalizations................................................552
11.12OnlineLearningin RKHS..........................................................553
11.12.1TheKernel LMS(KLMS).................................................553
11.12.2TheNaive OnlineRregMinimizationAlgorithm(NORMA)............556
11.12.3TheKernelAPSMAlgorithm.............................................560
11.13 MultipleKernelLearning..........................................................567
11.14 NonparametricSparsity-AwareLearning:AdditiveModels......................568
11.15 ACaseStudy:AuthorshipIdentification..........................................570
Problems....................................................................................574
References...................................................................................578
CHAPTER 12 Bayesian Learning: Inference and the EM Algorithm.................585
12.1 Introduction.........................................................................586
12.2 Regression:A Bayesian Perspective...............................................586
12.2.1The Maximum Likelihood Estimator.....................................587
12.2.2The MAP Estimator.......................................................588
12.2.3The Bayesian Approach...................................................589
12.3 The Evidence Function and Occam’s Razor Rule.................................593
12.4 Exponential Family of Probability Distributions..................................600
12.4.1The Exponential Family and the Maximum Entropy Method...........605
12.5 Latent Variablesand the EM Algorithm...........................................606
12.5.1The Expectation-Maximization Algorithm...............................606
12.5.2The EM Algorithm:A Lower Bound Maximization View..............608
12.6 Linear Regression and the EM Algorithm.........................................610
12.7 Gaussian Mixture Models..........................................................613
12.7.1Gaussian Mixture Modeling and Clustering..............................617
12.8 Combining Learning Models:A Probabilistic Point of View.....................621
12.8.1Mixing Linear Regression Models........................................622
12.8.2Mixing Logistic Regression Models......................................625
Problems...................................................................................628
12.9 Appendix to Chapter12............................................................631
12.9.1PDFs with Exponent of Quadratic Form.................................631
12.9.2The Conditionalfrom the Joint Gaussian Pdf............................632
12.9.3The Marginal from the Joint Gaussian Pdf...............................633
12.9.4The Posterior from Gaussian Prior and Conditional Pdfs................634
References..................................................................................637
CHAPTER 13　Bayesian Learning: Approximate Inference and Nonparametric Models ....................639
13.1 Introduction.........................................................................640
13.2 Variational Approximation in Bayesian Learning.................................640
13.2.1The Case of the Exponential Family of Probability Distributions.......644
13.3 A Variational Bayesian Approach to Linear Regression..........................645
13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling...............651
13.5 When Bayesian Inference Meets Sparsity.........................................655
13.6 Sparse Bayesian Learning(SBL)..................................................657
13.6.1The Spikeand Slab Method...............................................660
13.7 The Relevance Vector Machine Framework.......................................661
13.7.1 Adopting the Logistic Regression Model for Classification................................662
13.8 Convex Duality and Variational Bounds...........................................666
13.9 Sparsity-Aware Regression:A Variational Bound Bayesian Path................671
13.10 Sparsity-Aware Learning:Some Concluding Remarks...........................675
13.11 Expectation Propagation............................................................679
13.12 Nonparametric Bayesian Modeling................................................683
13.12.1The Chinese Restaurant Process..........................................684
13.12.2Inference...................................................................684
13.12.3Dirichlet Processes.........................................................684
13.12.4The Stick-Breaking Construction of a DP................................685
13.13 Gaussian Processes.................................................................687
13.13.1Covariance Functions and Kernels........................................688
13.13.2Regression..................................................................690
13.13.3Classification...............................................................692
13.14 ACaseStudy:Hyperspectral Image Unmixing...................................693
13.14.1Hierarchical Bayesian Modeling..........................................695
13.14.2Experimental Results......................................................696
Problems....................................................................................699
References...................................................................................702
CHAPTER 14 Monte Carlo Methods..........................................................707
14.1 Introduction.........................................................................707
14.2 Monte Carlo Methods:The Main Concept........................................708
14.2.1 Random number generation...............................................709
14.3 Random Sampling Based on Function Transformation...........................711
14.4 Rejection Sampling.................................................................715
14.5 Importance Sampling...............................................................718
14.6 Monte Carlo Methods and the EM Algorithm.....................................720
14.7 Markov Chain Monte Carlo Methods..............................................721
14.7.1 Ergodic Markov Chains...................................................723
14.8 The Metropolis Method............................................................728
14.8.1Convergence Issues........................................................731
14.9 Gibbs Sampling.....................................................................733
14.10 In Search of More Efficient Methods:A Discussion..............................735
14.11 A Case Study:Change-Point Detection...........................................737
Problems....................................................................................740
References...................................................................................742
CHAPTER 15 Probabilistic Graphical Models: Part I ...................................745
15.1 Introduction.........................................................................745
15.2 The Need for Graphical Models...................................................746
15.3 Bayesian Networks and the Markov Condition...................................748
15.3.1Graphs:Basic Definitions.................................................749
15.3.2 Some Hintson Causality..................................................753
15.3.3 D-Separation...............................................................755
15.3.4Sigmoidal Bayesian Networks............................................758
15.3.5Linear Gaussian Models...................................................759
15.3.6 Multiple-Cause Networks.................................................760
15.3.7 I-Maps,Soundness,Faithfulness,and Completeness....................761
15.4 Undirected Graphical Models......................................................762
15.4.1 Independencies and I-Mapsin Markov Random Fields.....................................763
15.4.2The Ising Model and Its Variants.........................................765
15.4.3 Conditional Random Fields(CRFs)......................................767
15.5 FactorGraphs.......................................................................768
15.5.1 Graphical Models for Error-Correcting Codes...........................770
15.6 Moralization of Directed Graphs...................................................772
15.7 Exact Inference Methods: Message-Passing Algorithms.........................773
15.7.1Exact Inference in Chains.................................................773
15.7.2Exact Inference in Trees...................................................777
15.7.3The Sum-Product Algorithm..............................................778
15.7.4The Max-Product and Max-Sum Algorithms............................782
Problems.............................................................................789
References..................................................................................791
CHAPTER 16 Probabilistic Graphical Models: Part II ..................................795
16.1 Introduction.........................................................................795
16.2 Triangulated Graphs and Junction Trees...........................................796
16.2.1Constructing a Join Tree...................................................799
16.2.2Message-Passing in Junction Trees.......................................801
16.3 Approximate Inference Methods...................................................804
16.3.1 Variational Methods: Local Approximation..............................804
16.3.2 Block Methods for Variational Approximation..........................809
16.3.3Loopy Belief Propagation.................................................813
16.4 Dynamic Graphical Models........................................................816
16.5 Hidden Markov Models ............................................................ 818
16.5.1 Inference ................................................................... 821
16.5.2 Learning the Parameters in an HMM ..................................... 825
16.5.3 Discriminative Learning................................................... 828
16.6 Beyond HMMs: A Discussion ..................................................... 829
16.6.1 Factorial Hidden Markov Models......................................... 829
16.6.2 Time-Varying Dynamic Bayesian Networks ............................. 832
16.7 Learning Graphical Models ........................................................ 833
16.7.1 Parameter Estimation ...................................................... 833
16.7.2 Learning the Structure..................................................... 837
Problems .......................... 838
References.................................................................................. 840
CHAPTER 17 Particle Filtering ................................................................ 845
17.1 Introduction ......................................................................... 845
17.2 Sequential Importance Sampling................................................... 845
17.2.1 Importance Sampling Revisited........................................... 846
17.2.2 Resampling................................................................. 847
17.2.3 Sequential Sampling....................................................... 849
17.3 Kalman and Particle Filtering ...................................................... 851
17.3.1 Kalman Filtering: A Bayesian Point of View............................. 852
17.4 Particle Filtering .................................................................... 854
17.4.1 Degeneracy................................................................. 858
17.4.2 GenericParticleFiltering.................................................. 860
17.4.3 Auxiliary Particle Filtering................................................ 862
Problems ................................................................................... 868
References.................................................................................. 872
CHAPTER 18 Neural Networks and Deep Learning ...................................... 875
18.1 Introduction ......................................................................... 876
18.2 ThePerceptron...................................................................... 877
18.2.1 The Kernel Perceptron Algorithm ........................................ 881
18.3 Feed-Forward Multilayer Neural Networks ....................................... 882
18.4 The Backpropagation Algorithm................................................... 886
18.4.1 The Gradient Descent Scheme ............................................ 887
18.4.2 BeyondtheGradientDescentRationale.................................. 895
18.4.3 Selecting a Cost Function ................................................. 896
18.5 Pruning the Network................................................................ 897
18.6 Universal Approximation Property of Feed-Forward Neural Networks................................. 899
18.7 Neural Networks: A Bayesian Flavor.............................................. 902
18.8 Learning Deep Networks........................................................... 903
18.8.1 The Need for Deep Architectures......................................... 904
18.8.2 Training Deep Networks .................................................. 905
18.8.3 Training Restricted Boltzmann Machines ................................ 908
18.8.4 Training Deep Feed-Forward Networks .................................. 914
18.9 Deep Belief Networks .............................................................. 916
18.10 Variations on the Deep Learning Theme .......................................... 918
18.10.1 Gaussian Units ............................................................. 918
18.10.2 Stacked Autoencoders ..................................................... 919
18.10.3 The Conditional RBM..................................................... 920
18.11 Case Study: A Deep Network for Optical Character Recognition ........................... 923
18.12 Case Study: A Deep Autoencoder ................................................. 925
18.13 Example: Generating Data via a DBN............................................. 928
Problems ....................................................... 929
References................................................................................... 932
CHAPTER 19 Dimensionality Reduction .................................................... 937
19.1 Introduction ......................................................................... 938
19.2 Intrinsic Dimensionality............................................................ 939
19.3 Principle Component Analysis..................................................... 939
19.4 Canonical Correlation Analysis.................................................... 950
19.4.1 Relatives of CCA .......................................................... 953
19.5 Independent Component Analysis ................................................. 955
19.5.1 ICA and Gaussianity....................................................... 956
19.5.2 ICA and Higher Order Cumulants ........................................ 957
19.5.3 Non-Gaussianity and Independent Components....................................... 958
19.5.4 ICA Based on Mutual Information........................................ 959
19.5.5 Alternative Paths to ICA................................................... 962
19.6 Dictionary Learning: The k-SVD Algorithm...................................... 966
19.7 Nonnegative Matrix Factorization ................................................. 971
19.8 Learning Low-Dimensional Models: A Probabilistic Perspective .............................972
19.8.1 Factor Analysis ............................................................ 972
19.8.2 ProbabilisticPCA.......................................................... 974
19.8.3 Mixture of Factors Analyzers: A Bayesian View to Compressed Sensing................... 977
19.9 Nonlinear Dimensionality Reduction.............................................. 980
19.9.1 Kernel PCA ................................................................ 980
19.9.2 Graph-Based Methods..................................................... 982
19.10 Low-Rank Matrix Factorization:A Sparse Modeling Path.......................991
19.10.1Matrix Completion.........................................................991
19.10.2Robust PCA................................................................995
19.10.3Applications of Matrix Completion and ROBUST PCA................996
19.11 A Case Study:fMRI Data Analysis................................................998
Problems....................................................................................1002
References...................................................................................1003
APPENDIX A Linear Algebra ...................................................................1013
A.1 Properties of Matrices..............................................................1013
A.2 Positive Definite and Symmetric Matrices........................................1015
A.3 Wirtinger Calculus..................................................................1016
References................................................................................1017
APPENDIX B Probability Theory and Statistics ..........................................1019
B.1 Cramér-Rao Bound.................................................................1019
B.2 Characteristic Functions............................................................1020
B.3 Moments and Cumulants...........................................................1020
B.4 Edgeworth Expansion of a Pdf.....................................................1021
Reference.................................................................................1022
APPENDIX C Hints on Constrained Optimization ........................................1023
C.1 Equality Constraints................................................................1023
C.2 Inequality Constraints..............................................................1025
References................................................................................1029
Index......................................................................1031