教学资源 – 图书教辅

扩展信息

语种 : 英文

页数 : 1152

开本 : 16

原书名 : Machine Learning: A Bayesian and Optimization Perspective, Second Edition

原出版社: Elsevier (Singapore) Pte Ltd

属性分类: 教材

包含CD : 无CD

绝版 : 无

图书简介

本书通过讲解监督学习的两大支柱——回归和分类——将机器学习纳入统一视角展开讨论。书中首先讨论基础知识，包括均方、最小二乘和最大似然方法、岭回归、贝叶斯决策理论分类、逻辑回归和决策树。然后介绍新近的技术，包括稀疏建模方法，再生核希尔伯特空间中的学习、支持向量机中的学习、关注EM算法的贝叶斯推理及其近似推理变分版本、蒙特卡罗方法、聚焦于贝叶斯网络的概率图模型、隐马尔科夫模型和粒子滤波。此外，本书还深入讨论了降维和隐藏变量建模。全书以关于神经网络和深度学习架构的扩展章节结束。此外，书中还讨论了统计参数估计、维纳和卡尔曼滤波、凸性和凸优化的基础知识，其中，用一章介绍了随机逼近和梯度下降族的算法，并提出了分布式优化的相关概念、算法和在线学习技术。

图书特色

WU

上架指导

计算机/机器学习

封底文字

本书对所有重要的机器学习方法和新近研究趋势进行了深入探索，通过讲解监督学习的两大支柱——回归和分类，站在全景视角将这些繁杂的方法一一打通，形成了明晰的机器学习知识体系。
新版对内容做了全面更新，使各章内容相对独立。全书聚焦于数学理论背后的物理推理，关注贴近应用层的方法和算法，并辅以大量实例和习题，适合该领域的科研人员和工程师阅读，也适合学习模式识别、统计/自适应信号处理、统计/贝叶斯学习、稀疏建模和深度学习等课程的学生参考。
此外，本书的所有代码均可免费下载，包含MATLAB和Python两个版本。

第2版重要更新　
重写了关于神经网络和深度学习的章节，以反映自第1版以来的新进展。这一章从感知器和前馈神经网络的基础概念开始讨论，对深度网络进行了深入研究，涵盖较新的优化算法、批标准化、正则化技术（如Dropout方法）、CNN和RNN、注意力机制、对抗样本和对抗训练、胶囊网络、生成架构（如RBM）、变分自编码器和GAN。
扩展了关于贝叶斯学习的内容，包括非参数贝叶斯方法，重点讨论中国餐馆过程（CRP）和印度自助餐过程（IBP）。

作者简介　
西格尔斯·西奥多里蒂斯（Sergios Theodoridis）雅典大学教授，香港中文大学（深圳）教授，研究兴趣包括机器学习、模式识别和信号处理等。他是IEEE Fellow、IET Fellow、EURASIP Fellow，曾任IEEE信号处理协会副主席、EURASIP主席以及IEEE Transactions on Signal Processing主编。曾获2017年EURASIP Athanasios Papoulis奖，2014年IEEE信号处理杂志最佳论文奖，以及2014年EURASIP最有价值服务奖等。此外，他还是经典著作《模式识别》的第一作者。

作者简介

[希]西格尔斯·西奥多里蒂斯（Sergios Theodoridis）著：西格尔斯·西奥多里蒂斯（Sergios Theodoridis）雅典大学教授，香港中文大学（深圳）教授，研究兴趣包括机器学习、模式识别和信号处理等。他是IEEE Fellow、IET Fellow、EURASIP Fellow，曾任IEEE信号处理协会副主席、EURASIP主席以及IEEE Transactions on Signal Processing主编。曾获2017年EURASIP Athanasios Papoulis奖，2014年IEEE信号处理杂志最佳论文奖，以及2014年EURASIP最有价值服务奖等。此外，他还是经典著作《模式识别》的第一作者。

图书目录

Preface...................................................................iv
Acknowledgments.........................................................vi
About the Author...................................................................viii
Notation...................................................................ix
CHAPTER1 Introduction................................................1
1.1 The Historical Context...........................................1
1.2 Artificia Intelligenceand Machine Learning..........................2
1.3 Algorithms Can Learn WhatIs Hidden in the Data......................4
1.4 Typical Applications of Machine Learning............................6
Speech Recognition......................................6
Computer Vision........................................6
Multimodal Data........................................6
Natural Language Processing...............................7
Robotics..............................................7
Autonomous Cars.......................................7
Challenges for the Future..................................8
1.5 Machine Learning: Major Directions................................8
1.5.1 Supervised Learning.....................................8
1.6 Unsupervised and Semisupervised Learning...........................11
1.7 Structure and a Road Map of the Book...............................12
References....................................................16
CHAPTER2 Probability and Stochastic Processes.............................19
2.1 Introduction...................................................20
2.2 Probability and Random Variables..................................20
2.2.1 Probability.............................................20
2.2.2 Discrete Random Variables................................22
2.2.3 Continuous Random Variables..............................24
2.2.4 Meanand Variance.......................................25
2.2.5 Transformation of Random Variables.........................28
2.3 Examples of Distributions........................................29
2.3.1 Discrete Variables.......................................29
2.3.2 Continuous Variables.....................................32
2.4 Stochastic Processes............................................41
2.4.1 First-and Second-Order Statistics...........................42
2.4.2 Stationarity and Ergodicity.................................43
2.4.3 Power Spectral Density...................................46
2.4.4 Autoregressive Models....................................51
2.5 Information Theory.............................................54
2.5.1 Discrete Random Variables................................56
2.5.2 Continuous Random Variables..............................59
2.6 Stochastic Convergence..........................................61
Convergence Everywhere..................................62
Convergence Almost Everywhere............................62
Convergence in the Mean-Square Sense.......................62
Convergence in Probability................................63
Convergence in Distribution................................63
Problems.....................................................63
References....................................................65
CHAPTER3 Learning in Parametric Modeling: Basic Concepts and Directions.........67
3.1 Introduction...................................................67
3.2 Parameter Estimation: the Deterministic Point of View...................68
3.3 Linear Regression..............................................71
3.4Classifcation..................................................75
Generative Versus Discriminative Learning....................78
3.5 Biased Versus Unbiased Estimation.................................80
3.5.1 Biased or Unbiased Estimation?.............................81
3.6 The Cram閞朢ao Lower Bound....................................83
3.7 Suffcient Statistic..............................................87
3.8 Regularization.................................................89
Inverse Problems:Ill-Conditioning and Overfittin...............91
3.9 The Bias朧ariance Dilemma......................................93
3.9.1 Mean-Square Error Estimation..............................94
3.9.2 Bias朧ariance Tradeoff...................................95
3.10 Maximum Likelihood Method.....................................98
3.10.1 Linear Regression: the Nonwhite Gaussian Noise Case............101
3.11 Bayesian Inference.............................................102
3.11.1 The Maximum a Posteriori Probability Estimation Method.........107
3.12 Curse of Dimensionality.........................................108
3.13 Validation....................................................109
Cross-Validation........................................111
3.14 Expected Loss and Empirical Risk Functions..........................112
Learnability............................................113
3.15 Nonparametric Modeling and Estimation.............................114
Problems.....................................................114
MATLAB?Exercises....................................119
References....................................................119
CHAPTER4 Mean-Square Error Linear Estimation.............................121
4.1 Introduction...................................................121
4.2 Mean-Square Error Linear Estimation: the Normal Equations..............122
4.2.1 The Cost Function Surface.................................123
4.3 A Geometric Viewpoint: Orthogonality Condition......................124
4.4 Extension to Complex-Valued Variables..............................127
4.4.1 Widely Linear Complex-Valued Estimation....................129
4.4.2 Optimizing With Respect to Complex-Valued Variables: Wirtinger Calculus...........................132
4.5 Linear Filtering................................................134
4.6 MSE Linear Filtering: a Frequency Domain Point of View................136
Deconvolution: Image Deblurring............................137
4.7 Some Typical Applications.......................................140
4.7.1 Interference Cancelation..................................140
4.7.2 System Identifcation.....................................141
4.7.3 Deconvolution: Channel Equalization.........................143
4.8 Algorithmic Aspects: the Levinson and Lattice-Ladder Algorithms.........149
Forward and Backward MSE Optimal Predictors................151
4.8.1 The Lattice-Ladder Scheme................................154
4.9 Mean-Square Error Estimation of Linear Models.......................158
4.9.1 The Gauss朚arkov Theorem...............................160
4.9.2 Constrained Linear Estimation: the Beamforming Case...........162
4.10 Time-Varying Statistics: Kalman Filtering............................166
Problems.....................................................172
MATLAB Exercises....................................174
References....................................................176
CHAPTER5 Online Learning: the Stochastic Gradient Descent Family of Algorithms.....179
5.1 Introduction...................................................180
5.2 The Steepest Descent Method.....................................181
5.3 Application to the Mean-Square Error Cost Function....................184
Time-Varying Step Sizes..................................190
5.3.1 The Complex-Valued Case.................................193
5.4 Stochastic Approximation........................................194
Application to the MSE Linear Estimation.....................196
5.5 The Least-Mean-Squares Adaptive Algorithm.........................198
5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments...........................................199
5.5.2 Cumulative Loss Bounds..................................204
5.6 The Affne Projection Algorithm...................................206
Geometric Interpretation of APA............................208
Orthogonal Projections....................................208
5.6.1 The Normalized LMS....................................211
5.7 The Complex-Valued Case........................................213
The Widely Linear LMS..................................213
The Widely Linear APA...................................214
5.8 Relatives of the LMS............................................214
The Sign-Error LMS.....................................214
The Least-Mean-Fourth (LMF) Algorithm.....................215
Transform-Domain LMS..................................215
5.9 Simulation Examples............................................218
5.10 Adaptive Decision Feedback Equalization............................221
5.11 The Linearly Constrained LMS....................................224
5.12 Tracking Performance of the LMS in Nonstationary Environments..........225
5.13 Distributed Learning: the Distributed LMS............................227
5.13.1 Cooperation Strategies....................................228
5.13.2 The Diffusion LMS......................................231
5.13.3 Convergence and Steady-State Performance: Some Highlights......237
5.13.4 Consensus-Based Distributed Schemes........................240
5.14 A Case Study: Target Localization..................................241
5.15 Some Concluding Remarks: Consensus Matrix........................243
Problems.....................................................244
MATLAB?Exercises....................................246
References....................................................247
CHAPTER6 The Least-Squares Family......................................253
6.1 Introduction...................................................253
6.2 Least-Squares Linear Regression: a Geometric Perspective................254
6.3 Statistical Properties of the LS Estimator.............................257
The LS Estimator Is Unbiased..............................257
Covariance Matrix of the LS Estimator........................257
The LS Estimator Is BLUE in the Presence of White Noise........258
The LS Estimator Achieves the Cram閞朢ao Bound for White Gaussian Noise.........................................259
Asymptotic Distribution of the LS Estimator...................260
6.4 Orthogonalizing the Column Space of the Input Matrix: the SVD Method....260
Pseudoinverse Matrix and SVD.............................262
6.5 Ridge Regression: a Geometric Point of View.........................265
Principal Components Regression...........................267
6.6 The Recursive Least-Squares Algorithm.............................268
Time-Iterative Computations...............................269
Time Updating of the Parameters............................270
6.7 Newton抯 Iterative Minimization Method.............................271
6.7.1 RLS and Newton抯 Method................................274
6.8 Steady-State Performance of the RLS...............................275
6.9 Complex-Valued Data: the Widely Linear RLS........................277
6.10 Computational Aspects of the LS Solution............................279
Cholesky Factorization....................................279
QR Factorization........................................279
Fast RLS Versions.......................................280
6.11 The Coordinate and Cyclic Coordinate Descent Methods.................281
6.12 Simulation Examples............................................283
6.13 Total Least-Squares.............................................286
Geometric Interpretation of the Total Least-Squares Method........291
Problems.....................................................293
MATLAB瓻xercises....................................296
References....................................................297
CHAPTER7 Classificationa Tour of the Classics..............................301
7.1 Introduction...................................................301
7.2 Bayesian Classificatio..........................................302
The Bayesian Classifie Minimizes the Misclassificatio Error......303
7.2.1 Average Risk...........................................304
7.3 Decision (Hyper) Surfaces........................................307
7.3.1 The Gaussian Distribution Case.............................309
7.4 The Naive Bayes Classifie.......................................315
7.5 The Nearest Neighbor Rule.......................................315
7.6 Logistic Regression.............................................317
7.7 Fisher抯 Linear Discriminant......................................322
7.7.1 Scatter Matrices.........................................323
7.7.2 Fisher抯 Discriminant: the Two-Class Case.....................325
7.7.3 Fisher抯 Discriminant: the Multiclass Case.....................328
7.8 Classifcation Trees.............................................329
7.9 Combining Classifers...........................................333
No Free Lunch Theorem..................................334
Some Experimental Comparisons............................334
Schemes for Combining Classifier..........................335
7.10 The Boosting Approach..........................................337
The Ada Boost Algorithm..................................337
The Log-Loss Function...................................341
7.11 Boosting Trees.................................................343
Problems.....................................................345
MATLAB瓻xercises....................................347
References....................................................349
CHAPTER8 Parameter Learning: a Convex Analytic Path........................351
8.1 Introduction...................................................352
8.2 Convex Sets and Functions.......................................352
8.2.1 Convex Sets............................................353
8.2.2 Convex Functions.......................................354
8.3 Projections Onto Convex Sets.....................................357
8.3.1 Properties of Projections..................................361
8.4 Fundamental The orem of Projections Onto Convex Sets..................365
8.5 A Parallel Version of POCS.......................................369
8.6 From Convex Sets to Parameter Estimation and Machine Learning..........369
8.6.1 Regression.............................................369
8.6.2 Classifcation...........................................373
8.7 Infintely Many Closed Convex Sets: the Online Learning Case............374
8.7.1 Convergence of APSM....................................376
8.8 Constrained Learning............................................380
8.9 The Distributed APSM..........................................382
8.10 Optimizing Nonsmooth Convex Cost Functions........................384
8.10.1 Subgradients and Subdifferentials............................385
8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: the Batch Learning Case..........................................388
8.10.3 Online Learning for Convex Optimization.....................393
8.11 Regret Analysis................................................396
Regret Analysis of the Subgradient Algorithm..................398
8.12 Online Learning and Big Data Applications: a Discussion................399
Approximation, Estimation, and Optimization Errors.............400
Batch Versus Online Learning..............................402
8.13 Proximal Operators.............................................405
8.13.1 Properties of the Proximal Operator..........................407
8.13.2 Proximal Minimization...................................409
8.14 Proximal Splitting Methods for Optimization..........................412
The Proximal Forward-Backward Splitting Operator.............413
Alternating Direction Method of Multipliers (ADMM)............414
Mirror Descent Algorithms................................415
8.15 Distributed Optimization: Some Highlights...........................417
Problems.....................................................417
MATLAB?Exercises....................................420
References....................................................422
CHAPTER9 Sparsity-Aware Learning: Concepts and Theoretical Foundations.........427
9.1 Introduction...................................................427
9.2 Searching for a Norm............................................428
9.3 The Least Absolute Shrinkage and Selection Operator (LASSO)...........431
9.4 Sparse Signal Representation......................................436
9.5 In Search of the Sparsest Solution..................................440
The? Norm Minimizer...................................441
The? Norm Minimizer...................................442
The? Norm Minimizer...................................442
Characterization of the? Norm Minimizer....................443
Geometric Interpretation..................................444
9.6 Uniqueness of the? Minimizer....................................447
9.6.1 Mutua lCoherence.......................................449
9.7 Equivalence of? and? Minimizers: Sufficency Conditions..............451
9.7.1 Condition Implied by the Mutual Coherence Number.............451
9.7.2 The Restricted Isometry Property (RIP).......................452
9.8 Robust Sparse Signal Recovery From Noisy Measurements...............455
9.9 Compressed Sensing: the Glory of Randomness........................456
Compressed Sensing.....................................456
9.9.1 Dimensionality Reduction and Stable Embeddings...............458
9.9.2 Sub-Nyquist Sampling: Analog-to-Information Conversion........460
9.10 A Case Study: Image Denoising....................................463
Problems.....................................................465
MATLAB瓻xercises....................................468
References....................................................469
CHAPTER10 Sparsity-Aware Learning: Algorithms and Applications.................473
10.1 Introduction...................................................473
10.2 Sparsity Promoting Algorithms....................................474
10.2.1 Greedy Algorithms......................................474
10.2.2 Iterative Shrinkage/Thresholding (IST) Algorithms..............480
10.2.3 Which Algorithm? Some Practical Hints......................487
10.3 Variations on the Sparsity-Aware Theme.............................492
10.4 Online Sparsity Promoting Algorithms...............................499
10.4.1 LASSO: Asymptotic Performance...........................500
10.4.2 The Adaptive Norm-Weighted LASSO........................502
10.4.3 Adaptive CoSa MPAlgorithm...............................504
10.4.4 Sparse-Adaptive Projection Subgradient Method................505
10.5 Learning Sparse Analysis Models..................................510
10.5.1 Compressed Sensing for Sparse Signal Representationin Coherent Dictionaries...................................512
10.5.2 Cosparsity.............................................513
10.6 A Case Study: Time-Frequency Analysis.............................516
Gabor Transform and Frames...............................516
Time-Frequency Resolution................................517
Gabor Frames..........................................518
Time-Frequency Analysis of Echolocation Signals Emitted by Bats..519
Problems.....................................................523
MATLAB瓻xercises....................................524
References....................................................525
CHAPTER11 Learningin Reproducing Kernel Hilbert Spaces......................531
11.1 Introduction...................................................532
11.2 Generalized Linear Models.......................................532
11.3 Volterra, Wiener, and Hammerstein Models...........................533
11.4 Cover抯 Theorem: Capacity of a Spacein Linear Dichotomies.............536
11.5 Reproducing Kernel Hilbert Spaces.................................539
11.5.1 Some Properties and Theoretical Highlights....................541
11.5.2 Examples of Kernel Functions..............................543
11.6 Representer Theorem............................................548
11.6.1 Semiparametric Representer Theorem........................550
11.6.2 Nonparametric Modeling: a Discussion.......................551
11.7 Kernel Ridge Regression.........................................551
11.8 Support Vector Regression........................................554
11.8.1 The Linear?Insensitive Optimal Regression...................555
11.9 Kernel Ridge Regression Revisited.................................561
11.10 Optimal Margin Classification Support Vector Machines.................562
11.10.1 Linearly Separable Classes: Maximum Margin Classifier.........564
11.10.2 Nonseparable Classes.....................................569
11.10.3 Performance of SVMs and Applications.......................574
11.10.4 Choice of Hyperparameters................................574
11.10.5 Multiclass Generalizations.................................575
11.11 Computational Considerations.....................................576
11.12 Random Fourier Features.........................................577
11.12.1 Online and Distributed Learningin RKHS.....................579
11.13 Multiple Kernel Learning.........................................580
11.14 Nonparametric Sparsity-Aware Learning: Additive Models...............582
11.15 A Case Study: Authorship Identificatio.............................584
Problems.....................................................587
MATLAB瓻xercises....................................589
References....................................................590
CHAPTER12 Bayesian Learning: Inference and the EM Algorithm...................595
12.1 Introduction...................................................595
12.2 Regression: a Bayesian Perspective.................................596
12.2.1 The Maximum Likelihood Estimator.........................597
12.2.2 The MAP Estimator......................................598
12.2.3 The Bayesian Approach...................................599
12.3 The Evidence Function and Occam抯 Razor Rule.......................605
Laplacian Approximation and the Evidence Function.............607
12.4 Latent Variables and the EM Algorithm..............................611
12.4.1 The Expectation-Maximization Algorithm.....................611
12.5 Linear Regression and the EM Algorithm.............................613
12.6 Gaussian Mixture Models........................................616
12.6.1 Gaussian Mixture Modeling and Clustering....................620
12.7 The EM Algorithm: a Lower Bound Maximization View.................623
12.8 Exponential Family of Probability Distributions........................627
12.8.1 The Exponential Family and the Maximum Entropy Method.......633
12.9 Combining Learning Models: a Probabilistic Pointof View...............634
12.9.1 Mixing Linear Regression Models...........................634
12.9.2 Mixing Logistic Regression Models..........................639
Problems.....................................................641
MATLAB瓻xercises....................................643
References....................................................645
CHAPTER13 Bayesian Learning: Approximate Inferenceand Nonparametric Models.....647
13.1 Introduction...................................................648
13.2 Variational Approximationin Bayesian Learning.......................648
The Mean Field Approximation.............................649
13.2.1 The Case of the Exponential Family of Probability Distributions.....653
13.3 A Variational Bayesian Approachto Linear Regression..................655
Computation of the Lower Bound............................660
13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling...........661
13.5 When Bayesian Inference Meets Sparsity.............................665
13.6 Sparse Bayesian Learning(SBL)...................................667
13.6.1 The Spike and Slab Method................................671
13.7 The Relevance Vector Machine Framework...........................672
13.7.1 Adopting the Logistic Regression Model for Classificatio.........672
13.8 Convex Duality and Variational Bounds..............................676
13.9 Sparsity-Aware Regression: a Variational Bound Bayesian Path............681
Sparsity-Aware Learning: Some Concluding Remarks............686
13.10 Expectation Propagation.........................................686
Minimizing the KL Divergence.............................688
The Expectation Propagation Algorithm.......................688
13.11 Nonparametric Bayesian Modeling.................................690
13.11.1 The Chinese Restaurant Process.............................691
13.11.2 Dirichlet Processes.......................................692
13.11.3 The Stick Breaking Construction of a DP......................697
13.11.4 Dirichlet Process Mixture Modeling..........................698
Inference..............................................699
13.11.5 The Indian Buffet Process.................................701
13.12 Gaussian Processes.............................................710
13.12.1 Covariance Functions and Kernels...........................711
13.12.2 Regression.............................................712
13.12.3 Classifcation...........................................716
13.13 A Case Study: Hyperspectral Image Unmixing.........................717
13.13.1 Hierarchical Bayesian Modeling.............................719
13.13.2 Experimental Results.....................................720
Problems.....................................................721
MATLAB瓻xercises....................................726
References....................................................727
CHAPTER14 Monte Carlo Methods.........................................731
14.1 Introduction...................................................731
14.2 Monte Carlo Methods: the Main Concept.............................732
14.2.1 Random Number Generation...............................733
14.3 Random Sampling Based on Function Transformation...................735
14.4 Rejection Sampling.............................................739
14.5 Importance Sampling............................................743
14.6 Monte Carlo Methods and the EM Algorithm..........................745
14.7 Markov Chain Monte Carlo Methods................................745
14.7.1 Ergodic Markov Chains...................................748
14.8 The Metropolis Method..........................................754
14.8.1 Convergence Issues......................................756
14.9 Gibbs Sampling................................................758
14.10 In Search of More Efficien Methods: a Discussion.....................760
Variational Inferenceor Monte Carlo Methods..................762
14.11 A Case Study: Change-Point Detection..............................762
Problems.....................................................765
MATLAB瓻xercise.....................................767
References....................................................768
CHAPTER15 Probabilistic Graphical Models: PartI.............................771
15.1 Introduction...................................................771
15.2 The Need for Graphical Models....................................772
15.3 Bayesian Networks and the Markov Condition.........................774
15.3.1 Graphs: Basic Definition..................................775
15.3.2 Some Hintson Causality..................................779
15.3.3 d-Separation...........................................781
15.3.4 Sigmoidal Bayesian Networks..............................785
15.3.5 Linear Gaussian Models...................................786
15.3.6 Multiple-Cause Networks..................................786
15.3.7 I-Maps, Soundness, Faithfulness, and Completeness..............787
15.4 Undirected Graphical Models.....................................788
15.4.1 Independencies and I-Mapsin Markov Random Fields............790
15.4.2 The Ising Model and Its Variants............................791
15.4.3 Conditional Random Fields (CRFs)..........................794
15.5 Factor Graphs.................................................795
15.5.1 Graphical Models for Error Correcting Codes...................797
15.6 Moralization of Directed Graphs...................................798
15.7 Exact Inference Methods: Message Passing Algorithms..................799
15.7.1 Exact Inferencein Chains..................................799
15.7.2 Exact Inferencein Trees...................................803
15.7.3 The Sum-Product Algorithm...............................804
15.7.4 The Max-Product and Max-Sum Algorithms...................809
Problems.....................................................816
References....................................................818
CHAPTER16 Probabilistic Graphical Models: PartII............................821
16.1 Introduction...................................................821
16.2 Triangulated Graphs and Junction Trees..............................822
16.2.1 Constructinga Join Tree...................................825
16.2.2 Message Passing in Junction Trees...........................827
16.3 Approximate Inference Methods...................................830
16.3.1 Variational Methods: Local Approximation....................831
16.3.2 Block Methods for Variational Approximation..................835
16.3.3 Loopy Belief Propagation..................................839
16.4 Dynamic Graphical Models.......................................842
16.5 Hidden Markov Models..........................................844
16.5.1 Inference..............................................847
16.5.2 Learning the Parametersin an HMM.........................852
16.5.3 Discriminative Learning...................................855
16.6 Beyond HMMs: a Discussion......................................856
16.6.1FactorialHiddenMarkovModels............................856
16.6.2 Time-Varying Dynamic Bayesian Networks....................859
16.7 Learning Graphical Models.......................................859
16.7.1 Parameter Estimation.....................................860
16.7.2 Learning the Structure....................................864
Problems.....................................................864
References....................................................867
CHAPTER17ParticleFiltering............................................871
17.1 Introduction...................................................871
17.2 Sequential Importance Sampling...................................871
17.2.1 Importance Sampling Revisited.............................872
17.2.2 Resampling............................................873
17.2.3 Sequential Sampling.....................................875
17.3 Kalman and Particle Filtering......................................878
17.3.1 Kalman Filtering:a Bayesian Point of View....................878
17.4 Particle Filtering...............................................881
17.4.1 Degeneracy............................................885
17.4.2 Generic Particle Filtering..................................886
17.4.3 Auxiliary Particle Filtering.................................889
Problems.....................................................895
MATLAB瓻xercises....................................898
References....................................................899
CHAPTER18 Neural Networks and Deep Learning..............................901
18.1 Introduction...................................................902
18.2 The Perceptron................................................904
18.3 Feed-Forward Multilayer Neural Networks...........................908
18.3.1 Fully Connected Networks.................................912
18.4 The Backpropagation Algorithm...................................913
Nonconvexity of the Cost Function...........................914
18.4.1 The Gradient Descent Backpropagation Scheme.................916
18.4.2 Variants of the Basic Gradient Descent Scheme.................924
18.4.3 Beyond the Gradient Descent Rationale.......................934
18.5 Selecting a Cos tFunction........................................935
18.6 Vanishing and Exploding Gradients.................................938
18.6.1 The Rectifie Linear Unit..................................939
18.7 Regularizing the Network........................................940
Dropout...............................................943
18.8 Designing Deep Neural Networks: a Summary.........................946
18.9 Universal Approximation Property of Feed-Forward Neural Networks.......947
18.10 Neural Networks: a Bayesian Flavor................................949
18.11 Shallow Versus Deep Architectures.................................950
18.11.1 The Power of Deep Architectures............................951
18.12 Convolutional Neural Networks....................................956
18.12.1 The Need for Convolutions................................956
18.12.2 Convolution Over Volumes.................................965
18.12.3 The Full CNN Architecture................................968
18.12.4 CNNs: the Epilogue......................................971
18.13 Recurrent Neural Networks.......................................976
18.13.1 Backpropagation Through Time.............................978
18.13.2 Attentionand Memory....................................982
18.14 Adversarial Examples...........................................985
Adversarial Training.....................................987
18.15 Deep Generative Models.........................................988
18.15.1 Restricted Boltzmann Machines.............................988
18.15.2 Pretraining Deep Feed-Forward Networks.....................991
18.15.3 Deep Belief Networks....................................992
18.15.4 Autoencoders...........................................994
18.15.5 Generative Adversarial Networks............................995
18.15.6 Variational Autoencoders..................................1004
18.16 Capsule Networks..............................................1007
Training...............................................1011
18.17 Deep Neural Networks: Some Final Remarks..........................1013
Transfer Learning........................................1013
Multitask Learning.......................................1014
Geometric DeepLearning.................................1015
Open Problems.........................................1016
18.18 A Case Study: Neural Machine Translation...........................1017
18.19 Problems.....................................................1023
Computer Exercises......................................1025
References....................................................1029
CHAPTER19 Dimensionality Reduction and Latent Variable Modeling................1039
19.1 Introduction...................................................1040
19.2 Intrinsic Dimensionality..........................................1041
19.3 Principal Component Analysis.....................................1041
PCA, SVD, and Low Rank Matrix Factorization.................1043
Minimum Error Interpretation..............................1045
PCA and Information Retrieval.............................1045
Orthogonalizing Properties of PCA and Feature Generation........1046
Latent Variables.........................................1047
19.4 Canonical Correlation Analysis....................................1053
19.4.1 Relatives of CCA........................................1056
19.5 Independent Component Analysis..................................1058
19.5.1 ICA and Gaussianity.....................................1058
19.5.2 ICA and Higher-Order Cumulants...........................1059
19.5.3 Non-Gaussianity and Independent Components.................1061
19.5.4 ICA Basedon Mutual Information...........................1062
19.5.5 Alternative Paths to ICA..................................1065
The Cocktail Party Problem................................1066
19.6 Dictionary Learning: the k-SVD Algorithm...........................1069
Whythe Namek-SVD?...................................1072
Dictionary Learning and Dictionary Identifiabilit...............1072
19.7 Nonnegative Matrix Factorization..................................1074
19.8 Learning Low-Dimensional Models: a Probabilistic Perspective............1076
19.8.1 Factor Analysis.........................................1077
19.8.2 Probabilistic PCA.......................................1078
19.8.3 Mixture of Factors Analyzers: a Bayesian View to Compressed Sensing.......................1082
19.9 Nonlinear Dimensionality Reduction................................1085
19.9.1 Kernel PCA............................................1085
19.9.2 Graph-Based Methods....................................1087
19.10 Low Rank Matrix Factorization: a Sparse Modeling Path.................1096
19.10.1 Matrix Completion.......................................1096
19.10.2 Robust PCA............................................1100
19.10.3 Applications of Matrix Completion and ROBUSTPCA...........1101
19.11 A Case Study: FMRI Data Analysis.................................1103
Problems.....................................................1107
MATLAB瓻xercises....................................1107
References....................................................1108
Index....................................................................1116