回归分析的理论与方法给出了分析各种领域变量关系的基本框架,在统计学、生物统计学、心理学。社会学、商业和工程等领域都有很多应用。
本书提供了适用于现实问题的回归分析方法的最新内容,并介绍了其中蕴含的统计思想及其应用。全书不仅系统地阐述了回归分析的经典内容,而且还介绍了近年来回归分析及多元方法领域的许多新思想和新发展,讲述了模型建立、直觉逻辑等各方法的前提假设,以及这些方法的目标、优缺点及详细说明。在叙述基本概念及理论的同时,作者力求反映该领域当前最流行的思想。
本书作者是生物统计学领域的专家,对回归分析十分熟悉。
本书把重点放在实际研究中可能用到的实用技能上,适合作为高等院校研究生、高年级本科生的教材或教学参考书,同时也是卫生科学、社会科学、生物科学和行为科学等领域的专业人员及理论研究人员难得的参考书。
David.Kleinbaum,博土、埃默里大学流行病学系教授、主要研究方向为传染病传播定量分析。
Lawrence L.Kupper,生物统计学博士,北卡罗来纳大学公共健康学院生物统计学教授,主要研究方向为流行病学与环境健康中的生物统计。
Keith E.Mullerl北卡罗来纳大学公共健康学院生物统计学副教授。
Azhar Nizam,埃默里大学统计学硕士。
This is the second revision of our second-level statistics text, originally published in 1978 and first revised in 1987. As before, this text is intended primarily for advanced undergraduates, graduate students, and working professionals in the health, social, biological, and behavioral sciences who engage in applied research in their fields. The book may also provide professional statisticians with some new insights into the application of advanced statistical techniques to real-life problems.
We have attempted in this revision to retain the basic structure and flayor of the earlier two editions, while at the same time making changes to keep pace with current analytic practices and computer usage in applied research. Notable changes in the third edition, discussed in more
detail later, include a fourth author (Azhar Nizam), some reorganization of topics (in Chapters22-24), expanded coverage of some conteot areas (such as logistic regression, in Chapter 23), a new chapter (Chapter 21 on repeated measures ANOVA), some new exercises for the reader, and
the integration of computer output (using the SAS package, primarily) into our discussion of examples in the main body of the text and as a component of exercises given at the end of each chapter. We have deleted from the previous editions chapters on discriminant analysis, factor analysis, and categorical data analysis. This decision was based on our finding from a survey of previons users of our text that these chapters were rarely used for classroom instruction and
were largely out of date. At the same time, the chapters we have added to replace this material seem more relevant to current applied research practice.
In this revision, as in our previous versions, we emphasize the intuitive logic and assump-tions that underlie the techniques covered, the purposes for which these techniques are designed, the advantages and disadvantages of the techniques, and valid interpretations based on
the techniques. Although we describe the statistical calculations required for the techniques we cover, we rely on computer output (even more so in this revision than previously) to provide the results of such calculations, so the reader can concenuate oo how to apply a given technique
rather than on how to carry out the calculations. The mathematical formulas that we do present require no more than simple algebraic manipulations. Proofs are of secondary importance and are generally omitted. Neither calculus nor matrix algebra is used anywhere in the main text,
although we have included an appendix on matrices for the interested reader.
The text is not intended to be a general reference work dealing with all the statistical tech-niques available for analyzing data involving several variables. Instead, we focus on the tech-niques we consider most essential for use in applied research. Alter becoming proficient with the material in this text, the reader should be able to benefit from more specialized discussions of applied topics not covered here.
The most notable features of this second revised edition are the following:
1. Regression analysis and analysis of variance are discussed in considerable detail and with pedagogical care that reflects the authors' extensive experience and insight as
teachers of such material.
2. The relationship between regression analysis and analysis of variance is highlighted.
3. The connection between multiple regression analysis and multiple and partial correlation analysis is discussed in detail.
4. Several advanced topics are presented in a unique, nonmathematical manner, including the analysis of repeated measures data (a new topic in this edition), maximum likelihood methods, logistic regression (expanded into a new chapter), and Poisson regression (also expanded into a new chapter).
5. An up-to-date discussion of the issues and procedures involved in fine-tuning a regression analysis is presented in chapters on confounding and interaction in regression, regression diagnostics, and selecting the best model.
6. Numerous examples and exercises illustrate applications to real studies in a wide variety of disciplines. New exercises have been added to all chapters.
7. Representative computer results from packaged programs (primarily using the SAS package) are used to illustrate concepts in the body of the text, as well as to provide a basis for exercises for the reader. We have greatly expanded the quantity of computer results provided throughout the text. Whenever appropriate, we have used computer output to replace material in the previous edition that unnecessarily emphasized numerical calculations.
8. The complete set of data for most exercises is provided, along with related computer results. This allows the instructor to assign computer work based on available packaged programs. However, if the instructional objectives involve a minimum of computer work, the instructor can use the computer results to give the student practical experience in interpreting computer output based on the techniques described in the text.
9. 5"he reorganization and expansion of the material on maximum likelihood methods into three chapters (22-24) provide a strong foundation for understanding the most widely used method for fitting mathematical models involving several variables.
10. A new chapter on methods for the analysis of repeated measures data (Chapter 21 extends the discussion of ANOVA methods to a rapidly developing area of statistical methodology for the analysis of correlated data.
For formal classroom instruction, the chapters fall naturally into three clusters: Chapters 4 through 16, on regression analysis; Chapters 17 through 20, on analysis of variance, with optional use of Chapter 21 to introduce the analysis of repeated measures data; and Chapters 22 through 24, on maximum likelihood methods and important applications involving logistic and Poisson regression modeling. For a first course in regression analysis, some of Chapters 11
through 16 may be considered too specialized. For example, Chapter 12 on regression diagnostics and Chapter 16 on selecting the best model might be used in a continuation course on regression modeling, which might also include some of the advanced topics covered in Chapters 21 through 24.
The Teaching Package
A data disk is bound into each copy of the book. This disk contains data for the problems;the data sets are formatted for SAS, StataQuest, Minitab, and in ASCII. A Student Solutions Manual contains complete solutions for all of the problems for which answers are given in Appendix D, and a Solutions Manual, available to adopting instructors, contains complete solutions to all problems in the book.
Acknowledgments
We wish to acknowledge several people who contributed to the preparation of this text.
Drs. Kleinbaum and Kupper continue to be indebted to John Cassel and Bernard Greenberg, two mentors who have provided us with inspiration and the professional and administrative guidance that enabled us to gain the broad experience necessary to write this book. Dr. Muller adds his thanks to Bernard Greenberg. Dr. Kleinbaum also wishes to thank John Boring, Chair of the Epidemiology Department at Emory University for his strong support and encouragement and
for his deep commitment to teaching excellence. Dr. Kupper wishes to thank Barry Margolin,Chair of the Biostatistics Department at the University of North Carolina for his leadership and support. Azhar Nizam wishes to thank the chair of his department, Dr. Vicki Hertzberg, Department of Biostatistics at Emory University.
We also wish to thank Edna Kleinbaum, Sandy Martin, Sally Muller, and Janet Nizam for their encouragement and support during the writing of this revision. We thank our many stu-
dents and colleagues at Emory University and at the University of North Carolina for their helpful comments and suggestions. We also want to thank the reviewers: Robert J. Anderson,University of Illinois at Chicago; Alfred A. Bartolucci, The University of Alabama at Birming-
ham; Robert Cochran, University of Wyoming; Joseph L. Fleiss, Columbia University Medical Center; James E. Holstein, University of Missouri at Columbia; Robin H. Lock, St. Lawrence University; Frank P. Mathur, Cal Poly at Pomona; and Satya N. Mishra, University of South Alabama. Finally, we thank those persons responsible for publishing this book: Alex Kugushev,Jamie Sue Brooks, and Dusty Davidson.
David G.Kleinbaum,Lawrence L.Kupper,Keith E.Muller,Azhar Nizam:David G.Kleinbaum: David G. Kleinbaum,博士,埃默里大学流行病学系教授,主要研究方向为传染病传播定量分析。
Lawrence L.Kupper: Lawrence L. Kupper,生物统计学博士,北卡罗来纳大学公共健康学院生物统计学教授,主要研究方向为流行病学与环境健康中的生物统计。究方向为传染病传播定量分析。
Keith E.Muller: Keith E. Muller,北卡罗来纳大学公共健康学院生物统计学副教授。
Azhar Nizam: Azhar Nizam,埃默里大学统计学硕士。
1 CONCEPTS AND EXAMPLES OF RESEARCH
1-1 Concepts 1
1-2 Examples 2
1-3 Concluding Remarks
References 6
2 CLASSIFICATION OF VARIABLES AND THE CHOICE OF ANALYSIS
2-1 Classification of Variables 7
2-2 Overlapping of Classification Schemes 11
2-3 Choice of Analysis 11
References 13
3 BASIC STATISTICS: A REVIEW 14
3-1 Preview 14
3-2 Descriptive Statistics 15
3-3 Random Variables and Distributions 16
3-4 Sampling Distributions of t, X2, and F 19
3-5 Statistical Inference: Estimation 21
3-6 Statistical Inference: Hypothesis Testing 24
3-7 Error Rates, Power, and Sample Size 28
Problems 30
References 33
4 INTRODUCTION TO REGRESSION ANALYSIS 34
4-1 Preview 34
4-2 Association versus Causality 35
4-3 Statistical versus Deterministic Models 37
4-4 Concluding Remarks 38
References 38
5 STRAIGHT-LINE REGRESSION ANALYSIS 39
5-1 Preview 39
5-2 Regression with a Single Independent Variable 39
5-3 Mathematical Properties of a Straight Line 42
5-4 Statistical Assumptions for a Straight-line Model 43
5-5 Determining the Best-fitting Straight Line 47
5-6 Measure of the Quality of the Straight-line Fit and Estimate of 2 51
5-7 Inferences About the Slope and Intercept 52
5-8 Interpretations of Tests for Slope and Intercept 54
5-9 Inferences About the Regression Line UY|X = B0 + B1X 57
5-10 Prediction of a New Value of Y at X0 59
5-11 Assessing the Appropriateness of the Straight-line Model 60
Problems 60
References 87
6 THE CORRELATION COEFFICIENT
AND STRAIGHT-LINE REGRESSION ANALYSIS 88
6-1 Definition of r 88
6-2 r as a Measure of Association 89
6-3 The Bivariate Normal Distribution 90
6-4 r and the Strength of the Straight-line Relationship 93
6-5 What r Does Not Measure 95
6-6 Tests of Hypotheses and Confidence Intervals for the Correlation Coefficient
6-7 Testing for the Equality of Two Correlations 99
Problems 101
References 103
7 THE ANALYSIS-OF-VARIANCE TABLE 104
7-1 Preview 104
7-2 The ANOVA Table for Straight-line Regression 104
Problems 108
8 MULTIPLE REGRESSION ANALYSIS:
GENERAL CONSIDERATIONS 111
8-1 Preview 111
8-2 Multiple Regression Models 112
8-3 Graphical Look at the Problem 113
8-4 Assumptions of Multiple Regression 115
8-5 Determining the Best Estimate of the Multiple Regression Equation 118
8-6 The ANOVA Table for Multiple Regression 119
8-7 Numerical Examples 12l
Problems 123
References 135
TESTING HYPOTHESES IN MULTIPLE REGRESSION 136
9-1 Preview 136
9-2 Test for Significant Overall Regression 137
9-3 Partial FTest 138
9-4 Multiple Partial F Test 143
9-5 Strategies for Using Partial F Tests 145
9-6 Tests Involving the Intercept 150
Problems 151
References 159
CORRELATIONS:PARTIAL, AND MULTIPLE PARTIAL 160
10-1 Preview 160
10-2 Correlation Matrix 161
10-3 Multiple Correlation Coefficient 162
10-4 Relationship of RY|X1、X2,...,Xk to the Multivariate Normal Distribution 164
10-5 Partial Correlation Coefficient 165
10-6 Alternative Representation of the Regression Model 172
10-7 Multiple Partial Correlation 172
10-8 Concluding Remarks 174
Problems 174
Reference 185
1 CONFOUNDING AND INTERACTION IN REGRESSION
11-1 Preview 186
11-2 Overview 186
11-3 Interaction in Regression 188
11-4 Confounding in Regression 194
11-5 Summary and Conclusions 199
Problems 199
Reference 211
12 REGRESSION DIAGNOSTICS 212
12-1 Preview 212
12-2 Simple Approaches to Diagnosing Problems in Data 212
12-3 Residual Analysis 216
12-4 Treating Outliers 228
12-5 Collinearity 237
12-6 Scaling Problems 248
12-7 Treating Collinearity and Scaling Problems 248
12-8 Alternate Strategies of Analysis 249
12-9 An Important Caution 252
Problems 253
References 279
13 POLYNOMIAL REGRESSION
13-1 Preview 281
13-2 Polynomial Models 282
13-3 Least-squares Procedure for Fitting a Parabola 282
13-4 ANOVA Table for Second-order Polynomial Regression 284
13-5 Inferences Associated with Second-order Polynomial Regression 284
13-6 Example Requiring a Second-order Model 286
13-7 Fitting and Testing Higher-order Models 290
13-8 Lack-of-fit Tests 290
13-9 Orthogonal Polynomials 292
13-10 Strategies for Choosing a Polynomial Model 301
Problems 302
14 DUMMY VARIABLES IN REGRESSION 317
14-1 Preview 317
14-2 Definitions 317
14-3 Rule for Defining Dummy Variables 318
14-4 Comparing Two Straight-line Regression Equations: An Example 319
14-5 Questions for Comparing Two Straight Lines 320
14-6 Methods of Comparing Two Straight Lines 321
14-7 Method I: Using Separate Regression Fits to Compare Two Straight Lines 322
14-8 Method II: Using a Single Regression Equation to Compare Two Straight Lines 327
14-9 Comparison of Methods I and II 330
14-10 Testing Strategies and Interpretation: Comparing Two Straight Lines 330
14-11 Other Dummy Variable Models 332
14-12 Comparing Four Regression Equations 334
14-13 Comparing Several Regression Equations Involving Two Nominal Variables 336
Problems 338
References 360
15 ANALYSIS OF COVARIANCE AND OTHER
METHODS FOR ADJUSTING CONTINUOUS DATA 361
15-1 Preview 361
15-2 Adjustment Problem 361
15-3 Analysis of Covariance 363
15-4 Assumption of Parallelism: A Potential Drawback 365
15-5 Analysis of Covariance: Several Groups and Several Covariates 366
15-6 Comments and Cautions 368
15-7 Summary 371
Problems 371
Reference 385
16 SELECTING THE BEST REGRESSION EQUATION 386
16-1 Preview 386
16-2 Steps in Selecting the Best Regression Equation 387
16-3 Step 1: Specifying the Maximum Model 387
16-4 Step 2: Specifying a Criterion for Selecting a Model 390
16-5 Step 3: Specifying a Strategy for Selecting Variables 392
16-6 Step 4: Conducting the Analysis 401
16-7 Step 5: Evaluating Reliability with Split Samples 401
16-8 Example Analysis of Actual Data 403
16-9 Issues in Selecting the Most Valid Model 409
Problems 409
References 422
I7 ONE-WAY ANALYSIS OF VARIANCE 423
17-1 Preview 423
17-2 One-way ANOVA: The Problem, Assumptions, and Data Configuration 426
17-3 Methodology for One-way Fixed-effects ANOVA 429
17-4 Regression Model for Fixed-effects One-way ANOVA 435
17-5 Fixed-effects Model for One-way ANOVA 438
17-6 Random-effects Model for One-way ANOVA 440
17-7 Multiple-comparison Procedures for Fixed-effects One-way ANOVA 443
17-8 Choosing a Multiple-comparison Technique 456
17-9 Orthogonal Contrasts and Partitioning an ANOVA Sum of Squares 457
Problems 463
References 483
18 RANDOMIZED BLOCKS: SPECIAL CASE OF TWO-WAY ANOVA
18-1 Preview 484
18-2 Equivalent Analysis of a Matched Pairs Experiment 488
18-3 Principle of Blocking 491
18-4 Analysis of a Randomized-blocks Experiment 493
18-5 ANOVA Table for a Randomized-blocks Experiment 495
18-6 Regression Models for a Randomized-blocks Experiment 499
18-7 Fixed-effects ANOVA Model for a Randomized-blocks Experiment 502
Problems 503
References 515
19 TWO-WAY ANOVA WITH EQUAL CELL NUMBERS 516
19-1 Preview 516
19-2 Using a Table of Cell Means 518
19-3 General Methodology 522
19-4 F Tests for Two-way ANOVA 527
19-5 Regression Model for Fixed-effects Two-way ANOVA 530
19-6 Interactions in Two-way ANOVA 534
19-7 Random- and Mixed-effects Two-way ANOVA Models 541
Problems 544
References 560
20 TWO-WAY ANOVA WITH UNEQUAL CELL NUMBERS 561
20-1 Preview 561
20-2 Problems with Unequal Cell Numbers: Nonorthogonality 563
20-3 Regression Approach for Unequal Cell Sample Sizes 567
20-4 Higher-way ANOVA 571
Problems 572
References 588
21 ANALYSIS OF REPEATED MEASURES DATA 589
21-1 Preview 589
21-2 Examples 590
21-3 General Approach for Repeated Measures ANOVA 592
21-4 Overview of Selected Repeated Measures Designs and ANOVA-based Analyses 594
21-5 Repeated Measures ANOVA for Unbalanced Data 611
21-6 Other Approaches to Analyzing Repeated Measures Data 612
Appendix 21-A Examples of SAS's GLM and MIXED Procedures 613
Problems 616
References 638
22 THE METHOD OF MAXIMUM LIKELIHOOD 639
22-1 Preview 639
22-2 The Principle of Maximum Likelihood 639
22-3 Statistical Inference via Maximum Likelihood 642
22-4 Summary 652
Problems 653
References 655
23 LOGISTIC REGRESSION ANALYSIS 656
23-1 Preview 656
23-2 The Logistic Model 656
23-3 Estimating the Odds Ratio Using Logistic Regression 658
23-4 A Numerical Example of Logistic Regression 664
23-5 Theoretical Considerations 671
23-6 An Example of Conditional ML Estimation
Involving Pair-matched Data with Unmatched Covariates 677
23-7 Summary 681
Problems 682
References 686