教学资源 – 图书教辅

扩展信息

语种 : 英文

页数 : 386

开本 : 16

原书名 : An Introduction to Parallel Programming

原出版社: Elsevier

属性分类: 教材

包含CD : 无

绝版 : 无

图书简介

本书循序渐进地展示了如何利用MPI、PThread 和OpenMP开发高效的并行程序，教给读者如何开发、调试分布式内存和共享式内存的程序，以及对程序进行性能评估。

图书前言

Parallel hardware has been ubiquitous for some time now. It’s difficult to find a laptop,
desktop, or server that doesn’t use a multicore processor. Beowulf clusters are
nearly as common today as high-powered workstations were during the 1990s, and
cloud computing could make distributed-memory systems as accessible as desktops.
In spite of this, most computer science majors graduate with little or no experience in
parallel programming. Many colleges and universities offer upper-division elective
courses in parallel computing, but since most computer science majors have to take
numerous required courses, many graduate without ever writing a multithreaded or
multiprocess program.
It seems clear that this state of affairs needs to change. Although many programs
can obtain satisfactory performance on a single core, computer scientists should be
made aware of the potentially vast performance improvements that can be obtained
with parallelism, and they should be able to exploit this potential when the need
arises.
An Introduction to Parallel Programming was written to partially address this
problem. It provides an introduction to writing parallel programs using MPI,
Pthreads, and OpenMP—three of the most widely used application programming
interfaces (APIs) for parallel programming. The intended audience is students and
professionals who need to write parallel programs. The prerequisites are minimal:
a college-level course in mathematics and the ability to write serial programs
in C. They are minimal because we believe that students should be able to start
programming parallel systems as early as possible.
At the University of San Francisco, computer science students can fulfill a
requirement for the major by taking the course, on which this text is based, immediately
after taking the “Introduction to Computer Science I” course that most majors
take in the first semester of their freshman year. We’ve been offering this course
in parallel computing for six years now, and it has been our experience that there
really is no reason for students to defer writing parallel programs until their junior
or senior year. To the contrary, the course is popular, and students have found that
using concurrency in other courses is much easier after having taken the Introduction
course.
If second-semester freshmen can learn to write parallel programs by taking a
class, then motivated computing professionals should be able to learn to write parallel
programs through self-study.We hope this book will prove to be a useful resource
for them.
About This Book
As we noted earlier, the main purpose of the book is to teach parallel programming in
MPI, Pthreads, and OpenMP to an audience with a limited background in computer
science and no previous experience with parallelism. We also wanted to make it as
flexible as possible so that readers who have no interest in learning one or two of
the APIs can still read the remaining material with little effort. Thus, the chapters on
the three APIs are largely independent of each other: they can be read in any order,
and one or two of these chapters can be bypass. This independence has a cost: It
was necessary to repeat some of the material in these chapters. Of course, repeated
material can be simply scanned or skipped.
Readers with no prior experience with parallel computing should read Chapter 1
first. It attempts to provide a relatively nontechnical explanation of why parallel systems
have come to dominate the computer landscape. The chapter also provides a
short introduction to parallel systems and parallel programming.
Chapter 2 provides some technical background in computer hardware and software.
Much of the material on hardware can be scanned before proceeding to the
API chapters. Chapters 3, 4, and 5 are the introductions to programming with MPI,
Pthreads, and OpenMP, respectively.
In Chapter 6 we develop two longer programs: a parallel n-body solver and a
parallel tree search. Both programs are developed using all three APIs. Chapter 7
provides a brief list of pointers to additional information on various aspects of parallel
computing.
We use the C programming language for developing our programs because all
three APIs have C-language interfaces, and, since C is such a small language, it is
a relatively easy language to learn—especially for C++ and Java programmers, since
they are already familiar with C’s control structures.
Classroom Use
This text grew out of a lower-division undergraduate course at the University of San
Francisco. The course fulfills a requirement for the computer science major, and it
also fulfills a prerequisite for the undergraduate operating systems course. The only
prerequisites for the course are either a grade of “B” or better in a one-semester
introduction to computer science or a “C” or better in a two-semester introduction
to computer science. The course begins with a four-week introduction to C programming.
Since most students have already written Java programs, the bulk of what is
covered is devoted to the use pointers in C.1 The remainder of the course provides
introductions to programming in MPI, Pthreads, and OpenMP.
We cover most of the material in Chapters 1, 3, 4, and 5, and parts of the material
in Chapters 2 and 6. The background in Chapter 2 is introduced as the need arises.
For example, before discussing cache coherence issues in OpenMP (Chapter 5), we
cover the material on caches in Chapter 2.
The coursework consists of weekly homework assignments, five programming
assignments, a couple of midterms, and a final exam. The homework usually involves
1Interestingly, a number of students have said that they found the use of C pointers more difficult than
MPI programming.
writing a very short program or making a small modification to an existing program.
Their purpose is to insure that students stay current with the course work and to give
them hands-on experience with the ideas introduced in class. It seems likely that the
existence of the assignments has been one of the principle reasons for the course’s
success. Most of the exercises in the text are suitable for these brief assignments.
The programming assignments are larger than the programs written for homework,
but we typically give students a good deal of guidance: We’ll frequently
include pseudocode in the assignment and discuss some of the more difficult aspects
in class. This extra guidance is often crucial: It’s not difficult to give programming
assignments that will take far too long for the students to complete. The results of the
midterms and finals, and the enthusiastic reports of the professor who teaches operating
systems, suggest that the course is actually very successful in teaching students
how to write parallel programs.
For more advanced courses in parallel computing, the text and its online support
materials can serve as a supplement so that much of the information on the syntax
and semantics of the three APIs can be assigned as outside reading. The text can also
be used as a supplement for project-based courses and courses outside of computer
science that make use of parallel computation.
Support Materials
The book’s website is located at http://www.mkp.com/pacheco. It will include
errata and links to sites with related materials. Faculty will be able to download
complete lecture notes, figures from the text, and solutions to the exercises and the
programming assignments. All users will be able to download the longer programs
discussed in the text.
We would greatly appreciate readers letting us know of any errors they find.
Please send an email to peter@usfca.edu if you do find a mistake.
Acknowledgments
In the course of working on this book, I’ve received considerable help from many
individuals. Among them I’d like to thank the reviewers who read and commented on
the initial proposal: Fikret Ercal (Missouri University of Science and Technology),
Dan Harvey (Southern Oregon University), Joel Hollingsworth (Elon University),
Jens Mache (Lewis and Clark College), Don McLaughlin (West Virginia University),
Manish Parashar (Rutgers University), Charlie Peck (Earlham College),
Stephen C. Renk (North Central College), Rolfe Josef Sassenfeld (The University
of Texas at El Paso), Joseph Sloan (Wofford College), Michela Taufer (University
of Delaware), Pearl Wang (George Mason University), Bob Weems (University of
Texas at Arlington), and Cheng-Zhong Xu (Wayne State University).
I’m also deeply grateful to the following individuals for their reviews of various
chapters of the book: Duncan Buell (University of South Carolina), Matthias
Gobbert (University of Maryland, Baltimore County), Krishna Kavi (University of
North Texas), Hong Lin (University of Houston–Downtown), Kathy Liszka (University
of Akron), Leigh Little (The State University of New York), Xinlian Liu (Hood
College), Henry Tufo (University of Colorado at Boulder), Andrew Sloss (Consultant
Engineer, ARM), and Gengbin Zheng (University of Illinois). Their comments
and suggestions have made the book immeasurably better. Of course, I’m solely
responsible for remaining errors and omissions.
Kathy Liszka is also preparing slides that can be used by faculty who adopt
the text, and a former student, Jinyoung Choi, is working on preparing a solutions
manual. Thanks to both of them.
The staff of Morgan Kaufmann has been very helpful throughout this project. I’m
especially grateful to the developmental editor, Nate McFadden. He gave me much
valuable advice, and he did a terrific job arranging for the reviews. He’s also been
tremendously patient with all the problems I’ve encountered over the past few years.
Thanks also to Marilyn Rash and Megan Guiney, who have been very prompt and
efficient during the production process.
My colleagues in the computer science and mathematics departments at USF have
been extremely helpful during my work on the book. I’d like to single out Professor
Gregory Benson for particular thanks: his understanding of parallel computing—
especially Pthreads and semaphores—has been an invaluable resource for me. I’m
also very grateful to our system administrators, Alexey Fedosov and Colin Bean.
They’ve patiently and efficiently dealt with all of the “emergencies” that cropped up
while I was working on programs for the book.
I would never have been able to finish this book without the encouragement and
moral support of my friends Holly Cohn, John Dean, and Robert Miller. They helped
me through some very difficult times, and I’ll be eternally grateful to them.
My biggest debt is to my students. They showed me what was too easy, and what
was far too difficult. In short, they taught me how to teach parallel computing. My
deepest thanks to all of them.
About the Author
Peter Pacheco received a PhD in mathematics from Florida State University. After
completing graduate school, he became one of the first professors in UCLA’s “Program
in Computing,” which teaches basic computer science to students at the College
of Letters and Sciences there. Since leaving UCLA, he has been on the faculty of
the University of San Francisco. At USF Peter has served as chair of the computer
science department and is currently chair of the mathematics department.
His research is in parallel scientific computing. He has worked on the development
of parallel software for circuit simulation, speech recognition, and the simulation
of large networks of biologically accurate neurons. Peter has been teaching
parallel computing at both the undergraduate and graduate levels for nearly twenty
years. He is the author of Parallel Programming with MPI, published by Morgan
Kaufmann Publishers.

上架指导

计算机科学及应用

封底文字

毫无疑问，随着多核处理器和云计算系统的广泛应用，并行计算不再是计算世界中被束之高阁的偏门领域。并行性已经成为有效利用资源的首要因素，Peter Pacheco撰写的这本新教材对于初学者了解并行计算的艺术和实践很有帮助。
　　　　　　　　　　　　　　　　Duncan Buell，南卡罗来纳大学计算机科学与工程系
本书阐述了两个越来越重要的领域：使用Pthread和OpenMP进行共享式内存编程，以及使用MPI进行分布式内存编程。更重要的是，它通过指出可能出现的性能错误，强调好的编程实现的重要性。这本书在不同学科（包括计算机科学、物理和数学等）背景下介绍以上话题。各章节包含了难易程度不同的编程习题。对于希望学习并行编程技巧、扩展知识面的学生或专业人士来说，这是一本理想的参考书籍。
Leigh Little，纽约州立大学布罗科波特学院计算机科学系
本书是一本精心撰写的全面介绍并行计算的书籍。学生以及相关领域从业者会从书中的相关最新信息中获益匪浅。作者以通俗易懂的写作手法，结合各种有趣的实例使本书引人入胜。在并行计算这个瞬息万变、不断发展的领域里，本书深入浅出、全面涵盖了并行软件和硬件的方方面面。
Kathy J. Liszka，阿克隆大学计算机科学系
并行计算就是未来！本书通过实用而有益的例子，介绍了这门复杂的学科。
Andrew N. Sloss FBCS，顾问工程师，ARM公司，《ARM系统开发者指南》作者
并行编程已不仅仅是面向专业技术人员的一门学科。如果想要全面开发机群和多核处理器的计算能力，那么学习分布式内存和共享式内存的并行编程技术是不可或缺的。本书循序渐进地展示了如何利用MPI、PThread 和OpenMP开发高效的并行程序，教给读者如何开发、调试分布式内存和共享式内存的程序，以及对程序进行性能评估。

本书特色
 采用教程形式，从简短的编程实例起步，一步步编写更有挑战性的程序。
 重点介绍分布式内存和共享式内存的程序设计、调试和性能评估。
 使用MPI、PThread 和OpenMP等编程模型，强调实际动手开发并行程序。

作者简介

(美)Peter S. Pacheco 著　旧金山大学：Peter Pacheco 拥有佛罗里达州立大学数学专业博士学位。曾担任旧金山大学计算机系主任，目前是旧金山大学数学系主任。近20年来，一直为本科和研究生讲授并行计算课程。

图书目录

CHAPTER 1 Why Parallel Computing . . . . . . . . . . . . . . . . . 1
1.1 Why We Need Ever-Increasing Performance .................. 2
1.2 Why We’re Building Parallel Systems ......................... 3
1.3 Why We Need to Write Parallel Programs ..................... 3
1.4 How Do We Write Parallel Programs ......................... 6
1.5 What We’ll Be Doing ............................................. 8
1.6 Concurrent, Parallel, Distributed ................................ 9
1.7 The Rest of the Book ............................................. 10
1.8 A Word of Warning ............................................... 10
1.9 Typographical Conventions ...................................... 11
1.10 Summary ........................................................... 12
1.11 Exercises ........................................................... 12
CHAPTER 2 Parallel Hardware and Parallel Software .................... 15
2.1 Some Background................................................. 15
2.1.1 The von Neumann architecture . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Processes, multitasking, and threads . . . . . . . . . . . . . . . . . . 17
2.2 Modifications to the von Neumann Model ..................... 18
2.2.1 The basics of caching. . . . .. . . . . . . . . . 19
2.2.2 Cache mappings . . . . . . .. . . . . . . . . . . . 20
2.2.3 Caches and programs: an example . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Virtual memory . . . . . . . .　. . . . . . . . . . . . . 23
2.2.5 Instruction-level parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.6 Hardware multithreading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Parallel Hardware ................................................. 29
2.3.1 SIMD systems . . . . . .　. . . . . . . . . . . . . . . 29
2.3.2 MIMD systems . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Interconnection networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.4 Cache coherence . . . . .　. . . . . . . . . . . . . . . . . . . . . . . 43
2.3.5 Shared-memory versus distributed-memory . . . . . . . . . . 46
2.4 Parallel Software .................................................. 47
2.4.1 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.2 Coordinating the processes/threads. . . . . . . . . . . . . . . . . . . . 48
2.4.3 Shared-memory . . . . . . . . . . .　. . . . . . . . . . . . 49
2.4.4 Distributed-memory . . . . . . . . . . . . . . 53
2.4.5 Programming hybrid systems . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.5 Input and Output .................................................. 56
2.6 Performance ....................................................... 58
2.6.1 Speedup and efficiency . . .　. . . . . . . . . . . . . . . . . . . 58
2.6.2 Amdahl’s law . . . . . . . . . . . . . . . . . . . . . . . 61
2.6.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6.4 Taking timings . . . . . . . . . . . . . . . . . . . . . . . . 63
2.7 Parallel Program Design ......................................... 65
2.7.1 An example . . . . . . . . . . .. . . . . . . . . . . . . . 66
2.8 Writing and Running Parallel Programs ........................ 70
2.9 Assumptions ....................................................... 70
2.10 Summary ........................................................... 71
2.10.1 Serial systems . . . . . .　. . . . . . . . . . . . . . . . 71
2.10.2 Parallel hardware . . . .. . . . . . . . . . . . . 73
2.10.3 Parallel software . . . . . . . . . . . . . . . . . . . . . 74
2.10.4 Input and output . . . . . . . . . . . . . . . . . . . . . . 75
2.10.5 Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.10.6 Parallel program design . . . . . . . . . . . . . . . 76
2.10.7 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.11 Exercises ........................................................... 77
CHAPTER 3 Distributed-Memory Programming with MPI ................. 83
3.1 Getting Started..................................................... 84
3.1.1 Compilation and execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.1.2 MPI programs. . . . . . . . .. . . . . . . . . . . . . 86
3.1.3 MPI Init and MPI Finalize . . . . . . . . . . . . . . . . . . . . . 86
3.1.4 Communicators, MPI Comm size and MPI Comm rank. . . . . . . . . . . . . . . . . . . . . 87
3.1.5 SPMD programs . . . . . . .. . . . . . . . . 88
3.1.6 Communication . . . . . . . . .. . . . . . . . . . . . . 88
3.1.7 MPI Send.. . . . . . . . . 88
3.1.8 MPI Recv.. . . . . . . . . 90
3.1.9 Message matching. . . . . . . . . . . . . . . . . . . 91
3.1.10 The status p argument. . . . . . . . . . . . . 92
3.1.11 Semantics of MPI Send and MPI Recv . . . . . . . . . . . . . . . . 93
3.1.12 Some potential pitfalls. . . . . . . . . . . . . . 94
3.2 The Trapezoidal Rule in MPI .................................... 94
3.2.1 The trapezoidal rule. . . . . . . . . . . . . . . . . 94
3.2.2 Parallelizing the trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . 96
Contents xiii
3.3 Dealing with I/O .................................................. 97
3.3.1 Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.3.2 Input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.4 Collective Communication....................................... 101
3.4.1 Tree-structured communication. . . . . . . . . . . . . . . . . . . . . . . . 102
3.4.2 MPI Reduce. . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.4.3 Collective vs. point-to-point communications . . . . . . . . 105
3.4.4 MPI Allreduce. . . . . . . . . . . . . . . . . . . . . . 106
3.4.5 Broadcast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.4.6 Data distributions. . . . . . . . . . . . . . . . . . . . 109
3.4.7 Scatter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4.8 Gather. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.4.9 Allgather. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.5 MPI Derived Datatypes .......................................... 116
3.6 Performance Evaluation of MPI Programs..................... 119
3.6.1 Taking timings . . . . . . . . . . . . . . . . . . . .. . . 119
3.6.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.6.3 Speedup and efficiency. . . . . . . . . . . . . . 125
3.6.4 Scalability . . . . . . . . . . . . . . . . . . . .. . . . . . . . 126
3.7 A Parallel Sorting Algorithm .................................... 127
3.7.1 Some simple serial sorting algorithms . . . . . . . . . . . . . . . . 127
3.7.2 Parallel odd-even transposition sort . . . . . . . . . . . . . . . . . . . 129
3.7.3 Safety in MPI programs. . . . . . . . . . . . . 132
3.7.4 Final details of parallel odd-even sort . . . . . . . . . . . . . . . . . 134
3.8 Summary ........................................................... 136
3.9 Exercises ........................................................... 140
3.10 Programming Assignments ...................................... 147
CHAPTER 4 Shared-Memory Programming with Pthreads ................ 151
4.1 Processes, Threads, and Pthreads ............................... 151
4.2 Hello, World ....................................................... 153
4.2.1 Execution... . . . . . . . 153
4.2.2 Preliminaries.. . . . . 155
4.2.3 Starting the threads. . . . . . . . . . . . . . . . . . 156
4.2.4 Running the threads. . . . . . . . . . . . . . . . . 157
4.2.5 Stopping the threads. . . . . . . . . . . . . . . . . 158
4.2.6 Error checking.. . . 158
4.2.7 Other approaches to thread startup. 159
4.3 Matrix-Vector Multiplication .................................... 159
4.4 Critical Sections ................................................... 162
xiv Contents
4.5 Busy-Waiting ...................................................... 165
4.6 Mutexes ............................................................ 168
4.7 Producer-Consumer Synchronization and Semaphores....... 171
4.8 Barriers and Condition Variables ................................ 176
4.8.1 Busy-waiting and a mutex. . . . . . . . . . 177
4.8.2 Semaphores. . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.8.3 Condition variables. . . . . . . . . . . . . . . . . . 179
4.8.4 Pthreads barriers. . . . . . . . . . . . . . . . . . . . . 181
4.9 Read-Write Locks ................................................. 181
4.9.1 Linked list functions. . . . . . . . . . . . . . . . . 181
4.9.2 A multi-threaded linked list . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.9.3 Pthreads read-write locks. . . . . . . . . . . 187
4.9.4 Performance of the various implementations . . . . . . . . . 188
4.9.5 Implementing read-write locks . . . . . . . . . . . . . . . . . . . . . . . . 190
4.10 Caches, Cache Coherence, and False Sharing ................. 190
4.11 Thread-Safety...................................................... 195
4.11.1 Incorrect programs can produce correct output . . . . . . . 198
4.12 Summary ........................................................... 198
4.13 Exercises ........................................................... 200
4.14 Programming Assignments ...................................... 206
CHAPTER 5 Shared-Memory Programming with OpenMP ................ 209
5.1 Getting Started..................................................... 210
5.1.1 Compiling and running OpenMP programs. . . . . . . . . . . 211
5.1.2 The program. . . . . . . . . . . . . . . . . . . . . . . . . 212
5.1.3 Error checking. . . . . . . . . . . . . . . . . . . . . . . 215
5.2 The Trapezoidal Rule ............................................. 216
5.2.1 A first OpenMP version. . . . . . . . . . . . . 216
5.3 Scope of Variables ................................................ 220
5.4 The Reduction Clause ............................................ 221
5.5 The parallel for Directive .................................... 224
5.5.1 Caveats . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 225
5.5.2 Data dependences.. . . . . . . . . . . . . . . . . . . 227
5.5.3 Finding loop-carried dependences . . . . . . . . . . . . . . . . . . . . . 228
5.5.4 Estimating . . . . . . . . . . . . . . . . . . . . . . . . . 229
5.5.5 More on scope. . . . . . . . . . . . . . . . . . . . . . . 231
5.6 More About Loops in OpenMP: Sorting ....................... 232
5.6.1 Bubble sort. . . . . . . . . . . . . . . . . . . . . . . . . . . 232
5.6.2 Odd-even transposition sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
5.7 Scheduling Loops ................................................. 236
5.7.1 The schedule clause . . . . .　. . . . . . . . . . . . . . . . . 237
5.7.3 The dynamic and guided schedule types. . . . . . . . . . . . . 239
5.7.4 The runtime schedule type . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
5.7.5 Which schedule . . . . . . . . . . . . . . . . . . . . . . . . . 241
5.8 Producers and Consumers........................................ 241
5.8.1 Queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
5.8.2 Message-passing . . . . . . . . .. . . . . . . . . . . . . . 242
5.8.3 Sending messages . . . . . . .　. . . . . . . . . . 243
5.8.4 Receiving messages . . . . . . . .. . . . . . . . . . . 243
5.8.5 Termination detection . . . . . . . . . . . 244
5.8.6 Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
5.8.7 The atomic directive . . . . . . . . . . . . . . . . . . . . 245
5.8.8 Critical sections and locks . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
5.8.9 Using locks in the message-passing program . . . . . . . . . 248
5.8.10 critical directives, atomic directives,
or locks . . . . . . . . . . . . .　. . . . . . . . . . . 249
5.8.11 Some caveats. . . . . . . . . . . . . . . . . . . . . . 249
5.9 Caches, Cache Coherence, and False Sharing ................. 251
5.10 Thread-Safety...................................................... 256
5.10.1 Incorrect programs can produce correct output . . . . . . . 258
5.11 Summary ........................................................... 259
5.12 Exercises ........................................................... 263
5.13 Programming Assignments ...................................... 267
CHAPTER 6 Parallel Program Development ............................... 271
6.1 Two n-Body Solvers .............................................. 271
6.1.1 The problem. . . . . . .　. . . . . . . . . . . . 271
6.1.2 Two serial programs . . . . . . . . . . . . . . 273
6.1.3 Parallelizing the n-body solvers . . . . . . . . . . . . . . . . . . . . . . . 277
6.1.4 A word about I/O . . . . . . . . . . . . . . . . . . . 280
6.1.5 Parallelizing the basic solver using OpenMP . . . . . . . . . 281
6.1.6 Parallelizing the reduced solver using OpenMP. . . . . . 284
6.1.7 Evaluating the OpenMP codes . . . . . . . . . . . . . . . . . . . . . . . . . 288
6.1.8 Parallelizing the solvers using pthreads . . . . . . . . . . . . . . . 289
6.1.9 Parallelizing the basic solver using MPI . . . . . . . . . . . . . . 290
6.1.10 Parallelizing the reduced solver using MPI . . . . . . . . . . . 292
6.1.11 Performance of the MPI solvers . . . . . . . . . . . . . . . . . . . . . . . 297
6.2 Tree Search ........................................................ 299
6.2.1 Recursive depth-first search. . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
6.2.2 Nonrecursive depth-first search . . . . . . . . . . . . . . . . . . . . . . . . 303
6.2.3 Data structures for the serial implementations. . . . . . . . 305
6.2.6 A static parallelization of tree search using pthreads . . . .. . . . . . . . . . . . 309
6.2.7 A dynamic parallelization of tree search using pthreads . . . . . .. . . . . . . . . . 310
6.2.8 Evaluating the pthreads tree-search programs . . . . . . . . 315
6.2.9 Parallelizing the tree-search programs using OpenMP . . . . . . . . . . . . . 316
6.2.10 Performance of the OpenMP implementations . . . . . . . 318
6.2.11 Implementation of tree search using MPI and static partitioning . . . . . . . . . . . 319
6.2.12 Implementation of tree search using MPI and dynamic partitioning . . . . . . . . .　. . . . 327
6.3 A Word of Caution ................................................ 335
6.4 Which API ........................................................ 335
6.5 Summary ........................................................... 336
6.5.1 Pthreads and OpenMP. . . . . . . . . . . . . . . .. . . . . . . . . . . . 337
6.5.2 MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
6.6 Exercises ........................................................... 341
6.7 Programming Assignments ...................................... 350
CHAPTER 7 Where to Go from Here ....................................... 353
References ......................................................................... 357
Index ............................................................................... 361