计算机组成与设计:硬件/软件接口(英文版·第4版·ARM版)
作者 : (美)David A. Patterson; John L. Hennessy 著
丛书名 : 经典原版书库
出版日期 : 2010-04-19
ISBN : 978-7-111-30288-9
定价 : 95.00元
教辅资源下载
扩展信息
语种 : 英文
页数 : 735
开本 : 16
原书名 : 计算机组成与设计:硬件/软件接口,ARM版(英文版 第4版)
原出版社: Elsevier
属性分类: 教材
包含CD :
绝版 :
图书简介

计算机组成方面最著名的本科生教材,此版本为偏向嵌入式计算系统,基于ARM平台。与前几版一样,本书还采用了一个MIPS处理器来展示计算机硬件技术、流水线、存储器层次结构以及I/O等基本功能。此外,本书还包括一些关于x86架构的介绍。

图书特色

David A. Patterson 加州大学伯克利分校计算机科学系教授,美国国家工程研究院院士,IEEE和ACM会士,曾因成功的启发式教育方法被IEEE授予James H. Mulligan,Jr教育奖章。他因为对RISC技术的贡献而荣获1995年IEEE技术成就奖,而在RAID技术方面的成就为他赢得了1999年IEEE Reynold Johnson信息存储奖。2000年他和John L. Hennessy分享了John von Neumann奖。
John L. Hennessy 斯坦福大学校长,IEEE和ACM会士,美国国家工程研究院院士及美国科学艺术研究院院士。Hennessy教授因为在RISC技术方面做出了突出贡献而荣获2001年的Eckert-Mauchly奖章,他也是2001年Seymour Cray计算机工程奖得主,并且和David A. Patterson分享了2000年John von Neumann奖。 
“本版特别之处在于采用ARM取代了早先使用MIPS作为核心处理器来讲述计算机设计的基本原则,为本书增加了另一个层面的内涵。ARM作为嵌入式领域的主流处理器,在嵌入式计算领域具有非常重要的意义。本书弥补了现有教学体系中的空白,即有针对性地向学习嵌入式系统的学生讲授计算机组成的基本原理。同以往版本一样,本书仍主要介绍计算机硬件/软件接口,并巧妙地将其与嵌入式系统设计的基本知识相联系。”
作者简介
本书主要特点
采用ARMv6(ARM11系列)为主要架构来展示指令系统和计算机算术运算的基本功能。
覆盖从串行计算到并行计算的革命性变革,新增了关于并行化的一章,并且每章中还有一些强调并行硬件和软件主题的小节。
新增一个由NVIDIA的首席科学家和架构主管撰写的附录,介绍了现代GPU的出现和重要性,首次详细描述了这个针对可视计算进行了优化的高度并行化、多线程、多核的处理器。
描述一种度量多核性能的独特方法——“Roofline model”,自带benchmark测试和分析AMD Opteron X4、Intel Xeon 5000、Sun UltraSPARC T2和 IBM Cell的性能。
涵盖了一些关于闪存和虚拟机的新内容。
提供了大量富有启发性的练习题,内容达200多页。
将AMD Opteron X4和Intel Nehalem作为贯穿本书的实例。
用SPEC CPU2006组件更新了所有处理器性能实例。
这本最畅销的计算机组成书籍经过全面更新,关注现今发生在计算机体系结构领域的革命性变革:从单处理器发展到多核微处理器。此外,出版这本书的ARM版是为了强调嵌入式系统对于全亚洲计算行业的重要性,并采用ARM处理器来讨论实际计算机的指令集和算术运算,因为ARM是用于嵌入式设备的最流行的指令集架构,而全世界每年约销售40亿个嵌入式设备。与前几版一样,本书采用了一个MIPS处理器来展示计算机硬件技术、流水线、存储器层次结构以及I/O等基本功能。此外,本书还包括一些关于x86架构的介绍。

图书前言

The most beautiful thing we can experience is the mysterious. It is the source of all true art and science.
Albert Einstein, What I Believe, 1930
About This Book
We believe that learning in computer science and engineering should re. ect the current state of the .eld, as well as introduce the principles that are shaping computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, and, ultimately, the success of computer systems.
Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software at a variety of levels also offers a framework for under standing the fundamentals of computing. Whether your primary interest is hardware or software, computer science or electrical engineering, the central ideas in computer organization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers.
The recent switch from uniprocessor to multicore microprocessors con. rmed the soundness of this perspective, given since the .rst edition. While programmers could once ignore that advice and rely on computer architects, compiler writers, and silicon engineers to make their programs run faster without change, that era is now over. For programs to run faster, they must become parallel. While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming, it will take many years to realize this vision. Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run ef.ciently on parallel computers.
The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does.
About the Other Book
Some readers may be familiar with Computer Architecture: A Quantitative Approach, popularly known as Hennessy and Patterson. (This book in turn is often called Patterson and Hennessy.) Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs. We used an approach that combined examples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach. It was intended for the serious computing professional who wanted a detailed understanding of computers.
A majority of the readers for this book do not plan to become computer architects. The performance and energy ef.ciency of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, database programmers, and most other software engineers need a . rm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications.
Thus, we knew that this book had to be much more than a subset of the material in Computer Architecture, and the material was extensively revised to match the different audience. We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory material; hence, there is much less overlap today than with the .rst editions of both books.
About the ARM Edition
Our goal in designing the ARM edition of Computer Organization and Design, was to highlight the importance of embedded systems to the computing industry throughout Asia. We decided to feature the the ARM architecture, since ARM is the most popular instruction set architecture for embedded devices, with almost 4 billion devices sold each year. Speci.cally, we use the ARM core for exploring the instruction set and arithmetic operations of a real computer. As with previous editions, a MIPS processor is used to present the fundamentals of hardware technologies, pipelining, memory hierarchies, and I/O.
Changes for the Fourth Edition
We had .ve major goals for the fourth edition of Computer Organization and Design: given the multicore revolution in microprocessors, highlight parallel hardware and software topics throughout the book; streamline the existing material to make room for topics on parallelism; enhance pedagogy in general; update the

Chapter or appendix Sections Software focus Hardware focus
1. Computer Abstractions 1.1 to 1.9 and Technology
1.10 (History)
2.1 to 2.14
2.15 (Compilers & Java)
2. Instructions: Language
of the Computer 2.16 to 2.19
2.20 (History)
3.1 to 3.9
3. Arithmetic for Computers
3.10 (History)
B1.1 to B1.5 B-1, B-2, B-3— ARM References. B2.1 to B2.3 B3.1 to B3.8
C. The Basics of Logic Design
C.1 to C.13
4.1 (Overview)
4.2 (Logic Conventions)
4.3 to 4.4 (Simple Implementation)
4.5 (Pipelining Overview)
4. The Processor 4.6 (Pipelined Datapath)
4.7 to 4.9 (Hazards, Exceptions)
4.10 to 4.11 (Parallel, Real Stuff)
4.12 (Verilog Pipeline Control)
4.13 to 4.14 (Fallacies)
4.15 (History)
D. Mapping Control to Hardware
D.1 to D.6
5.1 to 5.8
5.9 (Verilog Cache Controller) 5. Large and Fast: Exploiting Memory Hierarchy 5.10 to 5.12
5.13 (History)
6.1 to 6.10
6.11 (Networks) Other I/O Topics 6.12 to 6.13
6.
Storage and
6.14 (History)
7.
Multicores, Multiprocessors, 7.1 to 7.13and Clusters
7.14 (History)
A. Graphics Processor Units A.1 to A.12 Read carefully
Read if have time
Reference
Review or read Read for culture
technical content to re.ect changes in the industry since the publication of the third edition in 2004; and restore the usefulness of exercises in this Internet age.
Before discussing the goals in detail, let’s look at the table on the next page. It shows the hardware and software paths through the material. Chapters 1, 4, 5, and 7 are found on both paths, no matter what the experience or the focus. Chapter 1 is a new introduction that includes a discussion on the importance of power and how it motivates the switch from single core to multicore microprocessors. It also includes performance and benchmarking material that was a separate chapter in the third edition. Chapter 2 is likely to be review material for the hardware-oriented, but it is essential reading for the software-oriented, especially for those readers interested in learning more about compilers and object-oriented programming languages. It includes material from Chapter 3 in the third edition, making it possible to cover the complete ARM architecture in a single chapter, minus the . oating-point instructions. Chapter 3 is for readers interested in constructing a datapath or in learning more about .oating-point arithmetic. It uses ARM instructions for the examples. Some will skip Chapter 3, either because they don’t need it or because it is a review. Chapter 4 combines two chapters from the third edition to explain pipelined processors. Sections 4.1, 4.5, and 4.10 give overviews for those with a software focus. Those with a hardware focus, however, will .nd that this chapter presents core material; they may also, depending on their background, want to read Appendix C on logic design .rst. Chapter 6 on storage is critical to readers with a software focus, and should be read by others if time permits. The last chapter on multicores, multiprocessors, and clusters is mostly new content and should be read by everyone.
The .rst goal was to make parallelism a .rst class citizen in this edition, as it was a separate chapter on the CD in the last edition. The most obvious example is Chapter 7. In particular, this chapter introduces the Roo.ine performance model, and shows its value by evaluating four recent multicore architectures on two kernels. This model could prove to be as insightful for multicore microprocessors as the 3Cs model is for caches. Given the importance of parallelism, it wasn’t wise to wait until the last chapter to talk about, so there is a section on parallelism in each of the preceding six chapters:
■ Chapter 1: Parallelism and Power. It shows how power limits have forced the industry to switch to parallelism, and why parallelism helps.
■ Chapter 2: Parallelism and Instructions: Synchronization. This section discusses locks for shared variables, speci.cally the ARM instruction SWP.
■ Chapter 3: Parallelism and Computer Arithmetic: Floating-Point Associativity. This section discusses the challenges of numerical precision and . oatingpoint calculations.
■ Chapter 4: Parallelism and Advanced Instruction-Level Parallelism. It covers advanced ILP—superscalar, speculation, VLIW, loop-unrolling, and OOO— as well as the relationship between pipeline depth and power consumption.
■ Chapter 5: Parallelism and Memory Hierarchies: Cache Coherence. It introduces coherency, consistency, and snooping cache protocols.
■ Chapter 6: Parallelism and I/O: Redundant Arrays of Inexpensive Disks. It describes RAID as a parallel I/O system as well as a highly available ICO system.
Chapter 7 concludes with reasons for optimism why this foray into parallelism should be more successful than those of the past.
I am particularly excited about the addition of an appendix on Graphical Processing Units written by NVIDIA’s chief scientist, David Kirk, and chief architect, John Nickolls. Appendix A is the .rst in-depth description of GPUs, which is a new and interesting thrust in computer architecture. The appendix builds upon the parallel themes of this edition to present a style of computing that allows the programmer to think MIMD yet the hardware tries to execute in SIMD-style whenever possible. As GPUs are both inexpensive and widely available—they are even found in many laptops—and their programming environments are freely available, they provide a parallel hardware platform that many could experiment with.
The second goal was to streamline the book to make room for new material in parallelism. The .rst step was simply going through all the paragraphs accumulated over three editions with a .ne-toothed comb to see if they were still necessary. The coarse-grained changes were the merging of chapters and dropping of topics. Mark Hill suggested dropping the multicycle processor implementation and instead adding a multicycle cache controller to the memory hierarchy chapter. This allowed the processor to be presented in a single chapter instead of two, enhancing the processor material by omission. The performance material from a separate chapter in the third edition is now blended into the . rst chapter.
The third goal was to improve the pedagogy of the book. Chapter 1 is now meatier, including performance, integrated circuits, and power, and it sets the stage for the rest of the book. Chapters 2 and 3 were originally written in an evolutionary style, starting with a “single celled” architecture and ending up with the full MIPS architecture by the end of Chapter 3. This leisurely style is not a good match to the modern reader. This edition merges all of the instruction set material for the integer instructions into Chapter 2—making Chapter 3 optional for many readers—and each section now stands on its own. The reader no longer needs to read all of the preceding sections. Hence, Chapter 2 is now even better as a reference than it was in prior editions. Chapter 4 works better since the processor is now a single chapter, as the multicycle implementation is a distraction today. Chapter 5 has a new section on building cache controllers, along with a new CD section containing the Verilog code for that cache.
The accompanying CD-ROM introduced in the third edition allowed us to reduce the cost of the book by saving pages as well as to go into greater depth on topics that were of interest to some but not all readers. Alas, in our enthusiasm to save pages, readers sometimes found themselves going back and forth between the CD and book more often than they liked. This should not be the case in this edition. Each chapter now has the Historical Perspectives section on the CD and four chapters also have one advanced material section on the CD. Additionally, all exercises are in the printed book, so .ipping between book and CD should be rare in this edition.
For those of you who wonder why we include a CD-ROM with the book, the answer is simple: the CD contains content that we feel should be easily and immediately accessible to readers no matter where they are. The CD contains all of the Appendixes as well as the advanced content and reference material.
This is a fast-moving .eld, and as is always the case for our new editions, an important goal is to update the technical content. The AMD Opteron X4 model 2356 (code named “Barcelona”) serves as a running example throughout the book, and is found in Chapters 1, 4, 5, and 7. Chapters 1 and 6 add results from the new power benchmark from SPEC. Chapters 2 and 3 illustrate instruction set architecture with the latest version of the ARM architecture. Chapter 5 adds a new section on Virtual Machines, which are resurging in importance. Chapter 5 has detailed cache performance measurements on the Opteron X4 multicore and a few details on its rival, the Intel Nehalem, which will not be announced until after this edition is published. Chapter 6 describes Flash Memory for the .rst time as well as a remarkably compact server from Sun, which crams 8 cores, 16 DIMMs, and 8 disks into a single 1U bit. It also includes the recent results on long-term disk failures. Chapter 7 covers a wealth of topics regarding parallelism—including multithreading, SIMD, vector, GPUs, performance models, benchmarks, multiprocessor networks—and describes three multicores plus the Opteron X4: Intel Xeon model e5345 (Clovertown), IBM Cell model QS20, and the Sun Microsystems T2 model 5120 (Niagara 2).
The .nal goal was to try to make the exercises useful to instructors in this Internet age, for homework assignments have long been an important way to learn material. Alas, answers are posted today almost as soon as the book appears. We have a two-part approach. First, expert contributors have worked to develop entirely new exercises for each chapter in the book. Second, most exercises have a qualitative description supported by a table that provides several alternative quantitative parameters needed to answer this question. The sheer number plus . exibility in terms of how the instructor can choose to assign variations of exercises will make it hard for students to find the matching solutions online. Instructors will also be able to change these quantitative parameters as they wish, again frustrating those students who have come to rely on the Internet to provide solutions for a static and unchanging set of exercises. We feel this new approach is a valuable new addition to the book—please let us know how well it works for you, either as a student or instructor!
We have preserved useful book elements from prior editions. To make the book work better as a reference, we still place de.nitions of new terms in the margins at their .rst occurrence. The book element called “Understanding Program Performance” sections helps readers understand the performance of their programs and how to improve it, just as the “Hardware/Software Interface” book element helped readers understand the tradeoffs at this interface. “The Big Picture” section remains so that the reader sees the forest even despite all the trees.“Check Yourself ” sections help readers to con.rm their comprehension of the material on the . rst time through with answers provided at the end of each chapter.

Instructor Support
We have created and collected a great deal of material to help instructors teach courses using the book including solutions to exercises, chapter quizzes, . gures from the book, lecture notes, lecture slides, and other materials. The publisher will provide access to this material to instructors who adopt the book for their courses.
Concluding Remarks
If you read the following acknowledgments section, you will see that we went to great lengths to correct mistakes. Since a book goes through many printings, we have the opportunity to make even more corrections. If you uncover any remaining, resilient bugs, please contact the publisher by electronic mail at cod4bugs@mkp.com or by low-tech mail using the address found on the copyright page. Please note in the subject line if you found the error in the Asian Edition. This fourth edition marks a break in the long-standing collaboration between Hennessy and Patterson, which started in 1989. The demands of running one of the world’s great universities meant that President Hennessy could no longer make the substantial commitment to create a new edition. The remaining author felt like a juggler who had always performed with a partner who suddenly is thrust on the stage as a solo act. Hence, the people in the acknowledgments and Berkeley colleagues played an even larger role in shaping the contents of this book. Nevertheless, this time around there is only one author to blame for the new material in what you are about to read.
Acknowledgments for the Fourth Edition
I’m very grateful to Andrew Sloss, Dominic Symes, and Chris Wright for their invaluable suggestions on what I should include about ARM in Chapters 2 and 3. The ARM appendixes are developed from the material in their book, ARM System’s Developer Guide, Designing and Optimizing System Software. They also took the time to read the chapter drafts of 2 and 3 for technical accuracy. Of course, any mistakes that remain are entirely my own.
I’d like to thank David Kirk, John Nickolls, and their colleagues at NVIDIA (Michael Garland, John Montrym, Doug Voorhies, Lars Nyland, Erik Lindholm, Paulius Micikevicius, Massimiliano Fatica, Stuart Oberman, and Vasily Volkov) for writingthe .rst in-depth appendix on GPUs.
I am also very grateful for the contributions of the many experts who developed the new exercises for this new edition. Writing good exercises is not an easy task, and each contributor worked long and hard to develop problems that are both challenging and engaging:

Chapter 1: Javier Bruguera (Universidade de Santiago de Compostela)


Chapter 2: John Oliver (Cal Poly, San Luis Obispo), with contributions from Nicole Kaiyan (University of Adelaide) and Milos Prvulovic (Georgia Tech). Ranjani Parthasarathi (Anna University) converted the MIPS-based exercises to their ARM equivalents for the ARM edition.


Chapter 3: Matthew Farrens (University of California, Davis). Ranjani Parthasarathi (Anna University) converted the original MIPS-based exercises to their ARM equivalents for the ARM edition.


Chapter 4: Milos Prvulovic (Georgia Tech)


Chapter 5: Jichuan Chang, Jacob Leverich, Kevin Lim, and Partha Ranganathan (all from Hewlett-Packard), with contributions from Nicole Kaiyan (University of Adelaide)


Chapter 6: Perry Alexander (The University of Kansas)


Chapter 7: David Kaeli (Northeastern University)


Peter Ashenden took on the Herculean task of editing and evaluating all of the new exercises. Moreover, he even added the substantial burden of developing the companion CD.
I relied on my Silicon Valley colleagues for much of the technical material that
this book relies upon:

AMD—for the details and measurements of the Opteron X4 (Barcelona): William Brantley, Vasileios Liaskovitis, Chuck Moore, and Brian Waldecker.


Intel—for the prereleased information on the Intel Nehalem: Faye Briggs.


Micron—for background on Flash Memory in Chapter 6: Dean Klein.


Sun Microsystems—for the measurements of the instruction mixes for the SPEC2006 benchmarks in Chapter 2 and details and measurements of the Sun Server x4150 in Chapter 6: Yan Fisher, John Fowler, Darryl Gove, Paul Joyce, Shenik Mehta, Pierre Reynes, Dimitry Stuve, Durgam Vahia, and David Weaver.


U.C. Berkeley—Krste Asanovic (who supplied the idea for software concurrency versus hardware parallelism in Chapter 7), James Demmel and Velvel Kahan (who commented on parallelism and . oating-point calculations), Zhangxi Tan (who designed the cache controller and wrote the Verilog for it in Chapter 5), Sam Williams (who supplied the roo. ine model



and the multicore measurements in Chapter 7), and the rest of my colleagues in the Par Lab who gave extensive suggestions and feedback on parallelism topics found throughout the book.
I am grateful to the many instructors who answered the publisher’s surveys, reviewed our proposals, and attended focus groups to analyze and respond to our plans for this edition. They include the following individuals: Focus Group: Mark Hill (University of Wisconsin, Madison), E.J. Kim (Texas A&M University), Jihong Kim (Seoul National University), Lu Peng (Louisiana State University), Dean Tullsen (UC San Diego), Ken Vollmar (Missouri State University), David Wood (University of Wisconsin, Madison), Ki Hwan Yum (University of Texas, San Antonio); Surveys and Reviews: Mahmoud Abou-Nasr (Wayne State University), Perry Alexander (The University of Kansas), Hakan Aydin (George Mason University), Hussein Badr (State University of New York at Stony Brook), Mac Baker (Virginia Military Institute), Ron Barnes (George Mason University), Douglas Blough (Georgia Institute of Technology), Kevin Bolding (Seattle Paci. c University), Miodrag Bolic (University of Ottawa), John Bonomo (Westminster College), Jeff Braun (Montana Tech), Tom Briggs (Shippensburg University), Scott Burgess (Humboldt State University), Fazli Can (Bilkent University), Warren R. Carithers (Rochester Institute of Technology), Bruce Carlton (Mesa Community College), Nicholas Carter (University of Illinois at Urbana-Champaign), Anthony Cocchi (The City University of New York), Don Cooley (Utah State University), Robert D. Cupper (Allegheny College), Edward W. Davis (North Carolina State University), Nathaniel J. Davis (Air Force Institute of Technology), Molisa Derk (Oklahoma City University), Derek Eager (University of Saskatchewan), Ernest Ferguson (Northwest Missouri State University), Rhonda Kay Gaede (The University of Alabama), Etienne M. Gagnon (UQAM), Costa Gerousis (Christopher Newport University), Paul Gillard (Memorial University of Newfoundland), Michael Goldweber (Xavier University), Georgia Grant (College of San Mateo), Merrill Hall (The Master’s College), Tyson Hall (Southern Adventist University), Ed Harcourt (Lawrence University), Justin E. Harlow (University of South Florida), Paul F. Hemler (Hampden-Sydney College), Martin Herbordt (Boston University), Steve J. Hodges (Cabrillo College), Kenneth Hopkinson (Cornell University), Dalton Hunkins (St. Bonaventure University), Baback Izadi (State University of New York—New Paltz), Reza Jafari, Robert W. Johnson (Colorado Technical University), Bharat Joshi (University of North Carolina, Charlotte), Nagarajan Kandasamy (Drexel University), Rajiv Kapadia, Ryan Kastner (University of California, Santa Barbara), Jim Kirk (Union University), Geoffrey S. Knauth (Lycoming College), Manish M. Kochhal (Wayne State), Suzan Koknar-Tezel (Saint Joseph’s University), Angkul Kongmunvattana (Columbus State University), April Kontostathis (Ursinus College), Christos Kozyrakis (Stanford University), Danny Krizanc (Wesleyan University), Ashok Kumar,
S. Kumar (The University of Texas), Robert N. Lea (University of Houston), Baoxin Li (Arizona State University), Li Liao (University of Delaware), Gary Livingston (University of Massachusetts), Michael Lyle, Douglas W. Lynn (Oregon Institute of Technology), Yashwant K Malaiya (Colorado State University), Bill Mark (University of Texas at Austin), Ananda Mondal (Cla. in University), Alvin Moser (Seattle University), Walid Najjar (University of California, Riverside), Danial J. Neebel (Loras College), John Nestor (Lafayette College), Joe Oldham (Centre College), Timour Paltashev, James Parkerson (University of Arkansas), Shaunak Pawagi (SUNY at Stony Brook), Steve Pearce, Ted Pedersen (University of Minnesota), Gregory D Peterson (The University of Tennessee), Dejan Raskovic (University of Alaska, Fairbanks) Brad Richards (University of Puget Sound), Roman Rozanov, Louis Rubin.eld (Villanova University), Md Abdus Salam (Southern University), Augustine Samba (Kent State University), Robert Schaefer (Daniel Webster College), Carolyn J. C. Schauble (Colorado State University), Keith Schubert (CSU San Bernardino), William L. Schultz, Kelly Shaw (University of Richmond), Shahram Shirani (McMaster University), Scott Sigman (Drury University), Bruce Smith, David Smith, Jeff W. Smith (University of Georgia, Athens), Philip Snyder (Johns Hopkins University), Alex Sprintson (Texas A&M), Timothy D. Stanley (Brigham Young University), Dean Stevens (Morningside College), Nozar Tabrizi (Kettering University), Yuval Tamir (UCLA), Alexander Taubin (Boston University), Will Thacker (Winthrop University), Mithuna Thottethodi (Purdue University), Manghui Tu (Southern Utah University), Rama Viswanathan (Beloit College), Guoping Wang (Indiana-Purdue University), Patricia Wenner (Bucknell University), Kent Wilken (University of California, Davis), David Wolfe (Gustavus Adolphus College), David Wood (University of Wisconsin, Madison), Mohamed Zahran (City College of New York), Gerald D. Zarnett (Ryerson University), Nian Zhang (South Dakota School of Mines & Technology), Jiling Zhong (Troy University), Huiyang Zhou (The University of Central Florida), Weiyu Zhu (Illinois Wesleyan University).
I would especially like to thank the Berkeley people who gave key feedback for Chapter 7 and Appendix A, which were the most challenging pieces to write for this edition: Krste Asanovic, Christopher Batten, Rastilav Bodik, Bryan Catanzaro, Jike Chong, Kaushik Data, Greg Giebling, Anik Jain, Jae Lee, Vasily Volkov, and Samuel Williams.
A special thanks also goes to Mark Smotherman for making multiple passes to .nd technical and writing glitches that signi.cantly improved the quality of this edition. He played an even more important role this time given that this edition was done as a solo act.
We wish to thank the extended Morgan Kaufmann family for agreeing to publish this book again under the able leadership of Denise Penrose. Nathaniel McFadden was the developmental editor for this edition and worked with me weekly on the contents of the book. Kimberlee Honjo coordinated the surveying of users and their responses.

Dawnmarie Simpson managed the book production process for the original Fourth Edition, Parveen Singh managed the production for the ARM edition. We thank also the many vendors who contributed to this volume, especially Alan Rose of Multiscience Press and diacriTech, our compositor for the original Fourth Edition. Ritesh Misri and Veena Kaul of Thomson Digital provided all the production services — copyedit to .nal pages — for the ARM edition.
The contributions of the nearly 200 people we mentioned here have helped make this fourth edition what I hope will be our best book yet. Enjoy!
David A. Patterson

上架指导

计算机\硬件

封底文字

“本版特别之处在于采用ARM取代了早先使用MIPS作为核心处理器来讲述计算机设计的基本原则,为本书增加了另一个层面的内涵。ARM作为嵌入式领域的主流处理器,在嵌入式计算领域具有非常重要的意义。本书弥补了现有教学体系中的空白,即有针对性地向嵌入式系统方向的学生讲授计算机组成的基础原理。同以往版本一样,本书仍主要介绍计算机硬件/软件接口,并巧妙地将其与嵌入式系统设计的基本知识相联系。”
——Ranjani Parthasarathi
Anna大学,印度钦奈
这本最畅销的计算机组成书籍经过全面更新,关注现今发生在计算机体系结构领域的革命性变革:从单处理器发展到多核微处理器。此外,出版这本书的ARM版是为了强调嵌入式系统对于全亚洲计算行业的重要性,并采用ARM处理器来讨论实际计算机的指令集和算术运算,因为ARM是用于嵌入式设备的最流行的指令集架构,而全世界每年约销售出40亿个嵌入式设备。与前几版一样,本书还采用了一个MIPS处理器来展示计算机硬件技术、流水线、存储器层次结构以及I/O等基本功能。此外,本书还包括一些关于x86架构的介绍。
本书特色:
采用ARMv6(ARM11系列)为主要架构来展示指令系统和计算机算术运算的基本功能。
覆盖从串行计算到并行计算的革命性变革,新增了关于并行化的一章,并且每章中还有一些强调并行硬件和软件主题的小节。
新增一个由NVIDIA的首席科学家和架构主管撰写的附录,介绍了现代GPU的出现和重要性,首次详细描述了这个针对可视计算进行了优化的高度并行化、多线程、多核的处理器。
描述一种度量多核性能的独特方法——“Roofline model”,自带benchmark测试和分析AMD Opteron X4、Intel Xeon 5000、Sun UltraSPARC T2和 IBM Cell的性能。
涵盖了一些关于闪存和虚拟机的新内容。
提供了大量富有启发性的练习题,内容达200多页。
将AMD Opteron X4和Intel Nehalem作为贯穿本书的实例。
用SPEC CPU2006组件更新了所有处理器性能实例。

图书目录

Preface xv
CHAPTERS
Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Below Your Program 10
1.3 Under the Covers 13
1.4 Performance 26
1.5 The Power Wall 39
1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors 41
1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X4 44
1.8 Fallacies and Pitfalls 51
1.9 Concluding Remarks 54
1.10 Historical Perspective and Further Reading 55
1.11 Exercises 56
Instructions: Language of the Computer 74
2.1 Introduction 76
2.2 Operations of the Computer Hardware 77
2.3 Operands of the Computer Hardware 80
2.4 Signed and Unsigned Numbers 86
2.5 Representing Instructions in the Computer 93
2.6 Logical Operations 100
2.7 Instructions for Making Decisions 104
2.8 Supporting Procedures in Computer Hardware 113
2.9 Communicating with People 122
2.10 ARM Addressing for 32-Bit Immediates and More Complex Addressing Modes 127
2.11 Parallelism and Instructions: Synchronization 133
2.12 Translating and Starting a Program 135
2.13 A C Sort Example to Put It All Together 143
    : This icon identi.es material on the CD
2.14 Arrays versus Pointers 152
2.15 Advanced Material: Compiling C and Interpreting Java 156
2.16 Real Stuff: MIPS Instructions 156
2.17 Real Stuff: x86 Instructions 161
2.18 Fallacies and Pitfalls 170
2.19 Concluding Remarks 171
2.20 Historical Perspective and Further Reading 174
2.21 Exercises 174
Arithmetic for Computers 214
3.1 Introduction 216
3.2 Addition and Subtraction 216
3.3 Multiplication 220
3.4 Division 226
3.5 Floating Point 232
3.6 Parallelism and Computer Arithmetic: Associativity 258
3.7 Real Stuff: Floating Point in the x86 259
3.8 Fallacies and Pitfalls 262
3.9 Concluding Remarks 265
3.10 Historical Perspective and Further Reading 268
3.11 Exercises 269
The Processor 284
4.1 Introduction 286
4.2 Logic Design Conventions 289
4.3 Building a Datapath 293
4.4 A Simple Implementation Scheme 302
4.5 An Overview of Pipelining 316
4.6 Pipelined Datapath and Control 330
4.7 Data Hazards: Forwarding versus Stalling 349
4.8 Control Hazards 361
4.9 Exceptions 370
4.10 Parallelism and Advanced Instruction-Level Parallelism 377
4.11 Real Stuff: the AMD Opteron X4 (Barcelona) Pipeline 390
4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 392
4.13 Fallacies and Pitfalls 393
4.14 Concluding Remarks 394
4.15 Historical Perspective and Further Reading 395
4.16 Exercises 395
Large and Fast: Exploiting Memory Hierarchy 436
5.1 Introduction 438
5.2 The Basics of Caches 443
5.3 Measuring and Improving Cache Performance 461
5.4 Virtual Memory 478
5.5 A Common Framework for Memory Hierarchies 504
5.6 Virtual Machines 511
5.7 Using a Finite-State Machine to Control a Simple Cache 515
5.8 Parallelism and Memory Hierarchies: Cache Coherence 520
5.9 Advanced Material: Implementing Cache Controllers 524
5.10 Real Stuff: the AMD Opteron X4 (Barcelona) and Intel Nehalem Memory Hierarchies 525
5.11 Fallacies and Pitfalls 529
5.12 Concluding Remarks 533
5.13 Historical Perspective and Further Reading 534
5.14 Exercises 534
Storage and Other I/O Topics 554
6.1 Introduction 556
6.2 Dependability, Reliability, and Availability 559
6.3 Disk Storage 561
6.4 Flash Storage 566
6.5 Connecting Processors, Memory, and I/O Devices 568
6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System 572
6.7 I/O Performance Measures: Examples from Disk and File Systems 582
6.8 Designing an I/O System 584
6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks 585
6.10 Real Stuff: Sun Fire x4150 Server 592
6.11 Advanced Topics: Networks 598
6.12 Fallacies and Pitfalls 599
6.13 Concluding Remarks 603
6.14 Historical Perspective and Further Reading 604
6.15 Exercises 605
Multicores, Multiprocessors, and Clusters 616
7.1 Introduction 618
7.2 The Dif.culty of Creating Parallel Processing Programs 620
7.3 Shared Memory Multiprocessors 624
7.4 Clusters and Other Message-Passing Multiprocessors 627
7.5 Hardware Multithreading 631
7.6 SISD, MIMD, SIMD, SPMD, and Vector 634
7.7 Introduction to Graphics Processing Units 640
7.8 Introduction to Multiprocessor Network Topologies 646
7.9 Multiprocessor Benchmarks 650
7.10 Roo.ine: A Simple Performance Model 653
7.11 Real Stuff: Benchmarking Four Multicores Using the Roo. ine Model 661
7.12 Fallacies and Pitfalls 670
7.13 Concluding Remarks 672
7.14 Historical Perspective and Further Reading 674
7.15 Exercises 674 Index I-1
CD-ROM CONTENT
Graphics and Computing GPUs A-2
A.1 Introduction A-3
A.2 GPU System Architectures A-7
A.3 Scalable Parallelism – Programming GPUs A-12
A.4 Multithreaded Multiprocessor Architecture A-25
A.5 Parallel Memory System G.6 Floating Point A-36
A.6 Floating Point Arithmetic A-41
A.7 Real Stuff: The NVIDIA GeForce 8800 A-46
A.8 Real Stuff: Mapping Applications to GPUs A-55
A.9 Fallacies and Pitfalls A-72
A.10 Concluding Remarks A-76
A.11 Historical Perspective and Further Reading A-77
ARM and Thumb Assembler Instructions B1-2
B1.1 Using This Appendix B1-3 B1.2 Syntax B1-4 B1.3 Alphabetical List of ARM and Thumb Instructions B1-8 B1.4 ARM Assembler Quick Reference B1-49 B1.5 GNU Assembler Quick Reference B1-60
ARM and Thumb Instruction Encodings B2-2
B2.1 ARM Instruction Set Encodings B2-3
B2.2 Thumb Instruction Set Encodings B2-9
B2.3 Program Status Registers B2-11

Instruction Cycle Timings B3-2
B3.1 Using the Instruction Set Cycle Timing Tables B3-3 B3.2 ARM7TDMI Instruction Cycle Timings B3-5 B3.3 ARM9TDMI Instruction Cycle Timings B3-6 B3.4 StrongARM1 Instruction Cycle Timings B3-8 B3.5 ARM9E Instruction Cycle Timings B3-9 B3.6 ARM10E Instruction Cycle Timings B3-11 B3.7 Intel XScale Instruction Cycle Timings B3-12 B3.8 ARM11 Cycle Timings B3-14
C The Basics of Logic Design C-2
C.1 Introduction C-3
C.2 Gates, Truth Tables, and Logic Equations C-4
C.3 Combinational Logic C-9
C.4 Using a Hardware Description Language C-20
C.5 Constructing a Basic Arithmetic Logic Unit C-26
C.6 Faster Addition: Carry Lookahead C-38
C.7 Clocks C-48
C.8 Memory Elements: Flip-Flops, Latches, and Registers C-50
C.9 Memory Elements: SRAMs and DRAMs C-58
C.10 Finite-State Machines C-67
C.11 Timing Methodologies C-72
C.12 Field Programmable Devices C-78
C.13 Concluding Remarks C-79
C.14 Exercises C-80
D Mapping Control to Hardware D-2
D.1 Introduction D-3
D.2 Implementing Combinational Control Units D-4
D.3 Implementing Finite-State Machine Control D-8
D.4 Implementing the Next-State Function with a Sequencer D-22
D.5 Translating a Microprogram to Hardware D-28
D.6 Concluding Remarks D-32
D.7 Exercises D-33
ADVANCED CONTENT
Section 2.15 Compiling C and Interpreting Java Section 4.12 An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations Section 5.9 Implementing Cache Controllers Section 6.11 Networks
HISTORICAL PERSPECTIVES & FURTHER READING
Chapter 1 Computer Abstractions and Technology: Section 1.10 Chapter 2 Instructions: Language of the Computer: Section 2.20 Chapter 3 Arithmetic for Computers: Section 3.10 Chapter 4 The Processor: Section 4.15 Chapter 5 Large and Fast: Exploiting Memory Hierarchy: Section 5.13 Chapter 6 Storage and Other I/O Topics: Section 6.14 Chapter 7 Multicores, Multiprocessors, and Clusters: Section 7.14 Appendix A Graphics and Computing GPUs: Section A.11
TUTORIALS
VHDL
Verilog

SOFTWARE
Xilinx FPGA Design, Simulation and Synthesis Software QEMU http://www.nongnu.org/qemu/about.html
Glossary G-1 Index I-1 Further Reading FR-1

教学资源推荐
作者: 陈仪香 陈彦辉 编著
作者: [美]威廉 J.达利(William J. Dally)R.柯蒂斯·哈廷(R. Curtis Harting) 著
作者: 李林功 吴飞青 王兵 丁晓
参考读物推荐
作者: [英]姚文祥(Joseph Yiu) 著
作者: NVIDIA 技术服务(北京)有限公司 著
作者: (美)Sun Microsystems, Inc