搜索引擎:信息检索实践(英文版)
作者 : (美)W. Bruce Croft Donald Metzler Trevor Strohman著
丛书名 : 经典原版书库
出版日期 : 2009-09-27
ISBN : 978-7-111-28247-1
定价 : 45.00元
教辅资源下载
扩展信息
语种 : 英文
页数 : 534
开本 : 32
原书名 : Search Engines Information Retrieval in Practice
原出版社: Pearson Education Asia
属性分类: 教材
包含CD :
绝版 :
图书简介

本书介绍了信息检索(IR)中的关键问题,以及这些问题如何影响搜索引擎的设计与实现,并且用数学模型强化了重要的概念。对于网络搜素引擎这一重要的话题,主要涵盖了在网络上使用的搜索技术。

图书前言

This book provides an overview of the important issues in information retrieval, and how those issues affect the design and implementation of search engines. Not every topic is covered at the same level of detail. We focus instead on what we consider to be the most important alternatives to implementing search engine components and the information retrieval models underlying them. Web search engines are obviously a major topic, and we base our coverage primarily on the technology we all use on the Web,l but search engines are also used in many other applications. That is the reason for the strong emphasis on the information retrieval theories and concepts that underlie all search engines.
  The target audience for the book is primarily undergraduates in computer science or computer engineering, but graduate students should also find this useful. We also consider the book to be suitable for most students in information science programs. Finally, practicing search engineers should benefit from the book, whatever their background. There is mathematics in the book, but nothing too esoteric. There are also code and programming exercises in the book, but nothing beyond the capabilities of someone who has taken some basic computer science and programming classes.

上架指导

计算机科学及应用

封底文字

本书介绍了信息检索(IR)中的关键问题,以及这些问题如何影响搜索引擎的设计与实现,并且用数学模型强化了重要的概念。对于网络搜素引擎这一重要的话题,主要涵盖了在网络上使用的搜索技术。
本书适用于计算机科学或计算机工程专业的本科生、研究生,对于专业人士而言,本书也不失为一本理想的入门教材。
本书网站(http://www.search-engines-book.com/)给出了PDF和PPT两种形式的幻灯片、勘误等。

作者简介

(美)W. Bruce Croft Donald Metzler Trevor Strohman著:W. Bruce Croft 马萨诸塞大学Amherst分校计算机科学特聘教授、ACM会士。他创建了智能信息检索研究中心,发表了200余篇论文,获奖无数,包括2003年由ACM SIGIR颁发的Gerard Salton奖。 Donald Metzler 马萨诸塞大学Amherst分校博士,是位于加州Santa Clara的雅虎搜索与计算广告组的研究科学家。 Trevor Strohman 马萨诸塞大学Amherst分校博士,是Google搜索质量部门的软件工程师。他开发了Galago搜索引擎,也是Indri搜索引擎的主要开发者。

图书目录

1 Search Engines and Information Retrieval
1.1 What Is Information Retrieval
1.2 The Big Issues
1.3 Search Engines
1.4 Search Engineers

2 Architecture of a Search Engine
2.1 What Is an Architecture
2.2 Basic Building Blocks
2.3 Breaking It Down
2.3.1 Text Acquisition
2.3.2 Text Transformation
2.3.3 Index Creation
2.3.4 User Interaction
2.3.5 Ranking
2.3.6 Evaluation
2.4 How Does It Really Work

3 Crawls and Feeds
3.1 Deciding What to Search
3.2 Crawling the Web
3.2.1 Retrieving Web Pages
3.2.2 The Web Crawler
3.2.3 Freshness
3.2.4 Focused Crawling
3.2.5 Deep Web
3.2.6 Sitemaps
3.2.7 Distributed Crawling
3.3 Crawling Documents and Email
3.4 Document Feeds
3.5 The Conversion Problem
3.5.1 Character Encodings
3.6 Storing the Documents
3.6,1 Using a Database System
3.6.2 Random Access
3.6.3 Compression and Large Files
3.6.4 Update
3.6.5 BigTable
3.7 Detecting Duplicates
3.8 Removing Noise

4 Processing Text
4.1 From Words to Terms
4.2 Text Statistics
4.2.1 Vocabulary Growth
4.2.2 Estimating Collection and Result Set Sizes
4.3 Document Parsing
4.3.1 Overview
4.3.2 Tokenizing
4.3.3 Stopping
4.3.4 Stemming
4.3.5 Phrases and N-grams
4.4 Document Structure and Markup
4.5 Link Analysis
4.5.1 Anchor Text
4.5.2 PageRank
4.5.3 Link Quality
4.6 Information Extraction
4.6.1 Hidden Markov Models for Extraction
4.7 Internationalization

5 Ranking with Indexes
5.1 Overview
5.2 Abstract Model of Ranking
5.3 Inverted Indexes
5.3.1 Documents
5.3.2 Counts
5.3.3 Positions
5.3A Fields and Extents
5.3.5 Scores
5.3.6 Ordering
5.4 Compression
5.4.1 Entropy and Ambiguity
5.4.2 Delta Encoding
5.4.3 Bit-Aligned Codes
5.4.4 Byte-Aligned Codes
5.4.5 Compression in Practice
5.4.6 Looking Ahead
5.4.7 Skipping and Skip Pointers
5.5 Auxiliary Structures
5.6 Index Construction
5.6.1 Simple Construction
5.6.2 Merging
5.6.3 Parallelism and Distribution
5.6.4 Update
5.7 Query Processing
5.7.1 Document-at-a-time Evaluation
5.7.2 Term-at-a-time Evaluation
5.7.3 Optimization Techniques
5.7.4 Structured Queries
5.7.5 Distributed Evaluation
5.7.6 Caching

6 Queries and Interfaces
6.1 Information Needs and Queries
6.2 Query Transformation and Refinement
6.2.1 Stopping and Stemming Revisited
6.2.2 Spell Checking and Suggestions
6.2.3 Query Expansion
6.2.4 Relevance Feedback
6.2.5 Context and Personalization
6.3 Showing the Results
6.3.1 Result Pages and Snippets
6.3.2 Advertising and Search
6.3.3 Clustering the Results
6.4 Cross-Language Search

7 Retrieval Models
7.1 Overview of Retrieval Models
7.1.1 Boolean Retrieval
7.1.2 The Vector Space Model
7.2 Probabilistic Models
7.2.1 Information Retrieval as Classification
7.2.2 The BM25 Ranking Algorithm
7.3 Ranking Based on Language Models
7.3.1 Query Likelihood Ranking
7.3.2 Relevance Models and Pseudo-Relevance Feedback
7.4 Complex Queries and Combining Evidence
7.4.1 The Inference Network Model
7.4.2 The Galago Query Language
7.5 Web Search
7.6 Machine Learning and Information Retrieval
7.6.1 Learning to Rank
7.6.2 Topic Models and Vocabulary Mismatch
7.7 Application-Based Models

8 Evaluating Search Engines
8.1 Why Evaluate
8.2 The Evaluation Corpus
8.3 Logging
8.4 Effectiveness Metrics
8.4.1 Recall and Precision
8.4.2 Averaging and Interpolation
8.4.3 Focusing on the Top Documents
8.4.4 Using Preferences
……
9 Classification and Clustering
10 Social Search
11 Beyond Bag of Words
Reverences
Index

教学资源推荐
作者: (美)Gary B. Shelly , Thomas J. Cashman,Harry J. Rosenblatt 著 史晟辉 王艳清 李芳 耿志强 等译
作者: David E.Culler, Jaswinder Pal Singh, Anoop Gupta
作者: (美) Hector Garcia-Molina (斯坦福大学) Jeffrey D. Ullman (斯坦福大学) Jennifer Widom(斯坦福大学)著
作者: 赵淑芬 主编  康宇光 副主编
参考读物推荐
作者: 林奕辰
作者: 软考新大纲研究组编著
作者: (美)Joseph J.Bambara Paul R.Allen等