Jeff Dean-程序员百科全书- 科技百科 -互联网博物馆，你我的知识加油站

Jeff Dean，Google的软件架构天才。Google大型并发编程框架Map/Reduce作者。

Jeff Dean

在Google，公司最顶尖的编程高手Jeff Dean曾发明过一种先进的方法，该方法可以让一个程序员在几分钟内完成以前需要一个团队做几个月的项目。他还发明了一种神奇的计算机语言，可以让程序员同时在上万台机器上用最短的时间完成极为复杂的计算任务。

Jeff Dean于1999年加入Google，目前是Google系统架构小组的成员。他在Google主要负责开发Google的网页抓取、索引、查询服务以及广告系统等，他对搜索质量实现了多次改进，并实现了Google分布式计算架构的多个部分。

在加入Google之前，他工作于DEC/Compaq的Western实验室，主要从事软件分析工具、微处理器架构以及信息检索等方面的研究。他于1996年在华盛顿大学获得了博士学位，与Craig Chambers一起从事面向对象语言的编译器优化技术方面的研究。在毕业之前，他还在世界卫生组织的艾滋病全球规划署工作过。

[显示部分][显示全部]

个人自述编辑本段 回目录

Jeffrey Dean

Jeff Dean

Google Fellow

I joined Google in mid-1999, and I'm currently a Google Fellow in the Systems Infrastructure Group. My areas of interest include large-scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways. While at Google, I've worked on the following projects:

The design and implementation of the initial version of Google's advertising serving system.

The design and implementation of five generations of our crawling, indexing, and query serving systems, covering two and three orders of magnitude growth in number of documents searched, number of queries handled per second, and frequency of updates to the system. I recently gave a talk at WSDM'09 about some of the issues involved in building large-scale retrieval systems (slides).

The initial development of Google's AdSense for Content product (involving both the production serving system design and implementation as well as work on developing and improving the quality of ad selection based on the contents of pages).

The development of Protocol Buffers, a way of encoding structured data in an efficient yet extensible format, and a compiler that generates convenient wrappers for manipulating the objects in a variety of languages. Protocol Buffers are used extensively at Google for almost all RPC protocols, and for storing structured information in a variety of persistent storage systems. A version of the protocol buffer implementation has been open-sourced and is available at http://code.google.com/p/protobuf/.

Some of the initial production serving system work for the Google News product, working with Krishna Bharat to move the prototype system he put together into a deployed system.

Some aspects of our search ranking algorithms, notably improved handling for dealing with off-page signals such as anchortext.

The design and implementation of the first generation of our automated job scheduling system for managing a cluster of machines.

The design and implementation of prototyping infrastructure for rapid development and experimentation with new ranking algorithms.

The design and implementation of MapReduce, a system for simplifying the development of large-scale data processing applications. A paper about MapReduce appeared in OSDI'04.

The design and implementation of BigTable, a large-scale semi-structured storage system used underneath a number of Google products. A paper about BigTable appeared in OSDI'06.

Some of the production system design for Google Translate, our statistical machine translation system. In particular, I designed and implemented a system for distributed high-speed access to very large language models (too large to fit in memory on a single machine).

Some internal tools to make it easy to rapidly search our internal source code repository. Many of the ideas from this internal tool were incorporated into our Google Code Search product, including the ability to use regular expressions for searching large corpora of source code.
I enjoy developing software with great colleagues, and I've been fortunate to have worked with many wonderful and talented people on all of my work here at Google. To help ensure that Google continues to hire people with excellent technical skills, I've also been fairly involved in our engineering hiring process.
I received a Ph.D. in Computer Science from the University of Washington, working with Craig Chambers on whole-program optimization techniques for object-oriented languages in 1996. I received a B.S., summa cum laude from the University of Minnesota in Computer Science & Economics in 1990. From 1996 to 1999, I worked for Digital Equipment Corporation's Western Research Lab in Palo Alto, where I worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, I worked for the World Health Organization's Global Programme on AIDS, developing software to do statistical modelling, forecasting, and analysis of the HIV pandemic.

In 2009, I was elected to the National Academy of Engineering.

I've lived in lots of places in my life: Honolulu, HI; Manila, The Phillipines; Boston, MA; West Nile District, Uganda; Boston (again); Little Rock, AR; Hawaii (again); Minneapolis, MN; Mogadishu, Somalia; Atlanta, GA; Minneapolis (again); Geneva, Switzerland; Seattle, WA; and (currently) Palo Alto, CA. I'm hard-pressed to pick a favorite, though: each place has its plusses and minuses.
One of my life goals is to play soccer and basketball on every continent. So far, I've done so in North America, South America, Europe, Asia, and Africa. I'm worried that Antarctica might be tough, though.

搜索引擎名人堂之Jeff Dean编辑本段 回目录

最近一直在研究Nutch，所以关注到了搜索引擎界的一些名人，以示榜样。

Jeff Dean

看了《程序员》2008年9月刊，第一篇文章就是介绍这位神人的，他就是Google的软件架构师Jeff Dean。

我们在工作和生活中都会用到google等搜索引擎，而在那异常简洁的页面和搜索按钮后面都进行着大量的计算。每一个google的用户都在享受着这个软件架构天才对于搜索引擎的贡献。

他曾参与了 google爬虫、索引、搜索服务、广告等系统的设计，还设计了MapReduce以及BigTable等分布式架构。Jeff Dean在google这个平台上充分发挥其软件天才，创造出一个又一个另世人惊叹的软件，我们当然要记得 larry page和Sergey Brin，但对于一个如此成功的企业来讲需要的是更多的天才来贡献其价值。

Google就好像一台和布加迪一样快的航母，Jeff Dean和他的天才同事们还可以设计出来的更好发动机引擎，更大的容量，更快反应器。

能够深入了解MapReduce以及BigTable的核心运行机制，是一件另所有搜索引擎技术人员都很兴奋的事情。从现在的Java开源搜索引擎领域里来看，还都还在追赶着google的脚步，现在的开源领域也有很多天才的架构师，我们有幸能涉足这个领域，也不得不感谢这些与Jeff一样出色的人。

Jeff Dean从侧面来看还蛮帅的，包装一下不比布拉德皮特差，哈哈。

再来一张正面照！

Jeff Dean 编辑本段 回目录

今天记述一位人物，就是台上的这位。相信认识和了解他的人不多。能叫出名字吗？

Jeff Dean

Jeff Dean。没有几个人能交出他的名字，因为他不是娱乐圈的。虽然长得比大部分“明星”帅，也比他们优秀。但上镜的机会简直成反比。所以大家不知道他，所以我在我的博客里来介绍他。他是搞软件的，但他不是盖茨。不是首富，所以大家还是不认识他。可以看出，现在的娱乐圈和财富圈吸引着大部分人的眼球。虽然我们不认识他，但肯定认识这张图片背景的标识。Google。我们几乎离不开的搜索引擎。Google简介的界面却给我们提供了众多的优秀的服务。可想象其后台，一定不是我们点个按钮那么简单。我们每点一次按钮，后台就要进行大量的计算。Jeff Dean就是搞这个的，他是Google的软件架构天才。我们使用的大部分服务都有其的贡献，所以我们应该感谢这个人，我们就是对前台关注的多了，对后台关注的少了，对华丽的东西关注的多了，对朴实的东西关注的少了。这是不对的。我们能吃饱，不挨饿。应该感谢毛泽东、感谢袁隆平，我们顶礼膜拜不为过。

我之所以关注这个人，有独特的情愫在里边。因为我对软件有难割舍的情愫，但现实没能成全我从事软件，所以就时不时偷偷窥探下软件界的人和事，算做对软件的yy吧，聊以自慰。接下来的软件业或者说互联网业，的核心发展方向就是并行计算和分布存储。他正是抓住了这个趋势，又有才华，又有舞台的一个人。但他这个舞台看的人少，用的人多。喜欢他还有一点。大家看他的体型，他还是蛮健美蛮阳光的一个男人嘛，长的比盖茨帅多了，也不是我们想象中的那种秃顶、大肚子的程序员。他也喜欢打篮球。哈哈，大家知道我为什么喜欢他了吧。因为他和我有着共同的爱好。我们都喜欢软件和篮球。对于软件，他就好比赛场上的球星，我就是一个铁杆的球迷。对于篮球，不清楚他对篮球理解的有多深刻，也不晓得他的球技，但他的梦想是在地球上的每个大陆上玩篮球。所以我的梦想是能和他在同一个场上玩篮球。

Google用1000台服务器处理搜索结果编辑本段 回目录

在WSDM 2009会议上，Google讲述了过去十年间的进步。目前Google搜索引擎的快速反应让用户满意，不过到底有多少台服务器在处理Google搜索结果？Google研究员Jeff Dean做出了解答，1000台。

据他表示，仅仅在服务顶端用于查询和获取搜索结果的服务器，就有上千台之多。从查询到反馈回搜索结果，只需要不到200毫秒。

Jeff表示，“它们的性能表现让人满意，不到200毫秒就能反馈回结果”。这意味着上千台机器完全在内存中交换索引页面，搜索结果几乎瞬间可见。

他说，Google快速增长的年份是在1999年至2009年，搜索结果的查询周期从1000毫秒下降到了200毫秒。现在处理查询结果的机器在 1000的倍数级，不过Google放弃了把服务器数量提升至10000的倍数级的打算。如果数量达到这种级别，那么网页上的任何改动，在数分钟内就可以在搜索引擎中体现。

谷歌自爆数据中心基础设施编辑本段 回目录

2009-06-29：了Google一向很少对外透露其数据中心的工作，但5月28日，Google伙伴Jeff Dean在Google I/O会议的听众前，轻轻撩起了Google公司基础设施的神秘面紗。　　一方面，Google用的是一般的服务器、处理器、硬盘、软驱等等。另一方面，Dean似乎认为1800台服务器也是非常普通、不值得一提。而Google公司使用的软件，能在半秒之內回应700至1000台服务器的搜索請求，则完全是另一回事。

Jeff Dean

　　Google从未透露他们究竟拥有多少台服务器，但Dean认为至少不下数十万台。Dean表示，每個机柜里存放了大约40台服务器。而根据某项估计，Google目前在全球有36個数据中心，以每个中心有150个机柜计算，Google的服务器至少超过20万台，而实际数字还要比这大得多，且每天都在增加中。
　　不论真正的数字有多少，Google的成就也实在惊人，部分原因是他们推翻了电脑业的传统做法。当所有的超大型数据中心，如纽约股票交易所或航空公司的联合订位系统都是采用许多主流服务器和软件系统的时候，Google的数据中心绝大部分却是自身的技术建设而成。
　　有些制造和出售服务器的公司虽然不以为然，但Google显然相信自己的技术命运最好操纵在自己手中。Google搜索产品与使用者经验副总裁搜Marissa Mayer在5月29日的演讲中提到，共同创办人LarryPage鼓励员工对“不可能的事情”保持一种健康的不敬。也就是说，别太相信有什么不可能的事情。
　　要维持如此大规模的运作，Google必须对每一台机器都抱有一种随时可牺牲的态度，服务器制造商喜欢宣传他们的主机质量优越、具有高度承受故障或当机的能力，但Google仍然宁愿把钱投资在冗余软件系统上。
　　Dean表示：“我们的观点是，拥有两倍数量但比较不可靠的硬件，胜过数量一半但比较可靠的硬件。你必须为软件提供可靠保障，如果你有1万台主机在运作，每天一定会有一些意外。”
　　Dean说，每次新业务上线最能显示出硬件的脆弱。一般每个新业务上线的第一年，通常会发生1000次个别主机的故障、数千次硬盘故障;一次电力输送问题，会导致500至1000太主机失效约6小时;20次机柜损坏，每次会造成40至80台主机下线;5次机柜摇晃，会导致一半的网络封包在传送过程中遗失;整个业务至少一次重新上线，在两天之内的任何时间，影响5%到主机。整个业务中还有一半的几率会过热，可能导致5分钟内让几乎所有服务器当机，恢复则需要花费1到2天地时间。
　　虽然Google用一般的硬件组装其服务器，却不用传统的封装，他们要求英特尔提供特制的主机板。Dean表示，Google目前在每40台服务器的机柜外，都包了一层外壳，这是Google自行开发的设计，而不是服务器厂商提供的外壳。
　　Dean表示，Google使用了几种服务器组装的方式，有些配备了很多硬盘，有点则数量比较少。还有一些大范围的差异，他说：“我们不同的数据中心都有一些差异，但数据中心内部不会。”
　　对于服务器本身，Google偏好使用多核心晶片。许多习惯追求运算速度的软件公司其实很难适应多核心的晶片，但Google沒有这种问题。他们在技术上早就必须适应横跨数万台电脑的结构，因此他们已经进入平行运算的世界。
　　Dean说：“我们真的很喜欢使用多核心主机。对我们而言，多核心主机就像很多相互连接、性能优越的小机器，对我们来说相对好用。”

Jeff Dean

　　虽然Google对搜索和其他服务都要求快速回应，其平行运算能在单一指令的执行相对较慢时产生快速回应的结果。这对于多核心处理器和多线程模式设计者是一大鼓励。Dean說：“单线程的表现对我们来说无关紧要，我们有很多平行化的问题。”
　　那么Google要如何处理这些一般的硬件问题呢?用软件。
　　Dean说明了Google软件的三个核心要素：GFS(Google档案系统)、BigTable和MapReduce演算法。虽然Google资助了许多有助于其开展的开放源代码的计划，这些仍然属于专有软件。
　　Dean表示，三者中级别最低的GFS几乎在所有主机中运作，负责储存资料。某些GFS的化身是“许多petabyte大小”的档案系统。目前有超过200个业务在执行GFS，其中许多都包含数千台主机。
　　GFS把一块储存的资料(通常是64MB)，至少放在三台称为chunkserver的主机內;假如chunkserver发生故障，主服务器便负责吧资料备份到一个新的地方。Dean說：“至少在储存层级，主机故障完全由GFS系统处理。”
　　一窥Google数据中心自行定制的40台服务器机柜。基础建设大师Jeff Dean在Google I/O大会上展示了这张照片。

参考文献编辑本段 回目录

http://research.google.com/people/jeff/index.html
http://lfhawk.blog.hexun.com/22947221_d.html
http://banditjava.javaeye.com/blog/245441
http://hi.baidu.com/injava/blog/item/6da24caf6fb560f2faed5000.html

词条内容仅供参考，如果您需要解决具体问题
（尤其在法律、医学等领域），建议您咨询相关领域专业人士。本词条对我有帮助18

Jeff Dean 发表评论(0) 编辑词条

个人自述编辑本段 回目录

搜索引擎名人堂之Jeff Dean编辑本段 回目录

Jeff Dean 编辑本段 回目录

Google用1000台服务器处理搜索结果编辑本段 回目录

谷歌自爆数据中心基础设施编辑本段 回目录

参考文献编辑本段 回目录

对词条发表评论