网络信息体系结构 Web-based Information Architectures,http:/net.pku.edu.cn/wbia 彭波 pbnet.pku.edu.cn 北京大学信息科学技术学院 9/22/2008,Quiz,下面场景中,按回车键,会发生什么? 浏览器地址栏里输入http:/www.pku.edu.cn/之后 Google搜索框里输入”北京大学”之后,这一讲(概论)内容提要,三个“故事” 课程主要内容 课程的组织与安排,故事一:WBIA的由来,研究一下,Problem WBIA是什么? Approaches 分解 Web是什么? Information是什么? Information Architecture是什么? 排除法 WBIA不是 Etymology词源 WBIA的由来,Search,Web The World Wide Web (commonly shortened to the Web) is a system of interlinked hypertext documents accessed via the Internet.,Encyclopedia Britannica,Search Results,Information Information is a quality of a message from a sender to one or more receivers. But overall, information is the result of processing, manipulating and organizing data in a way that adds to the knowledge of the person receiving it. Information Architecture At its most basic, information architecture is the construction of a structure or the organization of information.,WBIA不是,Web Information Architecture (Web信息结构) 如何构建大规模复杂的Web站点,有效的进行信息组织 Network Architecture (网络体系结构) 网络体系结构是关于完整的计算机通信网络的一幅设计蓝图,是设计、构造和管理通信网络的框架和技术基础。比如OSI,TCP/IP等 Semantic Web (语义网) “The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ 1,WBIA的由来,Prof. LixiaoMing 2003开设一门研究生课程,取名WBIA。(Web Based Information Architecture-Web信息体系结构),WBIA的由来,2002年秋,Prof.Li遇到Kahle Brewster,他从1997年开始建了“Internet Archive”。,Library: “A centrally maintained collection of information organised to answer the information needs of a specific population,The Internet Archive is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public. “universal access to human knowledge”,200709,200809,2008-09-17,1997-01-31,WBIA的由来,2003年1月,Prof.Li在印度遇到CMU的Jaime G. Carbonell教授,一起参加“中美百万册图书”项目讨论 The goal of The Million Book Project is to digitize a million books by 2005. The task will be accomplished by scanning the books and indexing their full text with OCR technology. The undertaking will create a free-to-read, searchable digital library the approximate size of the combined libraries at Carnegie Mellon University,“中美百万册图书项目”,已经发展成“中美印” Raj Reddy, 1994 Turing Award Winner(人工智能) 让100万册书籍上网,全人类“便捷”共享 有史以来,人类大约出版了1亿种书,分散在各个图书馆和民间(Raj Reddy提供) 中华人民共和国:250万种 中华民国(1911-1949):15万种 中国古籍(1911年前):10多万种 (上述三个数据由王益明教授提供),数字化,Internet Archive is quietly digitizing around 1,000 public domain titles every day.,WHY?,为什么人们如此关注Library? 为什么人们对Web如此投入?,WBIA的由来,Carbonell教授在上一门课 Web-based Information Architectures Web-Based Information Management entails the design, creation, instrumentation and usage of web sites and related indexing and searching software. the course focuses on key technological underpinnings, primarily the hands-on creation of a search engine Subsequently, the course addresses related issues in web-based information architectures, including: automated text categorization, information extraction from web-pages, and a glimpse into larger-scale text and data mining methods.,WBIA的由来,Prof.Li 2003开设一门研究生课程,取名WBIA。 WBIA是什么? Web-based :以Web为研究对象,其中又以Web上文本信息为中心。 Information architecture :有效信息访问为研究中心问题,以Web Search为中心探讨Web信息处理、检索和挖掘的相关研究和技术问题。,故事二:Web,Web的诞生,1980年Tim Berners-Lee负责的Enquire(Enquire Within Upon Everything的简称)项目。hypertext 1990年11月,第一个Web服务器nxoc01.cern.ch开始运行,Tim Berners-Lee在自己编写的图形化Web浏览器“WorldWideWeb”上看到了最早的Web页面。 1991年,CERN(European Particle Physics Laboratory)正式发布了Web技术标准。 目前,与Web相关的各种技术标准都由著名的W3C组织(World Wide Web Consortium)管理和维护。,The first Web Server,Web的支撑技术,用超文本技术(HTML)实现信息与信息的连接 用统一资源定位技术(URI)实现全球信息的精确定位 用新的应用层协议(HTTP)实现分布式的信息共享。 这三个特点无一不与信息的分发、获取和利用有关。Tim Berners-Lee说:“Web是一个抽象的(假想的)信息空间。“也就是说,作为Internet上的一种应用架构,Web的首要任务就是向人们提供信息和信息服务。,Web增长,网站数目 1993-1996, from 130 to 600.000 sites Netcraft said that In the August 2008 survey we received responses from 176,748,506 sites. (135,166,473 sites one year before),浏览器大战,1993,Mark Andreessen编写Mosaic “The great thing about the Internet-the thing that catalyzed it in the first place and renews it every day-is that there are so many people able to use it, able to do a million different things. Its an open platform that anybody can develop and create applications for. A lot of people are able to apply their energy, and see it bear fruit.”,浏览器大战,1994, Mark Andreessen发布Netscape,成为当时的事实标准 1995, Microsoft开始全面转向Internet,发布Internet Explorer 1.0,三个月后发布2.0 1997, IE4.0发布,引入DHTMLWinner 1998, Netscape开放源码 2004,Mozilla.org在Netscape源码基础上开发发布Firefox,比IE有更多新功能和更好安全性,开始了新一轮浏览器大战。,Why? Web Browser成为争夺的焦点?,DOTCOM Bubble,The technology-heavy NASDAQ Composite index peaked in March 2000, reflecting the high point of the dot-com bubble.,背景: Free publishing and instant worldwide informationdirect Web-based commerce 1997-2001年间成立的internet-based公司: 股票价格飞速增长股票投机和盲目风险投资 推翻旧的商业模式追求市场份额超过一切,Example: kozmo.com,promised free one-hour delivery of anything fr
