[去 中 心 化]--社会需要进步

Http已经过时,IPFS介绍

作者:Eureka

关于IPFS在圈子里提到的并不多,在InfoQ上有一篇文章http://www.infoq.com/cn/articles/ipfs/介绍了它,题名为“IPFS:替代HTTP的分布式网络协议”。其中的内容主要来自于这篇文章:HTTP is obsolete. It’s time for the distributed, permanent web 因为这篇文章介绍了一些IPFS的核心内容,所以在这里我把这篇文章做转载如下:

中文翻译

GeoCity 和 IPFS

今年年初Internet Archive发布了对分布式Web方案的征集,新的Web将会变得速度更快,安全性更高,更可靠,并且更加持久。对此我们感到兴奋不已,同时也将立刻踏上探索Web未来的旅途。 通过和Protocol Labs合作,Neocities成为了在生产中使用IPFS的最主要力量。目前Neocities的所有网站都可以通过世界上任意一个IPFS结点浏览和归档,即使Neocities关闭或撤下了某个站点,该站点也仍然可以被有效访问。支撑Neocities网站的IPFS结点越多,这些网站的可靠性越强(冗余度越高),对我们的集中依赖也就越少。 那么什么是IPFS呢?让我们来看一下它的README的内容:

IPFS(The InterPlanetary File System)是一种点到点的分布式文件系统,它连接的计算设备都拥有相同的文件管理模式。从某种意义上来说这个概念跟Web的最初理念很类似,但是实际上IPFS更像是互相转发Git目标的单个Bittorrent用户群。IPFS具备成为internet子系统的素质,通过合理配置可以完备甚至替代HTTP。这听起来已经有些不可思议,但其实它可以做到更多。
IPFS的开发目前处于alpha试验阶段,还没能替代现存的网站存储系统。就像其它复杂的新技术一样,我们有很多改进要做。但IPFS不是空想,它一直在实际运行着,你可以试着在自己的电脑上配置IPFS,为Neocities网站的访问用户提供服务,你的加入或许会延续某个网站的寿命。 尽管IPFS的开发还不十分成熟,但我仍然要表达一个大胆的观点:IPFS会取代HTTP(和其他一些组件一起)。虽然取代HTTP听起来很荒谬,但是我们应当看到HTTP已经很破旧了,继续长期甚至永远地使用HTTP才是更荒谬的事。所以我们应当应用当今计算机技术,着力解决分布式带来的诸多问题,为Web服务设计更好的协议。

第一部分 HTTP的缺陷

HTTP(超文本传输协议)是全世界统一的全局信息格式协议,它的制定为分发和显示信息提供了规范。HTTP将发布信息的成本降到了最低,扰乱了经济、政治、文化管理机构对信息(音乐、思想、视频、新闻、游戏等等)传播的控制。然而通过使获取信息的渠道更加平等、过程更为容易,HTTP实际上使我们的文化产业更加蓬勃有生机,很难想象没有了它生活将会变得怎样。 
HTTP堪称史上最伟大的发明之一,我永远爱它。但是由于在可分布性和可持久性方面的缺失,它多次在我们面前崩溃,因此难以成为人类知识总和的永久载体。HTTP分发内容的方式在根本上是有缺陷的,尽管经历了性能调优、CA证书SSL以及其它一些手段,也都没能改善。HTTP/2的改进在一定程度上缓解了缺陷,但是保守的升级并不能根本地解决问题,反而突显了HTTP的老旧。因此我们要采用一种新的基础协议代替HTTP来治理网络环境,提供更好的Web服务。我非常希望IPFS能成为那个替代者。

HTTP是脆弱的

![脆弱的Http](/project/ipfs/img/first-web-server.jpg” />

上图是Tim Berners-Lee在CERN的NeXT电脑,是世界上第一台HTTP协议的Web服务器。主机箱上贴着一张醒目的纸条,上面写着“这是一台服务器,不要关机!”不能关机是因为其他几台Web服务器正和它保持着连接,依赖于它运转。一旦这台电脑关机或无响应,连接就会中断,站点间的联系会被打破,访问的内容也随之消失不见。这就是HTTP最大的问题所在,它的腐蚀性。Tim的NeXT电脑现存于一家博物馆,是最早被淘汰的Web服务器之一。 下图是使用Web时常见的场景: ![404](/project/ipfs/img/permanentweb404.png/> 即使没有读过HTTP协议的具体内容,大多数人也都了解404错误意味着什么。它是HTTP协议的错误码,表明网页不在服务器的指定位置。能够见到404错误说明你还有一定的运气,因为有些时候服务器连这样的信息也不会显示。更可怕的是除非Internet Archive做了备份,否则要访问的网页可能永远也找不到,就这样丢失了。通常一个网页诞生的时间越早,回应404错误的可能就越大。就像一个冷血的数字墓碑,埋葬了当时的认知、美感或者愚昧。 ![中心化的网络](/project/ipfs/img/moshtoyanni.jpg” /> 90年代以后建立的网站中,我最喜欢的是Mosh to Yanni,不过今天的它只能作为一个反面的例子出现,告诉我们HTTP在维持网站间的连接方面是有多么的不足。Mosh to Yanni主页的静态内容仍然可以加载,浏览器的渲染也运行得很正常,但是站外的和动态服务的内容已经全都不可访问。 出现这种状况的原因其实非常简单,就是集中管理的Web服务器不可避免的会关机。而关机的原因则有很多,或是域名的所有者变更,或是计算机崩溃却没有备份。若是取消集中管理,让网站的拥有者自己搭建HTTP服务器,情况也不会好转,甚至可能更糟。 与Mosh to Yanni相似的例子多得数不清,许多有用的信息就这样从人间蒸发。即使丢失的内容是荒谬的废话或是过时的言论,那也是我们人类经历过的历史,不该这么轻易的丢掉。

HTTP趋于超中心化

为了避免数据被不断侵蚀的状况,人们开始依赖于使用管理更完善的大型中心服务器,通过制作大量冗余备份来获得可靠性。这一方案在短期内是十分有效的,然而却在长期运转中滋生了新的一系列问题。 90年代易拉爱,在John Perry Barlow的网络空间独立宣言倡导下,线上国度不断繁荣,用信息的手段影响并推动世界。然而与此同时,政府和公司等组织也开始通过HTTP的漏洞进行渗透,窥探和监控网民的生活,阻碍他们获取对自身不利的信息。

![中心化的网络](/project/ipfs/img/centralized-decentralized-distributed.jpg” />

当初人们想要建立的是非中心化的Web,可是如今使用的Web却越来越集中于少数一些服务器中心。造成的后果是,类似美国国家安全局这样的机构只要截取这些服务器中心的通信信息,就可以得到大部分用户的数据。对于政府来说,只要在边境路由上设置内容审查,就可以阻断访问中心服务器的连接。同时网络通信遭遇DDoS攻击的风险也大大增加。 相反分布式的Web可以减少管理部门的干涉,恢复人们的网络自由,同时也会减少单点故障带来的风险。

HTTP的效率不高

截止到我写这句话的是时间为止,江南style的视频已经有超过2,344,327,696次观看,姑且认为视频的大小都是117MB,那么仅视频文件就产生了274.3PB的网络流量。假设每1GB的成本(包括网络带宽和服务器存储)是1美分,则总共的花费将是2,742,860美元。 实际情况下花费可能还要高,因为网络带宽的价格是0.12美元起,亚洲用户则是0.2美元左右。对于谷歌这样的大公司来说钱可能不是太大的问题,不过对于一些中小小公司这就是个天文数字。我在Neocities做的工作有相当一部分是针对昂贵的带宽,通过一些技术手段减少基础设施运作的成本。 虽然HTTP降低了发布信息的成本,但它仍然需要大量金钱来运转,而且花费越来越多。在没有达到一定规模的情况下,集中由数据中心向外传播信息是非常昂贵的。若是我们能把同一ISP网络下的个人电脑都变成CDN内容提供者,从而取代数据中心,像江南style这样很火的视频就可以在ISP网络内部传播,就不需要接入到internet主干网,从而可以降低大量成本。IPFS就具备这样的功能,后面还会说到。

HTTP造成了Internet主干网被过度依赖

由于Web内容是超中心化的,数据中心的运作十分依赖Internet主干网络。除了会受到政府的审查和屏蔽之外,可靠性问题也大量存在。尽管已经采用了冗余备份的手段,可是一旦主干网崩溃,或是路由表错乱,依然会造成严重的后果。 几个月我就有过一次亲身体验,一辆小汽车撞坏了我们使用的上行光纤,整个Neocities的网络都瞬间变慢了。此外我还听说过一些类似的事情,比如有猎人射穿了东俄勒冈数据中心的光纤,抢修的工程师不得不坐着履带雪地车踩着滑雪板前去维修。就在我写这篇文章的时候,旧金山湾刚刚发生了一起复杂的网络攻击。 我想说的是Internet主干网是不完全可靠的,一方面它很容易遭受攻击,另一方面一旦重要的线路瘫痪,大量的网络服务都会受到影响。

第二部分 IPFS的解决方案

上面我们主要讨论了HTTP存在的问题(超中心化),下面我们说说IPFS如何解决这些问题。IPFS从根本上改变了查找的方式,这是它最重要的特征。使用HTTP我们查找的是位置,而使用IPFS我们查找的是内容。 举个例子:服务器上运行着一个文件https://neocities.org/img/neocitieslogo.svg,遵照HTTP协议浏览器首先会查找服务器的位置(IP地址),随后向服务器索要文件的路径。这种体系下文件的位置取决于服务器管理者,而用户只能寄希望于文件没有被移动,并且服务器没有关闭。 IPFS的做法则是不再关心中心服务器的位置,也不考虑文件的名字和路径,只关注文件中可能出现的内容。我把刚才的文件neocitieslogo.svg放到IPFS节点,它会得到一个新名字QmXGTaGWTT1uUtfSb2sBAvArMEVLK4rQEcQg5bv7wwdzwU,是一个由文件内容计算出的加密哈希值。哈希值直接反映文件的内容,哪怕只修改1比特,哈希值也会完全不同。 当IPFS被请求一个文件哈希时,它会使用一个分布式哈希表找到文件所在的节点,取回文件并验证文件数据。虽然早期的分布式哈希表曾遭受过女巫攻击,但是已经有一些新的方案来实现,我相信这个问题可以解决。 IPFS是通用目的的基础架构,基本没有存储上的限制。大文件会被切分成小的分块,下载的时候可以从多个服务器同时获取。IPFS的网络是不固定的、细粒度的、分布式的网络,可以很好的适应内容分发网络(CDM)的要求。这样的设计可以很好的共享各类数据,包括图像、视频流、分布式数据库、整个操作系统、模块链、8英寸软盘的备份,还有最重要的——静态网站。 IPFS文件还可以抽象成特殊的IPFS目录,从而标注一个可读的文件名(透明的映射到IPFS哈希),在访问的时候会像HTTP一样获取一个目录索引。在IPFS上建立网站的流程和过去一样,而且把网站加入到IPFS节点的指令只需要一条指令:ipfs add -r yoursitedirectory。网页间的连接不再需要人去维护,IPFS自带的查找可以解决。

关联数据到IPFS

IPFS不会要求每一个节点都存储所有的内容,节点的所有者可以自由选择想要维持的数据。这就像是书签一样,在备份了自己的网站之外,自愿的为其他的关注的内容提供服务,不同的是这个书签不会像以前一样最终变得失效。 IPFS节点间的拷贝、存储和网站支援都很容易,只需要使用一条指令以及网站的哈希,例如:ipfs pin add -r QmcKi2ae3uGb1kBg1yBpsuwoVqfmcByNdMiZ2pukxyLWD8,剩下的IPFS会搞定。 
如果IPFS得以普及,节点数达到一定规模,即使每个节点只存放一点点内容,所累计的空间、带宽和可靠性也远超HTTP能提供的。随之而来,分布式Web会变成地球上最快、最可靠、最大的数据仓库,人类知识也就再也不会湮灭,亚历山大图书馆永远不会倒塌。

IPNS

IPFS哈希只能用来表示不可变数据,因为一旦数据改变,哈希值也会改变。从某种意义上来说,这是保持数据持续性的好的设计。但是我们也需要一种方法来标记最新更新网站的哈希,这个方法我们称作IPNS。 IPFS哈希是网站通过哈希公钥生成的,相对的IPNS使用私钥来标记IPFS哈希的引用。如果以前用过比特币你应该很熟悉这种模式,比特币地址就是一种公钥哈希。在Neocities的IPFS节点上,我标记了Penelope(我们网站的吉祥物)的镜像,可以使用IPNS公钥来加载:QmTodvhq9CUS9hH8rirt4YmihxJKZ5tYez8PtDmpWrVMKP。 由于IPNS功能还没有完成,如果上面的链接不能工作也请不要灰心。IPNS公钥指向的位置是可变的,公钥的值则是保持不变的。随着IPNS的引入,网站升级的问题可以顺利解决。

可读可变寻址

由于IPFS/IPNS的哈希值都是很长和难记的字符串,所以IPFS兼容了现存的域名系统(DNS),可以通过可读的链接访问IPFS/IPNS内容。使用方法是在nameserver上创建一个文本记录,插入网站的哈希值(如果手上有一个命令行终端,试一下这个:dig TXT ipfs.git.sexy)。访问http://ipfs.io/ipns/ipfs.git.sexy/可以观察到效果。 接下来IPFS还打算支持Namecoin。Namecoin从理论上完全实现了分布式Web的去中心化,整体的运行中不再需要中心化的授权。支持了Namecoin的IPFS不再需要ICANN、中心服务器,不受政治干涉,也无需授权证书。这听起来难以置信,但却是今天可以实现的技术。

IPFS-HTTP网关:新旧Web之间的桥梁

IPFS在实现上加装了HTTP网关,使得现有的浏览器也可以访问IPFS,我在上文中举过例子。所以无需等待,现在就可以开始使用IPFS作为存储、分布和搭建网站的设施。

如何使用IPFS

现有的IPFS实现是实验性的,Neocities会在网站更新的时候发布新的IPFS哈希,新的哈希会指向最新版本的网站,通过IPFS-HTTP网关可以直接访问。由于IPFS哈希是随着更新改变的,我们可以自动的获得旧版本网站的历史档案,这些内容也会被提供出来。

IPNS加入后如何使用

如果项目长期地进展顺利,我们会用IPFS支撑所有的网站,并且为每个站点制作IPNS哈希,用户可以独立的发布内容,不再需要联系我们。假如我们做到了设想的情景,即使Neocities不存在了,用户还是可以正常更新他们的网站,意味着用户对中心服务器的依赖永远解除,把世界上网站都笼络到旗下的计划完全打破。这听起来太棒了!真是太棒了! 不过现在说这些有点为时过早也毫无意义,在IPFS能取代HTTP之前还有很多的实际工作要做。目前最紧迫的任务不是畅想未来,而是要脚踏实地接受Internet Archive的挑战——实现分布式Web


英文原文

GeoCity 和 IPFS

Early this year, the Internet Archive put out a call for a distributed web. We heard them loud and clear.

Today I’m making an announcement that begins our long journey to the future of the web. A web that is faster, more secure, more robust, and more permanent.

Neocities has collaborated with Protocol Labs to become the first major site to implement IPFS in production. Starting today, all Neocities web sites are available for viewing, archiving, and hosting by any IPFS node in the world. When another IPFS node chooses to host a site from Neocities, that version of the site will continue to be available, even if Neocities shuts down or stops hosting it. The more IPFS nodes seed Neocities sites, the more available (and redundant) Neocities sites become. And the less centrally dependent the sites are on us to continue existing.

What is IPFS? From their README:

IPFS is a distributed file system that seeks to connect all computing devices with the same system of files. In some ways, this is similar to the original aims of the Web, but IPFS is actually more similar to a single bittorrent swarm exchanging git objects.

IPFS could become a new major subsystem of the internet. If built right, it could complement or replace HTTP. It could complement or replace even more. It sounds crazy. It is crazy.

IPFS is still in the alpha stages of development, so we’re calling this an experiment for now. It hasn’t replaced our existing site storage (yet). Like with any complex new technology, there’s a lot of improvements to make. But IPFS isn’t vaporware, it works right now. You can try it out on your own computer, and already can use it to help us serve and persist Neocities sites.

The message I want to send couldn’t possibly be more audacious: I strongly believe IPFS is the replacement to HTTP (and many other things), and now’s the time to start trying it out. Replacing HTTP sounds crazy. It is crazy! But HTTP is broken, and the craziest thing we could possibly do is continue to use it forever. We need to apply state-of-the-art computer science to the distribution problem, and design a better protocol for the web.

Part 1: What's wrong with HTTP?

The Hypertext Transfer Protocol (HTTP) has unified the entire world into a single global information protocol, standardizing how we distribute and present information to eachother.

It is inconceivable for me to even think about what life would be like without it. HTTP dropped the cost of publishing content to almost nothing, an innovation that took a sledgehammer to the top-down economic, political, and cultural control over distribution of information (music, ideas, video, news, games, everything). As a result of liquifying information and making it the publication of it more egalitarian and accessible, HTTP has made almost everything about our culture better.

I love HTTP, and I always will. It truly stands among the greatest and most important inventions of all time.

But while HTTP has achieved many things, it’s usefulness as a foundation for the distribution and persistence of the sum of human knowledge isn’t just showing some cracks, it’s crumbling to pieces right in front of us. The way HTTP distributes content is fundamentally flawed, and no amount of performance tuneups or forcing broken CA SSL or whatever are going to fix that. HTTP/2 is a welcome improvement, but it’s a conservative update to a technology that’s beginning to show its age. To have a better future for the web, we need more than a spiced up version of HTTP, we need a new foundation. And per the governance model of cyberspace, that means we need a new protocol. IPFS, I’m strongly hoping, becomes that new protocol.

HTTP is brittle

This is a picture of the first HTTP web server in the world. It was Tim Berners-Lee’s NeXT computer at CERN.

Pasted on the machine is an ominous sticker: “This machine is a server, do not power it down!!”.

The reason it couldn’t be powered down is that web sites on other servers were starting to link to it. Once they linked to it, they then depended on that machine continuing to exist. If the machine was powered down, the links stopped working. If the machine failed or was no longer accessible at the same location, a far worse thing happened: the chain between sites becomes permanently broken, and the ability to access that content is lost forever. That sticker perfectly highlights the biggest problem with HTTP: it erodes.

Tim’s NeXT cube is now a museum piece. The first of millions of future dead web servers.

You’ve seen the result:

Even if you’ve never read the HTTP spec, you probably know what 404 means. It’s the error code used by HTTP to indicate that the site is no longer on the server at that location. Usually you’re not even that lucky. More often, there isn’t even a server there anymore to tell you that the content you’re looking for is gone, and it has no way to help you find it. And unless the Internet Archive backed it up, you’ll never find it again. It becomes lost, forever.

The older a web page is, the more likely it is you’ll see 404 pages. They’re the cold-hearted digital tombstones of a dying web, betraying nothing about what knowledge, beauty, or irreverent stupidity may have once resided there.

One of my favorite sites from the 90s web was Mosh to Yanni, and viewing the site today gives a very strong example of how inadequate HTTP is for maintaining links between sites. All the static content stored with the site still loads, and my modern browser still renders the page (HTML, unlike HTTP, has excellent lasting power). But any links offsite or to dynamically served content are dead. For every weird example like this, there are countless examples of incredibly useful content that have also long since vanished. Whether eroding content is questionable crap or timelessly useful, it’s still our history, and we’re losing it fast.

The reason this happens is simple: centrally managed web servers inevitably shut down. The domain changes ownership, or the company that ran it goes out of business. Or the computer crashes, without having a backup to restore the content with. Having everyone run their own personal HTTP server doesn’t solve this. If anything, it probably makes it worse.


HTTP encourages hypercentralization

The result of this erosion of data has been further dependence on larger, more organized centralized services. Their short-term availability tends to be (mostly) good due to redundant backups. But this still doesn’t address long-term availability, and creates a whole new set of problems.

We’ve come a long way since John Perry Barlow’s A Declaration of the Independence of Cyberspace. As our electronic country becomes more influential and facilitates the world with more information, governments and corporations alike have started to pry into HTTP’s flaws, using them to spy on us, monetize us, and block our access to any content that represents a threat to them, legitimate or otherwise.

The web we were intended to have was decentralized, but the web we have today is very quickly becoming centralized, as billions of users become dependent on a small handful of services.

Regardless of whether you think this is a legitimate tradeoff, this was not how HTTP was intended to be used. Organizations like the NSA (and our future robot overlords) now only have to intercept our communications at a few sources to spy on us. It makes it easy for governments to censor content at their borders by blocking the ability for sites to access these highly centralized resources. It also puts our communications at risk of being interrupted by DDoS attacks.

Distributing the web would make it less malleable by a small handful of powerful organizations, and that improves both our freedom and our independence. It also reduces the risk of the “one giant shutdown” that takes a massive amount of data with it.

HTTP is inefficient

As of this writing, Gangnam Style now has over 2,344,327,696 views. Go ahead, watch it again. I’ll wait for you.

Let’s make some assumptions. The video clocks in at 117 Megabytes. That means (at most) 274,286,340,432 Megabytes, or 274.3 Petabytes of data for the video file alone has been sent since this was published. If we assume a total expense of 1 cent per gigabyte (this would include bandwidth and all of the server costs), $2,742,860 has been spent on distributing this one file so far.

That’s not too bad… if you’re Google. But if you’re a smaller site, the cost to serve this much data would be astronomical, especially when bandwidth rates for small players start around $0.12 per gigabyte and go as high a $0.20 in Asia. I’ve spent the better part of my work at Neocities battling expensive bandwidth to ensure we can keep running our infrastructure at low cost.

HTTP lowered the price of publishing, but it still costs money, and these costs can really add up. Distributing this much data from central datacenters is potentially very expensive if not done at economies of scale.

What if, instead of always serving this content from datacenters, we could turn every computer on an ISP’s network into a streaming CDN? With a video as popular as Gangnam Style, it could even be completely downloaded from within an ISP’s network, not requiring numerous hops over the internet backbone. This is one of the many things IPFS is capable of improving (we’ll discuss this in a bit).

HTTP creates overdependence on the Internet backbone

When content is hypercentralized, it makes us highly dependent on the internet backbones to the datacenters functioning. Aside from making it easy for governments to block and censor content, there are also reliability problems. Even with redundancies, major backbones sometimes get damaged, or routing tables go haywire, and the consequences can be drastic. I got a weird taste of that a few months ago, when Neocities slowed down after a car crashed into a fiber uplink we use in Canada (no suspects yet, but a few promising leads). I've also heard stories where hunters have shot at the fiber cables connecting the eastern Oregon datacenters (the enormous ones that store a lot of data), requiring engineers to show up on snowmobiles with cross country skis to repair the fiber lines. Since I wrote this post, details have emerged on a sophisticated attack on fiber lines happening in the Bay Area. The point is, the internet backbone isn't perfect, it's easy to attack it, and it's easy for service to get affected by a few important fiber lines getting cut.

Part 2: How IPFS solves these problems

We’ve discussed HTTP’s problems (and the problems of hypercentralization). Now let’s talk about how IPFS, and how it can help improve the web.

IPFS fundamentally changes the way we look for things, and this is it’s key feature. With HTTP, you search for locations. With IPFS, you search for content.

Let me show you an example. This is a file on a server I run: https://neocities.org/img/neocitieslogo.svg. Your browser first finds the location (IP address) of the server, then asks my server for the file using the path name. With that design, only the owner (me) can determine that this is the file you’re looking for, and you are forced to trust that I don’t change it on you by moving the file, or shutting the server down.

Instead of looking for a centrally-controlled location and asking it what it thinks /img/neocitieslogo.svg is, what if we instead asked a distributed network of millions of computers not for the name of a file, but for the content that is supposed to be in the file?

This is precisely what IPFS does.

When neocitieslogo.svg is added to my IPFS node, it gets a new name: QmXGTaGWTT1uUtfSb2sBAvArMEVLK4rQEcQg5bv7wwdzwU. That name is actually a cryptographic hash, which has been computed from the contents of that file. That hash is guaranteed by cryptography to always only represent the contents of that file. If I change that file by even one bit, the hash will become something completely different.

When I ask the IPFS distributed network for that hash, it efficiently (20 hops for a network of 10,000,000) finds the nodes that have the data using a Distributed Hash Table, retrieves it, and verifies using the hash that it’s the correct data. Early DHT designs had issues with Sybil attacks, but we have new ways to address them, and I’m very confident this is a solvable problem (unlike the problems with HTTP, which are just going to be broken forever).

IPFS is general purpose, and has little in the way of storage limitations. It can serve files that are large or small. It automatically breaks up larger files into smaller chunks, allowing IPFS nodes to download (or stream) files from not just one server like with HTTP, but hundreds of them simultaneously. The IPFS network becomes a finely-grained, trustless, distributed, easily federated Content Delivery Network (CDN). This is useful for pretty much everything involving data: images, video streaming, distributed databases, entire operating systems, blockchains, backups of 8 inch floppy disks, and most important for us, static web sites.

IPFS files can also be special IPFS directory objects, which allow you to use human readable filenames (which transparently link to other IPFS hashes). You can load the directory’s index.html by default, the same way a standard HTTP server does. Using directory objects, IPFS allows you to make static web sites exactly the same way you make them today. It’s a single command to add your web site to an IPFS node: ipfs add -r yoursitedirectory. After that, it’s available from any IPFS node without requiring you to link to any hashes in the HTML (example, and example with index.html renamed).

Federating data with IPFS

IPFS doesn’t require every node to store all of the content that has ever been published to IPFS. Instead, you choose what data you want to help persist. Think of it like bookmarks, except instead of bookmarking a link to a site that will eventually fail, you back up the entire site for yourself, and volunteer to help to serve the content to others that want to look at it.

If a lot of nodes host a little bit, these little bits quickly add up to more space, bandwidth and availablity than any centralized HTTP service could ever possibly provide. The distributed web will quickly become the fastest, most available, and largest store of data on the planet earth. And nobody will have the power to “burn books” by turning it all off. This Library of Alexandria is never going to burn down.

Copying, storing and helping serve web sites from other IPFS nodes is easy. It just takes a single command and the hash of the site: ipfs pin add -r QmcKi2ae3uGb1kBg1yBpsuwoVqfmcByNdMiZ2pukxyLWD8. IPFS takes care of the rest.

IPNS

IPFS hashes represent immutable data, which means they cannot be changed without the hash being different. This is a good thing because it encourages data persistence, but we still need a way to find the latest IPFS hash representing your site. IPFS accomplishes this using a special feature called IPNS.

IPNS allows you to use a private key to sign a reference to the IPFS hash representing the latest version of your site using a public key hash (pubkeyhash for short). If you’ve used Bitcoin before, you’re familiar with this - a Bitcoin address is also a pubkeyhash. With our Neocities IPFS node, I signed the image of Penelope (our site mascot) and you can load it using our IPNS pubkeyhash for that node: QmTodvhq9CUS9hH8rirt4YmihxJKZ5tYez8PtDmpWrVMKP.

IPNS isn’t done yet, so if that link doesn’t work, don’t fret. Just know that I will be able to change what that pubkeyhash points to, but the pubkeyhash will always remain the same. When it’s done, it will solve the site updating problem.

Now we just need to make the location of these sites human-readable, and we’ve got all the pieces we need.

Human-readable mutable addressing

IPFS/IPNS hashes are big, ugly strings that aren’t easy to memorize. So IPFS allows you to use the existing Domain Name System (DNS) to provide human-readable links to IPFS/IPNS content. It does this by allowing you to insert the hash into a TXT record on your nameserver (if you have a command line handy, run this: dig TXT ipfs.git.sexy). You can see this in action by visiting http://ipfs.io/ipns/ipfs.git.sexy/.

Going forward, IPFS has plans to also support Namecoin, which could theoretically be used to create a completely decentralized, distributed web that has no requirements for a central authority in the entire chain. No ICANN, no central servers, no politics, no expensive certificate “authorities”, and no choke points. It sounds crazy. It is crazy. And yet, it’s completely possible with today’s technology!

IPFS HTTP gateway: The bridge between the old web and the new

The IPFS implementation ships with an HTTP gateway I’ve been using to show examples, allowing current web browsers to access IPFS until the browsers implement IPFS directly (too early? I don’t care). With the IPFS HTTP gateway (and a little nginx shoe polish), we don’t have to wait. We can soon start switching over to IPFS for storing, distributing, and serving web sites.

How we’re using IPFS now

Our initial implementation of IPFS is experimental and modest, for now. Neocities will be publishing an IPFS hash once per day when sites are updated, accessible from every site profile. This hash will point to the latest version of the site, and be accessible via our IPFS HTTP gateway. Because the IPFS hash changes for each update, this also enables us to provide an archive history for all the sites, something we automagically just get from the way that IPFS works anyways.

How we’ll use IPNS in the future

Long-term, if things go well, we want to use IPFS for storing all of our sites, and issue IPNS keys for each site. This would enable users to publish content to their site independently of us. If we do it right, even if Neocities doesn’t exist anymore, our users can still update their sites. We effectively take our user’s central dependence on our servers and smash it to pieces, permanently ruining our plans for centralized world domination forever. It sounds awesome. It is awesome!

It’s still early, and there’s much work to do before IPFS can replace HTTP without needing to describe the idea as crazy. But there’s no time like the present to plan for the future. It’s time for us to get to work. Accept the Internet Archive’s challenge: distribute the web.


下面您可根据自己的喜好分别使用“Disqus”或者“多说”进行留言评论: