当前位置:首页> 热门排行 >IPFS——内容寻址,版本化,对等的文件系统(1)

IPFS——内容寻址,版本化,对等的文件系统(1)

2023-05-10 14:56:27

本文基于《IPFS - Content Addressed, Versioned, P2P File System(DRAFT 3)》进行翻译,翻译过程中主要参考IPFS白皮书,根据自己的理解来做调整。

作者: Juan Benet (juan@benet.ai)

摘要(ABSTRACT)

The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with contentaddressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.

星际文件系统(IPFS)是一种对等分布式文件系统,旨在将所有计算设备连接到相同的文件系统。在某些方面,IPFS和Web很像,但IPFS可以看作是一个BitTorrent集群,并在Git仓库中做对象交换。换句话来说,IPFS提供了高吞吐的基于内容寻址的块存储模型和超链接。这形成了一个广义的默克尔有向无环图(Merkle DAG)数据结构,可以用这个数据结构构建版本化文件系统,区块链,甚至是永久性网站。IPFS结合了分布式哈希表,带激励机制的块交换和自认证的命名空间。IPFS没有单点故障,节点不需要相互信任。

1 介绍(INTRODUCTION)

There have been many attempts at constructing a global
distributed file system. Some systems have seen significant success, and others failed completely. Among the academic attempts, AFS [6] has succeeded widely and is still in use today. Others [7, ?] have not attained the same success. Outside of academia, the most successful systems have been peer-to-peer file-sharing applications primarily geared toward large media (audio and video). Most notably, Napster, KaZaA, and BitTorrent [2] deployed large file distribution systems supporting over 100 million simultaneous users. Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily [16]. These applications saw greater numbers of users and files distributed than their academic file system counterparts. However, the applications were not designed as infrastructure to be built upon. While there have been successful repurposings[^1], no general file-system has emerged that offers global, low-latency, and decentralized distribution.

在构建全球化的分布式文件系统方面,已经有很多尝试。一些系统取得了重要的成功,而另一些却彻底的失败了。在学术界的尝试中,AFS[6]取得了广泛的成功,至今也还在使用。另一些[7,?]就没有获得一样的成功。学术之外,最成功的系统是面向大多媒体(音频和视频)的点对点,文件共享的应用系统。最值得注意的是,Napster,KaZaA和BitTorrent[2]部署了大型文件分发系统,支持超过1亿的同步用户。即使在今天, BitTorrent也维持着每天千万节点的活跃数[16]。可以看到,这些应用程序分发的用户和文件数量比学术文件系统对应数量多。但是,这些应用不是作为基础设施来设计的。虽然取得了成功的应用,但没有出现一种通用的文件系统,支持全球化,低延迟,去中心化分发。

Perhaps this is because a “good enough” system for most use cases already exists: HTTP. By far, HTTP is the most successful “distributed system of files” ever deployed. Coupled with the browser, HTTP has had enormous technical and social impact. It has become the de facto way to transmit files across the internet. Yet, it fails to take advantage of dozens of brilliant file distribution techniques invented in the last fifteen years. From one prespective, evolving Web infrastructure is near-impossible, given the number of backwards compatibility constraints and the number of strongparties invested in the current model. But from another perspective, new protocols have emerged and gained wide use since the emergence of HTTP. What is lacking is upgrading design: enhancing the current HTTP web, and introducing new functionality without degrading user experience.

可能是适用大多数场景的“足够好用”的系统已经存在的原因:它就是HTTP。到目前为止,HTTP是最成功的“文件发布系统”。与浏览器相结合,HTTP在技术和社会上有巨大的影响力。它已成为互联网文件传输的事实标准。然而,它没有采用最近15年发明的数十种先进的文件分发技术。从一个角度来看,考虑到向后兼容性约束的数量以及对当前模型感兴趣的强大团队的数量,演进Web基础架构几乎不可能实现。但从另一个角度来看,自HTTP出现以来,新的协议已经出现并得到广泛的应用。 缺乏的是升级设计:增强当前的HTTP网络,并引入新功能而不会降低用户体验。

Industry has gotten away with using HTTP this long because moving small files around is relatively cheap, even for small organizations with lots of traffic. But we are entering a new era of data distribution with new challenges: (a)hosting and distributing petabyte datasets, (b) computing on large data across organizations, (c) high-volume highdefinition on-demand or real-time media streams, (d) versioning and linking of massive datasets, (e) preventing accidental disappearance of important files, and more. Many of these can be boiled down to “lots of data, accessible everywhere.” Pressed by critical features and bandwidth concerns, we have already given up HTTP for different data distribution protocols. The next step is making them part of the Web itself.

业界长期使用HTTP,因为移动小文件相对便宜,即使对于流量大的小型组织也是如此。但我们正在进入了一个数据分发的新时代,随之而来的是新的挑战:(a)托管和分发PB级的数据集,(b)跨组织的大数据计算,(c)大容量高清晰度按需或实时媒体流,(d)大规模的数据集版本化和链接,(e)防止重要文件意外丢失,等等。许多挑战可以归结来“大量数据,随处访问”。受关键特性和带宽问题的影响,我们已经放弃了HTTP,而使用不同的数据分布协议。下一步是让这些协议成为Web本身的一部分。

Orthogonal to efficient data distribution, version control systems have managed to develop important data collaboration workflows. Git, the distributed source code version control system, developed many useful ways to model and implement distributed data operations. The Git toolchain offers versatile versioning functionality that large file distribution systems severely lack. New solutions inspired by Git are emerging, such as Camlistore [?], a personal file storage system, and Dat [?] a data collaboration toolchain and dataset package manager. Git has already influenced distributed filesystem design [9], as its content addressed Merkle DAG data model enables powerful file distribution strategies. What remains to be explored is how this data structure can influence the design of high-throughput oriented file systems, and how it might upgrade the Web itself.

与高效的数据分发相对应,版本控制系统已经设法开发了重要数据的协作工作流。分布式源代码版本控制系统Git开发了许多有用的方法来建模和实现分布式数据操作。 Git工具链提供了大型文件分发系统严重缺乏的多种版本功能。 受Git启发的新解决方案正在兴起,如Camlistore [?],个人文件存储系统,以及Dat [?]数据协作工具链和数据集包管理器。 Git已经影响了分布式文件系统设计[9],因为它的内容寻址Merkle DAG数据模型可以实现强大的文件分发策略。 还有待探讨的是,这种数据结构如何影响高吞吐量文件系统的设计,以及它如何升级Web本身。

This paper introduces IPFS, a novel peer-to-peer versioncontrolled filesystem seeking to reconcile these issues. IPFS synthesizes learnings from many past successful systems.Careful interface-focused integration yields a system greater than the sum of its parts. The central IPFS principle is modeling all data as part of the same Merkle DAG.

本文介绍IPFS,一种新颖的对等网络版本控制的文件系统,旨在解决这些问题。 IPFS综合了过去许多成功的系统的经验教训。精心设计、专注于接口集成的系统产生的效益大于构建它的各个部件的总和。IPFS的核心原则是将所有数据建模为同一Merkle DAG的一部分。


友情链接