
Editor's Note: This article comes fromChain News ChainNews (ID: chainnewscom), Author: Li Shuomiao Frank, Vice President of ParallelVC Investment, published with authorization.
Editor's Note: This article comes from
Chain News ChainNews (ID: chainnewscom)
Chain News ChainNews (ID: chainnewscom)
, Author: Li Shuomiao Frank, Vice President of ParallelVC Investment, published with authorization.
The development of the Internet has gone through two stages: Web1.0 stage and Web2.0 stage. Websites in the Web1.0 stage are mostly static websites. There is no interaction process between network users and network information. Users can only obtain information displayed on the network, and the effectiveness and efficiency of information acquisition are relatively low.
In the early days of Web1.0 and Web2.0, because the amount of user data was small and the data dimensions were relatively small, the personal data of users could not generate much value. However, as people's use of the Internet has increased in recent years, the value of personal information on the Internet has also become non-negligible. In the past two years, there have been cases of Internet companies violating personal privacy data and user data being stolen in various countries. In the future, with the development of artificial intelligence Internet of Things (AIoT) and 5G networks, the dimension of personal network data will be more comprehensive and more valuable, making data security and data privacy more important.
With the improvement of network speed and the increase of bandwidth, people and the Internet gradually began to interact. In 2003, Dale Duherty, vice president of O'Reilly Media, proposed the concept of Web2.0.
Web2.0 is called a readable and writable web. In the early days of Web2.0, compared with Web1.0, all Internet users can create their own content and upload it to the Internet instead of just obtaining information from the Internet, which greatly increases the richness of Internet information. With the further development of technologies such as AI and big data in recent years, human-computer interaction has been promoted to a new stage. Online behavior data such as browsing information, clicks, and searches generated by users on the Internet are captured and recorded. The technical background can analyze a more accurate user portrait through the real-time data of the user combined with its past information, and recommend corresponding products for the user based on the user portrait. products or information. In this way, the purchase conversion efficiency of the merchant is improved, and at the same time, the user body can more quickly find the products that they may want to buy, so that the user experience becomes better.
However, while the centralization of information is convenient, it also has a big disadvantage, that is, all the data of the user is collected and used by the platform without their perception, and even the ownership of the data is unclear.
In the early days of Web1.0 and Web2.0, because the amount of user data was small and the data dimensions were relatively small, the personal data of users could not generate much value. However, as people's use of the Internet has increased in recent years, the value of personal information on the Internet has also become non-negligible. In the past two years, there have been cases of Internet companies violating personal privacy data and user data being stolen in various countries. In the future, with the development of artificial intelligence Internet of Things (AIoT) and 5G networks, the dimension of personal network data will be more comprehensive and more valuable, making data security and data privacy more important.
The emergence of Web3.0 is to solve the current problems faced by Web2.0. The blockchain network satisfies the needs of the underlying technology of Web3.0 because of its functions and properties of trustlessness, non-tampering and confirmation of rights. At the same time, due to changes in the network architecture, data is no longer a simple number but a commodity with value attributes, which also makes our existing data network slowly transform into a value network.
Web3.0 needs decentralized storage
As mentioned above, if you want to construct a decentralized network to ensure data security and privacy, you need to ensure that the data can only be controlled by its owner (data generator), which includes obtaining or authorizing others right to use. It is a pity that the current storage solutions in the Internet are all centralized storage, and the main storage providers are central institutions such as Amazon, Alibaba Cloud, and Google Cloud. At the same time, the user's personal data generated online is now occupied by various platforms and APPs. Even if the ownership of personal data can be returned to the user in the future, in the case of central storage, the user cannot ensure that his data will not be stored by the APP platform or even used or modified by the provider.
In addition, 5G and AIoT cutting-edge technologies are developing rapidly. The dimension of personal data in the future network will further increase, and the value of data will also increase rapidly. Only the use of decentralized storage can further ensure data security and privacy.
If there is no decentralized storage technology as the underlying technical support of the decentralized network, even if decentralized information transmission and decentralized computing are realized, the decentralization of data cannot be truly guaranteed. Therefore, decentralized storage will be an essential technical component in the future Web3.0 ecology.
Current status of traditional cloud storage
At present, the traditional cloud storage field can be divided into three forms: public cloud, private cloud and hybrid cloud. Public cloud is currently the most common form of cloud service. Public cloud is owned and managed by technology providers and serves multiple clients. At the same time, public cloud can be divided into three technical forms: SaaS, PaaS and IaaS.
SaaS is the abbreviation of "Software as a service". This type of service provides applications as services to users. The main customers are individual home users and small and medium-sized enterprise users. The main domestic service providers are Baidu Netdisk and Tencent Weiyun. In foreign countries, it is mainly Dropbox, iCloud and so on.
PaaS is the abbreviation of "Platform as a service". This type of service is to provide the development platform as a service to users, and the main customers are small and medium-sized enterprises and individual developers. China's top PaaS platform has Baidu Cloud, while foreign countries mainly develop Openshift platforms for Google App Engine and Red Hat.
IaaS is the abbreviation of "Infrastructure as a service", which mainly provides cloud computing resources such as virtual machines and cloud storage originals to users as a service network. Domestic leading companies include Alibaba Cloud and Huawei Cloud, while Amazon EC2 mainly provides such services abroad.
Unlike public clouds, private clouds and hybrid clouds mostly serve large enterprise users. A private cloud is used and maintained by an enterprise or organization, and users have more control over personalization, while a hybrid cloud is a solution that mixes and matches public clouds and private clouds to achieve relatively high cost performance.
First of all, we can analyze the data on the use of cloud storage by domestic individual users: According to the MAU share data of Aurora in 2019, the top four are Baidu Netdisk, Tencent Weiyun, Caiyun and Tianyi Cloud Disk.
Among them, the share of active users of Baidu Netdisk reached 82.9%. Continuing to observe the user hobbies TGI of Baidu Netdisk, it can be concluded that the main data stored by users is content related to movies, animation and sports.
At the company level, the data shows that 39% of enterprises are currently using cloud storage-related services (such as AWS, Azure, Google Cloud, etc.), and according to forecasts, by 2022, the number of enterprises using cloud storage-related services will reach 60% about. Among the three major cloud storage companies in the United States, Microsoft Azure has a market share of 44%, AWS has a market share of about 32%, and Google Cloud has a market share of only 19%. In the Spiceworks 2019 Public Cloud Report, Azure was also rated as the most competitive cloud service provider. At the same time, according to the survey, 79% of the business owners mainly use the Windows system in their business, and Microsoft's Azure cloud storage service is paired with the Windows system to have a better product experience.
The current cloud storage service architecture is relatively mature, and users can choose to use public cloud, private cloud or hybrid cloud services according to their needs. Among them, in public cloud services, users can choose different storage methods based on the frequency of data calls, and save costs to the greatest extent without affecting usage.
Compared with cloud storage, nearly 81% of enterprises are still using traditional storage matrix, that is, hardware storage. Dell EMC currently has the highest market share of 42%. According to the survey, 80% of enterprises choose traditional storage matrix because of its higher reliability, and also believe that compared with cloud storage, traditional hardware storage has higher privacy and security.
At the technical level, traditional cloud storage has undergone three stages of iteration. The first stage is a storage architecture solution based on NAS (Network Attached storage) and SAN (Storage Area Network, storage network). This architecture preliminarily forms a cloud storage solution, but it is difficult to flexibly call different storage clusters when the geographical location of the server is high, so there is a big bottleneck in the expansion of capacity and performance.
The beginning of the second phase is marked by the emergence of the EMC VPLEX architecture. On the basis of the original storage architecture, VPLEX realizes the functions of virtual storage and heterogeneous storage, liberates the geographical restrictions of storage hardware, and solves the problem of cross-cluster operation. Through the integration of intelligent distributed caches, cross-host, cross-cluster, and cross-data center access and data sharing are realized, which greatly improves the capacity and scalability of cloud storage. The services it provides are similar to those provided by IAAS today.
The third stage is the current stage of cloud storage. The current underlying architecture of cloud storage is a cloud-like structure. Multiple storage devices are connected to each other, and hardware devices are intelligently switched according to different needs. Great improvements have been made in terms of multi-copy consistency, disaster recovery, and elastic expansion. The current cloud storage SaaS and PaaS platforms can provide users with easy-to-use storage services.
The Development of Decentralized Storage
In the current cloud storage infrastructure, the distributed architecture has become a standard configuration because of its high performance, multi-distribution consistency, and support for hierarchical storage. Decentralized storage can actually be classified as a type of distributed storage.
Compared with other distributed storage solutions, the data center in the decentralized storage architecture will not be completely owned by the cloud provider, but will be jointly provided by multiple nodes in the network, and the data will be stored in an encrypted manner. In this way, the privacy and security issues of data are better solved, and even the data center or storage hardware provider cannot obtain the stored data.
In the current decentralized storage, the most representative projects are IPFS and Filecoin. The full name of IPFS is "Inter Planetary File System", and the Chinese name is "Star File System". IPFS is an underlying network transmission protocol equivalent to the HTTP (Hypertext) protocol in the current Internet world. HTTP is a relatively simple request-response protocol for interaction between users and servers.
The function of IPFS is similar to that of HTTP, but the architectural features of p2p network are added to it. Compared with the HTTP protocol, the IPFS protocol is more efficient. HTTP is a single-threaded communication, and only one task can be performed on a server at a time, while IPFS uses p2p for multi-threaded downloads, which can save more than 50% of bandwidth costs. At the same time, because of the centralized nature of the current Internet server, the information in the network can be completely controlled and difficult to save, but if a decentralized protocol such as IPFS is used, as long as the information is owned by any user in the network, the entire network This information can be obtained.
In the decentralized storage ecosystem, IPFS mainly assumes the role of the underlying technical protocol, and other specific business-side solutions will be implemented by the layer2 project. At present, based on the IPFS protocol, Filecoin, the incentive layer developed by the IPFS team, is most likely to land first. This project has opened a test network and will open the main network as soon as this year.
In addition to IPFS, there are many projects that are trying to implement decentralized storage, but there is no actual progress so far, so I won’t introduce too much.
Understand Filecoin
Filecoin is a decentralized storage project based on the IPFS protocol. Through the protocol, it establishes a bridge between users and storage service providers to provide users with decentralized storage services.
Participants in the Filecoin network can be divided into three parts: storage miners, retrieval miners, and users. Storage miners provide storage space for the decentralized storage network, and at the same time mortgage a part of tokens in the network to prevent miners from doing evil. Storage miners need to prove to the network that they have stored the user's target data within a certain period of time. If they cannot, a certain percentage of mortgaged tokens will be deducted as a penalty.
Retrieval miners are mainly responsible for providing users with data retrieval services in the network. When users initiate requests, retrieval miners need to find the corresponding information in the network and send it to users. At the same time, the network does not require the participation of miners. Miners can be storage miners and retrieval miners at the same time, or only participate in one of them.
Users also have greater flexibility when using Filecoin. On the one hand, users can use the Filecoin network as a network disk. Users only need to indicate the data and quotations they want to store, and the system will match the quotations of users and miners. When the price is unified, the system sends the user's data to the storage miners.
Storage miners put them into different sectors (storage units) according to the size of the data, and then encapsulate and slice the sectors, and the transaction information will be stored in the hash table in the network. During the continuation of the order, the node will continue to challenge the miner to check whether the miner has saved this file. This completes the entire process of storage.
Weight = ParentWeight + ECV + ECPrM * ratio
In terms of data privacy, when creating a storage subscription, users can choose whether the data they store can only be used for personal viewing or disclosed to the entire network. If it is disclosed to the whole network, everyone in the network can query this data through retrieval miners. Users store data in several different nodes.
In terms of consensus algorithms, Filecoin uses three different consensus algorithms: Expected Consensus, Proof-of-replication and Proof-of-Spacetime.
Expected Consensus is a consensus algorithm used in Filecoin to specify block generation rules, and it is a probability-based Byzantine fault-tolerant algorithm. According to Expected Consensus, when each new block is generated, a leader will be elected to produce a block. Similar to BTC mining, the probability of being selected is proportional to the ratio of computing power to the total network computing power, but in the Filecoin network, it is calculated based on the effective storage space provided by the miners rather than the computing power in the BTC network. Before each block is generated in the network, the miners will generate a newticket through the linear process of VRF and VDF according to the ticket value of the previous block prove, and then confirm whether it is valid by comparing the value of the newticket and the ratio of the effective computing power to the total computing power Qualified to produce blocks. If the newticket is less than its effective computing power/full network computing power, this miner is elected as a block miner.
Different from the BTC network, the election method of Filecoin may have the situation that no one produces a block or multiple people act as the leader to jointly produce a block. When no one produces a block, this height is displayed as an empty block in the network. Because there is also the possibility of multiple people producing blocks at the same time, the Filecoin team designed tipset—a tipset is a collection of blocks. Like the BTC network, due to the problem of network delay, two tipsets may exist at the same time, but in the end the network will elect a unique tipset based on weight and develop the chain mechanism downwards. The formula for calculating the weight of each block as follows:
Currently, ECV is set to 10, ECPrM is set to 100, and Ratio is the storage efficiency of the current node (the storage capacity of the node/the storage capacity of all nodes). That is to say, under the current algorithm, the higher the ratio of the node, the higher the Weight. When the weights are the same, the tipset with the smaller ticket value will be selected. Because such a technical design also makes the underlying architecture of filecoin more inclined to the DAG structure.
The Proof-of-Replication consensus algorithm is used to prove that miners have indeed completed the specified work. In the Filecoin network, the main job of miners is to store user data. Miners need to seal the user's data in a sector, and the system will generate a DRG (Depth-Robust-Graph) during the sealing process, also known as the replica value. Only the storage space corresponding to the generated replica value can be identified as valid storage space.
The Prove-of-spacement consensus is finally responsible for the verification step. When each new block is generated, the network will randomly challenge the miners. The miners need to provide the replica value for verification, and the verification is done through zero-knowledge proof. If the result is correct, it means that the miner has stored the encapsulated data normally, and its effective storage share remains unchanged. If there is a replica value that cannot be verified correctly, the storage space corresponding to the replica value will be removed from the valid storage space.
What are the issues worthy of attention in Filecoin?
At present, there is still a big gap between Filecoin's technical solutions and traditional mature cloud storage solutions. The following will briefly analyze the current problems of Filecoin.
information security issues
Compared with traditional centralized storage, the biggest advantage of decentralized storage is that it can better protect the privacy of data security. However, the current design of Filecoin makes people question the security and privacy of data.
Let’s take a look at the process of user storage: After the order is successfully matched, the user will send the original data directly to the miners. At this time, the user’s data is not encrypted or segmented. Miners have direct access to complete user data and are undetectable by the network or users. According to this technical solution, the privacy and security of users' personal data in Filecoin will be relatively threatened, and its security and privacy are even weaker than traditional cloud storage solutions. At the same time, there are loopholes that cause direct data leakage in the early stages of the storage process, so methods to ensure data privacy such as encryption and segmentation in the subsequent process become meaningless.
On the other hand, because all the data of a user is stored in a single hard disk of a miner, if the miner gives up mining for some reason, the data stored here by the user will be completely lost from the network, and there is no Set up relevant mechanisms to allow users to retrieve their data before the miners launch. Such unpredictable events also greatly reduce the security and reliability of data stored in the Filecoin network.
At the same time, according to the information mentioned in the previous AMA of this project, Filecoin is only a protocol to coordinate the needs of storage providers and users, and cannot require storage providers to make specified operations. Although this reduces the risk of decentralization and the occurrence of platform manipulation of user data, it reduces the storage provider's cost of doing evil. Even if the storage provider requires the storage provider to delete or destroy its stored data, the storage provider will not Users can be tricked into copying user data privately or even forcibly not executing user requirements. At present, the technical equipment or service providers of traditional cloud storage are all large companies. After the problem occurs, the responsible party can be identified immediately and the problem can be solved. However, the miners in decentralized networks such as Filecoin are relatively scattered and unknown. The follow-up solution after the problem occurred also caused great obstacles, making its comprehensive security weaker than that of centralized storage.
User Experience Issues
According to the overall description in the white paper, the user experience of Filecoin should be poor. First of all, the Filecoin project does not provide users with any additional technical services other than the simplest storage function, such as disaster recovery solutions. Users can only deal with storage by storing their files in several different nodes. Issues where the unit is dropped, damaged, etc. resulting in data inaccessibility or even data loss. For users who do not understand technology and disaster recovery, it may cause permanent loss of user data.
In addition, because of the information security issues mentioned above, if users want to further ensure the security of their own data, they need to encrypt files before storing them. In this way, the operation requirements for the user are relatively high, which makes the user experience even worse.
Incomplete storage technology solutions
At present, the Filecoin technical solution is not complete, which is an important reason affecting users' use of its storage network, such as the disaster recovery problem mentioned above. According to the current design of Filecoin, the data stored by the user will be stored independently by a miner each time, and other miners will not actively store the backup of this file. In this way, the requirements for disaster recovery technology are greater than those of traditional cloud storage, because traditional cloud storage is mainly due to the technical reasons of the equipment itself, but while Filecoin prevents the technical failure of the equipment itself, it also needs to prevent storage miners from stopping the provision due to various main reasons. The service situation occurs. There will be far more factors contributing to such situations than purely technical failures, and the solutions will be much more complex.
At present, Filecoin solves the disaster recovery problem by allowing users to store data in multiple miners for active backup. However, in the early days of the network, due to the instability of the miners themselves, the loss of multiple backup data, or even the loss of all backups cannot be ruled out. If the situation happens, it will be a very big blow to the operation of users and the network.
At the same time, because of pure active backup, the redundancy problem of the storage network cannot be solved. In traditional storage, because storage is centralized, the system can analyze, process and optimize all data in the background to remove duplicate data and improve storage network efficiency. Although the non-public data stored by users cannot be processed in this way due to the nature of decentralization, how to optimize the data storage structure in combination with the characteristics of the IPFS network for publicly available data stored by users is a direction that the team needs to study carefully.
The three problems mentioned above restrict the future development of Filecoin at different levels. From the analysis of the traditional storage market at the beginning of the article, it can be seen that the current storage market is basically divided into two categories: the first category mainly serves individuals and small business users, mainly in the form of SaaS platform technology exist. The user experience and product diversity of the SaaS platform have become relatively mature after years of development. At the same time, cloud disk applications such as Baidu.com and Google Drive have their own characteristics in file transfer, file search, and collaborative office. And advantages, Filecoin itself has little advantage in its competition. If there are loopholes in data security, filecoin will not survive in this competition.
Even if the data security problem is solved, from the perspective of real storage demand, the current development of Filecoin is not easy to be optimistic. According to the storage data of Chinese online disk users provided at the beginning of the article, it can be seen that the top three TGIs of current cloud disk users are mainly movies, sports and animation. These types of related video or picture data pay more attention to dissemination rather than privacy. Compared with decentralized storage, traditional network disks that can be shared or queried by users will have more advantages.
The data that really needs decentralized storage can be roughly divided into two categories: First, data such as personal photos and private files have a greater demand for decentralized storage, but such data is relatively scarce in the current stock market. Small. The second is the personal data generated by personal IoT devices in the future. However, because leading IoT device companies such as Xiaomi now regard user data as an important asset and can generate huge value through it, there is no specific regulation to promote it. Under this circumstance, the head company will not return the user's data to the customer. If leading companies are unwilling to return the number of individuals and decentralized storage is not applicable, then even if individual users are willing to use it, it is meaningless, because the fundamental purpose has not been achieved.
The second type of users are large and medium-sized enterprises. Large and medium-sized enterprises mainly use cloud storage to store company-related data, so they have high requirements for the reliability, security, and privacy of storage solutions. At present, traditional solutions include four solutions: public cloud, private cloud, hybrid cloud, and traditional storage matrix. The combination of private cloud and traditional storage hardware solves corporate customers' concerns about public cloud data privacy, and can also Meet customers' needs for reliability and security of storage solutions. At present, Filecoin lags behind traditional cloud storage solutions in three aspects, so it cannot compete with it. Compared with the traditional storage matrix, the benefits brought by decentralized storage are not pursued by traditional storage matrix users, so it is difficult to convert among these users. For decentralized storage, in the future, it is possible to start developing ToB-related businesses only when data reliability, security, and privacy can be guaranteed, and the price can be lower than the current traditional cloud storage solutions.
Apart from technical factors, there are many uncertainties in the Filecoin project.
The first is the team factor. The two mainnet launch promises in 2018 have not been fulfilled. The mining has changed from CPU mining to GPU mining, and at the same time, the mining rules have been changed during the testnet stage. Although the subsequent changes are to prevent miners from maliciously spamming data, such frequent changes reduce the trust of users and network participants.
In addition, so far, the mechanism and distribution rules of Filecoin mining rewards have not been determined, which further increases the uncertainty of the project.
Dropbox
Finally, the mining logic of Filecoin is different from other POW types of mining. The project requires long-term participation of miners and cannot withdraw in the middle, because the data may be lost forever if the miners quit. However, the specific details of Filecoin mining are too complicated, and the benefits The situation is also more difficult to budget, which also increases uncontrollable factors. At that time, whether there will be enough miners to participate and stay smoothly is a question worthy of attention. At the same time, the departure of miners in the middle will have more impact on the network, which can only be judged after the mainnet is launched.
In terms of price, because Filecoin has various technical and product problems mentioned above, it needs a very attractive price compared with traditional cloud storage to reduce users' usage costs.
Box
Sorting out mainstream cloud storage SaaS applications
Through the above analysis, we can know that at the current stage, the main competitor of Filecoin will be the SaaS application in the traditional cloud storage system, so the following will sort out the mainstream toB and toC Saas service charging schemes in the market for readers' reference:
The personal version provides 2G storage space for free, the Plus version provides 2T storage space and text search at the same time, mobile phone offline storage and other services are 78 yuan per month, the professional version provides 3T storage space, and on the basis of PLUS provides AutoOCR, file collaborative editing, file locking and other services cost RMB 130 per month.
The toB version, the plus price is 81 yuan per month, providing 5T storage space and 2G file transfer capacity, and the minimum price is 3 people. The Professional version is 130 yuan per person per month, providing wireless data space and 300G file transfer capacity, and it is also available for 3 people.
The personal version provides 10G storage space for free, and the uploaded single file size is limited to 250M. The paid version is divided into $5/month, $15/month, $25/month, and $35/month. The $5 plan provides 100G storage space, and the uploaded single file size is limited to 2GB; the $15 plan provides wireless storage space, and the uploaded single file size is limited to 5GB. At the same time, the box can be connected to an app, and the app data can be directly stored in the box. The $25 plan allows 3 apps to be connected, while the $35 plan allows unlimited apps.
Baidu Netdisk
The payment plan of Baidu’s personal network disk is shown in the above picture. The super member is 18 yuan per month, and the ordinary member is 8 yuan per month. Non-members have 15G storage space before completing tasks, and will get 2T after completing a series of sharing tasks storage space.
The picture above shows the introduction and price of Baidu Enterprise Network Disk.
Compared with the three centralized storage providers listed above, it is still difficult for Filecoin to compete with it in terms of enterprise-level services. In terms of personal services, Filecoin can only compete with it if it customizes more flexible payment plans and lower prices. At the same time, Filecoin and other distributed storage projects also need to try their best to enrich their additional functions to improve their competitiveness.
Prospects for the development of decentralized storage