SoSoValue: Advanced AI-Powered Crypto Investment Research Platform

Total MarketCap:$00

API

Dark

SearchSSI/Mag7/Meme/ETF/Coin/Index/Charts/Research

00:00 / 00:00

TrustlessLabs

**Reading, Indexing to Analysis: A Brief Overview of the Web3 Data Indexing Track**

This article explores the development history of blockchain data accessibility, comparing the characteristics of three data service protocols: The Graph, Chainbase, and Space and Time in terms of architecture and application of AI technology. It points out that blockchain data services are evolving towards intelligence and security, and will continue to play an important role as industry infrastructure in the future.

Feel free to support us with a click! 🫡

1. **Introduction**

Since the first wave of dApps such as Etheroll, ETHLend, and CryptoKitties in 2017, to the flourishing of various financial, gaming, and social dApps based on different blockchains today, have we ever considered the sources of the various data that these dApps adopt during interactions when discussing decentralized on-chain applications?

In 2024, the focus is on AI and Web3. In the world of artificial intelligence, data acts as the lifeblood for its growth and evolution. Just as plants rely on sunlight and moisture to thrive, AI systems depend on vast amounts of data to continuously "learn" and "think." Without data, no matter how sophisticated the AI algorithms are, they remain castles in the air, unable to exert their intended intelligence and effectiveness.

This article analyzes the evolution of blockchain data indexing from the perspective of blockchain data accessibility, comparing the established data indexing protocol The Graph with emerging blockchain data service protocols Chainbase and Space and Time. It particularly explores the similarities and differences in the data services and product architecture of these two new protocols that incorporate AI technology.

2. **The Complexity and Simplicity of Data Indexing: From Blockchain Nodes to Full-Chain Databases**

2.1 **Data Sources: Blockchain Nodes**

From the moment we start to understand "what blockchain is," we often encounter the phrase: blockchain is a decentralized ledger. Blockchain nodes are the foundation of the entire blockchain network, responsible for recording, storing, and disseminating all transaction data on the chain. Each node has a complete copy of the blockchain data, ensuring the maintenance of the network's decentralized characteristics. However, for the average user, building and maintaining a blockchain node is not an easy task. It not only requires specialized technical skills but also comes with high hardware and bandwidth costs. At the same time, the querying capabilities of ordinary nodes are limited and cannot retrieve data in the formats required by developers. Therefore, although theoretically everyone can run their own node, in practice, users usually prefer to rely on third-party services.

To address this issue, RPC (Remote Procedure Call) node providers have emerged. These providers are responsible for the costs and management of nodes and provide data through RPC endpoints, allowing users to easily access blockchain data without having to build their own nodes. Public RPC endpoints are free but come with rate limits that may negatively impact the user experience of dApps. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval can require a significant amount of back-and-forth communication. This makes them resource-intensive and inefficient for complex data queries. Additionally, private RPC endpoints are often difficult to scale and lack compatibility across different networks. However, the standardized API interfaces provided by node providers lower the barrier for users to access on-chain data, laying the groundwork for subsequent data parsing and applications.

2.2 **Data Parsing: From Raw Data to Usable Data**

The data obtained from blockchain nodes is often raw data that has been encrypted and encoded. While this data preserves the integrity and security of the blockchain, its complexity also increases the difficulty of data parsing. For average users or developers, directly processing this raw data requires a large amount of technical knowledge and computing resources.

The process of data parsing is particularly important in this context. By parsing complex raw data and converting it into a more understandable and operable format, users can more intuitively understand and utilize this data. The success of data parsing directly determines the efficiency and effectiveness of blockchain data applications and is a key step in the entire data indexing process.

2.3 **The Evolution of Data Indexers**

As the volume of blockchain data increases, the demand for data indexers is also growing. Indexers play a crucial role in organizing on-chain data and sending it to databases for easy querying. The working principle of an indexer is to index blockchain data and make it available through a SQL-like query language (such as GraphQL API). By providing a unified interface for querying data, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, significantly simplifying the process.

Different types of indexers optimize data retrieval in various ways:

- **Full Node Indexers**: These indexers run full blockchain nodes and extract data directly from them, ensuring data completeness and accuracy, but requiring significant storage and processing power.
- **Lightweight Indexers**: These indexers rely on full nodes to fetch specific data as needed, thus reducing storage requirements but potentially increasing query times.
- **Dedicated Indexers**: These indexers are specialized for certain types of data or specific blockchains, optimizing retrieval for specific use cases, such as NFT data or DeFi transactions.
- **Aggregate Indexers**: These indexers extract data from multiple blockchains and sources, including off-chain information, providing a unified query interface, which is particularly useful for multi-chain dApps.

Currently, the Ethereum archive nodes in the Geth client in archive mode occupy about 13.5 TB of storage space is required, while the archival demand under the Erigon client is about 3 TB. As the blockchain continues to grow, the amount of data storage required for archival nodes will also increase. In the face of such vast data volumes, mainstream indexing protocols not only support multi-chain indexing but also customize data parsing frameworks to meet the data needs of different applications. For example, The Graph's "Subgraph" framework is a typical case.

The emergence of indexers has greatly improved the efficiency of data indexing and querying. Compared to traditional RPC endpoints, indexers can efficiently index large amounts of data and support high-speed queries. These indexers allow users to perform complex queries, easily filter data, and analyze it after extraction. Furthermore, some indexers also support aggregating data sources from multiple blockchains, avoiding the need to deploy multiple APIs in multi-chain dApps. By running distributed across multiple nodes, indexers not only provide enhanced security and performance but also reduce the risks of interruptions and downtime that may arise from centralized RPC providers.

In contrast, indexers use pre-defined query languages, allowing users to directly obtain the required information without having to deal with the underlying complex data. This mechanism significantly improves the efficiency and reliability of data retrieval and represents an important innovation in blockchain data access.

2.4 Full-Chain Database: Aligning with Stream-First

Using index nodes to query data typically means that the API becomes the sole portal for digesting on-chain data. However, when a project enters a scaling phase, it often requires more flexible data sources, which standardized APIs cannot provide. As application demands become more complex, primary data indexers and their standardized indexing formats gradually struggle to meet increasingly diverse query needs, such as searches, cross-chain access, or off-chain data mapping.
In modern data pipeline architectures, a "stream-first" approach has become a solution to overcome the limitations of traditional batch processing, enabling real-time data ingestion, processing, and analysis. This paradigm shift enables organizations to respond immediately to incoming data, deriving insights and making decisions almost instantaneously. Similarly, the development of blockchain data service providers is also moving towards building blockchain data streams, with traditional indexer service providers gradually launching products that obtain real-time blockchain data through data streams, such as The Graph's Substreams, Goldsky's Mirror, and real-time data lakes generated based on blockchains like Chainbase and SubSquid.

These services aim to address the need for real-time parsing of blockchain transactions and provide more comprehensive querying capabilities. Just as the "stream-first" architecture revolutionizes data processing and consumption in traditional data pipelines by reducing latency and enhancing responsiveness, these blockchain data stream service providers also hope to support the development of more applications and assist on-chain data analysis through more advanced and mature data sources.

By redefining the challenges of on-chain data from the perspective of modern data pipelines, we can view the management, storage, and provision of on-chain data in a whole new light. When we start to regard subgraphs and Ethereum ETL as data streams in data pipelines rather than final outputs, we can envision a possible world that can tailor high-performance data sets for any business use case.

3. AI + Database? In-depth Comparison of The Graph, Chainbase, Space and Time

3.1 The Graph

The Graph network enables multi-chain data indexing and querying services through a decentralized network of nodes, facilitating developers to easily index blockchain data and build decentralized applications. Its main product models are the data query execution market and the data indexing cache market, both of which essentially serve the product query needs of users. The data query execution market specifically refers to consumers paying index nodes that provide the desired data, while the data indexing cache market is where index nodes allocate resources based on historical indexing popularity of subgraphs, query fees collected, and demand from on-chain curators for subgraph outputs.

Subgraphs are the fundamental data structure in The Graph network. They define how to extract and transform data from the blockchain into a queryable format (e.g., GraphQL schema). Anyone can create subgraphs, and multiple applications can reuse these subgraphs, enhancing data reusability and efficiency.
The Graph network consists of four key roles: indexers, curators, delegators, and developers, who collectively provide data support for web3 applications. Here are their respective responsibilities:
· Indexer: Indexers are node operators in The Graph network, who participate in the network by staking GRT (The Graph's native token) and provide indexing and query processing services.
· Delegator: Delegators are users who stake GRT tokens to support the operation of index nodes. Delegators earn a portion of the rewards through the index nodes they delegate.
· Curator: Curators are responsible for signaling which subgraphs should be indexed by the network. They help ensure that valuable subgraphs are prioritized for processing.
· Developer: Unlike the other three, who are supply-side participants, developers are demand-side users and the main users of The Graph. They create and submit subgraphs to The Graph network, awaiting the network to fulfill their data needs.
Currently, The Graph has shifted to a comprehensive decentralized subgraph hosting service, with circulating economic incentives among different participants ensuring the system operates smoothly:

· Indexer rewards: Indexers earn income through consumer query fees and part of the GRT token block rewards.
· Delegator Rewards: Delegators receive a portion of the rewards through the index nodes they support.
· Curator Rewards: If curators signal valuable subgraphs, they can earn a portion of the query fees.

In fact, The Graph's products are also rapidly evolving in the AI wave. As one of the core development teams of The Graph ecosystem, Semiotic Labs has been dedicated to optimizing index pricing and user query experiences using AI technology. Currently, the tools developed by Semiotic Labs, including AutoAgora, Allocation Optimizer, and AgentC, have enhanced the ecosystem's performance in various aspects.

· AutoAgora introduces a dynamic pricing mechanism that adjusts prices in real-time based on query volume and resource usage, optimizing pricing strategies to ensure the competitiveness and revenue maximization of indexers.
· Allocation Optimizer addresses the complex issue of resource allocation for subgraphs, helping indexers achieve optimal resource configuration to enhance revenue and performance.
· AgentC is an experimental tool that allows users to access The Graph's blockchain data through natural language, thereby enhancing user experience.

The application of these tools has further enhanced the intelligence and user-friendliness of The Graph through AI assistance.

3.2 Chainbase

Chainbase is an all-chain data network that integrates all blockchain data into a single platform, making it easier for developers to build and maintain applications. Its unique features include:

· Real-time Data Lake: Chainbase provides a real-time data lake specifically for blockchain data streams, allowing data to be accessed instantly as it is generated.
· Dual-Chain Architecture: Chainbase is built on Eigenlayer AVS, creating an execution layer that forms a parallel dual-chain architecture with CometBFT's consensus algorithm. This design enhances the programmability and composability of cross-chain data, supporting high throughput, low latency, and finality, while improving network security through a dual-staking model.
· Innovative Data Format Standard: Chainbase introduces a new data format standard called "manuscripts," optimizing the structuring and utilization of data in the cryptocurrency industry.
· Crypto World Model: With its vast blockchain data resources, Chainbase combines AI model technology to create AI models that can effectively understand, predict, and interact with blockchain transactions. The basic version model, Theia, has been launched for public use.

These features make Chainbase stand out in blockchain indexing protocols, particularly focusing on the accessibility of real-time data, innovative data formats, and the creation of smarter models through the combination of on-chain and off-chain data to enhance insights.

Chainbase's AI model, Theia, is a key highlight that differentiates it from other data service protocols. Theia is based on the DORA model developed by NVIDIA, combining on-chain and off-chain data with temporal and spatial activities to learn and analyze cryptocurrency patterns, responding through causal reasoning to deeply explore the potential value and patterns of on-chain data, providing users with more intelligent data services.

AI-empowered data services make Chainbase not just a blockchain data service platform but a more competitive intelligent data service provider. With robust data resources and proactive AI analysis, Chainbase can provide broader data insights and optimize users' data processing workflows.

3.3 Space and Time

Space and Time (SxT) aims to create a verifiable computing layer that extends zero-knowledge proofs on decentralized data warehouses, providing trustworthy data processing for smart contracts, large language models, and enterprises. Currently, Space and Time has secured $20 million in its latest series A funding round, led by Framework Ventures, Lightspeed Faction, Arrington Capital, and Hivemind Capital.

In the field of data indexing and validation, Space and Time introduces a novel technological path — Proof of SQL. This is an innovative zero-knowledge proof (ZKP) technology developed by Space and Time that ensures SQL queries executed on decentralized data warehouses are tamper-proof and verifiable. When a query is executed, Proof of SQL generates a cryptographic proof that verifies the integrity and accuracy of the query results. This proof is attached to the query results, allowing any verifier (such as smart contracts) to independently confirm that the data has not been altered during processing. Traditional blockchain networks typically rely on consensus mechanisms to validate the authenticity of data, while Space and Time's Proof of SQL implements a more efficient data validation method. Specifically, in Space and Time's system, one node is responsible for data acquisition, while other nodes verify the authenticity of the data using zk technology. This approach changes the resource consumption of multiple nodes redundantly indexing the same data under consensus mechanisms to reach an agreement, thus enhancing the overall performance of the system. As this technology matures, it provides a stepping stone for various traditional industries focusing on data reliability to use blockchain data to construct products.

At the same time, SxT has been closely collaborating with Microsoft's AI Joint Innovation Lab to accelerate the development of generative AI tools, making it easier for users to access blockchain data through natural language processing. Currently, in the Space and Time Studio, users can experience entering natural language queries, which the AI automatically converts into SQL and executes on behalf of the user to present the final results needed by the user.

3.4 Comparison of Differences
4. Conclusion and Outlook

In summary, blockchain data indexing technology has evolved from the initial source of node data, through the development of data parsing and indexers, to finally evolve into AI-powered full-chain data services, going through a process of gradual improvement. The continuous evolution of these technologies not only enhances the efficiency and accuracy of data access but also provides users with an unprecedented intelligent experience.

Looking ahead, as AI technology and new technologies such as zero-knowledge proofs continue to develop, blockchain data services will become further intelligent and secure. We have reason to believe that blockchain data services will continue to play an important role as infrastructure in the future, providing strong support for industry progress and innovation.

All You Need to Know in 10s

TermsPrivacy PolicyWhitePaperOfficial VerificationCookieBlog