Trustless and Decentralized Machine Learning with Zero-Knowledge Proofs and Blockchains
The rise of artificial intelligence (AI) in terms of its performance and complexity has accelerated rapidly in the last few years. As a result, consumers of AI models are increasingly relying on external cloud providers for the development and operation of these models, a concept known as 'Machine Learning as a Service' (MLaaS), which raises data protection and security concerns. The recent technological breakthrough of zero-knowledge proofs (ZKPs), as well as the rapid evolution of blockchain technology, hold great promise for solving these problems and revolutionizing the current use of MLaaS systems.
This blog examines the applications of ZKPs and blockchain technology within the MLaaS framework and predicts future developments in this field. Initially, current challenges of MLaaS providers are analyzed. Subsequently, existing applications of ZKPs and blockchains in the MLaaS framework are outlined, followed by their current limitations, from which potential future technological developments are derived.
Background: What is Machine Learning and Zero-knowledge Proofs?
AI encompasses algorithms and models that enable computers to perform tasks that would require human intelligence. Machine learning (ML), a subfield of AI, refers to the ability of computers to learn and make predictions based on experience. A fundamental concept of this blog is the process of the development of ML models, which is visualized below. The ML process includes the steps of problem definition, data collection, data pre-processing, model selection, training, evaluation, deployment and inference.
A zero-knowledge (ZK) protocol is a cryptographic protocol that allows one party (prover) to prove to another party (verifier) the (in)validity of a statement without revealing any information other than the fact of the (in)validity of the statement.
What challenges does machine learning currently face?
One of the key challenges that the ML framework faces is the concept of centralization. A centralized information system is dependent on a central point of contact and is centrally controlled by a single entity, from which, as shown below, numerous challenges arise.
The central point of contact poses security risks, as server-side attacks on this hub can lead to data breaches or system failures, making it an attractive target for attackers. Moreover, the limited resources impair the system's horizontal scalability.
Central control can result in an abuse of power, characterized by a violation of data privacy or data and computational integrity. Additionally, it can cause a lack of transparency, which undermines the consumers' trust in the correct execution of operations and increases the risk of abuse of power.
What are the current applications of ZKPs and blockchain technology in the ML process?
The current applications of ZKPs and blockchain technology create a trustless and decentralized ML process centered on inference, computation, and storage.
Trustless inference
The implementation of ZKPs in the inference step ensures the computational integrity of the centralized entity without eliminating its centralized decision-making ability, thereby transforming it into a trustless process.
Decentralized computation
Decentralized computation transfers centrally executed computation to a distributed network which eliminates the central point of contact, overcoming the resulting single point of attack and limited horizontal scalability. This forms a marketplace of computing providers and consumers. The resource providers' computational integrity is ensured by the following game-theoretical factors, such as redundant computation, provider reputation and financial incentives.
Decentralized storage
A decentralized storage network allows for distributed, redundant storage at various storage providers and could serve for storing the datasets aggregated in the data collection.
What are the limitations of the current applications?
The current applications are limited by a lack of privacy, multiple challenges in the verification process within decentralized computation networks, and high proof generation costs.
Lack of privacy in server-side proof generation
Since the prover inherently knows all hidden aspects of a proof, only client-side proof generation ensures the privacy of the client data. Otherwise, in server-side proof generation, the transmission of the data to the model provider would compromise the client’s privacy.
Limitations of decentralized computation
Redundant computing inherently demands more computing power, raising consumer costs significantly. Moreover, the verification process only lowers, not eliminates, the risk of integrity breaches and is inherently limited to deterministic computations. Finally, the data associated with the computation must be public for computing providers to access them.
High costs of generating ZKPs
Research suggests that generating a ZKP for an ML operation requires about a thousand times the computational power of the original ML operation. Since the specific computational intensity varies depending on the complexity of the ML model, complex models require server-side proof generation, while client-side computed ZKPs are limited to simpler models. Additionally, ZKP implementations are currently practically limited to the inference phase.
What are the expected future technological developments?
Future technological developments are centered on solving existing limitations and further decentralizing the steps of the ML process.
Potential combination of ZKPs with Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) resolves the lack of privacy in server-side proof generation. FHE represents a form of encryption that allows computation on encrypted data. Since its computational integrity cannot be verified, a combination with ZKPs presents a highly interesting field of research. However, there are concerns about the technical compatibility and massive computing costs that make a potential linkage in the near future unlikely.
Cost reduction in the creation of ZKPs
The overarching goal is to reduce the cost of proof generation and increase the implementation potential of ZKPs. Estimates for a daily cost optimization rate are at three to four percent. This rapid progression lifts the practical limitation of ZKPs to the inference phase. Consequently, the training of ML models could be a potential future area of application for ZKPs. However, the high computational burden could question its value proposition.
Verification of computational integrity through ZKPs in decentralized computing networks
The verification of computational integrity with ZKPs in decentralized computing networks would overcome today’s limitation to deterministic computations and would enable an (almost) unequivocal verification. Therefore, a cost reduction of ZKPs could lead to them being integrated into the verification process of decentralized computing networks.
Decentralization of data collection
The establishment of a decentralized data marketplace enables providers to contribute data on their own initiative. The quality verification of the data remains an unresolved challenge, as irrelevant, erroneous, or malicious data could be integrated. Three combinable solutions include a reputation system based on data quality feedback, attested sensors with ZKPs for data authenticity and processing, and rewarding data providers for proven model performance improvements using ZKPs.
Decentralized governance of a ML model
The implementation of Decentralized Autonomous Organizations (DAOs) could challenge centralized governance of a ML model. Decentralized governance could foster transparency and reduce risks such as fraud or poor decision-making.
Summary of key points
The following table provides a summary of the key points discussed in this blog.
Conclusion
Blockchains sacrifice efficiency for uncompromising security, which hinders the execution of complex computations such as ML operations. ZKPs allow blockchain-based applications to access computationally intensive calculations performed off-chain in a trustless manner and thereby hold the potential to significantly expand the spectrum of blockchain applications.
Thank you to Eric Rode for conducting research and contributing to the creation of this blog post.
Comments