Using Blockchain Technology to Ensure Data Integrity: Applying Hyperledger Fabric to Biomanufacturing

View PDF

www.istockphoto.com

Digitalization of manufacturing operations is a major challenge that many industries face. With the advent of smart equipment, automation of unit operations and complete processes, and digitalization of batch documentation, more data are generated now than ever before. The information must remain manageable, and data integrity needs to be ensured. The challenge for biomanufacturers will be to ensure that their entire large output of data will be attributable, legible, contemporaneous, original, and accurate (ALCOA) as defined by the US Food and Drug Administration (FDA) and other regulatory agencies.

Parallel to the digitalization efforts of the biopharmaceutical industry, a new technology for storing data in a decentralized database format was designed: blockchain. The concept has become known especially for cryptocurrencies such as Bitcoin, but it offers more possibilities. By using specific frameworks such as Hyperledger (1) that provide tools for development of blockchain-based applications, data storage can be decentralized in industry.

Here we elaborate on both the challenges of digital manufacturing and big data and the requirements for data integrity. We posit that those can be solved using blockchain technology as a decentralized database in biomanufacturing to ensure “data integrity by design.â€

Background
The main driver for the pharmaceutical industry to ensure data integrity is the ALCOA principle based on good documentation practice (GDP) related to the full range of data records — both paper and electronic. All data-integrity initiatives are based on ALCOA requirements, and each drug manufacturer is responsible for conforming to these principles. All data records must be complete and immutable, and their origin needs to be clear. Note that electronic systems and their records in the biopharmaceutical industry need to apply these principles according to 21 CFR Part 11.

When a company undergoes a merger or acquisition, as is common in the biopharmaceutical industry, a stringent information technology (IT) strategy often is not present to cover the entire company that results. Instead, site-specific IT environments develop with manifold machines, devices, infrastructure, systems, and applications. Hence, a range of possibilities has been introduced to store electronic data: file-based storage in local or central repositories using several different formats, database solutions on local clients, and central or middleware solutions to forward data between systems. For all those options, backup and restore functionalities must be established. That becomes a significant challenge for IT managers needing to ensure full data integrity within such complex systems. And for all business managers, a large amount of system knowledge is required to review raw data within an original system.

Blockchain Basics
A seminal white paper introducing the Bitcoin concept was published in 2008 under the pseudonym of Satoshi Nakamoto (2). Bitcoin was designed to “allow online payments to be sent directly from one party to another without going through a financial institution.†Through this technology, it became possible to ensure the security and reliability of data and transactions using the knowledge and resources of all participants.

By contrast with classic client–server networks, blockchain networks are based on a peer-to-peer approach in which each participant is both a client and a server providing specific services. Members of the network are called nodes, and those can be single computer systems or parts of complex systems. In addition to submitting data to a network, they validate data and transactions. Transactions contain information about an exchange of data among multiple nodes and can contain almost all conceivable variants of electronic records. After a certain number of transactions have been collected and validated, they are combined into an immutable “block†which is added to the “chain.â€

The term blockchain, however, is not clearly defined. A recently founded committee of the International Organization for Standardization (ISO) with the goal of harmonizing technologies within the blockchain field describes it as

a shared, immutable ledger that can record transactions across different industries, thus enhancing transparency and reducing transaction costs. It is a digital platform that records and verifies transactions in a transparent and secure way, removing the need for middlemen and increasing trust through its highly transparent nature. (3)

Ledger is a more general term for distributed storage of data. Therefore, distributed ledger technology (DLT) often serves as a synonym for blockchain — but DLT is more than that. It can enhance a blockchain with dedicated business functionalities to create smart contracts that perform operations such as applying business rules regarding data verification. A DLT also replicates data to all nodes, making the information validated and immutable. Validation mostly (but not always) is ensured by a consensus mechanism called proof of work, which requires a large amount of computer resources to solve a cryptographic problem through trial and error. As soon as one node solves that problem, its solution is verified by all the other nodes — and in the case of a final positive result, the validated data are applied to the blockchain and replicated automatically to all participants.

The combination of all of those functionalities has empowered the members of the Bitcoin network to establish an electronic currency system with no central authority. Its current market capitalization of about US$126 billion compares with the roughly $100 billion market capitalization of Goldman Sachs as of the end of 2019 (4). In addition to this well-known use of DLTs in public electronic currency systems, it is also possible to implement applications focused on enterprise solutions and restricted to private participants.

One framework that can be used to implement private blockchain solutions is Hyperledger. Launched in 2016 by the Linux Foundation, the Hyperledger project is supported now by IBM, Intel, SAP, and a number of other companies including leaders in finance, banking, the internet of things (IoT), supply chain, manufacturing, and technology (5). Compared with other similar solutions, the Hyperledger project focuses on business processes and requirements. Under its umbrella are several subprojects including Fabric, which by design provides a modular approach to encapsulate data for predefined groups (channels). Additionally, smart contracts can be implemented in a range of languages to apply required business functionalities.

Implementation of Hyperledger Fabric
Below we consider a heterogenous biomanufacturing network with a large amount of equipment and systems, using as an example a chromatography column that regularly produces data. The two main use cases are

  • writing data nearly continuously
  • reading data on demand.

Figure 1: Components of a Hyperledger Fabric network

The major requirements for writing data are defined within the ALOCA principle: A chosen solution needs to be robust to ensure that even hundreds of transactions per minute will not harm the system and that the solution can scale with increasing demands in the future. Blockchain began as public, open, and accessible to everyone, but access to business blockchain solutions must be restricted tightly — from general access to specific roles and permissions.

A Fabric network consists of several components to ensure segregation of duties and responsibilities as illustrated in Figure 1. All components can be installed easily because they are provided as preconfigured “containers†(6). Those are similar to virtual machines but are more “lightweight†because they use the host operating system instead of providing their own.

Network Components: The major components that make up the Fabric network are channels, peers, and an ordering service.

Channels allow only their members to communicate and exchange data with each other. This component is the first piece of the puzzle to ensure clear separation of data and access. In the example of a column, a channel called “Production†could be used. Each channel owns its own data structure to store all transactions including the corresponding data. This data structure is defined as the ledger of the channel. The ledger contains the data in blockchain form as well as metainformation about them. The members of the channel (such as the column itself) are defined as peers that store a complete copy of that channel’s ledger. As a result, the channel data is distributed as a true copy to each peer of the channel.

Peers can have specific permissions to access the channel. For example, they may be able to write or read data only. Additionally, peers also can host smart contracts (called chaincodes) to implement business functionality for data access. Finally, the ordering service is the orchestrator of the network, which manages and distributes the transactions between peers and channels. Besides these major components, several configurations define the detailed membership and competencies of each component whereby membership can be divided again to organizations and consortia to determine the permissions to the network.

Distributed Validation: Networks based on such a framework require no time- and cost-consuming proof of work; instead, they use an endorsement policy as a consensus mechanism to process a large number of transactions with the same reliability. Doing so is a complex process that is handled in the background, requiring no manual action by participants (users).

After an application requests to add or change data, that request is sent to all endorsing peers in the network. They all check the transaction data independently from one another and send their results to the ordering service. An endorsement policy defines how many endorsing peers need to validate the transaction before it is confirmed. Afterward, the ordering service broadcasts the transaction to all available network peers, which again validate the transaction against their local copy of the ledger.

In the example of a chromatography column, its corresponding control unit creates a certain kind of data during the packing process that is validated by all other connected equipment of the channel “Production.†When a positive result is obtained, the transaction is appended to the end of the blockchain. Finally, the original requesting application is informed about the validation results of the transaction. The entire validation process is complex and secure but extremely fast. Based on IBM research, a network can process up to 3,500 such transactions per second (7).

None of the applications using a Fabric network can access the blockchain data directly, neither to read nor write. For each such access, a smart contract (chaincode) needs to be used. For example, in the case of the column-packing process, the smart contract could ensure that in addition to the height-equivalent to a theoretical plate (HETP) results, the input parameters such as bed height and plate number also get stored. Implementation can be achieved using common programming languages such as Go (8), Node.js (9), and Java (10). That implementation is straightforward because significant functionality already is provided by the framework. After chaincodes have been implemented, they need to be installed and instantiated on appropriate peers — either all of them or only specific ones — to allow access to the data.

Figure 2: Error for invalid response

Because chaincodes are used to control blockchain data, they also could be used to manipulate peers and the data in a blockchain. All peers of the network validate each transaction. If one peer tries to manipulate data in the blockchain, then the ordering service will recognize that the response of that peer is different from responses of the others and will stop the transaction with a corresponding error (Figure 2). The company’s endorsement policy defines how many endorsing peers need to respond with the same reply for a transaction to be accepted as valid. With enough valid peers, a manipulating peer would be identified and excluded from further transactions.

The same result would come from attempted manipulation of the blockchain itself, which is stored as a file locally on each peer. As soon as the blockchain of one peer differs from that of the others’, the results of its next transaction validation will differ and be recognized as invalid. With Fabric network data integrity by design, manipulation by single peers becomes impossible, and attempting peers will be identified and excluded automatically.

An Enabling Technology
Digitalization of manufacturing processes is creating more data that need to be stored reliably in an integrated way. With its focus on enterprise, a Hyperledger Fabric network could enable applications to use blockchain technology in the biopharmaceutical industry. With a network set up as a private blockchain solution — and using the options of channels, differentiation of peers, and specific permissions — it becomes possible to restrict access tightly. To enable access for such a network, technologies such as Node.js, Java, and representational state transfer (REST) web services are available.

Most smart equipment and applications should be able to access a Hyperledger Fabric network right out of the box. If not, middleware could be used to translate. Ensuring data integrity according to the ALCOA principle also can be managed:

  • Data are attributed by corresponding transactions where all partners are identified by their digital certificates.
  • Legible data are ensured using applications for access or by opening the blockchain file in any type of editing application because no proprietary formats are used. New data are appended so that existing data remain forever.
  • Contemporaneous writing can be achieved because all equipment, applications, and other participants can join the network and add to the data directly.
  • Original data are kept and replicated to all nodes of the network. If one node leaves the network or gets replaced, or if a new node joins, that does not affect data on the other nodes.
  • Accuracy is ensured because data cannot be changed after they have been committed to the blockchain. Changes can be made, but they will be appended as a new version of the information to the blockchain.

In addition to those functional requirements, usually an explicit requirement to use established solutions with long-term support will ensure functionality in the future. The Hyperledger project was founded in 2016, so doubts could arise about long-term support. However, with the project supported by several leading industry partners, it should be relevant and enhanced in the future — as suggested so far by the new releases occurring once per quarter.

Blockchain technology in general and Hyperledger Fabric in particular can help manage the future of “Pharma 4.0.†Data will continue to be generated, and it is already a challenge to standardize their management. Blockchain applications might not be appropriate for all purposes (e.g., for business-intelligence applications building data cubes may not benefit), but all distributed systems and applications that write and access data on demand could benefit. The foundations have been laid for managing the digital transformation of Industry 4.0.

References
1 Hyperledger. The Linux Foundation: San Francisco, CA, 2020; http://hyperledger.org.

2 Nakamoto S. Bitcoin: A Peer-to-Peer Electronic Cash System. White paper 31 October 2008; https://bitcoin.org/bitcoin.pdf.

3 Nadin C. Blockchain Technology Set to Grow Further with International Standards in Pipeline. ISO News 24 May 2017; https://www.iso.org/news/Ref2188.htm.

4 CoinLore 2020; https://www.coinlore.com.

5 About Hyperledger. The Linux Foundation: San Francisco, CA, 2020; https://www.hyperledger.org/about.

6 Blockchain Network. The Linux Foundation: San Francisco, CA, 2019; https://hyperledger-fabric.readthedocs.io/en/release-1.4/network/network.html.

7 Androulaki E, et al. Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. arXiv 30 Jan 2018; https://arxiv.org/pdf/1801.10228v1.pdf.

8 “Go†Open-Source Programming Language. Google: Mountain View, CA, 2020; https://golang.org.

9 Node.js. OpenJS Foundation: San Francisco, CA, 2020; https://nodejs.org.

10 Java Download. Oracle: Redwood City, CA, 2020; https://www.java.com.

Prof. Dr.-Ing. habil. Joachim Warschat is institute director at the Fraunhofer Institute for Industrial Engineering, Technology, and Innovation Management, and he teaches technology and innovation management at FernUniversität in Hagen, Germany. Corresponding author René Bergemann has worked in the pharmaceutical industry since 2015 and has been responsible for implementation of several information technology projects (bergemannrene01@gmail.com).