Data Interoperability: A Case Study in Complex Systems Engineering
Marlene R. Williamson
The US Department of Defense (DOD) has a long history dealing with data interoperability challenges. Many solutions have been successfully implemented, but all have exhibited limitations when the scope of their application has been expanded. The introduction of the internet and the concept of Netcentric Warfare have magnified both the need for and the challenges of data interoperability. It is becoming increasingly clear that the fundamental problem is one of complexity, and this understanding has influenced recent Joint military guidance and directives. This paper will discuss the problem domain and solutions as a case study in complex systems engineering (CSE). The case study provides preliminary insights involving scale and emergence in CSE.
DOD Data Interoperability is an ideal case study for complex systems engineering (CSE). There is a long history of limited successes and ultimate failures so that the challenges are well recognized. Further, new concepts for networked operations present additional challenges, and it is recognized that new kinds of solutions are needed. Experience has shown that achieving data interoperability across a large diverse enterprise is intractable. In addition to technical challenges, there are organizational and cultural issues. It is becoming increasingly clear that complexity is the true problem. Recognizing this, the Joint DOD community has begun to approach data interoperability as a problem in CSE. This paper reports on the some preliminary issues and insights.
CSE is about achieving a process, not an end state. System engineering implies some level of deliberate control or standardization, while complexity implies a level of unpredictable evolutionary development. Instituting a process that allows for both standardization and variability is a key CSE challenge. A related challenge is balancing competition and collaboration. Competition is valuable because it drives innovation, but collaboration is essential to provide coherence. Taken to extremes, competition can lead to counterproductive battles between magic bullet solutions, while collaboration can involve cumbersome processes, which are also counterproductive.
CSE is a human process with concomitant complexities and limitations. As the size and diversity of a stakeholder group grows, fewer system details can be standardized. Matching the degree of standardization to the scale of the stakeholder group is another key CSE challenge. Effective communication is a critical element, and useful system documentation is a key enabler.
1.1 What is Data Interoperability?
Data Interoperability is the ability to correctly interpret data that crosses system or organizational boundaries [Renner undated]. Specifically, data interoperability means that one system (and its users) can understand and interpret the data that comes from another system (and its users). This understanding includes both the syntax (format) and the semantics (meaning) of the data. Data interoperability does not include process interoperability, which ensures that the other system has the information that is needed, nor does it include communications interoperability, which ensures that there is a data transfer mechanism between the systems.
Human communication is a key element of data interoperability. System builders and their users must share a common vocabulary. Typically shared understanding is defined implicitly by stakeholders in a community of common interest. Disconnects among these implicit understandings can inhibit data interoperability within and especially between communities of interest.
1.2 Why is Data Interoperability Complex?
Four characteristics contribute to the complexity of DOD data interoperability. First is the large number of systems. The number of interfaces increases as the square of the number of systems. An enterprise with many thousands of systems has potentially many millions of interfaces. The DOD, for example, has over a thousand logistic systems alone.
Second, changing operational needs continually require new and modified systems with new and modified interfaces. Changes occur rapidly during war when new tactics and procedures evolve quickly to meet changing combat conditions and enemy tactics. Changing requirement can be imposed even when a system is still in development. For example, the end of the cold war with the concomitant emphasis on tactical missions led to modifications to many DOD military systems and their interfaces.
Third, interfaces must accommodate asynchronous implementation and deployment of systems. Interfacing systems may be developed in parallel by different contractors under different program timelines, and multiple versions of each system may be in the field simultaneously. For example, an entire surveillance aircraft cannot be grounded for retrofitting so upgrading is an asynchronous multi-year process. Since new upgrades are being introduced continuously, each aircraft system may be unique.
Finally, diverse communities have diverse information content with domain-specific vocabularies. For example, the DOD uses health, training, housing, transportation, procurement, and accounting information, in addition to weapon systems, support and logistics information. The deployment of hundreds of thousands of warfighters may require coordination and interoperability among all these communities. However, even within a single community, vocabularies vary in subtle ways. For example, does the aircraft weight include the on-board mission equipment, the fuel, and the cargo weights? Does the cargo weight include shipping boxes or pallets? Yet weight is a simple concept relative to abstractions, such as mission.
2 Past Solutions
Many approaches to data interoperability have been successful in small to medium communities of stakeholders but have become intractable as the number and diversity of stakeholders increases. One approach is to do nothing assuming that interoperability will happen, because it is in everyones best interest. However, this approach is rarely if ever successful. Doing nothing has resulted in stand-alone systems with undefined or poorly defined interfaces (or no interfaces) to critical information.
A more successful approach for a few systems is to define and control the interfaces between them. Each pair of systems agrees to comply with a detailed documented interface, with changes effected via an Interface Control Board that includes the relevant stakeholders. Negotiating interface changes between systems with different contractors, users, funding sources, and development timelines is challenging but has sometimes been successful. In practice, several issues arise. The documentation may include only the data syntax, not the semantics, because the stakeholders come from a small community where the syntax is implicitly understood. This makes it difficult for those outside the community to reuse the interfaces. For systems with many complicated interfaces and frequent changes, the documentation is costly to maintain and may become out of date. Achieving backward compatibility with numerous versions of fielded systems is also challenging. Worse, the number of interfacing pairs of systems increases as the square of the number of systems, so this approach seems to be most successful for a hub and spokes system-of-systems configuration, for example for interfaces between an aircraft and its support equipment.
Two approaches have been successful for interfaces between larger numbers of systems: standardized message sets and standardized data bases. Message sets and data bases encounter similar problems as the number of systems increases. These include expanding documentation of highly complicated standards, which makes the standards expensive to implement, backward compatibility issues, and an unresponsive change process. As a result, implementations become fragmented and incomplete.
For example, each of the many DOD message standards may have several versions, which are not compatible with each other, and each version may have a hundred or more defined messages each with a hundred or more defined fields. The documentation for a message standard may fill a bookshelf. Due to cost and schedule constraints, an entire standard is rarely if ever implemented in a single system. Moreover interpretations of the standard may differ. As a result, using a message standard does not guarantee interoperability.
Databases also become unwieldy. In an attempt to insulate systems from frequent changes in database content and structure, systems-of-systems may develop modified versions of the database, limiting interoperability outside the system-of-systems.
A final approach has been to standardize vocabulary and data models. These approaches have been effective for some well-defined communities. However, they have had the same limitations for diverse communities as the other standards. Migrating legacy systems to new data model standards strains limited resources. Cost and schedule limitations have also led implementers of new systems to ignore complicated standardized vocabularies and data models.
Although past solutions have merit, all have failed to provide interoperability for large diverse communities. Solutions involving coordinated standards have lead to workarounds and subsequent incompatibilities. Nevertheless, uncoordinated interface control has also failed.
In the absence of deliberate engineering, data interoperability does not emerge because the three major groups of stakeholders, those who fund, use and build the systems, have other pressing priorities. Funders are primarily interested in providing new capabilities; users want to fix todays deficiencies quickly; and system builders are focused on meeting severe cost and schedule constraints.
At the same time, standardization is fundamentally limited. Large amounts of diverse data cannot be standardized over large diverse groups of stakeholders because it is too expensive to implement and too slow to change. Complexity limits the ability to manage large standards.
3 New Guidance and Directives
Recognizing these problems, the DOD has issued new guidance and directives to foster both coordination and evolution. There are two thrusts. One involves establishing Communities of Interest (COIs), defined as groups of stakeholders who use a common vocabulary to exchange information. The other involves requirements for disciplined data asset engineering, where data assets are broadly defined to include output files, databases, documents or web pages as well as services that may be provided to access the data from an application.
3.1 Communities of Interest (COI)
COIs have been established by the DOD to develop shared vocabularies, define shared information spaces and align responsibilities for information owners and data producers. Successful COIs have been triads including users, funders, and builders. COIs are envisioned to encompass both centralized coordination via organizational COIs and decentralized problem solving via ad hoc COIs. Organizational COIs provide a structure for more stable stakeholder communities, while ad hoc COIs provide a mechanism to coordinate among communities with immediate data interoperability needs. One valuable enabler has been leveraging established stakeholder groups and standards (based on existing messages, databases and data models).
It is sometimes envisioned that the operational COIs should provide oversight of the ad hoc COIs, but that view implies that the DOD domain can be cleanly deconstructed. It is not yet clear how to bring these two kinds of COIs together under a single process. It is impossible to divide the enterprise cleanly into an organization of non-duplicative COIs, but it is equally impossible for interoperability to emerge from uncoordinated problem-solving COIs.
This is at the heart of the CSE challenge. Engineering presumes a deliberative process to ensure an end result, in this case data interoperability. In the face of increasing complexity, however, structured engineering fails. Standardization can only go so far; it is critical to allow for decentralized variations. Several competing data models have been proposed as ultimate solutions, but the past failure of global solutions has made the community wary of magic bullets.
3.2 Data Asset Engineering
Data asset engineering is part of good system engineering practice, and much of the new DOD guidance and directives have simply underscored the requirements for this practice. The DOD key tenets are that data should be visible, understandable, accessible and trusted. This means that users and applications can discover the existence of data assets (e.g. via metadata catalogs [DOD CIO, 2005]) and that they can interpret the data, both structurally and semantically. In addition, it means that data assets are available to users and applications except where limited by policy, regulation or security. There is a new emphasis on accommodating unanticipated future users and on coordinating with COIs to identify recommended data standards and interoperability test opportunities.
However, there are many challenges. These requirements strain resources and push the limits of current technology, although technologies are continuing to advance in the commercial sector (e.g. to support the semantic web). Many existing systems are poorly documented, and the funding for improved documentation would have to come at the expense of improved system capabilities. For new systems, it is difficult to anticipate the required resources because best practices have not been institutionalized.
In addition to technical challenges and limited resources, there are organizational and cultural issues. These include evolving lines of authority for data interoperability and for program management and funding.
Some progress has been made. Metadata registries have been established, although improvements are needed in both the quality of the documentation and the capabilities of the registries. Developers of new systems are beginning to collaborate via established COIs to identify and evolve applicable standards. There has also been some effort to develop small standards (e.g. Cursor on Target [Byrne, 2004], which has been useful across a diverse stakeholder community).
4 CSE Preliminary Insights
The DOD experiences in data interoperability engineering can be generalized to other applications of CSE. A key insight is that standardization should be applied at the appropriate scale. Specifically, the degree of standardization should relate to the size and diversity of the stakeholder group. Traditionally, data interoperability standards have been inclusive and flexible. As a result, they have become unmanageable as the size and diversity of the stakeholder group has grown. It may be that the reverse should occur - that standards should be limited to what is common to the stakeholder group and simplicity should be favored over flexibility.
Another insight is that CSE needs to combine a structured engineering process and an unstructured evolutionary process in a manner that promotes the emergence of a desired characteristic. Ad hoc COIs have been effective in solving focused data interoperability problems and organizational COIs have been effective in defining community standards. It is not yet clear how to combine these two processes, but the author believes communication (via well-constructed metadata catalogues) to be a key enabler.
A final insight involves the dual role of competition and collaboration. The CSE process should promote both multiple competing solutions and collaborations to establish standardization. Data interoperability is a goal that will never be fully achieved, but it is unclear what level of diversity is healthy and what level of non-interoperability is acceptable.
Providing data interoperability is an ongoing DOD challenge, and many recognize that the central issue is complexity. As the community struggles with this challenge, it should continue to reflect on how their experiences generalize and inform CSE.
One unsolved challenge is how to measure progress, or lack of progress, toward interoperability. While it may be clear that improvements are needed to meet growing needs, it is unclear how to determine whether data interoperability is in fact improving. Although various metrics have been proposed, both at the system and at the enterprise level, there is no consensus. Until this challenge is met, it will be difficult to understand what solutions work.
 Byrne, R. Cursor on Target Improves Efficiency, The Edge, Vol 8, No. 2, The MITRE Corp., Fall, 2004
 Renner, S. A History of DoD Data Management, undated, USAF briefing
 DoD Directive 8100.1, Global Information Grid (GIG) Overarching Policy, September 19, 2002
 Deputy Secretary of Defense, Management Initiative
Decision No. 912 (MID-912), Joint
 Department of Defense Chief Information Officer, Net-Centric Data Strategy, 09 May 2003
 CJCSINST 6212.01C, Interoperability And Supportability of Information Technology and National Security Systems, 20 November 2003
 Department of Defense Chief Information Officer, DoD Discovery Metadata Specification (DDMS) Version 1.2, 03 January 2005
 Deputy Secretary of Defense Memorandum (OSD 03246-04), Information Technology Portfolio Management, 22 March 2004
 Chairman of the Joint Chiefs of Staff Memorandum
CM-2040-04, Assignment of Warfighting Mission Area (WMA) Responsibilities to
Support Global Information Grid
 The Joint Staff, Joint Requirements Oversight Council Memorandum (JROCM 199-04), Data Strategy Implementation for Warfighter Domain Systems, 29 October 2004
 USD AT&L Defense Acquisition Guidebook, Version 1.0, Section 7-Acquiring Information Technology and National Security Systems, 17 November 2004
 DoD Directive 8320.2, Data Sharing in a Net-Centric Department of Defense, 02 December 2004