Trends in Dataspace Technology

Eclipse Dataspace Components (EDC) and Dataspace Protocol

As data platforms continue to proliferate across organizations, there is an increasing need for secure interoperability between data silos. 'Dataspaces' facilitated by intermediary software connectors have emerged as one model to control data sharing across vendor and organizational boundaries.

One open-source offering in this ecosystem is Eclipse Dataspace Components (EDC) - an extensible toolkit for building connectors to data environments based on standardized protocols.

EDC's Origins and Philosophy

Initially named Eclipse Dataspace Connector, EDC was launched in 2021 under the Eclipse Foundation to advance open ecosystems for the decentralized exchange of data. It has since been rebranded to Eclipse Dataspace Components to better reflect its goals, beyond the pre-packaging of connectors [*1].

While influenced by work at the International Dataspaces Association (IDSA) and Fraunhofer Institutes, EDC operates as an independent project tailored for integration and extension. As such, its codebase and approach differ extensively from other offerings labelled as 'IDS connectors' [*2].

As a community-driven initiative under a non-profit foundation, transparency and openness are core priorities for EDC. All key artefacts - architecture documents, public APIs, roadmaps - are open for review and contribution from adopters. This developer-centric ethos aims to drive productive collaboration around decentralized data ecosystems.

Current Capabilities and Roadmap

As is common with open-source projects focused on novel use cases, EDC is currently pre-1.0 and has not yet achieved full production stability. Changes may yet occur with APIs and storage formats as the platform and specifications evolve.

The 0.3.1 series delivers core connectivity capabilities for multi-party data interactions based on the Dataspace Protocol. However, advanced functionality like semantics-based automation, analytics integration, and robust policy engines remain areas for exploration.

With increasing community involvement, the path toward a stable 1.0 milestone is coming into focus for EDC. Hardening infrastructure components, full protocol coverage, improving developer ergonomics, and optimizing performance appear to be key themes for readiness. Support for emerging IDS framework extensions also promises to bolster enterprise appeal.

Dataspace Protocol Overview

Underpinning the latest EDC releases is support for the Dataspace Protocol [*3] - a standard defining the essential interactions between software components in data ecosystem deployments.

The current version of the protocol centers around three pillars required for any data-sharing system:

Catalogue [*4] - Providers advertise available data assets by registering them into ecosystem catalogues. Consumers can then query catalogues using DCAT (Data Catalogue Vocabulary) to find assets matching their needs [*5].
Policy & Contracts - Consumers specify the unique ID of the desired data asset. The data provider then evaluates the request against predefined access policies to determine whether to grant or deny access. If granted, the provider generates a machine-readable Contract Agreement, documenting standardized usage terms and permissions using the ODRL (Open Digital Rights Language) [*6]. Since contract processing takes time, it runs asynchronously. Once the access contract is created, the provider sends the Contract Agreement back to the consumer as a callback confirming approval. [*7]
Transfer Process - Once contracts allow access, consumers can request actual data transfers from providers. The protocol standardizes interactions around transfer initialization, monitoring, and completion.

Understanding Dataspace Protocol via EDC Example Code

EDC offers sample code and tutorials for implementing connectors that interact via the Dataspace Protocol. [*8]

This step-by-step example demonstrates a basic connector data transfer flow [*9]:

Initialize provider and consumer connectors by registering data endpoints.
Provider publishes data assets and datasets access policies.
Consumer queries asset catalogue and discovers available data.
Consumer and provider agree on a contract permitting access.
Within contract limits, consumer requests and receives data.

This example closely mirrors the protocol's abstract workflows. Studying the code and execution provides valuable practical insights.

It is important to note - while the sample uses curl commands to invoke HTTP requests, this is part of EDC's Connector Management API [*10] for orchestration. Under the hood, EDC components exchange protocol messages to implement these interactions.

Examining Communication Flows and Formats

As the protocol is designed for internet transfers, REST APIs are the standard system interfacing mechanism. Requests and responses contain payloads formatted in JSON-LD [*11] - a JSON variant supporting linked data models.

For example, in the catalogue lookup flow, consumer connectors might retrieve JSON-LD-formatted data describing available assets. JSON-LD aligns with underlying information models like DCAT and ODRL, powering semantics and automation. [*12]

However, EDC does not yet use these advanced capabilities. At present, JSON-LD adds mostly redundant formalism. As functionality matures, richness within Protocol artefact formats will lead to smarter ecosystem behaviors.

A New Paradigm

Initiatives like EDC and compatible protocols aim to transform rigid data silos into vibrant, trustworthy sharing ecosystems. By separating key concerns - like security, governance, and discovery - into composable building blocks, the path toward interoperability looks promising.

As with any new paradigm, hard questions around commercialization, liability, and sustainability remain for decentralized data networks. However, we can draw optimism from the active engagement of technology leaders in this space. The NTT Group has been an early and eager contributor to projects like EDC. Along with allies across industry and academia, NTT DATA helps drive open, transparent dialogue between potential data ecosystem participants to find answers to these questions.

Masatake Iwasaki
Software Engineer at NTT DATA, Open Source Advocate, Committer & PMC member of Apache Hadoop and Apache Bigtop, ASF member.

Masaru Dobashi
He is leading a project to develop Dataspaces and their technologies. He is also contributing to develop academic test-beds and communities of Dataspaces; Senior Specialist (General Manager class) & Executive IT Specialist (IT Platform Tech.) of NTT DATA Group; Visiting Researcher of Tokyo University

Will Java Run More Sustainably on Arm's Architecture?

NTT DATA Focus

Building Supply Chains of the Future