Cloud sharing sounds easy until the people sharing it are not one company, not one data center, not one legal jurisdiction, and not even one scientific discipline.
Inside a single enterprise, “AI platform” usually means a controlled environment: one cloud vendor, one identity system, one billing model, one preferred deployment stack, and one procurement department quietly pretending this is all strategic. In scientific research, the picture is messier. A climate group may have data in one national infrastructure, compute in another, collaborators across several countries, and privacy restrictions that prevent raw data from moving at all. A bioimaging team may want to publish a model, let others inspect its lineage, deploy it on external infrastructure, and still retain enough metadata for the next researcher to reproduce the result rather than merely admire the abstract.
That is the problem behind AI4EOSC, a federated open-source AI/ML platform designed for the European Open Science Cloud ecosystem.1 The paper is not simply introducing another MLOps dashboard, although it contains many dashboard-shaped things. Its more interesting claim is architectural: if AI assets must be findable, accessible, interoperable, reusable, reproducible, deployable, and auditable across institutions, then governance cannot remain an afterthought. It has to be embedded into the platform layer that creates, builds, tracks, publishes, deploys, and monitors those assets.
This is the part many readers will underestimate. The easy reading is: “scientists need a better MLOps platform.” The sharper reading is: “manual FAIR compliance breaks when the AI lifecycle is fragmented.” Metadata written after the fact is not governance. A model card detached from training history is not provenance. A Docker image with no reliable lineage is just a portable mystery box. Charming, but still a mystery box.
AI4EOSC tries to solve that by making the platform itself responsible for the chain of custody.
The real mechanism is not cloud federation; it is lifecycle control
The paper begins from two sources of fragmentation.
The first is lifecycle fragmentation. Researchers often stitch together development environments, experiment trackers, model repositories, compute infrastructure, deployment endpoints, and monitoring tools as separate components. Each tool may work locally. The combined workflow does not necessarily produce an asset that another scientist can understand, rerun, deploy, or audit.
The second is open-science fragmentation. FAIR principles are already central to scientific data practice, but AI/ML assets are more dynamic than ordinary datasets. A model evolves through code commits, training runs, hyperparameter choices, container builds, metadata updates, deployment changes, inference logs, and retraining. A repository entry alone cannot capture that motion. It is like trying to describe a supply chain by photographing the final box.
AI4EOSC’s key mechanism is therefore integration. The platform controls enough of the lifecycle to turn governance from a form-filling exercise into a default operating condition.
The mechanism looks like this:
Fragmented AI workflow
→ metadata and provenance must be reconstructed manually
→ FAIR compliance becomes uneven
→ reuse and trust depend on heroic documentation
Integrated AI platform
→ development, CI/CD, metadata, containers, deployment, and tracking are connected
→ provenance is generated during the workflow
→ FAIR compliance becomes part of the infrastructure
→ reuse and trust become easier to inspect
This is why the paper repeatedly insists that FAIR-by-design requires end-to-end integration. The argument is not philosophical. It is operational. Metadata can only be standardized reliably if the platform sees how assets are registered and published. Provenance only becomes meaningful if the platform sees training runs, build steps, model artifacts, and deployment updates. Interoperability only becomes useful if models can be packaged, described, and moved across infrastructure without locking everything into one vendor’s private garden.
A normal enterprise MLOps stack may help deploy models efficiently. AI4EOSC is aiming at something slightly different: a scientific AI operating system for communities that need openness, provenance, federation, and reuse.
Yes, that sounds less glamorous than “AI agent swarm.” It is also much closer to what serious institutions actually need.
AI4EOSC is four systems tied together by a governance logic
The architecture is organized around four major systems: the Development Platform, AI/ML as a Service, LLM Services, and Platform Orchestration. Listed flatly, this sounds like a product brochure. Mechanistically, each piece solves one part of the governance chain.
| Platform component | What it does | Why it matters operationally |
|---|---|---|
| AI4EOSC Development Platform | Provides development environments, AI modules, tools, federated learning servers, CI/CD, MLflow tracking, and drift monitoring | Keeps model creation, training, quality checks, and monitoring inside a controlled lifecycle |
| AI/ML as a Service | Deploys AI modules through serverless inference based on OSCAR, with standardized APIs and pipeline composition | Turns published models into usable services without forcing every researcher to become an infrastructure engineer |
| LLM Services | Aggregates provider-hosted LLMs via vLLM and LiteLLM under a single endpoint, with OpenAI-compatible access and documentation support | Adds federated LLM access without requiring every community to manage its own isolated interface |
| Platform Orchestration | Uses INDIGO-PaaS orchestration, Infrastructure Manager, federation registry, and TOSCA templates to deploy platform services across providers | Makes the federation reproducible enough for external communities to adopt, rather than admire from a safe distance |
The Development Platform is the center of gravity. AI modules are represented as Git repositories, with templates to help researchers package models according to platform expectations. The dashboard exposes module metadata, development environments, batch execution, deployments, secrets, and resource reporting. Codespaces provide ready-to-use environments such as VS Code or JupyterLab, with standard deep learning frameworks and authenticated endpoints.
The workload management system is built on Nomad and Consul rather than assuming a single Kubernetes cluster. That choice is not just taste. The paper’s target setting is geographically distributed e-infrastructure, with resources from multiple clouds and countries. The current federation includes providers in Spain, Slovakia, Portugal, Turkey, and Poland. The goal is one control plane over heterogeneous infrastructure, not another beautifully configured cluster that becomes useless the moment collaboration crosses an institutional boundary.
This matters because most AI governance failures are not caused by one missing checklist item. They happen between systems. The model was trained over here, logged over there, containerized somewhere else, deployed under a different endpoint, and later inspected by someone who can no longer reconstruct what happened. The cracks are where the accountability leaks out.
AI4EOSC’s answer is to reduce the cracks.
FAIR-by-design means the boring parts become mandatory
The strongest part of the paper is not that AI4EOSC has development environments or model serving. Those are necessary, but not special. The stronger contribution is that FAIR compliance is wired into routine platform operations.
The metadata system requires each registered AI module or tool to include a valid metadata file following a JSON Schema. Metadata includes user-defined fields such as title, summary, description, DOI, external links, and tags, while also incorporating automatically filled fields such as license and modification dates from GitHub. The Platform API can serialize this metadata in JSON-LD and RDF Turtle, and transform it into application profiles such as MLDCAT-AP.
In plain business language: the platform is not asking users to remember interoperability after the model is finished. It is shaping the object from the beginning so other systems can understand it later.
The CI/CD pipeline then makes this enforceable. AI4EOSC uses Jenkins pipelines with both user-defined jobs and platform-enforced jobs. The platform-level pipeline validates metadata, runs quality checks such as style testing, unit testing and security scanning, builds Docker images, publishes them to registries, integrates with Zenodo releases, notifies platform services of updates, and triggers provenance updates.
This is not glamorous work. It is also where most institutional AI initiatives quietly fail. Everyone wants model governance. Fewer people want to decide who validates metadata, who rebuilds containers, who records which tests were executed, and who updates the lineage graph when a model changes. AI4EOSC’s answer is: the platform should do as much of that as possible.
The provenance system is especially important. It collects metadata from the Platform API, MLflow, and the CI/CD pipeline, stores heterogeneous JSON fragments in PostgreSQL, then uses RDF Mapping Language rules through CARML to transform them into a semantic provenance graph. The graph is serialized as JSON-LD and aligned with W3C PROV-O. In the paper’s example, a code commit triggers a Jenkins build, which validates metadata, builds a Docker image, and notifies the provenance system; a training run logs metrics to MLflow; the resulting relationships become part of a queryable graph.
That design reveals a mature trade-off. The system does not instrument every pipeline component internally to capture ultra-fine-grained lineage. Instead, it collects metadata non-intrusively from existing platform components. The cost is less detail than a tightly instrumented controlled pipeline. The benefit is deployability in heterogeneous real-world environments where components are independently operated and researchers do not want to rewrite everything just to satisfy a provenance priesthood.
The priesthood, as usual, will survive.
Federation is useful only when trust travels with the workload
Federation is one of those words that can make any architecture sound more serious. But the paper’s federation story has a concrete operational meaning: AI4EOSC tries to make workloads, metadata, identities, secrets, and deployments work across distributed infrastructures.
The Workload Management System manages user workloads and system-level jobs across providers. Users can run persistent environments for development or transient batch jobs for training. Sidecar tasks handle platform functions such as connecting to external storage, synchronizing datasets from repositories at runtime, and triggering notifications. System-level components such as reverse proxies, the Platform API, and the dashboard are also managed as workloads.
The platform also integrates external resources rather than pretending all useful assets live inside its own catalog. AI modules are containerized for deployment in external clouds. The dashboard can deploy through the Infrastructure Manager and connect to the EOSC EU Node. AI4EOSC has a connector to the BioImage Model Zoo, mapping its standardized metadata so external modules can appear inside the AI4EOSC dashboard and be deployed through the platform. Data can be synchronized from repositories such as Zenodo, Data Europa, Dryad, and SeaNoe through DataHugger extensions and WMS sidecars. Storage can connect to RCLONE-compatible providers such as Nextcloud, S3, OwnCloud, Dropbox, and Google Drive.
The business lesson is not “use exactly this stack.” The lesson is that federation without trust infrastructure is just distributed inconvenience.
AI4EOSC puts identity, authorization, secret management, tenant isolation, traceability, and privacy into the same story. Authentication uses Keycloak and OpenID Connect, with external federated identity providers. Authorization uses role-based access control derived from internal groups and trusted identity entitlements. Secrets are managed through Vault and injected through the workload system rather than exposed as plaintext. Tenant isolation operates both at the organizational level through Nomad namespaces and at the workload network level.
For privacy, the platform follows a data minimization principle: uploaded data is stored in the user’s personal space, cached data is removed when deployments terminate, and user-facing actions route through the Platform API to maintain audit logs. Federated learning is treated as useful but not magically safe. The paper explicitly notes remaining risks such as poisoning and inference attacks, and AI4EOSC adds token-based client authentication, client weight divergence monitoring, server-side differential privacy, and metric privacy.
That is the right tone. Federated learning does not abolish privacy risk. It changes where the risk lives. Anyone saying otherwise is either selling something or has not met lawyers.
The evidence supports feasibility, not universal superiority
The performance section should be read carefully. It is not a grand benchmark proving that AI4EOSC is better than every industrial MLOps platform. It is a set of feasibility and overhead tests designed to answer a narrower question: does the added federation and governance layer impose unacceptable operational cost?
The answer, in the tested settings, is no.
| Evidence in the paper | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Feature comparison against Kubeflow, Polyaxon, Seldon, SageMaker, OpenML, Hugging Face, DEEP, and AI4EOSC | Comparison with prior tools | AI4EOSC targets a different combination: scientific focus, openness, FAIR-by-design, W3C PROV, federation, EOSC integration | It is not an independent market benchmark or total-cost comparison |
| WMS scalability test over job search and retrieval | Main engineering evidence | Search time grows sub-linearly with job count; retrieval stays in tens of milliseconds; federation adds a small fixed cost | It does not prove performance under every workload, network condition, or provider governance model |
| Training efficiency test | Overhead validation | Training a computer vision model for 25 epochs took 112.4 ± 1.7 seconds on a VM, 112.6 ± 1.2 with Docker, and 113.5 ± 1.2 in AI4EOSC — less than 1% overhead | It does not prove all training workloads will have similar overhead |
| Federated learning efficiency test | Overhead validation | Four FL configurations ranged from 222.61 ± 14.16 to 233.60 ± 14.61 seconds over five rounds; AI4EOSC introduced no large computational penalty in this setup | It does not prove privacy, robustness, or model quality across all federated learning scenarios |
| Cold-start measurement | Implementation detail for serving | In a 3-node OSCAR cluster, first synchronous invocation averaged 1521 ms and warm synchronous invocation averaged 992 ms | It is application- and image-dependent and does not settle production SLOs for commercial traffic |
| Precipitation nowcasting workflow | End-to-end demonstration | Shows how development, FL, deployment, metadata validation, containers, and provenance can operate as one workflow | It is one scientific use case, not broad evidence of productivity improvement |
| iMagine, KMD4EOSC, and AI4Life adoption | External validation through community deployment | Shows the stack can be adapted across scientific domains and infrastructures | It does not yet quantify long-term ROI, researcher time saved, or maintenance burden |
This distinction matters. The paper’s evidence is credible for what it tries to show: the platform can run across heterogeneous infrastructure, add governance functions, support federated workflows, and avoid obvious performance penalties in selected tests.
It does not show that every organization should adopt AI4EOSC, that federated scientific platforms are cheap to maintain, or that FAIR-by-design automatically produces better science. The paper’s future work section is honest about this. It notes open problems around provenance for multi-model inference pipelines and federated learning scenarios, machine-actionability across repositories and data spaces, operational overhead as the federation expands, and the need for systematic user studies to measure productivity and reproducibility outcomes.
That last point is important. “Researchers feel less burdened” is plausible. “Researchers measurably save time and reproduce more work” still needs stronger empirical evidence.
The precipitation case shows why the platform matters
The precipitation nowcasting use case is useful because it turns the architecture into a workflow.
The task is radar-based prediction of Vertically Integrated Liquid over a five-minute horizon, using radar images from the Czech-Slovak border region. The dataset covers four months of observations from April to July 2016, with images captured every five minutes and stored as HDF5 files. To simulate a federated setting, the images are split into four quadrants, each treated as a separate client with a distinct data distribution.
A researcher uses the dashboard to launch a TensorFlow Codespace with an NVIDIA Tesla T4 GPU, secure repository access, and MLflow tracking. After identifying a convolutional neural network architecture, the team launches a Flower federated learning server and four client instances, one per zone. Token-based authentication controls participation. The federated process runs for ten rounds of ten epochs each. Then a personalized federated learning approach, adapFL, fine-tunes the global model locally for each zone.
The result: adapFL improves both MSE and MAE over individual training and vanilla federated learning across all four zones. The paper does not provide this as a universal federated learning theorem. It uses the case to demonstrate that distributed data, authenticated FL, experiment tracking, deployment, CI/CD, metadata validation, Docker publishing, and provenance updates can occur inside one lifecycle.
That is the useful part.
In a fragmented workflow, the scientific result and the operational trace often diverge. The model may improve, but the evidence chain becomes hard to inspect. In the AI4EOSC workflow, the system automatically builds and publishes images, validates metadata, deploys endpoints for prototyping and serverless inference, and updates provenance graphs. The researcher is not expected to manually narrate the entire lifecycle afterward like a medieval monk copying deployment logs by candlelight.
This does not make the model correct. It makes the model’s creation more inspectable.
For scientific AI, that difference matters.
What businesses should actually learn from AI4EOSC
The direct domain of the paper is open science. But the business relevance is wider, especially for organizations that operate across subsidiaries, regulated entities, joint ventures, hospitals, labs, public agencies, or multi-cloud environments.
The wrong business lesson is: “AI4EOSC proves we should build our own open-science cloud.” Most companies should not. They have enough trouble naming SharePoint folders.
The better lesson is: governance must be attached to lifecycle control.
A useful enterprise translation looks like this:
| AI4EOSC principle | Business equivalent | Practical implication |
|---|---|---|
| FAIR-by-design metadata | Standardized model, dataset, and workflow metadata | Require metadata validation before models enter internal catalogs or production registries |
| CI/CD-generated provenance | Automated lineage from code, data, training, testing, build, and deployment | Stop relying on manual “model governance documents” that are updated three weeks late and read by nobody |
| Containerized modules and standard APIs | Portable model packaging and serving contracts | Reduce dependence on one platform team or one cloud vendor |
| Federated learning with authenticated clients | Cross-entity training without centralizing sensitive data | Useful for healthcare, finance, manufacturing consortia, and public-sector collaborations where data cannot freely move |
| Secret management and RBAC | Controlled operational access | Treat model workflows as governed systems, not experimental notebooks with production consequences |
| Drift monitoring | Production feedback loops | Track whether deployed models still behave acceptably after the world changes, as it rudely tends to do |
| External catalog and storage connectors | Interoperability with existing assets | Avoid forcing every unit to migrate before collaboration can begin |
For regulated companies, the main ROI pathway is not faster model training. It is lower coordination cost, lower audit friction, and higher confidence that deployed AI assets can be inspected and reused. This is boring ROI, which is another way of saying it might actually survive procurement.
A pharmaceutical company collaborating with hospitals, a bank coordinating models across country units, an insurance group sharing risk models across subsidiaries, or an industrial firm working with suppliers on predictive maintenance all face versions of the same problem: useful data and expertise are distributed, but governance obligations are centralized. AI4EOSC shows one way to structure the platform so distribution does not erase traceability.
Cognaptus would phrase the business inference like this: once AI moves beyond isolated demos, “platform” stops meaning compute access and starts meaning controlled institutional memory.
Who trained what? On which data? With which code? Under which environment? With which metadata? Which tests ran? Which container was published? Which endpoint served it? Who accessed it? Did the model drift? Can another team reuse it?
If the platform cannot answer these questions automatically, the organization does not have AI governance. It has archaeology.
The paper’s limits are practical, not cosmetic
The paper is strongest as an architecture and feasibility study. Its limitations are therefore mostly about operational maturity and generalization.
First, the provenance model is stronger for single-model workflows than for multi-model inference pipelines and federated learning scenarios. This is not a minor edge case. Modern AI systems increasingly chain models, tools, retrieval systems, preprocessing steps, and postprocessing logic. A provenance graph that captures one model well still has work to do when the “model” becomes a pipeline of interacting services.
Second, semantic interoperability is not the same as machine-actionability. AI4EOSC can serialize metadata through MLDCAT-AP-compatible formats and produce RDF/JSON-LD provenance, but the paper acknowledges that deeper integration with emerging standards and external catalogs is still needed. Findable and accessible are not the final boss. Autonomously usable by external systems is harder.
Third, federation creates governance overhead. The platform uses a unified control plane, automation, self-healing, queueing, and provider orchestration to reduce this overhead, but geographically distributed infrastructure remains messy. Providers fail. Networks vary. Policies diverge. Capacity is uneven. A platform can discipline this complexity; it cannot repeal it by writing “federated” in the architecture diagram.
Fourth, the performance tests are controlled and selected. They show low overhead in the tested scenarios: sub-linear job search scaling, less than 1% training overhead, comparable federated learning times, and acceptable warm inference latency for a reference implementation. They do not replace broader stress testing under diverse commercial workloads, long-running multi-tenant operations, or strict service-level agreements.
Finally, the paper has not yet presented systematic user studies measuring researcher productivity or reproducibility outcomes. The community deployments in iMagine, KMD4EOSC, and AI4Life are meaningful adoption signals, but adoption is not the same as quantified impact.
These boundaries do not weaken the paper. They keep it from becoming magical platform literature, a genre already overpopulated.
The strategic point: AI sharing needs institutional memory
AI4EOSC is valuable because it reframes “sharing” as more than publishing a model or exposing an endpoint.
To share AI responsibly across institutions, you need at least four things to travel with the asset: metadata, execution environment, provenance, and access control. Remove metadata, and nobody knows what the asset is. Remove the execution environment, and nobody can run it. Remove provenance, and nobody can trust how it came to exist. Remove access control, and nobody serious will connect sensitive data to it.
That is the mechanism the paper demonstrates. Federation is not the hero by itself. The hero is governed federation: a system where models can move, services can deploy, data can remain local when needed, and the evidence chain does not evaporate.
For business leaders, the uncomfortable implication is that AI maturity is less about buying the most impressive model and more about building institutional plumbing around it. The fashionable layer gets the demo. The boring layer decides whether the demo becomes an asset.
AI4EOSC is not a universal enterprise blueprint. It is a scientific platform, built for the EOSC context, with open questions around machine-actionability, multi-model provenance, federation overhead, and measured productivity impact. But its central design lesson travels well: if governance is added after the lifecycle, it will always be partial. If governance is built into the lifecycle, reuse becomes less heroic.
Cloud without borders is not achieved by ignoring borders. It is achieved by carrying enough structure across them that collaboration remains trustworthy.
That, inconveniently, is what real AI infrastructure looks like.
Cognaptus: Automate the Present, Incubate the Future.
-
Ignacio Heredia et al., “AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research,” arXiv:2512.16455v3, 2026, https://arxiv.org/abs/2512.16455. ↩︎