Cloud Research Environment (CRE)
PhenoMeNal provides a Virtual Research Environment (VRE) known as the “PhenoMeNal Cloud Research Environment” (CRE), for interoperable and scalable metabolomics analysis. End-users, such as researchers and research teams, educators, SMEs, and any other type of user, will be able to create, on-demand and through a simple user interface, an environment of tools, services, data supporting their research needs. Hardware setup and software deployment required to operate these facilities are completely transparent to the PhenoMeNal CRE and hence the users can focus on the analysis and not the technicalities (see Figure).
Figure: Responsibilities when carrying out contemporary metabolomics data analysis.
(Left:) Today’s situation: Scientists are responsible for everything, including the computer hardware, installing all necessary software, and carrying out the actual analysis. All execution is limited by the resources in the single computer.
(Right:) The PhenoMeNal approach: Software tools are available as containers without the need for installations, with data in agreed-upon interoperable file formats. The CRE can be started on single computers or on cloud resources, and the scientists benefit from only needing to deal with the analysis as the technical implementations are handled by the CRE.
The PhenoMeNal CRE portal provides a site that enables the user to interact with the components of PhenoMeNal to deploy its own VRE. See figure below
Figure: Main components of PhenoMeNal
Compute Infrastructure: creation of of Virtual Machine Images (VMIs) for the deployment of infrastructure on cloud providers, the essential building block where tools can run on top.
Containers: for all the required tools and documentation to allow external tools makers to containerize their own tools independently. This is a requirement for PhenoMeNal to be able to deploy a piece of software on top of the infrastructure.
Data: prepackaged in certain formats to be usable within PhenoMeNal provided by software that are containerised as well, and that will allow users to prepare their data for use within PhenoMeNal.
Cloud Research Environment Portal – Usability Testing Methodology and Design Guidelines
The PhenoMeNal CRE Portal* is the entry point (Gateway) to all the tools and workflows on offer and in the long term will enable users to deploy their own PhenoMeNal Virtual Research Environments on private and public cloud providers, as well as provide documentation for deployments on local hardware. The main user platforms for the deployed VRE infrastructure will be a customised build of the Galaxy workflow and the Jupiter/iPython notebook systems, ensuring these environments will scale to meet the computational needs. The first version is available at: http://portal.phenomenal-h2020.eu/home and has been designed based on User experience (UX) evaluation using UX design principles. It is to be noted that is just the first round of UX for the initial prototype. We will continue to employ UX throughout the project as it evolves.
*We have chosen to externally brand the portal as “PhenoMeNal Cloud Research Environment Portal” as during our UX testing, the users did not fully understand, nor trust, the term “Virtual” in VRE
Galaxy is a workflow environment tool that allows researchers to concatenate common bioinformatics tools to create pipelines or workflows. It uses the original code and binaries of bioinformatics tools (developed elsewhere), and provides tool wrappers for them so that the Galaxy’s user interface and API can interact with those tools. In contrast to the classical installation of Galaxy where most tools would be executed serially on the same machine where Galaxy is running, PhenoMeNal enables scalable analysis on multiple compute nodes using microservices by connecting Galaxy to Kubernetes.
Figure: The flow implemented for deploying the Galaxy runtime into a Kubernetes (k8s) container orchestration (CO) system.
Initially, (1) the user requests Galaxy (through its UI or API) to run a job with certain data. (2) Definitions added to our Galaxy instance allows the implemented k8s Runner for Galaxy to map the tool required in the job to a container. All this information is passed by the k8s runner for Galaxy to the master node of the CO in the form a of k8s Job API object using the pykube Python library to communicate. (3) The master node allocates the k8s Job to a node, according to availability of resources. (4) The node, using the Job definition, requests the required container (if not available) image from the PhenoMeNal docker registry. (5) The node, with the container obtained, runs the k8s Job, while the k8s Runner for Galaxy constantly queries to the master about the status of the job. (6) Once ready and signalled by the runner, Galaxy collects the results through the shared filesystem, once requests to the k8s master’s REST API Endpoint shows that the job is done, and exposes them to the user.
Jupyter is a system to combine text (including e.g. mathematical equations) and code in an easy-to-read document that renders in a web browser. Within PhenoMeNal, we use Jupyter as one of the ways of consuming the microservices developed within the consortium. When launching the VRE, users can open Jupyter and then either invoke services directly in an interactive fashion, or schedule long-running jobs using a workflow system of their own.
Continous Integration System
PhenoMeNal hosts a Jenkins continuous integration system that serves as an integration point where source code is collected, tools are built, containers are assembled, tests can be run to ensure correctness and interoperability, and where results can be pushed to public or private registries.
Public Galaxy instance
The PhenoMeNal Public Galaxy VRE runs on top of a Kubernetes cluster. The pre-provisioned PhenoMeNal Galaxy docker image is able to run inside a Kubernetes Replication Controller/Pod and communicates through the service account of Kubernetes with the master nodes to submit jobs to the cluster. This docker image contains all the tools that have been dockerized, “galaxified” and tested (currently manually, in the future via automatic integration tests in PhenoMeNal continuous integration system) with sample datasets to check that they work adequately. Within this public instance, we provide shared workflows and data sets within Galaxy, that any user can try on the instance.