To establish a comprehensive and standardised e-infrastructure for processing, analysing and information-mining of extremely large medical metabolic phenotype data. Our specific objectives are:
- to use existing open source community standards, integrate tools, resources and methods for the management, dissemination and computational analysis of very large datasets of human metabolic phenotyping and genomic data into a secure and sustainable e-Infrastructure
- to operate and consolidate the PhenoMeNal e-infrastructure based on existing internal and external HPC (High-performance computing), cloud, and grid resources, including the EGI and the EGI Federated Cloud, and to extend it to world-wide computational infrastructures
- to improve and scale-up tools used within the infrastructure to cope with very large datasets
- to establish technology for a water-tight audit trail for the processing of human metabolic phenotyping data from the raw data acquisition all the way to the generation of high-level biomedical insights (such as a medical diagnosis)
- to establish privacy-protection methods that allow working with highly sensitive molecular phenotype data
- to foster the worldwide adoption of PhenoMeNal through a wide range of outreach, dissemination, networking and training activities
- to develop a model to ensure sustainability of the PhenoMeNal network
The three-year project consists of nine work packages designed to meet our project objectives including objectives related to Networking, Service and Joint Research Activities:
Work Package 1: Management
WP1 will provide the management infrastructure for PhenoMeNal. It will be coordinated by the EMBL-EBI, building on EMBL-EBI experience in the management of large consortia. It will make use of state of the art electronic communication methods, which will be integrated in the PhenoMeNal Virtual Research Environment.
Lead Participant : EMBL-EBI
- Annual PhenoMeNal Consortium meetings
- Annual stakeholder meetings
- Staff exchanges between the PhenoMeNal partners
Work Package 2: Sustainability of PhenoMeNal
WP2 will develop and implement plans to sustain the PhenoMeNal e-infrastructure beyond its initial 3-year funding period. Sustainability measures include building strong relationships to all stakeholders, to industry and to all parties developing components of the PhenoMeNal grid and virtual research environment.
Lead Participant : UL
- Mapping of e-infrastructures, users, investments for supporting policy developments in the field of metabolomics, biomarkers and biobanks.
- Establishing and maintaining relations with publishers for supporting data deposition services.
- Coordination with national and international related activities and support to the development of global and sustainable approaches in the field.
- Promotion of long-term sustainability, including the involvement of funders.
- Preparation of a business plan beyond the end of the project.
Work Package 3: Dissemination and Outreach
WP3 will establish close links between the PhenoMeNal consortium and the wider community in metabolomics, genomics and the biomedical community in order to establish a user base for the services offered. We will raise community awareness for the services provided by the PhenoMeNal consortium, from data and information mining, processing and analysis, but also for the use of omics technologies in a biomedical context, close to the clinic. Through the existing framework of the Metabolomics Society, we will ensure broad community input into the services developed by the PhenoMeNal consortium and the processing and data mining services for metabolomics provided by this consortium. We will provide online training in the grid, the tools and the data formats used in the PhenoMeNal grid.
Lead Participant : UoB
- Establish extra-consortium links with other e-infrastructures, standards communities, publishers and funders
- Raise awareness for standards, services and tools provided by the PhenoMeNal grid
- Provide online training for users interested in the PhenoMeNal grid
- Arrange PhenoMeNal stakeholder meetings
Work Package 4: Interfacing with Biomedical and European Infrastructures
WP4 will aim at maximising communications with European infrastructures with an interest in biomedical data generation and analysis (such as EGI, Elixir, BBMRI, IMI eTRIKS, INSTRUCT and any other relevant infrastructures resulting from the present and future calls) and those supported by NIH. The PhenoMeNal project, through this work package, will actively participate in the concertation activities, consultations and other meetings and events related to e-infrastructure. The objective is to optimise synergies between projects by providing input and receiving feedback from other stakeholders.
Lead Participant: CIRMMP
- Boost the offering of services by the PhenoMeNal e-infrastructure to the current large scale EU biomedical infrastructures, both physical and electronic, and their users.
- Align PhenoMeNal activities to the requirements of such infrastructures.
- Optimise synergies between projects by providing input and receiving feedback from working groups addressing activities of common interest.
- Establish communication channels and participate in meetings with other European biomedical and e-infrastructures, and those word-wide.
Work Package 5: Operations and Maintenance of PhenoMeNal GRID/Cloud
WP5 will provide the foundation upon where data and analysis services are made possible to be used together on compute resources. It comprises the hardware (compute and storage) as well as middleware for federating queries and resources between sites, enabling the functions in the Virtual Research Community (VRC) portal developed in WP6. Of high importance is the documentation and packaging of infrastructure resources and configurations to allow for easy setup on partner systems, enabling a federated system.
Lead Participant: UU
- Establishment of the PhenoMeNal e-infrastructure
- Operations and maintenance of the PhenoMeNal VRC portal
- Maintenance and provisioning of the PhenoMeNal services in the PhenoMeNal e-infrastructure
Work Package 6: PhenoMeNal Virtual Research Community Gateway
WP6 will create a PhenoMeNal Virtual Research Community (VRC) will be the central access point for all users. This is the online portal enabling user interactions, virtual machine distribution, access to compute resources and information exchange. Users can search for existing public datasets and perform online analysis. The VRC will interact with the PhenoMeNal middleware developed in WP5 to federate on-demand job requests between provisioned data, compute, and analysis resources. Expert help and training will be available through the VRC.
Lead Participant: EMBL-EBI
- Establishment of the PhenoMeNal Virtual Research Community (VRC) portal
JOINT RESEARCH ACTIVITIES
Work Package 7: Privacy and Ethics
WP7 will ensure that all data collected and held within the project will comply with all local laws, regulations and ethics. All personal information will be processed in accordance with accepted Data Protection Principles outlined above. Responsibility for data will be with the host institution/data provider.
Lead Participant: ICl
- Develop appropriate policies, procedures and management accountability and structures to provide a robust governance framework for information management.
- Raise awareness of information governance within the consortium and assure ongoing compliance.
- Provide a forum for information exchange on best practice in clinical data sharing and disclosure.
- Ensure that ethical standards and guidelines of Horizon2020 will be rigorously applied, regardless of the country in which the research is carried out.
Work Package 8: Data provenance, Compliance, and Integrity
WP8 will ensure building a federated infrastructure, by essence, requires the stakeholders to agree on reporting and communication standards for messages and information to be reliably brokered and exchanged between systems. Contingent to a central demand to ensure data evaluation, audit and review imposes that provenance, traceability and integrity of data is delivered. WP8 will therefore develop a set of modular format specifications to achieve maximum efficiency during data exchange with each module focusing on specific component of data management, while retaining compatibility with existing community standards and addressing distinct and complementary scenarios of usage. Data safety and patient de-identification schemes developed in WP7 will be considered.
Lead Participant: UOXF
- Define metadata and data exchange standards, along with technical and user documentations.
- Implement and maintain PhenoMeNal reference implementations.
Work Package 9: Tools, Workflows, Audit and Data Management
WP9 aims to develop and maintain the primary scientific- and technological tools and corresponding interfaces. We will support the data standards defined by WP8 and facilitate the interoperability of tools both within this consortium and those externally developed by the community. We will establish distributed tools for phenomics, metabolomics and bioinformatics processing pipelines and workflows, including longitudinal primary research data management (continuous availability to avoid data lock-in) and data audit mechanisms, as well as quality assurance schemes. Thus, this work package will produce several tailored VMIs which will be the basis for the service activities in WP5.
Lead Participant: IPB
- Specify and integrate software pipelines and tools utilised in the PhenoMeNal e-Infrastructure into VMIs, adhering to data standards developed in WP8 and supporting the interoperability and federation middleware developed in WP5. Most tools will be already available (see table 1.1) and we will develop new applications to complete ‘missing links’ in pipelines. Although two explicit releases for VMIs are listed as deliverables below, we will use public repositories and continuous integration to always provide development snapshots of the infrastructure VMIs.
- Develop methods to scale-up software pipelines for high-throughput analysis, supporting distributed execution on e.g. local clusters, private clouds, federated clouds, or GRIDs.
- Add quality control and quality assurance to pipelines to ensure high quality and reliable data, keep an audit trail of intermediate steps and results.
- Develop methods to present and summarize the results of the pipelines in biomedical and disease contexts.