Virtual Earthquake and seismology Research Community e-science environment in Europe

Verce logoVerce gateway

Reporting Period 1 (1 October 2011 - 31 March 2012)

M6

Deliverables

Publishable Summary
WP2 (Pilot applications and use cases)
Following a survey, nine applications and use cases have been analyzed. Two pilot applications and use cases - to be enabled by the VERCE platform - have been selected for the first stage of the project based on their scientific impact and their relevance for the seismology community. The first one is a data-intensive statistical analysis application based upon the innovative ambient seismic noise correlation method; it is pivotal to enable exploration and analysis of large data (~100s TBs) for detection of seismic sources and monitoring of transient Earth property changes. The scientific use case was designed - and organized - as macro-modules that include parallel data management and data staging across distributed storage resources, data processing and data analysis steps shared by many other related use cases. The second one is a dataintensive modeling application based on the HPC seismic waveform simulation and inversion methods; it is pivotal to enable high-resolution 3D seismic imaging through exploration of the data and the model spaces. The use case was designed also as macro-modules that include orchestrated data and work flows across community Data archives, Grid and HPC e-infrastructures with different identification, authentication and data management policies, and a number of existing HPC simulation codes to be provided as services. This use case is seen as a step toward a next 3D waveform adjoint-based inversion use case.

Deliverable: D-NA2.1.pdf
WP3 (Training and user documentation)
A first documentation and training work plan has been designed and planned. It includes: an internal component targeted toward a shared ontology and community of practice across the consortium; an external component targeted toward the seismology community at large. The latter is built in synergy with the ITN Quest and the ESFRI-EPOS projects, and is coordinated with other EU e-Infrastructure projects, e.g. PRACE, EGI, and EUDAT. The training and documentation plan includes: training workshops; news, documents, videos and webinars - collected from VERCE and related EU projects - and that will be provided through a dedicated section in the VERCE website; a helpdesk has been set up for training questions and requests. A first internal workshop will take place at the University of Liverpool, the first week of September 2012, and will be focussed on the pilot applications and the work and data flow engines.

Deliverable: D-NA3.1.pdf
WP4 (Dissemination and public outreach)
A dissemination and public outreach work plan has been defined and planed. Identified targeted audience includes: VERCE partners; seismology and solid Earth research community; IT community through the European e-Infrastructures and related projects; Industry actors, e.g. in hydrocarbon and resource exploration geophysics, containment of underground wastes, carbon sequestration; general public and national agencies. Existing communication tools provided by the VERCE partners have been reviewed with a survey. Well-established outreach channels of ORFEUS and EMSC - the two seismological NPOs - and of EPOS will be used. Through those links to other international NPOs - e.g. IRIS, Earthscope and UNAVCO in the US, JAMSTEC and NIED in Japan - will allow to reach a broad international audience. A dedicated dissemination section was set up within the project website (http://www.verce.eu/). It will include: newsletters, leaflets, posters, links to social networks, and press releases. A first ID card and a poster were created to present the project. The project was presented in a number of international conferences (e.g., EGU, AGU) and European meetings (e.g., EGI user forum, DG-INFSO coordination meetings...). A list of forthcoming presentation occasions is continuously updated. A first set of European projects have been targeted and contacted. In the solid Earth community these include: the ESFRI project EPOS, the ERC projects WHISPER and WAVETOMO, the ITN project QUEST, and related European infrastructure projects NERA, SHARE, REAKT. Other multi-disciplinary e-Infrastructure projects are: EUDAT and ENVRI where partners of VERCE are involved. Toward the IT community, VERCE will make use of the channels provided by EGI, PRACE.

Deliverable: D-NA4.1.pdf
WP5 (Management and operation of the research platform)
The VERCE research platform will be operated on top of a set of distributed public and private data and computing resources provided by European and national Grid and HPC e-Infrastructures and by a number of VERCE partners. A collection of resources and infrastructures to be integrated into the initial test bed, on which the VERCE platform will be deployed, has been identified and properly described, e.g. in terms of access policy, identification, authentication, data and computing hardware components and system management, middleware and service components. The initial strategy for the test bed integration is to make use of software components already available and adopted by the existing e-Infrastructures, e.g. PRACE and EGI, the ADMIRE platform and other resource providers. One major software provider is EMI, already collaborating with ERC, dCache, gLite and UNICORE. Initial resources of the VERCE test bed use different authentication mechanisms. Those need to be federated by another layer providing a single-sign-on solution. An initial pragmatic strategy is based upon X.509 certificates, and alternative services accepted by PRACE/EGI, e.g. like gateways, allowing users to authenticate using different scheme. A first step toward a Federated Identity Management (FIM) will be based upon the extension of the LDAP database for User Authorization and Authentication by Shibboleth components and SAML for services. For EGI, a VOMS service will be provided in the next months, as soon as the EGI VERCE VO will be registered. The first selected tools to be deployed, as detailed below, involve data management, job management, and seismological software, together with software components from the ADMIRE stack provided by UEDIN. The software repository will be based on SVN (Apache version) already integrated to the VERCE Redmine collaborative platform. Finally a monitoring strategy and metrics has been defined based on Nagios (EGI), Inca (PRACE) and Ganglia (private resources).

Deliverable: D-SA1.1.pdf
WP6 (Integration and evaluation of the platform services)
A Plan-Do-Check-Act (PDCA) cycle has been selected to manage the platform release process. Each cycle is estimated to be one year with two overlapping cycles to facilitate a six-monthly release of the platform in order to mitigate the risk of missed releases. A release schedule and recommended work practices have been defined and documented. A "Software Component Release Request Form" has been prepared to improve communication with the other work packages when a release request is submitted. Generic and software component specific tests have also been defined and planned to ensure that the accepted software components are of a recommended quality and are properly integrated into the VERCE platform, together with key performance indicators to assess the quality of service of the platform. A portfolio of software components has been selected from a list of potentially useful software components studied by the architecture team, together with a list of currently available software components on the test bed. Priority was given to common components that are currently available on the test bed to support immediate access, and development and integration work. The selected components are summarised as follows: secure user access (X.509 certificate-based authentication, GSISSH, VOMS, MyProxy, Shibboleth, LDAP); resource management (VERCE gateway and resource database, LDAP, SAML, VOMS); job submission and management (Glite WMS/CREAM, GRAM, UNICORE); data management (GridFTP, gLite LFC, OGSADAI, IRODS, dCache, ArcLink); Database access and Metadata services (OGSA-DAI, GreIC, AMGA), seismic software (ObsPy, rdseed, SAC).

Deliverable: D-SA2.1.pdf
WP7 (Scientific gateway, user interfaces and knowledge and method sharing)
The scientific gateway is meant to be a community-specific web portal, enabling the use of a well-defined and targeted set of tools for data-intensive analysis and modeling applications, data collections and services. It will provide access to the underlying set of HPC, Grid and Data-intensive resources available within the VERCE consortium. The existing seismology portal implemented within other projects, with the joint efforts of the current VERCE participants, was reviewed to evaluate the advantage of adopting similar technologies and interaction paradigms for the VERCE's gateway developments. New functionalities and services described in this project have been reviewed for implementation. The initial core gateway's components was selected and described; it includes the user interface, the job management service, the registry of data resources, the registry of workflows, and the registry of processing elements. The development of the gateway will require frequent interaction with the seismology stakeholders, in order to refine the gateway components needed for each use case's implementation. Another concern is to make it compatible with the existing e-Infrastructure initiatives, e.g. PRACE, EGI and EPOS. Based upon the experience gained, working the seismology community, the interfacing strategy between the users and the back-end infrastructures providing data and services has identify two classes of potential users: those looking for services to connect to and those who need graphical tools to interact with. The right balance between the two access paradigms will be driven by the analysis of the scientific use cases through an iterative process involving several work packages.

Deliverable: D-SA3.1.pdf
WP8 (Harnessing data-intensive applications)
The two first selected pilot applications and scientific use cases have been analyzed. These applications have been found among the VERCE participants, the world research and development leaders. For the dataintensive analysis application, the current implementation is well advanced with open source libraries of processing and analysis routines, well documented. Several issues have been identified; predominantly related to scalable and transparent data management across distributed storage - e.g., continuously updated massive data sets (as large binary objects) composed of small application-level objects, efficient fine grain access and dynamically adjustable chunk sizes, concurrency-oriented version based interface supporting asynchronous highly parallel data workflows, high throughput under heavy access concurrency, efficient data staging strategy, instantiation of perfectly parallel IO intensive processing stages and CPU-intensive correlation stages on respectively well-balanced data crunching and hybrid GPGPU computing architectures. For the HPC data-intensive modeling application, the analysis reviewed a number of existing HPC wave simulation codes, e.g. development level, modularity, documentation and user community, and selected four codes to be enabled on the HPC infrastructures and provided as a service with the use case. Several issues have been identified predominantly related to the orchestration of the data workflow across different e-infrastructure and private resources with heterogeneous access and data management policies. This analysis has been translated into requirements for the VERCE platform.

Deliverable: D-JRA1.1.pdf
WP9 (VERCE architecture and tools for data analysis and data modelling applications)
The most significant components in the initial iteration of the VERCE architecture have been identified as follows. The Gateway is based on a technology developed during the ADMIRE project; it is the hub of the VERCE platform, delegating the enactment of user workflows to available distributed resources. Currently, only workflows written in the Dispel workflow language are accepted for deployment and execution on OGSA-DAI services. The language Dispel is chosen for the present at least for three reasons: (i) it is dataflow based for multi-site enactment, (ii) it has functions to describe work patterns, and (iii) it is designed for human communication and to avoid detail that inhibits automated mapping and optimisation. The Grid integration, especially for data movement, is very important for VERCE. Greater support for GridFTP for reliable file movement has been integrated into OGSA-DAI in accordance with the specifications of the Globus project. The Obspy integration, in order to promote uptake of any platform, is important to make as simple as possible for researchers to continue to use the languages, tools and libraries that they already use fluently. A generic mechanism for implanting arbitrary Python (but specifically Obspy5) scripts into OGSADAI activities has been prototyped to this end. A Cross-correlation test case was put together in order to test the prototype platform and identify early requirements vis-à-vis data handling, process execution and distribution. The test case was tested on EDIM1, the Edinburgh Data Intensive Machine, a data-brick compute cluster operated by the University of Edinburgh. The Seismic Data eXplorer tool, developed at Liverpool University - initially as part of the RapidSeis project - is a tool for seismic waveform analysis and has been further refined within VERCE.

Deliverable: D-JRA2.1.pdf