• English 
  • Spanish 

Setting up THREDDS Data Server for JASMIN Group Workspaces

Large projects are given access to an allocation of shared disk known as a Group Workspace (GWS) on JASMIN. Many projects want to share their data with other communities (publicly or to a restricted group). Currently the options for the GWS Manager are to put some content under a ‘public/’ directory at the top-level of their GWS [1] or provide some file permissions to other JASMIN login accounts [2] in order to provide access. Another service allows the content of a group workspace to be catalogued by the addition of a metadata file by the GWS manager. , providing a limited discovery service (GWS etiquette [3]).

Due to this dispersion of publishing services processes, it highlights the need to facilitate and integrate all these capabilities in a single process and framework. This is why in this work the use of THREDDS Data Server (TDS) is been proposed as a proof of concept for that purpose. TDS is a web server that provides metadata and data access services for scientific datasets, using a variety of remote data access protocols [4]. TDS is implemented in 100% Java, and is contained in a web application archive (WAR) deployment file, which allows very easy installation into the open-source Apache Tomcat application server. TDS combines catalog services with integrated data-serving capabilities, including OPeNDAP, HTTP file serving, and OGC Web Coverage Server (WCS). Using TDS, it is possible to publish a dataset using catalog, OPeNDAP and HTTP services without having to duplicate any other information. Even though it is possible to use both sets of functionality separately, the integration of catalog services and data serving reduces the burden of server configuration and management.

In terms of implementation, in this work a cluster of tomcat server instances is deployed as the backend and together with a load-balancing system based on Apache HTTP (httpd). Load balancers are generally used to distribute and balance client traffic between servers. This strategy allows better strategies for controlling resources. A single server that handles all the incoming requests may not have the capacity to handle high traffic volumes. Because of that, the processing load must be distributed among the cluster of servers. Server load balancing distributes service requests across a group of real servers and makes those servers look like a single large server to the clients.

In this work, httpd works as an application-level load balancer, which means that uses a HTTP Request parameter decides which backend server to use. The Apache httpd has been deployed because of its reliability, security, flexibility, ease of use, cost and availability for multiple