TERRA-REF data can be accessed through many different interfaces: Globus, Clowder, BETYdb, CyVerse, and CoGe. Raw data is transfered to the primary compute pipeline using Globus Online. Data is ingested into Clowder to support exploratory analysis. The Clowder extractor system is used to transform the data and create derived data products, which are either available via Clowder or published to specialized services, such as BETYdb.
We have developed tutorials to provide users with both 'quick start' vignettes and more detailed introductions to TERRA REF datasets. Tutorials for accessing trait data, sensor data, and genomics data are organized by directory ("traits", "sensors", and "genomics").
The tutorials assume familiarity with or willingness to learn Python and / or R, and provide the greatest flexibility and access to available data.
BETYdb is used to manage and distribute agricultural and ecological data. It contains phenotype and agronomic data including plot locations and other geolocations of interest (e.g. fields, rows, plants).
BETYdb contains the derived trait data with plot locations and other information associated with agronomic experimental design.
Using SQL and PostGIS with Docker (Advanced Users)
The fastest and most comprehensive way to access the database using SQL and other database interfaces (such as the R package dplyr interface described below, or GIS programs described in . You can run an instance of the database using docker, as described below
This is how you can access the TERRA REF trait database. It requires that you install the Docker software on your computer.
The easiest way to get the entire database, including metadata. Assuming you are familiar with the Postgres and / or the R dbplyr library documentation. See the TERRA REF Tutorials terraref.org/tutorials, the BETYdb Data Access guide for additional examples.
#git clone https://github.com/terraref/data-paper
docker-compose up -d postgres
docker-compose run --rm bety initialize
docker-compose run --rm bety sync
psql -d bety -U bety -W bety
bety_src <- src_postgres(dbname = "bety",
password = 'bety',
host = 'localhost',
user = 'bety',
port = 5433)
Interested researchers can access BETYdb directly from GIS software such as ESRI ArcMap and QGIS.
In some cases direct access can simplify the use of spatial data in BETYdb.See the Appendix Accessing BETYdb with GIS Software for more information.
Clowder: Sensor Data and Metadata Browser
Clowder is an active data repository designed to enable collaboration around a set of shared datasets. TERRAREF uses Clowder to organize, annotate, and process data generated by phenotyping platforms. Datafiles are available via the Clowder web interface or API.
Clowder is the used to organize, annotate, and process raw data generated by the field scanner and other phenotyping platforms. It also stores information about sensors. Learn more about Clowder software from https://clowderframework.org
Data organization in Clowder
Data is organized into spaces, collections, and datasets, collections.
Spaces contain collections and datasets. TERRA-REF uses one space for each of the phenotyping platforms.
Collections consist of one or more datasets. TERRA-REF collections are organized by acquisition date and sensor. Users can also create their own collections.
Datasets consist of one or more files with associated metadata collected by one sensor at one time point. Users can annotate, download, and use these sensor datasets.
CyVerse is a National Science Foundation funded cyberinfrastructure that aims to democratize access to supercomputing capabilities.
TERRA-REF genomics data is accessible on the CyVerse Data Store and Discovery Environment. Accessing data through the CyVerse Discovery Environment requires signing up for a free CyVerse account. The Discovery Environment gives users access to software and computing resources, so this method has the advantage that TERRA-REF data can be utilized directly without the need to copy the data elsewhere.