1 of 8

How to Access Data

Overview

TERRA-REF data is available through four different approaches: Globus Connect, Clowder, BETYdb, and CoGe. Raw data is transfered to the primary compute pipeline using Globus Online. Data is ingested into Clowder to support exploratory analysis. The Clowder extractor system is used to transform the data and create derived data products, which are either available via Clowder or published to specialized services, such as BETYdb.

For more information, see the .

Clowder

Clowder is the primary system used to organize, annotate, and process raw data generated by the phenotyping platforms as well as information about sensors.

Use Clowder to explore the raw TERRA-REF data, perform exploratory analysis, and develop custom extractors.

For more information, see .

Globus Connect

Raw data is transferred to the primary TERRA-REF compute pipeline on the (ROGER) system using Globus Online. Data is available for Globus transfer via the . Direct access to ROGER is restricted.

Use Globus Online when you want to transfer data from the TERRA-REF system for local analysis.

For more information, see .

BETYdb

BETYdb contains the derived trait data with plot locations and other information associated with agronomic experimental design.

Use BETYdb to access about derived trait data.

CoGe

CoGe contains genomic information and sequence data.

Other Data

Field protocols
Calibration protocols

Using Clowder (Sensor and Genoomics data)

About Clowder

Clowder is an active data repository designed to enable collaboration around a set of shared datasets. TERRAREF uses Clowder to organize, annotate, and process data generated by phenotyping platforms. Datafiles are available via the Clowder web interface or API.

See the Clowder documentation for more information about the software and its applications.

Requesting Access

To create an account, sign up at the TERRA-REF Clowder site and wait for your account to be approved. Once access is granted, you can explore collections and datasets.

Data organization

Data is organized into spaces, collections, and datasets, collections.

Spaces contain collections and datasets. TERRA-REF uses one space for each of the phenotyping platforms.
Collections consist of one or more datasets. TERRA-REF collections are organized by acquisition date and sensor. Users can also create their own collections.
Datasets consist of one or more files with associated metadata collected by one sensor at one time point. Users can annotate, download, and use these sensor datasets.

Searching the database

Clowder allows users to search metadata and filter datasets and files with particular attributes. Simply enter your search terms in the search box.

Analyzing data in Clowder

Clowder includes support for launching integrated analysis environments from your browser, including RStudio and Jupyter Notebooks.

After selecting a dataset, under the "Analysis Environment Instances", select the "Launch new instance with dataset" drop-down, select the desired tool, then the "Launch" button. Select the "Environment manager" link to view the list of active instances. Find your instance and select the title link. This will display the tool with the selected dataset mounted. If you have a running instance, you can also "Upload dataset to existing instance".

Clowder Extractors

Through it's extractor architecture, Clowder supports automated computational workflows. For more information about developing Clowder extractors, see the Extractor Development documentation

Using Globus (Sensor and Genomics data)

About Globus Connect

The Globus Connect service provides high-performance, secure, file transfer and synchronization between endpoints. It also allows you to securely share your data with other Globus users.

Installing Globus

To access data via Globus, you must first have a Globus account and endpoint.

Sign up for Globus at globus.org
Download and install Globus Connect Personal or Server.

Requesting Access

To request access to the Terraref endpoint, send your Globus id (or University email) to David LeBauer (dlebauer@illinois.edu) with 'TERRAREF Globus Access Request' in the subject. You will be notified once you have been granted access.

Accessing Data via Globus

To transfer data to your computer or server:

Log into Globus https://www.globus.org
Add an endpoint for the destination (e.g. your local computer) https://www.globus.org/app/endpoints/create-gcp
Go to the 'transfer files' page: https://www.globus.org/app/transfer
Select source
- Endpoint: Terraref
- Path: Navigate to the subdirectory that you want.
- Select (click) a folder
- Select (highlight) files that you want to download at destination
- Select the endpoint that you set up above of your local computer or server
- Select the destination folder (e.g. /~/Downloads/)
Click 'go'
Files will be transfered to your computer

Using BETYdb (trait data, experimental metadata)

About BETYdb

BETYdb is used to manage and distribute agricultural and ecological data. It contains phenotype and agronomic data including plot locations and other geolocations of interest (e.g. fields, rows, plants).

Requesting access

To request access to BETYdb, register on the BETYdb web site. You will be notified once you have been granted access.

Data organization

The primary BETYdb Data Access Guide is largely relevant here, noting the following usages:

Genotypes are stored in the cultivars table
Plots are stored in the sites table. Plots are nested hierarchically based on geolocation.

Most tables in BETYdb have search boxes. We describe below how to use the Advanced Search box to query data from these tables and download the results as a CSV file.

The Advanced Search box is the easiest way to download summary datasets designed to have enough information (location, time, species, citations) to be useful for a wide range of use cases.

(For more information about querying data from specific tables, see the BETYdb Data Access Guide.)

On the Welcome page of BETYdb there is a search option for trait and yield data (Figure 1). This tool allows users to search the entire collection of trait and yield data for specific sites, citations, species, and traits.

The results page provides a map interface and the option to download a file containing search results. The downloaded file is in CSV format. This file provides meta-data and provenance information, including the SQL query used to extract the data, the date and time the query was made, the citation source of each result row, and a citation for BETYdb itself.

Instructions

Using the search box to search trait and yield data is very simple: Type the site (city or site name), species (scientific or common name), cultivar, citation (author and/or year), or trait (variable name or description) into the search box and the results will show contents of BETYdb that match the search. The number of records per page can be changed to accord with the viewer's preference and the search results can be downloaded in the Excel-compatible CSV format.

The search map may be used in conjunction with search terms to restrict search results to a particular geographical area—or even a specific site—by clicking on a map. Clicking on a particular site will restrict results to that site. Clicking in the vicinity of a group of sites but not on a particular site will restrict the search to the region around the point clicked. Alternatively, if a search using search terms is done without clicking on the map, all sites associated with the returned results are highlighted on the map. Then, to zero in on results for a particular geographic area, click on or near highlighted locations on the map.

Accessing BETYdb via ArcMap and other GIS software

Interested researchers can access BETYdb directly from GIS software such as ESRI ArcMap and QGIS. In some cases direct access can simplify the use of spatial data in BETYdb data, but this convenience must be weighed against a more complex setup, limits of GIS software compatibility, and additional complexity of extracting data from a PostGIS SQL database.

Overview

Accessing the production BETYdb used by the TERRA REF program requires creating a secure shell tunnel (SSH) to a remote server. After creating the tunnel, the database is accessed as if it were available on the local machine. A step-by-step process is given below.

Configuration used for these instructions

ArcMap 10.3 or later (Requires Windows operating system)
Instructions for using QGIS and other GIS software are provided below
PuTTY: ssh client for Windows that can be downloaded here: PuTTY

Setup

Request Access

Request access to the BETYdb server by following the link. This will take you to the NCSA identity service. If you do not have an NCSA account, you will be asked to create one. This account and password will be used to login to the database server. Access will generally be granted within 24-hours.

Confirm Access

Use PuTTY or your preferred SSH client and your NCSA account. First open the terminal and then login to bety6.ncsa.illinois.edu using ssh from the command line:

ssh <login>@bety6.ncsa.illinois.edu

After confirming access to bety6 logout by typing exit.

Create SSH Tunnel to BETYdb

The following command will create an SSH tunnel from your computer to the BETYdb server:

Note if have a postgres running on your desktop computer (using the default port 5432), you will need to stop it first.

ssh -L 5432:localhost:5432 <login>@bety6.ncsa.illinois.edu

The above will bind the local port 5432 (first parameter) to port 5432 (second parameter), the default Postgres listening port, on the remote server. All traffic bound for port 5432 on your local machine will be automatically forwarded to the remote server. As a result, programs such as ArcGIS running on your computer will connect to the remote BETYdb as if it were on your computer.

Note you will need to create the SSH connection with the tunnel every time you wish to access BETYdb from your local machine.

To keep the tunnel open, use

ssh -Nf -L 5432:localhost:5432 <login>@bety6.ncsa.illinois.edu

note for PuTTY Users: you can configure Putty to remember these settings. In the navigation tree on the left-hand side, click Connection > SSH > Tunnels. Enter '5432' under Source port and 'localhost:5432' in the Destination field. Then click session and save this configuration for future use.

The next section of the guide will discuss accessing BETYdb using ArcMap, querying plots and joining these to the traits and experiments tables. The instructions for setting up a SSH tunnel will also work psql, pgAdmin3, QGIS, and other clients. Instructions for connecting via QGIS and ArcGIS Pro are provided below.

Using ArcMAP

Add BETYdb Layer or Table to ArcMap

BETYdb is configured with PostGIS geometry support. This allows ArcGIS Desktop clients to access geometry layers stored within BETYdb.

Warning: ArcGIS releases prior to 10.3 required you to place the PostgreSQL libpq files in the ArcGIS client's bin directory. This is no longer required for the ArcGIS Desktop clients but some ESRI tools may still require the library be installed.*

Click on the ArcCatalog icon (on right edge of ArcMap window) to open the ArcCatalog Tree
In the tree, click on 'Database Connections' and then "Add Database Connnections". A Database Connection dialog window will open.

Within the dialog box:

Database Platform: PostgreSQL
Instance: localhost
Authentication Type: Database authentication
User name: viewer
Password: DelchevskoOro 
Database: select bety (if everything else is correct)

Click OK
The connection will be saved as "Connection to localhost.sde", right
click and rename to it to "TERRA REF BETYdb trait database" to allow easy reuse.
Click on the Add Layer icon (black cross over yellow diamand) button to open the Add Data dialog window.
Under 'Look in' on the second line choose 'Database Connections'.
Select the "TERRA REF BETYdb trait database" that created above
Select the bety.public.sites table and click 'Add'.
- This 'sites' table is the only table in the database with a geospatial 'geometry' data type.
- Any of the other tables can also be added, as described below.
The New Query Layer dialog will be displayed asking for the Unique Identifier Field for the layer. For the bety.public.sites table, the unique identifier is the "sitename" field.
Click Finish.

Warning: ArcMap does not support the big integer format used by BETYdb as primary keys and those fields will not be visible or available for selection. In most cases you should be able to use other fields as unique identifiers.*

Modifying the Query Layer

BETYdb contains one geometry table called betydb.public.sites containing the boundaries for each plot. Because the plot boundaries can change each season, and even within season, different plot definitions may be used (e.g. to subset plots or exclude boundary rows), there is significant overlap that can cause confusion when displayed. In general, you will want to use the query layer to limit plots to a single season and a single definition.

Right click the bety.public.sites layer and choose properties.
Choose the Definition Query tab
Add the line sitename LIKE 'MAC Field Scanner Season 1%' or sitename LIKE 'MAC Field Scanner Season 2%' to limit the layer to Season 1 or Season 2 respectively.
Click 'OK'

For more advanced selection of sites by experiment or season, you can join the experiments and experiments_sites tables. This is beyond the scope of the present tutorial.

Joining Additional BETYdb Tables

Additional tables can be added and joined to the sites table. Tables can be added just like any other layer. In this case, we'll add bety.public.traits_and_yields_view and join it to the bety.public.sites layer.

To create a join with other tables, start by adding the desired table.
Follow instructions above to add the bety.public.traits_and_yields_view
On this table the unique identifier is a group of columns, so select sitename, cultivar, scientificname, trait, date, entity, and method as the unique identifiers.
Right click on the bety.public.sites layer.
Under 'Joins and Relates' select 'Join'.
Choose sitename (from bety.public.sites) in part 1
Choose bety.public.traits_and_yields_view in part 2
Choose sitename in part 3
Click OK

Creating a Thematic View

The final section describes how to create a thematic view of the bety.public.sites layer based on the mean attribute where the trait is NDVI from the bety.public.traits_and_yields_view. Remove any previous joins from bety.public.sites (right click bety.public.sites --> joins and relates --> remove join) prior to performing this procedure because we will be selecting the NDVI data by creating a query layer from bety.public.traits_and_yields_view prior to the join.

Right click bety.public_traits_and_yields_view table and select properties
Click on the Definition Query tab
Add the line "trait = 'NDVI'" to the Definition Query box
Click OK
Follow the steps defined in Joining Additional BETYdb Tables
Right click on the bety.sites layer and choose properties
Choose the Symbology tab
Under the Show section, choose Quantities --> Graduated Colors
Under the Fields Value selection choose mean
Click OK

Connecting to Other GIS Software

Below connection instructions assume an SSH tunnel exists.

ArcGIS Pro

This assumes you have followed instructions for ArcMAP to create a database connection file.

Open ArcCatalog
- Under database connections, you will find the connection made above, called 'TERRA REF BETYdb.sde'
- right click this and select 'properties'
- copy the file path (it should look like C:\Users\<USER NAME>\AppData\Roaming\ESRI\Desktop10.4\ArcCatalog\TERRA REF BETYdb.sde
Open ArcGIS Pro
- Under the Insert tab, select connections --> 'add database'
- paste the path to 'TERRA REF BETYdb.sde' in the directory navigation bar
- select 'TERRA REF BETYdb.sde'

QGIS

Open QGIS
In left 'browser panel', right-click the PostGIS icon
select 'New Connection'
Enter connection properties
- Name: TERRA REF BETYdb trait database
- Service: blank
- Host: localhost
- Port: 5432
- Database: bety
- SSL mode: disable
- Username: viewer
- Password: DelchevskoOro
- Options: select 'Also list tables with no geometry'

How to export plots from PostGIS as a Shapefile

This does not require GIS software other than the PostGIS traits database. While connecting directly to the database within GIS software is handy, it is also straightforward to export Shapefiles.

After you have connected via ssh to the PostGIS server, the pgsql2shp function is available and can be used to dump out all of the plot and site definitions (names and geometries) thus:

pgsql2shp -f terra_plots.shp -h localhost -u bety -P bety bety \ 
         "SELECT sitename, geometry FROM sites"

Using CoGe (Genomics)

CoGe contains genomic data.

About CoGe

CoGe is a platform for performing Comparative Genomics research. It provides an open-ended network of interconnected tools to manage, analyze, and visualize next-gen data.

Requesting Access

Coming soon

Using CyVerse (Genomics)

About CyVerse

is a National Science Foundation funded cyberinfrastructure that aims to democratize access to supercomputing capabilities.

Accessing Data via CyVerse

TERRA-REF genomics data is accessible on the CyVerse Data Store and Discovery Environment. Accessing data through the CyVerse Discovery Environment requires signing up for a free CyVerse account. The Discovery Environment gives users access to software and computing resources, so this method has the advantage that TERRA-REF data can be utilized directly without the need to copy the data elsewhere. During the TERRA-REF , users will need to request access to the TERRA-REF CyVerse Community Data folder through the TERRA-REF . The TERRA-REF Community Data folder can be found at /iplant/home/shared/terraref.

Using Analysis Workbench (all data)

About the Analysis Workbench

The Analysis Workbench allows you to launch private Jupyter Notebook and RStudio instances to explore and analyze TERRA-REF data products.

Requesting Access

To create an account, sign up at the TERRA-REF Analysis Workbench site and wait for your account to be approved. Once access is granted, you can launch analysis environments.

Scratch Space

Each user has a "home" directory mounted into the analysis tools under /home/userid. This is read-write scratch space.

Data Access

Data access is provided via a read-only NFS mount to the TERRA-REF dataset on ROGER. The data is mounted to each container under /data/terraref and linked to the analysis environment working directory. For example, in Jupyter this is /home/jovyan/work/data.

Accessing BETYdb via ArcMap and other GIS software

Overview

Configuration used for these instructions

ArcMap 10.3 or later (Requires Windows operating system)
Instructions for using QGIS and other GIS software are provided below
PuTTY: ssh client for Windows that can be downloaded here: PuTTY

Setup

Request Access

Confirm Access

Use PuTTY or your preferred SSH client and your NCSA account. First open the terminal and then login to bety6.ncsa.illinois.edu using ssh from the command line:

ssh <login>@bety6.ncsa.illinois.edu

After confirming access to bety6 logout by typing exit.

Create SSH Tunnel to BETYdb

The following command will create an SSH tunnel from your computer to the BETYdb server:

Note if have a postgres running on your desktop computer (using the default port 5432), you will need to stop it first.

ssh -L 5432:localhost:5432 <login>@bety6.ncsa.illinois.edu

Note you will need to create the SSH connection with the tunnel every time you wish to access BETYdb from your local machine.

To keep the tunnel open, use

ssh -Nf -L 5432:localhost:5432 <login>@bety6.ncsa.illinois.edu

note for PuTTY Users: you can configure Putty to remember these settings. In the navigation tree on the left-hand side, click Connection > SSH > Tunnels. Enter '5432' under Source port and 'localhost:5432' in the Destination field. Then click session and save this configuration for future use.

Using ArcMAP

Add BETYdb Layer or Table to ArcMap

BETYdb is configured with PostGIS geometry support. This allows ArcGIS Desktop clients to access geometry layers stored within BETYdb.

Warning: ArcGIS releases prior to 10.3 required you to place the PostgreSQL libpq files in the ArcGIS client's bin directory. This is no longer required for the ArcGIS Desktop clients but some ESRI tools may still require the library be installed.*

Click on the ArcCatalog icon (on right edge of ArcMap window) to open the ArcCatalog Tree
In the tree, click on 'Database Connections' and then "Add Database Connnections". A Database Connection dialog window will open.

Within the dialog box:

Database Platform: PostgreSQL
Instance: localhost
Authentication Type: Database authentication
User name: viewer
Password: DelchevskoOro 
Database: select bety (if everything else is correct)

Click OK
The connection will be saved as "Connection to localhost.sde", right
click and rename to it to "TERRA REF BETYdb trait database" to allow easy reuse.
Click on the Add Layer icon (black cross over yellow diamand) button to open the Add Data dialog window.
Under 'Look in' on the second line choose 'Database Connections'.
Select the "TERRA REF BETYdb trait database" that created above
Select the bety.public.sites table and click 'Add'.
- This 'sites' table is the only table in the database with a geospatial 'geometry' data type.
- Any of the other tables can also be added, as described below.
The New Query Layer dialog will be displayed asking for the Unique Identifier Field for the layer. For the bety.public.sites table, the unique identifier is the "sitename" field.
Click Finish.

Warning: ArcMap does not support the big integer format used by BETYdb as primary keys and those fields will not be visible or available for selection. In most cases you should be able to use other fields as unique identifiers.*

Modifying the Query Layer

Right click the bety.public.sites layer and choose properties.
Choose the Definition Query tab
Add the line sitename LIKE 'MAC Field Scanner Season 1%' or sitename LIKE 'MAC Field Scanner Season 2%' to limit the layer to Season 1 or Season 2 respectively.
Click 'OK'

For more advanced selection of sites by experiment or season, you can join the experiments and experiments_sites tables. This is beyond the scope of the present tutorial.

Joining Additional BETYdb Tables

To create a join with other tables, start by adding the desired table.
Follow instructions above to add the bety.public.traits_and_yields_view
On this table the unique identifier is a group of columns, so select sitename, cultivar, scientificname, trait, date, entity, and method as the unique identifiers.
Right click on the bety.public.sites layer.
Under 'Joins and Relates' select 'Join'.
Choose sitename (from bety.public.sites) in part 1
Choose bety.public.traits_and_yields_view in part 2
Choose sitename in part 3
Click OK

Creating a Thematic View

Right click bety.public_traits_and_yields_view table and select properties
Click on the Definition Query tab
Add the line "trait = 'NDVI'" to the Definition Query box
Click OK
Follow the steps defined in Joining Additional BETYdb Tables
Right click on the bety.sites layer and choose properties
Choose the Symbology tab
Under the Show section, choose Quantities --> Graduated Colors
Under the Fields Value selection choose mean
Click OK

Connecting to Other GIS Software

Below connection instructions assume an SSH tunnel exists.

ArcGIS Pro

This assumes you have followed instructions for ArcMAP to create a database connection file.

Open ArcCatalog
- Under database connections, you will find the connection made above, called 'TERRA REF BETYdb.sde'
- right click this and select 'properties'
- copy the file path (it should look like C:\Users\<USER NAME>\AppData\Roaming\ESRI\Desktop10.4\ArcCatalog\TERRA REF BETYdb.sde
Open ArcGIS Pro
- Under the Insert tab, select connections --> 'add database'
- paste the path to 'TERRA REF BETYdb.sde' in the directory navigation bar
- select 'TERRA REF BETYdb.sde'

QGIS

Open QGIS
In left 'browser panel', right-click the PostGIS icon
select 'New Connection'
Enter connection properties
- Name: TERRA REF BETYdb trait database
- Service: blank
- Host: localhost
- Port: 5432
- Database: bety
- SSL mode: disable
- Username: viewer
- Password: DelchevskoOro
- Options: select 'Also list tables with no geometry'

How to export plots from PostGIS as a Shapefile

This does not require GIS software other than the PostGIS traits database. While connecting directly to the database within GIS software is handy, it is also straightforward to export Shapefiles.

After you have connected via ssh to the PostGIS server, the pgsql2shp function is available and can be used to dump out all of the plot and site definitions (names and geometries) thus:

pgsql2shp -f terra_plots.shp -h localhost -u bety -P bety bety \ 
         "SELECT sitename, geometry FROM sites"

How to Access Data

Overview

Clowder

Globus Connect

BETYdb

CoGe

Other Data

Using Clowder (Sensor and Genoomics data)

About Clowder

Requesting Access

Data organization

Searching the database

Analyzing data in Clowder

Clowder Extractors

Using Globus (Sensor and Genomics data)

About Globus Connect

Installing Globus

Requesting Access

Accessing Data via Globus

See also

Using BETYdb (trait data, experimental metadata)

About BETYdb

Requesting access

Data organization

Using the Advanced Search box

Using the Search Box

Instructions

See also

Accessing BETYdb via ArcMap and other GIS software

Overview

Configuration used for these instructions

Setup

Request Access

Confirm Access

Create SSH Tunnel to BETYdb

Using ArcMAP

Add BETYdb Layer or Table to ArcMap

Modifying the Query Layer

Joining Additional BETYdb Tables

Creating a Thematic View

Connecting to Other GIS Software

ArcGIS Pro

QGIS

How to export plots from PostGIS as a Shapefile

Using CoGe (Genomics)

About CoGe

Requesting Access

Using CyVerse (Genomics)

About CyVerse

Accessing Data via CyVerse

Using Analysis Workbench (all data)

About the Analysis Workbench

Requesting Access

Scratch Space

Data Access

Using Globus (Sensor and Genomics data)

About Globus Connect

Installing Globus

Requesting Access

Accessing Data via Globus

See also

Accessing BETYdb via ArcMap and other GIS software

Overview

Configuration used for these instructions

Setup

Request Access

Confirm Access

Create SSH Tunnel to BETYdb

Using ArcMAP

Add BETYdb Layer or Table to ArcMap

Modifying the Query Layer

Joining Additional BETYdb Tables

Creating a Thematic View

Connecting to Other GIS Software

ArcGIS Pro

QGIS

How to export plots from PostGIS as a Shapefile

Using CoGe (Genomics)

About CoGe

Requesting Access