Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Interested researchers can access BETYdb directly from GIS software such as ESRI ArcMap and QGIS. In some cases direct access can simplify the use of spatial data in BETYdb data, but this convenience must be weighed against a more complex setup, limits of GIS software compatibility, and additional complexity of extracting data from a PostGIS SQL database.
Accessing the production BETYdb used by the TERRA REF program requires creating a secure shell tunnel (SSH) to a remote server. After creating the tunnel, the database is accessed as if it were available on the local machine. A step-by-step process is given below.
ArcMap 10.3 or later (Requires Windows operating system)
Instructions for using QGIS and other GIS software are provided below
PuTTY: ssh client for Windows that can be downloaded here: PuTTY
Request access to the BETYdb server by following the link. This will take you to the NCSA identity service. If you do not have an NCSA account, you will be asked to create one. This account and password will be used to login to the database server. Access will generally be granted within 24-hours.
Use PuTTY or your preferred SSH client and your NCSA account. First open the terminal and then login to bety6.ncsa.illinois.edu using ssh from the command line:
After confirming access to bety6 logout by typing exit
.
The following command will create an SSH tunnel
from your computer to the BETYdb server:
Note if have a postgres running on your desktop computer (using the default port 5432), you will need to stop it first.
The above will bind the local port 5432 (first parameter) to port 5432 (second parameter), the default Postgres listening port, on the remote server. All traffic bound for port 5432 on your local machine will be automatically forwarded to the remote server. As a result, programs such as ArcGIS running on your computer will connect to the remote BETYdb as if it were on your computer.
Note you will need to create the SSH connection with the tunnel every time you wish to access BETYdb from your local machine.
To keep the tunnel open, use
note for PuTTY Users: you can configure Putty to remember these settings. In the navigation tree on the left-hand side, click Connection > SSH > Tunnels. Enter '5432' under Source port and 'localhost:5432' in the Destination field. Then click session and save this configuration for future use.
The next section of the guide will discuss accessing BETYdb using ArcMap, querying plots and joining these to the traits and experiments tables. The instructions for setting up a SSH tunnel will also work psql, pgAdmin3, QGIS, and other clients. Instructions for connecting via QGIS and ArcGIS Pro are provided below.
BETYdb is configured with PostGIS geometry support. This allows ArcGIS Desktop clients to access geometry layers stored within BETYdb.
Warning: ArcGIS releases prior to 10.3 required you to place the PostgreSQL libpq files in the ArcGIS client's bin directory. This is no longer required for the ArcGIS Desktop clients but some ESRI tools may still require the library be installed.*
Click on the ArcCatalog icon (on right edge of ArcMap window) to open the ArcCatalog Tree
In the tree, click on 'Database Connections' and then "Add Database Connnections". A Database Connection dialog window will open.
Within the dialog box:
Click OK
The connection will be saved as "Connection to localhost.sde", right
click and rename to it to "TERRA REF BETYdb trait database" to allow easy reuse.
Click on the Add Layer icon (black cross over yellow diamand) button to open the Add Data dialog window.
Under 'Look in' on the second line choose 'Database Connections'.
Select the "TERRA REF BETYdb trait database" that created above
Select the bety.public.sites table and click 'Add'.
This 'sites' table is the only table in the database with a geospatial 'geometry' data type.
Any of the other tables can also be added, as described below.
The New Query Layer dialog will be displayed asking for the Unique Identifier Field for the layer. For the bety.public.sites table, the unique identifier is the "sitename" field.
Click Finish.
Warning: ArcMap does not support the big integer format used by BETYdb as primary keys and those fields will not be visible or available for selection. In most cases you should be able to use other fields as unique identifiers.*
BETYdb contains one geometry table called betydb.public.sites containing the boundaries for each plot. Because the plot boundaries can change each season, and even within season, different plot definitions may be used (e.g. to subset plots or exclude boundary rows), there is significant overlap that can cause confusion when displayed. In general, you will want to use the query layer to limit plots to a single season and a single definition.
Right click the bety.public.sites layer and choose properties.
Choose the Definition Query tab
Add the line sitename LIKE 'MAC Field Scanner Season 1%'
or sitename LIKE 'MAC Field Scanner Season 2%'
to limit the layer to Season 1 or Season 2 respectively.
Click 'OK'
For more advanced selection of sites by experiment or season, you can join the experiments
and experiments_sites
tables. This is beyond the scope of the present tutorial.
Additional tables can be added and joined to the sites table. Tables can be added just like any other layer. In this case, we'll add bety.public.traits_and_yields_view and join it to the bety.public.sites layer.
To create a join with other tables, start by adding the desired table.
Follow instructions above to add the bety.public.traits_and_yields_view
On this table the unique identifier is a group of columns, so select sitename, cultivar, scientificname, trait, date, entity, and method as the unique identifiers.
Right click on the bety.public.sites layer.
Under 'Joins and Relates' select 'Join'.
Choose sitename (from bety.public.sites) in part 1
Choose bety.public.traits_and_yields_view in part 2
Choose sitename in part 3
Click OK
The final section describes how to create a thematic view of the bety.public.sites layer based on the mean attribute where the trait is NDVI from the bety.public.traits_and_yields_view. Remove any previous joins from bety.public.sites (right click bety.public.sites --> joins and relates --> remove join) prior to performing this procedure because we will be selecting the NDVI data by creating a query layer from bety.public.traits_and_yields_view prior to the join.
Right click bety.public_traits_and_yields_view table and select properties
Click on the Definition Query tab
Add the line "trait = 'NDVI'" to the Definition Query box
Click OK
Follow the steps defined in Joining Additional BETYdb Tables
Right click on the bety.sites layer and choose properties
Choose the Symbology tab
Under the Show section, choose Quantities --> Graduated Colors
Under the Fields Value selection choose mean
Click OK
Below connection instructions assume an SSH tunnel exists.
This assumes you have followed instructions for ArcMAP to create a database connection file.
Open ArcCatalog
Under database connections, you will find the connection made above, called 'TERRA REF BETYdb.sde'
right click this and select 'properties'
copy the file path (it should look like C:\Users\<USER NAME>\AppData\Roaming\ESRI\Desktop10.4\ArcCatalog\TERRA REF BETYdb.sde
Open ArcGIS Pro
Under the Insert tab, select connections --> 'add database'
paste the path to 'TERRA REF BETYdb.sde' in the directory navigation bar
select 'TERRA REF BETYdb.sde'
Open QGIS
In left 'browser panel', right-click the PostGIS icon
select 'New Connection'
Enter connection properties
Name: TERRA REF BETYdb trait database
Service: blank
Host: localhost
Port: 5432
Database: bety
SSL mode: disable
Username: viewer
Password: DelchevskoOro
Options: select 'Also list tables with no geometry'
This does not require GIS software other than the PostGIS traits database. While connecting directly to the database within GIS software is handy, it is also straightforward to export Shapefiles.
After you have connected via ssh to the PostGIS server, the pgsql2shp
function is available and can be used to dump out all of the plot and site definitions (names and geometries) thus:
Clowder is an active data repository designed to enable collaboration around a set of shared datasets. TERRAREF uses Clowder to organize, annotate, and process data generated by phenotyping platforms. Datafiles are available via the Clowder web interface or API.
See the Clowder documentation for more information about the software and its applications.
To create an account, sign up at the TERRA-REF Clowder site and wait for your account to be approved. Once access is granted, you can explore collections and datasets.
Data is organized into spaces, collections, and datasets, collections.
Spaces contain collections and datasets. TERRA-REF uses one space for each of the phenotyping platforms.
Collections consist of one or more datasets. TERRA-REF collections are organized by acquisition date and sensor. Users can also create their own collections.
Datasets consist of one or more files with associated metadata collected by one sensor at one time point. Users can annotate, download, and use these sensor datasets.
Clowder allows users to search metadata and filter datasets and files with particular attributes. Simply enter your search terms in the search box.
Clowder includes support for launching integrated analysis environments from your browser, including RStudio and Jupyter Notebooks.
After selecting a dataset, under the "Analysis Environment Instances", select the "Launch new instance with dataset" drop-down, select the desired tool, then the "Launch" button. Select the "Environment manager" link to view the list of active instances. Find your instance and select the title link. This will display the tool with the selected dataset mounted. If you have a running instance, you can also "Upload dataset to existing instance".
Through it's extractor architecture, Clowder supports automated computational workflows. For more information about developing Clowder extractors, see the Extractor Development documentation
The Analysis Workbench allows you to launch private Jupyter Notebook and RStudio instances to explore and analyze TERRA-REF data products.
To create an account, sign up at the TERRA-REF Analysis Workbench site and wait for your account to be approved. Once access is granted, you can launch analysis environments.
Each user has a "home" directory mounted into the analysis tools under /home/userid. This is read-write scratch space.
Data access is provided via a read-only NFS mount to the TERRA-REF dataset on ROGER. The data is mounted to each container under /data/terraref and linked to the analysis environment working directory. For example, in Jupyter this is /home/jovyan/work/data.
TERRA-REF data is available through four different approaches: Globus Connect, Clowder, BETYdb, and CoGe. Raw data is transfered to the primary compute pipeline using Globus Online. Data is ingested into Clowder to support exploratory analysis. The Clowder extractor system is used to transform the data and create derived data products, which are either available via Clowder or published to specialized services, such as BETYdb.
For more information, see the Architecture Documentation.
Clowder is the primary system used to organize, annotate, and process raw data generated by the phenotyping platforms as well as information about sensors.
Use Clowder to explore the raw TERRA-REF data, perform exploratory analysis, and develop custom extractors.
For more information, see Using Clowder.
Raw data is transferred to the primary TERRA-REF compute pipeline on the Resource Open Geospatial Education and Research (ROGER) system using Globus Online. Data is available for Globus transfer via the Terraref endpoint. Direct access to ROGER is restricted.
Use Globus Online when you want to transfer data from the TERRA-REF system for local analysis.
For more information, see Using Globus.
BETYdb contains the derived trait data with plot locations and other information associated with agronomic experimental design.
Use BETYdb to access about derived trait data.
For more information, see Using BETYdb.
CoGe contains genomic information and sequence data.
For more information, see Using CoGe.
Field protocols
Calibration protocols
Field scanner operational log https://github.com/terraref/computing-pipeline/issues/128
CoGe contains genomic data.
CoGe is a platform for performing Comparative Genomics research. It provides an open-ended network of interconnected tools to manage, analyze, and visualize next-gen data.
Coming soon
The Globus Connect service provides high-performance, secure, file transfer and synchronization between endpoints. It also allows you to securely share your data with other Globus users.
To access data via Globus, you must first have a Globus account and endpoint.
Sign up for Globus at globus.org
To request access to the Terraref endpoint, send your Globus id (or University email) to David LeBauer (dlebauer@illinois.edu) with 'TERRAREF Globus Access Request' in the subject. You will be notified once you have been granted access.
To transfer data to your computer or server:
Log into Globus https://www.globus.org
Add an endpoint for the destination (e.g. your local computer) https://www.globus.org/app/endpoints/create-gcp
Go to the 'transfer files' page: https://www.globus.org/app/transfer
Select source
Endpoint: Terraref
Path: Navigate to the subdirectory that you want.
Select (click) a folder
Select (highlight) files that you want to download at destination
Select the endpoint that you set up above of your local computer or server
Select the destination folder (e.g. /~/Downloads/)
Click 'go'
Files will be transfered to your computer
Globus Getting Started
CyVerse is a National Science Foundation funded cyberinfrastructure that aims to democratize access to supercomputing capabilities.
TERRA-REF genomics data is accessible on the CyVerse Data Store and Discovery Environment. Accessing data through the CyVerse Discovery Environment requires signing up for a free CyVerse account. The Discovery Environment gives users access to software and computing resources, so this method has the advantage that TERRA-REF data can be utilized directly without the need to copy the data elsewhere. During the TERRA-REF beta release period, users will need to request access to the TERRA-REF CyVerse Community Data folder through the TERRA-REF beta user application. The TERRA-REF Community Data folder can be found at /iplant/home/shared/terraref
.
BETYdb is used to manage and distribute agricultural and ecological data. It contains phenotype and agronomic data including plot locations and other geolocations of interest (e.g. fields, rows, plants).
To request access to BETYdb, register on the BETYdb web site. You will be notified once you have been granted access.
The primary BETYdb Data Access Guide is largely relevant here, noting the following usages:
Genotypes are stored in the cultivars
table
Plots are stored in the sites
table. Plots are nested hierarchically based on geolocation.
Most tables in BETYdb have search boxes. We describe below how to use the Advanced Search box to query data from these tables and download the results as a CSV file.
The Advanced Search box is the easiest way to download summary datasets designed to have enough information (location, time, species, citations) to be useful for a wide range of use cases.
(For more information about querying data from specific tables, see the BETYdb Data Access Guide.)
On the Welcome page of BETYdb there is a search option for trait and yield data (Figure 1). This tool allows users to search the entire collection of trait and yield data for specific sites, citations, species, and traits.
The results page provides a map interface and the option to download a file containing search results. The downloaded file is in CSV format. This file provides meta-data and provenance information, including the SQL query used to extract the data, the date and time the query was made, the citation source of each result row, and a citation for BETYdb itself.
Using the search box to search trait and yield data is very simple: Type the site (city or site name), species (scientific or common name), cultivar, citation (author and/or year), or trait (variable name or description) into the search box and the results will show contents of BETYdb that match the search. The number of records per page can be changed to accord with the viewer's preference and the search results can be downloaded in the Excel-compatible CSV format.
The search map may be used in conjunction with search terms to restrict search results to a particular geographical area—or even a specific site—by clicking on a map. Clicking on a particular site will restrict results to that site. Clicking in the vicinity of a group of sites but not on a particular site will restrict the search to the region around the point clicked. Alternatively, if a search using search terms is done without clicking on the map, all sites associated with the returned results are highlighted on the map. Then, to zero in on results for a particular geographic area, click on or near highlighted locations on the map.
Produced with Gitbook version