Accession - plant materials collected from a particular area.
Active reflectance - measurement of light originating from a sensor that reflects off of an object and back to the sensor
Algorithm - a process or set of rules to be followed in calculations or other problem-solving operations
Alignment, sequence - a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences
API (application programming interface) - a set of routine definitions, protocols, and tools for building software and applications.
BAM (Binary Alignment/Map) format - binary format for storing sequence data.
BED (Browser Extensible Data) format - format consisting of one line per feature, each containing 3-12 columns of data, plus optional track definition lines.
BETYdb (Biofuel Ecophysiological Traits and Yields database) - a web-based database of plant trait and yield data that supports research, forecasting, and decision making associated with the development and production of cellulosic biofuel crops
BRDF (Bidirectional Reflectance Distribution Function) - a function of four real variables that defines how light is reflected at an opaque surface.
Breeding Management System (BMS) - an information management system developed by the Integrated Breeding Platform to help breeders manage the breeding process, from program planning to decision-making.
Brown Dog - a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research.
BWA - a software package for mapping low-divergent sequences against a large reference genome.
Clowder - a scalable data repository for sharing, organizing and analyzing data
Collections - one or more datasets.
Cultivar - plants selected for desirable characteristics that can be maintained by propagation.
Data product level - relative amount that data products are processed. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats.
Data standards - the rules by which data are described and recorded.
Datasets - one or more files with associated metadata collected by one sensor at one time point.
Downwelling spectral irradiance - The component of radiation directed toward the earth's surface per unit frequency or wavelength
Exposure - the amount of light per unit area reaching an electronic image sensor
FASTQ format - a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.
FASTX-toolkit - a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Gantry - a rail-bound crane systems that transport a measurement platform (like the Scanalyzer) over a field
GAPIT (Genome Association and Prediction Integrated Tool) – an R package that performs Genome Wide Association Study (GWAS) and genome prediction (or selection).
GATK (Genome Analysis Toolkit) - a software package for analysis of high-throughput sequencing data
Gbrowse - a combination of database and interactive web pages for manipulating and displaying annotations on genomes.
Generic Model Organism Database (GMOD) - a collection of open source software tools for managing, visualizing, storing, and disseminating genetic and genomic data.
Genome annotation - the process of attaching biological information to sequences.
Genomic coordinates - The beginning and ending positions of an annotation along a sequence
Genotype calling - inferring the genotype carried by an individual at each site
GeoDjango - geographic Web framework for building GIS Web applications
Germplasm - the sum total of genetic resources of an organism.
GFF (General Feature Format) - format consisting of one line per feature, each containing 9 columns of data, plus optional track definition lines
GIS (geographic information system) - a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data.
Globus - a connected set of data transfer and sharing services for research data management.
Hierarchical Data Format (HDF) - a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data.
Hyperspectral data - information from across the electromagnetic spectrum.
IGV (Integrative Genomics Viewer) - a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.
Integrated Breeding Platform (IBP) - platform providing integrated, high-performing breeding informatics and management system
Jbrowse - an embeddable genome browser
Json - open-standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.
Jupyter Notebook - a web application for creating and sharing documents that contain live code, equations, visualizations and explanatory text.
Lemnatec - supplier of software and automated research platforms for plant phenotyping.
Metadata - data that provides information about other data
MLMM (multi-locus mixed-model) - analysis for genome-wide association studies (GWAS) that uses a forward and backward stepwise approach to select markers as fixed effect covariates in the model.
NetCDF - a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
OpenAlea - a distributed collaborative effort to develop Python libraries and tools that address the needs of current and future works in Plant Architecture modeling.
OpenCV (Open Source Computer Vision Library) - an open source computer vision and machine learning software library.
PAR (Photosynthetically Active Radiation) - the amount of light available for photosynthesis, which is light in the 400 to 700 nanometer wavelength range.
Phenotype - the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment.
Phytozome - a project that facilitates comparative genomic studies amongst green plants.
PlantCV - an imaging processing package specific for plants that is built upon open-source software
PostGIS - an open source software program that adds support for geographic objects to the PostgreSQL object-relational database.
Python - a programming language
QA (quality assurance) - a planned system of review procedures conducted outside the actual data compilation.
QC (quality control) - a system of checks to assess and maintain the quality of the data.
Quality scores - measure of the probability that a nucleotide base is correctly identified from DNA sequencing
R/qtl - an extensible, interactive environment for mapping quantitative trait loci (QTL) in experimental crosses.
Raw data - unprocessed data collected from an experiment
Reads - sequence of nucleotides of a segment of DNA
Reference data - data that defines the set of permissible values to be used by other data fields.
RESTful API - an application program interface (API) that uses HTTP requests to get, put, post, and delete data.
ROGER - a cluster housed at NCSA that has 13.3 TB of system memory available for computation
Rstudio - a set of integrated tools for use with R, a software environment for statistical computing and graphics.
SAMtools (Sequence Alignment/Map) – a generic format for storing large nucleotide sequence alignments.
Scanalyzer - instrumentation created by Lemnatec with robotic sensor arm with multiple overhead cameras and sensors
Sequencing - the process of determining the precise order of nucleotides within a DNA molecule.
SNP (single nucleotide polymorphism) - a variation in a single nucleotide that occurs at a specific position in the genome
Spaces - contain collections and datasets. TERRA-REF uses one space for each of the phenotyping platforms.
Spectral exposure - the radiant energy received by a surface, per unit time, per unit frequency
Spectral flux - the radiant energy emitted, reflected, transmitted or received, per unit time, per unit frequency
Spectral response function (SRF) - the quantum efficiency of a sensor at specific wavelengths over the range of a spectral band
SQL (Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system
SRA (Sequence Read Archive) - a bioinformatics database that provides a public repository for DNA sequencing data
Standards committee - TERRA project representatives and external advisors who work to create clear definitions of data formats, semantics, and interfaces, file formats, and representations of space, time, and genetic identity based on existing standards, commonly used file formats, and user needs to make it easier to analyze and exchange data and results.
Swagger - a set of rules for a format describing REST API. The format can be used to share documentation among product managers, testers and developers, but can also be used by various tools to automate API-related processes.
TASSEL-GBS - software for investigating the relationship between phenotypes and genotypes
TERRA (Transportation Energy Resources from Renewable Agriculture) - a program funded by ARPA-E program that facilitates the improvement of advanced biofuel crops, by developing and integrating cutting-edge remote sensing platforms, complex data analytics tools, and high-throughput plant breeding technologies.
TERRA-REF (Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform) - a research project focused on developing an integrated phenotyping system for energy sorghum that leverages genetics and breeding, automation, remote plant sensing, genomics, and computational analytics.
Thredds: Geospatial Data server - a web server that provides metadata and data access for scientific datasets, using a variety of remote data access protocols
Trait - the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs
Variants - a nucleotide difference in a genotype compared to a reference genotype
VCF - a text file format (most likely stored in a compressed manner). It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome.
Vcftools - a program package designed for working with VCF files
White reference, reflectance of - light reflecting off of a white reference object that is used for the calibration of hyperspectral images
The configurations used by QGIS and ArcMAP should be consistent with other software that uses databases.
BETYdb is configured with PostGIS geometry support. This allows ArcGIS Desktop clients to access geometry layers stored within BETYdb.
Click on the ArcCatalog icon (on right edge of ArcMap window) to open the ArcCatalog Tree
In the tree, click on 'Database Connections' and then "Add Database Connnections". A Database Connection dialog window will open.
Within the dialog box:
Click OK
The connection will be saved as "Connection to localhost.sde", right
click and rename to it to "TERRA REF BETYdb trait database" to allow easy reuse.
Click on the Add Layer icon (black cross over yellow diamand) button to open the Add Data dialog window.
Under 'Look in' on the second line choose 'Database Connections'.
Select the "TERRA REF BETYdb trait database" that created above
Select the bety.public.sites table and click 'Add'.
This 'sites' table is the only table in the database with a geospatial 'geometry' data type.
Any of the other tables can also be added, as described below.
The New Query Layer dialog will be displayed asking for the Unique Identifier Field for the layer. For the bety.public.sites table, the unique identifier is the "sitename" field.
Click Finish.
Warning: ArcMap does not support the big integer format used by BETYdb as primary keys and those fields will not be visible or available for selection. In most cases you should be able to use other fields as unique identifiers.*
BETYdb contains one geometry table called betydb.public.sites containing the boundaries for each plot. Because the plot boundaries can change each season, and even within season, different plot definitions may be used (e.g. to subset plots or exclude boundary rows), there is significant overlap that can cause confusion when displayed. In general, you will want to use the query layer to limit plots to a single season and a single definition.
Right click the bety.public.sites layer and choose properties.
Choose the Definition Query tab
Add the line sitename LIKE 'MAC Field Scanner Season 1%'
or sitename LIKE 'MAC Field Scanner Season 2%'
to limit the layer to Season 1 or Season 2 respectively.
Click 'OK'
For more advanced selection of sites by experiment or season, you can join the experiments
and experiments_sites
tables. This is beyond the scope of the present tutorial.
Additional tables can be added and joined to the sites table. Tables can be added just like any other layer. In this case, we'll add bety.public.traits_and_yields_view and join it to the bety.public.sites layer.
To create a join with other tables, start by adding the desired table.
Follow instructions above to add the bety.public.traits_and_yields_view
On this table the unique identifier is a group of columns, so select sitename, cultivar, scientificname, trait, date, entity, and method as the unique identifiers.
Right click on the bety.public.sites layer.
Under 'Joins and Relates' select 'Join'.
Choose sitename (from bety.public.sites) in part 1
Choose bety.public.traits_and_yields_view in part 2
Choose sitename in part 3
Click OK
The final section describes how to create a thematic view of the bety.public.sites layer based on the mean attribute where the trait is NDVI from the bety.public.traits_and_yields_view. Remove any previous joins from bety.public.sites (right click bety.public.sites --> joins and relates --> remove join) prior to performing this procedure because we will be selecting the NDVI data by creating a query layer from bety.public.traits_and_yields_view prior to the join.
Right click bety.public_traits_and_yields_view table and select properties
Click on the Definition Query tab
Add the line "trait = 'NDVI'" to the Definition Query box
Click OK
Follow the steps defined in Joining Additional BETYdb Tables
Right click on the bety.sites layer and choose properties
Choose the Symbology tab
Under the Show section, choose Quantities --> Graduated Colors
Under the Fields Value selection choose mean
Click OK
Below connection instructions assume an SSH tunnel exists.
This assumes you have followed instructions for ArcMAP to create a database connection file.
Open ArcCatalog
Under database connections, you will find the connection made above, called 'TERRA REF BETYdb.sde'
right click this and select 'properties'
copy the file path (it should look like C:\Users\<USER NAME>\AppData\Roaming\ESRI\Desktop10.4\ArcCatalog\TERRA REF BETYdb.sde
Open ArcGIS Pro
Under the Insert tab, select connections --> 'add database'
paste the path to 'TERRA REF BETYdb.sde' in the directory navigation bar
select 'TERRA REF BETYdb.sde'
Open QGIS
In left 'browser panel', right-click the PostGIS icon
select 'New Connection'
Enter connection properties
Name: TERRA REF BETYdb trait database
Service: blank
Host: localhost
Port: 5432
Database: bety
SSL mode: disable
Username: bety
Password: bety
Options: select 'Also list tables with no geometry'
This does not require GIS software other than the PostGIS traits database. While connecting directly to the database within GIS software is handy, it is also straightforward to export Shapefiles.
After you have connected via ssh to the PostGIS server, the pgsql2shp
function is available and can be used to dump out all of the plot and site definitions (names and geometries) thus: