Glossary

Accession - plant materials collected from a particular area.

Active reflectance - measurement of light originating from a sensor that reflects off of an object and back to the sensor

Algorithm - a process or set of rules to be followed in calculations or other problem-solving operations

Alignment, sequence - a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences

API (application programming interface) - a set of routine definitions, protocols, and tools for building software and applications.

BAM (Binary Alignment/Map) format - binary format for storing sequence data.

BED (Browser Extensible Data) format - format consisting of one line per feature, each containing 3-12 columns of data, plus optional track definition lines.

BETYdb (Biofuel Ecophysiological Traits and Yields database) - a web-based database of plant trait and yield data that supports research, forecasting, and decision making associated with the development and production of cellulosic biofuel crops

BRDF (Bidirectional Reflectance Distribution Function) - a function of four real variables that defines how light is reflected at an opaque surface.

Breeding Management System (BMS) - an information management system developed by the Integrated Breeding Platform to help breeders manage the breeding process, from program planning to decision-making.

Brown Dog - a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research.

BWA - a software package for mapping low-divergent sequences against a large reference genome.

Clowder - a scalable data repository for sharing, organizing and analyzing data

Collections - one or more datasets.

Cultivar - plants selected for desirable characteristics that can be maintained by propagation.

Data product level - relative amount that data products are processed. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats.

Data standards - the rules by which data are described and recorded.

Datasets - one or more files with associated metadata collected by one sensor at one time point.

Downwelling spectral irradiance - The component of radiation directed toward the earth's surface per unit frequency or wavelength

Exposure - the amount of light per unit area reaching an electronic image sensor

FASTQ format - a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

FASTX-toolkit - a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Gantry - a rail-bound crane systems that transport a measurement platform (like the Scanalyzer) over a field

GAPIT (Genome Association and Prediction Integrated Tool) – an R package that performs Genome Wide Association Study (GWAS) and genome prediction (or selection).

GATK (Genome Analysis Toolkit) - a software package for analysis of high-throughput sequencing data

Gbrowse - a combination of database and interactive web pages for manipulating and displaying annotations on genomes.

Generic Model Organism Database (GMOD) - a collection of open source software tools for managing, visualizing, storing, and disseminating genetic and genomic data.

Genome annotation - the process of attaching biological information to sequences.

Genomic coordinates - The beginning and ending positions of an annotation along a sequence

Genotype calling - inferring the genotype carried by an individual at each site

GeoDjango - geographic Web framework for building GIS Web applications

Germplasm - the sum total of genetic resources of an organism.

GFF (General Feature Format) - format consisting of one line per feature, each containing 9 columns of data, plus optional track definition lines

GIS (geographic information system) - a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data.

Globus - a connected set of data transfer and sharing services for research data management.

Hierarchical Data Format (HDF) - a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data.

Hyperspectral data - information from across the electromagnetic spectrum.

IGV (Integrative Genomics Viewer) - a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

Integrated Breeding Platform (IBP) - platform providing integrated, high-performing breeding informatics and management system

Jbrowse - an embeddable genome browser

Json - open-standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.

Jupyter Notebook - a web application for creating and sharing documents that contain live code, equations, visualizations and explanatory text.

Lemnatec - supplier of software and automated research platforms for plant phenotyping.

Metadata - data that provides information about other data

MLMM (multi-locus mixed-model) - analysis for genome-wide association studies (GWAS) that uses a forward and backward stepwise approach to select markers as fixed effect covariates in the model.

NetCDF - a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

OpenAlea - a distributed collaborative effort to develop Python libraries and tools that address the needs of current and future works in Plant Architecture modeling.

OpenCV (Open Source Computer Vision Library) - an open source computer vision and machine learning software library.

PAR (Photosynthetically Active Radiation) - the amount of light available for photosynthesis, which is light in the 400 to 700 nanometer wavelength range.

Phenotype - the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment.

Phytozome - a project that facilitates comparative genomic studies amongst green plants.

PlantCV - an imaging processing package specific for plants that is built upon open-source software

PostGIS - an open source software program that adds support for geographic objects to the PostgreSQL object-relational database.

Python - a programming language

QA (quality assurance) - a planned system of review procedures conducted outside the actual data compilation.

QC (quality control) - a system of checks to assess and maintain the quality of the data.

Quality scores - measure of the probability that a nucleotide base is correctly identified from DNA sequencing

R/qtl - an extensible, interactive environment for mapping quantitative trait loci (QTL) in experimental crosses.

Raw data - unprocessed data collected from an experiment

Reads - sequence of nucleotides of a segment of DNA

Reference data - data that defines the set of permissible values to be used by other data fields.

RESTful API - an application program interface (API) that uses HTTP requests to get, put, post, and delete data.

ROGER - a cluster housed at NCSA that has 13.3 TB of system memory available for computation

Rstudio - a set of integrated tools for use with R, a software environment for statistical computing and graphics.

SAMtools (Sequence Alignment/Map) – a generic format for storing large nucleotide sequence alignments.

Scanalyzer - instrumentation created by Lemnatec with robotic sensor arm with multiple overhead cameras and sensors

Sequencing - the process of determining the precise order of nucleotides within a DNA molecule.

SNP (single nucleotide polymorphism) - a variation in a single nucleotide that occurs at a specific position in the genome

Spaces - contain collections and datasets. TERRA-REF uses one space for each of the phenotyping platforms.

Spectral exposure - the radiant energy received by a surface, per unit time, per unit frequency

Spectral flux - the radiant energy emitted, reflected, transmitted or received, per unit time, per unit frequency

Spectral response function (SRF) - the quantum efficiency of a sensor at specific wavelengths over the range of a spectral band

SQL (Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system

SRA (Sequence Read Archive) - a bioinformatics database that provides a public repository for DNA sequencing data

Standards committee - TERRA project representatives and external advisors who work to create clear definitions of data formats, semantics, and interfaces, file formats, and representations of space, time, and genetic identity based on existing standards, commonly used file formats, and user needs to make it easier to analyze and exchange data and results.

Swagger - a set of rules for a format describing REST API. The format can be used to share documentation among product managers, testers and developers, but can also be used by various tools to automate API-related processes.

TASSEL-GBS - software for investigating the relationship between phenotypes and genotypes

TERRA (Transportation Energy Resources from Renewable Agriculture) - a program funded by ARPA-E program that facilitates the improvement of advanced biofuel crops, by developing and integrating cutting-edge remote sensing platforms, complex data analytics tools, and high-throughput plant breeding technologies.

TERRA-REF (Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform) - a research project focused on developing an integrated phenotyping system for energy sorghum that leverages genetics and breeding, automation, remote plant sensing, genomics, and computational analytics.

Thredds: Geospatial Data server - a web server that provides metadata and data access for scientific datasets, using a variety of remote data access protocols

Trait - the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs

Variants - a nucleotide difference in a genotype compared to a reference genotype

VCF - a text file format (most likely stored in a compressed manner). It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome.

Vcftools - a program package designed for working with VCF files

White reference, reflectance of - light reflecting off of a white reference object that is used for the calibration of hyperspectral images

Last updated