1 of 56

Primary version

Introduction

About this book

This book describes the TERRA-REF data collection, computing, and analysis pipelines. The following links provide quick access to:

About TERRA-REF

The ARPA-E-funded Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) program aims to transform plant breeding by using remote sensing to quantify plant traits such as plant architecture, carbon uptake, tissue chemistry, water use, and other features to predict the yield potential and stress resistance of 300+ diverse Sorghum lines.

The data storage and computing system provides researchers with access to the reference phenotyping data and analytics resources using a high performance computing environment. The reference phenotyping data includes direct measurements and sensor observations, derived plant phenotypes, and genetic and genomic data.

Our objectives are to ensure that the software and data in the reference data and computing pipeline are interoperable, reusable, extensible, and understandable. Providing clear definitions of common formats will make it easier to analyze and exchange data and results.

Users

This documentation is intended to enable a wide variety of user-groups to use and contribute to the datasets and tools contained in the TERRA-REF platform, including:

Contact Info and More

Email:
- David LeBauer (Data / Computing Lead): dlebauer@arizona.edu
- Nadia Shakoor (Project Director): nshakoor@danforthcenter.org
- Todd Mockler (Principal Investigator): tmockler@danforthcenter.org

Scientific Objectives

Combining advanced sensing with novel analytical approaches to accelerate breeding

Conducting high-throughput phenotyping to connect discoveries to field performance

Breeding is currently limited by the speed at which phenotypes can be measured, and the information that can be extracted from these measurements. Current instruments used to quantify plant traits do not scale to the thousands or tens of thousands of individual plants that need to be evaluated in a breeding program. The TERRA-REF field scanner system scans over 1 acre of plants, collecting thousands of daily measurements throughout the growing season that are used to determine plant phenotypes and inform breeding decisions.

The field level phenotypic data combined with the genomic data is helping us to identify the differences between each line and the reference genome sequence for sorghum. We are using bioinformatics and quantitative genetics to characterize the observed genetic variation and identify genomic regions controlling biomass, plant architecture, and photosynthetic traits.

Using large-scale genome sequencing to drive phenotype-genotype associations and gene discovery

There is enormous potential for sorghum crop improvement. There are 50,000 sorghum accessions in the U.S. germplasm collection and most are unused and unstudied. TERRA-REF is analyzing a sorghum bioenergy association panel (BAP) that includes diverse sweet and biomass lines from all five sorghum races. The BAP captures geographic, racial, and genomic diversity.

TERRA-REF has already sequenced 384 of the lines with an average sequence coverage of 20x per line. Genome-wide association studies (GWAS) are now underway.

Providing reference quality data as a community resource

TERRA-REF is developing a data storage and computing system that provides researchers with access to all of the ‘raw’ data and derived plant phenotypes (traits). Data from sensors at a variety of locations across the US will be transferred to one location.

The reference data will facilitate data sharing and re-use of data by providing metadata, provenance for derived data sets, and standardized data processing workflows. It will include geospatial infrastructure for efficiently querying and transforming key datasets and tools that enable researchers to access, archive, use, and contribute data products. The technical documentation for this data pipeline is detailed in this book.

Experimental Design

Overview

Phenotyping

The TERRA-REF project is collecting phenotype data from 852 sorghum genotypes grown in three locations:

Maricopa Agricultural Center (MAC), Maricopa, Arizona
Kansas State University (KSU), Ashland, KS
Donald Danforth Plant Science Center, Missouri

Genotyping

Whole genome resequencing is being carried out on ~400 sorghum accessions to understand the landscape of genetic variation in the selected germplasm and enable high-resolution mapping of bioenergy traits with genome wide association studies (GWAS). Additionally, ~200 sorghum recombinant inbred lines (RILs) will be characterized with ~400,000 genetic markers using genotyping-by-sequencing (Morris et al., 2013) for trait dissection in the RIL population and testcross hybrids of the RIL population.

The Maricopa Agricultural Center (MAC)

The Maricopa field site is located at the the University of Arizona Maricopa Agricultural Center and USDA Arid Land Research Station in Maricopa, Arizona.

Overview of Experiments During the TERRA REF project

Season

Crop

Experiments

Populations(1)

Planting Date

Harvest

Sorghum

Density

BAP, RIL

2016-04-20

2016-07-16

Sorghum

Uniformity Trials(2)

Stay Green RILs F10

2016-07-27

2016-12-02

Durum Wheat

Diversity Panel

2016-12-15

2017-04-05

Sorghum

Late Season Drought

2017-04-13

2017-09-21

Durum Wheat

Diversity Panel

2017-11-20

2018-04-05

Sorghum

BAP

2018-04-20

2018-08-02

Sorghum

Hybrid Uniformity Blocks

Stay Green RILs, Mutants, F2 families

2018-08-23

2018-11-01

Durum Wheat

Uniformity Trials

Diversity Panel

2019-01-01

2019-03-31

9 S

Sorghum

GRASSL x RIO RILs

2019-05-01

2019-07-28

9 N(3)

Sorghum

SAP

2019-04-29

2019-09-05

(1) IL = Recombinant Inbred Lines, BAP = Bioenergy Association Panel, SAP = Sorghum Association Panel (2). Uniformity Trial = same lines planted in strips across field (3) In season 9 a second field ‘North’ was added, and separate trials were conducted

Season 1 sorghum (April - July 2016)

Three hundred thirty one lines were planted in Season 1.

Planting maps

Under the scanner system

Planting Design

Under scanner system

Experiment

Reps

Treatments

Experimental design

BAP

30 lines (12 PS, 12 sweet, 6 grain)

RCB with sorghum types nested in groups

Night illumination

5 illumination levels x 2 PS lines (with check line separating illumination levels)

RCB

Row #

6 adjacent plot scenarios: 3 lines (forage, sweet, PS) x 2 sides (east or west)

RCB but not balanced with all treatments in all reps

Biomass

5 sampling times x 3 lines (forage, sweet, PS)

RCB with sampling time as a repeated measure

Density

3 densities (5, 15, 30 cm) x 3 lines (forage, sweet, PS)

RCB

RILs

130 RILs plus 10 repeats of a single line/rep

Incomplete Block (row-column alpha lattice design)

Uniformity

2 lines (forage, PS)

None - Same line planted in single range

Controlled Environment Phenotyping

Donald Danforth Plant Science Center, Missouri

The Automated controlled-environment phenotyping at the Donald Danforth Plant Science Center Bellwether Foundation Phenotyping Facility

The Bellwether Foundation Phenotyping Facility is a climate controlled 70 m2 growth house with a conveyor belt system for moving plants to and from fluorescence, color, and near infrared imaging cabinets. This automated, high-throughput platform allows repeated non-destructive time-series image capture and multi-parametric analysis of 1,140 plants in a single experiment. You can read more about the Danforth Plant Sciences Center Bellwether Foundation Phenotyping Facility on the DDPSC website.

The Scanalyzer 3D platform at the Bellwether Foundation Phenotyping Facility at the Donald Danforth Plant Science Center consists of multiple digital imaging chambers connected to the Conviron growth house by a conveyor belt system, resulting in a continuous imaging loop. Plants are imaged from the top and/or multiple sides, followed by digital construction of images for analysis.

RGB imaging allows visualization and quantification of plant color and structural morphology, such as leaf area, stem diameter and plant height.
NIR imaging enables visualization of water distribution in plants in the near infrared spectrum of 900–1700 nm.
Fluorescent imaging uses red light excitation to visualize chlorophyll fluorescence between 680 – 900 nm. The system is equipped with a dark adaptation tunnel preceding the fluorescent imaging chamber, allowing the analysis of photosystem II efficiency.

The LemnaTec software suite is used to program and control the Scanalyzer platform, analyze the digital images and mine resulting data. Data and images are saved and stored on a secure server for further review or reanalysis.

Experiments LT1A (TM015) and LT1B (TM016)

Duration: 10 days on LemnaTec platform

Experimental Design:

3 replicates of 190 BAP lines were grown in a randomized complete block design
Watering regimes = 30% FC and 100% FC
Drought conditions were imposed 10 days after planting
Plants were imaged daily for 10 days (11-20 DAP) and sampled at 20 days after planting
Experiment was repeated twice to phenotype the full BAP (Reps 1A and 1B)

Genomics

Whole-genome resequencing

Experimental Design:

384 BAP samples were sequenced to an average depth of ~25x.
Shotgun sequencing (127-bp paired-end) was done using an Illumina X10 instrument at the HudsonAlpha Institute for Biotechnology.
Data were aligned against the BTx623 reference genotype

Genotyping-by-sequencing

Experimental Design:

Data

Overview

This user manual is divided into the following sections:

What data is available?

Raw output from sensors deployed on the Lemnatec field scanner
- Additional data from greenhouse systems, UAVs, and tractors have not been released, but can be accessed through our beta user program
Manually-collected fieldbooks and associated protocols
Derived data, including phenomics data, from computational approaches
Genomic pipeline data

How to Access Data

Overview

TERRA-REF data can be accessed through many different interfaces: Globus, Clowder, BETYdb, CyVerse, and CoGe. Raw data is transfered to the primary compute pipeline using Globus Online. Data is ingested into Clowder to support exploratory analysis. The Clowder extractor system is used to transform the data and create derived data products, which are either available via Clowder or published to specialized services, such as BETYdb.

Tutorials (Recommended!)

We have developed tutorials to provide users with both 'quick start' vignettes and more detailed introductions to TERRA REF datasets. Tutorials for accessing trait data, sensor data, and genomics data are organized by directory ("traits", "sensors", and "genomics").

The tutorials assume familiarity with or willingness to learn Python and / or R, and provide the greatest flexibility and access to available data.

Globus: Browse and Transfer Files

Raw data is transferred to the primary TERRA-REF file system at the National Center for Computing Applications at the University of Illinois.

Use Globus Online when you want to transfer data from the TERRA-REF system for local analysis.

Transferring data using Globus Connect:

To access data via Globus, you must first have a Globus account and endpoint.

Select source
- Endpoint: #Terraref
- Path: Navigate to the subdirectory that you want.
- Select (click) a folder
- Select (highlight) files that you want to download at destination
- Select the endpoint that you set up above of your local computer or server
- Select the destination folder (e.g. /~/Downloads/)
Click 'go'
Files will be transfered to your computer

Requesting Access to unpublished data in TERRA-REF BETYdb:

To request access to unpublished data, send your Globus id to David LeBauer (dlebauer@email.arizona.edu) with 'TERRAREF Globus Access Request' in the subject.

fill out the terraref.org/beta user form
email dlebauer@email.arizona.edu with your globusid to request access.

BETYdb: Trait Data and Agronomic Metadata

BETYdb contains the derived trait data with plot locations and other information associated with agronomic experimental design.

Accessing data in R

Requesting Access to unpublished data in TERRA-REF BETYdb:

email dlebauer@email.arizona.edu for your account to be approved.

Using SQL and PostGIS with Docker (Advanced Users)

The fastest and most comprehensive way to access the database using SQL and other database interfaces (such as the R package dplyr interface described below, or GIS programs described in . You can run an instance of the database using docker, as described below

This is how you can access the TERRA REF trait database. It requires that you install the Docker software on your computer.

psql

R

GIS software

Clowder: Sensor Data and Metadata Browser

Data organization in Clowder

Data is organized into spaces, collections, and datasets, collections.

Spaces contain collections and datasets. TERRA-REF uses one space for each of the phenotyping platforms.
Collections consist of one or more datasets. TERRA-REF collections are organized by acquisition date and sensor. Users can also create their own collections.
Datasets consist of one or more files with associated metadata collected by one sensor at one time point. Users can annotate, download, and use these sensor datasets.

Requesting Access to unpublished data in Clowder:

email dlebauer@email.arizona.edu for your account to be approved.

CyVerse: Genomics Data

TERRA-REF genomics data is accessible on the CyVerse Data Store and Discovery Environment. Accessing data through the CyVerse Discovery Environment requires signing up for a free CyVerse account. The Discovery Environment gives users access to software and computing resources, so this method has the advantage that TERRA-REF data can be utilized directly without the need to copy the data elsewhere.

You can also find these in the CyVerse discovery environment in the TERRA-REF Community Data folder: /iplant/home/shared/terraref.

CoGe: Genomics Data

Data Products

The following table lists available TERRA-REF data products. The table will be updated as new datasets are released. Links are provided to pages with detailed information about each data product including sensor descriptions, algorithm (extractor) information, protocols, and data access instructions.

Data product

Description

3D point cloud data (LAS) of the field constructed from the Fraunhofer 3D scanner output (PLY).

Fluorescence intensity imaging is collected using the PSII LemnaTec camera. Raw camera output is converted to (netCDF/GeoTIFF)

Hyperspectral imaging data from the SWIR and VNIR Headwall Inspector sensors are converted to netCDF output using the hyperspectral extractor.

Infrared heat imaging data is collected using FLIR sensor. Raw output is converted to GeoTIFF using the FLIR extractor.

Multispectral data is collected using the PRI and NDVI Skye sensors. Raw output is converted to timeseries data using the multispectral extractor.

Stereo imaging data is collected using the Prosilica cameras. Full-color images are reconstructed in GeoTIFF format using the de-mosaic extractor. A full-field mosaic is generated using the full-field mosaic extractor.

Spectral reflectance data

Spectral reflectance is measured using a Crop Circle active crop canopy sensor

Environment conditions are collected through the CO2 sensor and Thies Clima. Raw output is converted to netCFG using the environmental-logger extractor.

postGIS/netCDF

Phenotype data is derived from sensor output using the PlantCV extractor and imported into BETYdb.

FASTQ and VCF files available via Globus

UAV and Phenotractor

Plot level data available in BETYdb

Environmental conditions

Environment conditions data is collected using the Vaisala CO2, Thies Clima weather sensors as well as lightning, irrigation, and weather data collected at the Maricopa site.

Data formats follow the Climate and Forecast (CF) conventions for variable names and units. Environmental data are stored in the Geostreams database.

Data sources

WeatherStation coordinates are 33.074457 N, 111.975163 W
EnvironmentLogger is on top of the gantry system and is moveable.
Irrigation is managed at the field level. There are four regions that can be irrigated at different rates.

Data access

Level 1 data

Level 1 meteorological data is aggregated to from 1 Hz raw data to 5 minute averages or sums.

netCDF: 5s (12 per minute) observations

On Globus or Workbench you can find these data provided in both hourly and daily files. These files contain data at the original temporal resolution of 1/s. In addition, they contain the high resolution spectral radiometer data.

sites/ua-mac/Level_1/envlog_netcdf

hourly files: YYYY-MM-DD_HH-MM-SS_environmentallogger.nc
daily files: envlog_netcdf_L1_ua-mac_YYYY-MM-DD.nc

Geostreams: 5 minute observations

Data can be accessed using the geostreams API or the PEcAn meteorological workflow. These are illustrated in the sensor data tutorials.

Here is the json representation of a single five-minute observation:

Data can be accessed using the geostreams API or the PEcAn meteorological workflow.

These are illustrated in the sensor data tutorials.

Here is the json representation of a single five-minute observation from Geostreams:

[
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:06:24-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:10:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.6207870370370374,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":0.07488770951583902,
         "relative_humidity":26.18560185185185,
         "air_temperature":300.17606481481516,
         "eastward_wind":1.571286062845733,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },

Variable names and units

CF standard-name

units

bety

isimip

cruncep

narr

ameriflux

air_temperature

airT

tasAdjust

tair

air

TA (C)

air_temperature_max

tasmaxAdjust

tmax

air_temperature_min

tasminAdjust

tmin

air_pressure

air_pressure

PRESS (KPa)

mole_fraction_of_carbon_dioxide_in_air

mol/mol

CO2

moisture_content_of_soil_layer

kg m-2

soil_temperature

soilT

TS1 (NOT DONE)

relative_humidity

rhurs

rhum

specific_humidity

specific_humidity

qair

shum

CALC(RH)

water_vapor_saturation_deficit

VPD

VPD (NOT DONE)

surface_downwelling_longwave_flux_in_air

W m-2

same

rldsAdjust

lwdown

dlwrf

Rgl

surface_downwelling_shortwave_flux_in_air

W m-2

solar_radiation

rsdsAdjust

swdown

dswrf

surface_downwelling_photosynthetic_photon_flux_in_air

mol m-2 s-1

PAR

PAR (NOT DONE)

precipitation_flux

kg m-2 s-1

cccc

prAdjust

rain

acpc

PREC (mm/s)

degrees

wind_direction

wind_speed

m/s

Wspd

eastward_wind

m/s

eastward_wind

CALC(WS+WD)

northward_wind

m/s

northward_wind

CALC(WS+WD)

Raw Data

Data is available via Globus or Workbench:

/ua-mac/raw_data/co2sensor
/ua-mac/raw_data/EnvironmentLogger
/ua-mac/raw_data/irrigation
/ua-mac/raw_data/lightning
/ua-mac/raw_data/weather

Sensor information:

Computational pipeline

Environmental Logger

Description: EnvironmentalLogger raw files are converted to netCDF.

Known Issues

Known issue: the irrigation data stream does not currently handle variable irrigation rates within the field. Specifically, we have not yet accounted for the Summer 2017 drought experiments. See terraref/reference-data#196 for more information.

When the full field is irrigated (as is typical), the irrigated area is 5466.1 m2 (=215.2 m x 25.4 m)

In 2017:

Full field irrigated area from the start of the season to August 1 (103 dap) is 5466.1 m2 (=215.2 m x 25.4 m).
Well-watered treatment zones from August 1 to 15 (103 to 116 dap): 2513.5 m2 (=215.2 m x 11.68 m) in total, combined areas of non-contiguous blocks
Well-watered treatment zones from August 15 - 30 (116 to 131 dap): 3169.9 m2 (=215.2 m x 14.73 m), again in total as the combined areas of non-contiguous blocks

Github Issues

Phenotype Data

Under Construction

https://terraref.github.io/tutorials/accessing-trait-data-in-r.html

Genomics data

You can access genomics data in one of the following locations:

The data is structured on both the TERRA-REF strorage (accessible via Globus and Workbench) and CyVerse Data Store infrastructures as follows:

Whole-genome resequencing

Raw data

Derived data

Data derived from analysis of the raw resequencing data at the Danforth Center (version1) are available as gzipped, genotyped variant call format (gVCF) files and the final combined hapmap file.

Genotyping-by-sequencing (GBS)

Raw data

Derived data

Combined genotype calls are available in VCF format.

KSU Genomics Pipeline / GBS/RIL analysis

Raw Data

genomics/raw_data/ril/gbs
- H5JYFBCXY_1_fastq.txt
- H5JYFBCXY_2_fastq.txt
- Key_ril_terra

Derived Data

genomics/derived_data/ril/gbs/kansas_state/version1/imp_TERRA_RIL_SNP.vcf

Fluorescence intensity imaging

Summary

Fluorescence intensity data is collected using the PSII camera.

Each measurement produces 102 bin files. The first (index 0) is an image taken right before the LED are switched on (dark reference). Frame 1 to 100 are the 100 images taken, with the LEDs on. In binary file 102 (index 101) is a list with the timestamps of each frame of the 100 frames.

Right now the LED on timespan is 1s thus the first 50 frames are taken with LEDs on the latter 50 frames with LED off.

Raw data access

Fluorescence intensity data is available via Clowder and Globus:

Globus paths:
- /raw_data/ps2top
- /Level_1/ps2_png/

Geospatial information

Several different sensors include geospatial information in the dataset metadata describing the location of the sensor at the time of capture.

Coordinate reference systems The Scanalyzer system itself does not have a reliable GPS unit on the sensor box. There are 3 different coordinate systems that occur in the data:

Most common is EPSG:4326 (WGS84) USDA coordinates
Tractor planting & sensor data is in UTM Zone 12
Sensor position information is captured relative to the southeast corner of the Scanalyzer system in meters

EPSG:4326 coordinates for the four corners of the Scanalyzer system (bound by the rails above) are as follows:

NW: 33° 04.592' N, -111° 58.505' W
NE: 33° 04.591' N, -111° 58.487' W
SW: 33° 04.474' N, -111° 58.505' W
SE: 33° 04.470' N, -111° 58.485' W

In the trait database, this site is named the "MAC Field Scanner Field" and its bounding polygon is "POLYGON ((-111.9747967 33.0764953 358.682, -111.9747966 33.0745228 358.675, -111.9750963 33.074485715 358.62, -111.9750964 33.0764584 358.638, -111.9747967 33.0764953 358.682))"

Scanalyzer coordinates Finally, the Scanalyzer coordinate system is right-handed - the origin is in the SE corner, X increases going from south to north, and Y increases from east to the west.

In offset meter measurements from the southeast corner of the Scanalyzer system, the extent of possible motion for the sensor box is defined as:

NW: (207.3, 22.135, 5.5)
SE: (3.8, 0, 0)

Scanalyzer -> EPSG:4326 1. Calculate the UTM position of known SE corner point 2. Calculate the UTM position of the target point, using SE point as reference 3. Get EPSG:4326 position based on UTM

MAC coordinates Tractor planting data and tractor sensor data will use UTM Zone 12.

Scanalyzer -> MAC Given a Scanalyzer(x,y), the MAC(x,y) in UTM zone 12 is calculated using the linear transformation formula:

ay = 3659974.971; by = 1.0002; cy = 0.0078;
ax = 409012.2032; bx = 0.009; cx = - 0.9986;
Mx = ax + bx * Gx + cx * Gy
My = ay + by * Gx + cy * Gy

Assume Gx = -Gx', where Gx' is the Scanalyzer X coordinate.

MAC -> Scanalyzer

Gx = ( (My/cy - ay/cy) - (Mx/cx - ax/cx) ) / (by/cy - bx/cx)
Gy = ( (My/by - ay/by) - (Mx/bx - ax/bx) ) / (cy/by - cx/bx)

MAC -> EPSG:4326 USDA We do a linear shifting to convert MAC coordinates in to EPSG:4326 USDA

Latitude: Uy = My - 0.000015258894
Longitude: Ux = Mx + 0.000020308287

Sensors with geospatial metadata

stereoTop
flirIr
co2
cropCircle
PRI
scanner3dTop
NDVI
PS2
SWIR
VNIR

Available data All listed sensors

"gantry_system_variable_metadata": {
      "time": "08/17/2016 11:23:14",
      "position x [m]": "207.013",
      "position y [m]": "3.003",
      "position z [m]": "0.68",
      "speed x [m/s]": "0",
      "speed y [m/s]": "0.33",
      "speed z [m/s]": "0",
      "camera box light 1 is on": "True",
      "camera box light 2 is on": "True",
      "camera box light 3 is on": "True",
      "camera box light 4 is on": "True",
      "y end pos [m]": "22.135",
      "y set velocity [m/s]": "0.33",
      "y set acceleration [m/s^2]": "0.1",
      "y set decceleration [m/s^2]": "0.1"
    },

stereoTop

"sensor_fixed_metadata": {
      "cameras alignment": "cameras optical axis parallel to XAxis, perpendicular to ground",
      "optics focus setting (both)": "2.5m",
      "optics apperture setting (both)": "6.7",
      "location in gantry system": "camera box, facing ground",
      "location in camera box x [m]": "0.877",
      "location in camera box y [m]": "2.276",
      "location in camera box z [m]": "0.578",
      "field of view at 2m in X- Y- direction [m]": "[1.857 1.246]",
      "bounding Box [m]": "[1.857     1.246]",
    },

cropCircle

"sensor_fixed_metadata": {
      "location in gantry system": "camera box, facing ground",
      "location in camera box x [m]": "0.480",
      "location in camera box y [m]": "1.920",
      "location in camera box z [m]": "0.6",
    },

co2Sensor

"sensor_fixed_metadata": {
      "location in gantry system": "camera box, facing ground",
      "location in camera box x [m]": "0.35",
      "location in camera box y [m]": "2.62",
      "location in camera box z [m]": "0.7",
    },

flirIrCamera

"sensor_fixed_metadata": {
      "location in gantry system": "camera box, facing ground",
      "location in camera box x [m]": "0.877",
      "location in camera box y [m]": "1.361",
      "location in camera box z [m]": "0.520",
      "field of view x [m]": "1.496",
      "field of view y [m]": "1.105",
    },

ndviSensor

"sensor_fixed_metadata": {
      "location in gantry system": "top of gantry, facing up, camera box, facing ground",
      "location in camera box x [m]": "0.33",
      "location in camera box y [m]": "2.50",
    },

priSensor

"sensor_fixed_metadata": {
      "location in gantry system": "top of gantry, facing up, camera box, facing ground",
      "location in camera box x [m]": "0.400",
      "location in camera box y [m]": "2.470",
    },

SWIR

"sensor_fixed_metadata": {
      "location in gantry system": "camera box, facing ground",
      "location in camera box x [m]": "0.877",
      "location in camera box y [m]": "2.325",
      "location in camera box z [m]": "0.635",
      "field of view y [m]": "0.75",
      "optics focal length [mm]": "25",
      "optics focus apperture": "2.0",
    },

field scanner plots

There are 864 (54*16) plots in total and the plot layout is described in the plot plan table.

dimension

value

# rows

# rows / plot

# plots (2 rows ea)

864

# ranges

# columns

row width (m)

0.762

plot length (m)

row length (m)

3.5

alley length (m)

0.5

The boundary of each plot changes slightly each planting season. The scanalyzer coordinates of each plot are transformed into the (EPSG:4326) USDA coordinates using the equations above.

Hyperspectral imaging data

Summary

Hyperspectral imaging data is collected using the Headwall VNIR and SWIR sensors. In the Nov 2017 Beta Release only VNIR data is provided because we do not have the measurements of downwelling spectral radiation required by the pipeline.

Please see the README hyperspectral pipeline README for more information about how the data are generated and known issues.

Hyperspectral Algorithm and pipeline

See Hyperspectral extractor

Data Access

Raw Data

Raw data is available in the filesystem, accessible via Globus in the following directories:

VNIR: /sites/ua-mac/raw_data/VNIR
SWIR: /sites/ua-mac/raw_data/SWIR

These files are uncalibrated; see the hyperspectral pipeline repository for information on how these can be processed.

Level 1 data access

Hyperspectral data is available via Clowder, Globus #Terraref endpoint, the TERRA REF Workbench, and our THREDDS server:

Clowder:
- VNIR Hyperspectral NetCDFs
- SWIR Collection: Level 1 data not available
Globus and Workbench:
- VNIR: /sites/ua-mac/Level_1/vnir_netcdf
- SWIR: Level 1 data not available
Sensor information:
- Headwall SWIR
- Headwall VNIR

For details about using this data via Clowder or Globus, please see Data Access section.

Level 2 data access

Level 2 data are spectral indices computed at the same resolution as Level 1. These can be found in the same Level 1 directories as their parents, but the files are appended *_ind.nc.

To get a list of hyperspectral indices currently generated https://terraref.ncsa.illinois.edu/bety/api/v1/variables?type=Reflectance%20Index

traits::

Level 3 data access

Hyperspectral Indices

The following indices are computed and provided as both Level 2 data at full spatial resolution and as Level 3 (plot level) means.

Citations can be found Morris, Geoffrey P., Davina H. Rhodes, Zachary Brenton, Punna Ramu, Vinayan Madhumal Thayil, Santosh Deshpande, C. Thomas Hash et al. "Dissecting genome-wide association signals for loss-of-function phenotypes in sorghum flavonoid pigmentation traits." G3: Genes, Genomes, Genetics 3, no. 11 (2013): 2085-2094.

Index

Label

Formula

Citation

DWSI1

Disease Water Stress Index 1

R800 / R1660

Apan, Held, Phinn and Markley (2003)

ND900_680

Normalized Difference 900/680

(R900 - R680)/(R900 + R680)

Rouse et al. (1973)

SR900_680

Simple ratio 900/680

R900/R680

Rouse et al. (1973)

DWSI2

Disease Water Stress Index 2

R1660 / R550

Apan, Held, Phinn and Markley (2003)

TCARI

Transformed Chlorophyll Absorption Ratio

3 ((R700 - R670) - 0.2 (R700 - R550) * (R700/R670))

Haboudane et al. (2002)

DWSI3

Disease Water Stress Index 3

R1660 / R680

Apan, Held, Phinn and Markley (2003)

DWSI4

Disease Water Stress Index 4

R550 / R680

Apan, Held, Phinn and Markley (2003)

DWSI5

Disease Water Stress Index 5

(R800 + R550) / (R1660 + R680)

Apan, Held, Phinn and Markley (2003)

SR700_670

Simple Ratio 700/670

R700/R670 Part of TCARI index

RDVI

Renormalized Difference Vegetation Index

(R800 - R670)/(R800 + R670)^0.5

Rougean and Breon (1995)

PRI531

Normalized Difference 531/570 Photochemical Reflectance Index 531/570

(R531 - R570)/(R531 + R570)

Gamon et al. (1992)

EVI

Enhanced Vegetation Index

2.5 (R800 - R680) / (R800 + 6.0f R680 - 7.5f * R450 + 1.0f)

Huete et al. (1997)

ARVI

Atmospherically Resistant Vegetation Index

(R800 - (2.0f R680 - R450)) / (R800 + (2.0f R680 - R450))

Kaufman and Tanr√© (1996)

REIP1

Red-Edge Inflection Point 1

700 + 40 * {[(R670 + R780)/2 - R700] /(R740 - R700)}

Guyot and Baret, 1988

TVI

Triangular Vegetation Index

0.5 (120 (R750 - R550) - 200 * (R670 - R550))

Haboudaneet al. (2004)

GEMI

Global Environmental Monitoring Index

((2 (pow(R800) - pow(R680)) + 1.5 800 + 0.5 680) / (800 + 680 + 0.5) (1.0 - 0.25 (2.0f (pow(800) - pow(680)) + 1.5 800 + 0.5 680) / (800 + 680 + 0.5))) - ((680 - 0.125) / (1.0 - 680))

Pinty and Verstraete (1992)

GARI

Green Atmospherically Resistant Index

(R800 - (R550 - 1.7 (R450 - R680))) / (R800 + (R550 - 1.7 (R450 - R680)))

Gitelson et al. (1996)

DVI

Difference Vegetation Index

R800 - R680

Tucker et al. (1979)

GDVI

Green Difference Vegetation Index

R800 - R550

Sripada et al. (2006)

GNDVI

Green Normalized Difference Vegetation Index

(R800 - R550) / (R800 + R550)

Gitelson and Merzlyak (1998)

GRVI

Green Ratio Vegetation Index

R800 / R550

Sripada et al. (2006)

SR750_710

Simple Ratio 750/710 Zarco-Tejada & Miller 2001

R750/R710

Zarco-Tejada et al. (2001)

MSR705_445

Modified simple ratio 705/445

(R750 - R445)/(R705 - R445)

Sims and Gamon (2002)

Water index

R900 - R970

Penuelas. et al. (1993)

Chl index

Chlorophyll index

R750/R550

Gitelson and Merzlyak (1994)

NDVI705

Normalized Difference 750/705 Chl NDI

(R750 - R705)/(R750 + R705)

Gitelson and Merzlyak (1994)

ChlDela

Chlorophyll content

(R540 - R590)/(R540 + R590)

Delaieux et al. (2014)

FRI2

Fluorescence ratio indices 2

R740/R800

Dobrowski et al. (2005)

NDVI1

Normalized Difference Vegetation Index1

(R800 - R670)/( R800 + R670)

Rouse et al. (1973)

FRI1

Fluorescence ratio index1

R690/R600

Dobrowski et al. (2005)

OSAVI

Optimized Soil Adjusted Vegetation Index

(1 + 0.16) * (R800 - R670)/(R800 + R670 + 0.16)

Rondeaux et al. (1996)

NDRE

Normalized Difference 790/720 Normalized difference red edge index

(R790 - R720)/(R790 + R720)

Barnes et al. (2000)

Car1Black

Carotenoid index from Blackburn 1998

R800/R470

Blackburn (1998)

SIPI

Structure intensive pigment index

(R800 - R450)/(R800 + R650)

Penuelas. et al. (1995)

AntGitelson

Anthocyanin (Gitelson)

(1/R550 - 1/R700) * R780

Gitelson et al.(2003,2006)

Car2Black

Carotenoid index 2 from Blackburn 1998

(R800 - R470)/(R800 + R470)

Blackburn (1998)

PRI586

Photochemical reflectance index from Panigada et al 2014

(R531 - R586)/(R531 + R586)

Panigada et al. (2014)

AntGamon

Anthocyanin from Gamon and Surfus 1999

R650/R550

Gamon and Surfus (1999)

CarChap

Carotenoid index (Chappelle)

R760/R500

Chappelle et al. (1992)

PRI512

Photochemical reflectance index from Hernandez-Clemente et al 2011

(R531- R512)/(R531 + R512)

Hern√°ndez-Clemente et al. (2011)

TCARI_OSAVI

Transformed Chlorophyll Absorption in Reflectance Index/Optimized Soil-Adjusted Vegetation Index: TCARI/OSAVI

TCARI/OSAVI

Haboudane et al. (2002)

IPVI

Infrared Percentage Vegetation Index

R800 / (R800 + R680)

Crippen et al. (1990)

NLI

Non-Linear Index

(pow(R800, 2) - R680) / (pow(R800, 2) + R680)

Goel and Qin (1994)

MNLI

Modified Non-Linear Index

((pow(R800, 2) - R680) * 1.5f) / (pow(R800, 2) + R680 + 0.5f)

Yang et al. (2008)

SAVI

Soil Adjusted Vegetation Index

(1.5f * (R800 - R680)) / (R800 + R680 + 0.5f)

Huete et al. (1988)

TDVI

Transformed Difference Vegetation Index

sqrt(0.5f + ((R800 - R680) / (R800 + R680)))

Bannari et al. (2002)

VARI

Visible Atmospherically Resistant Index

(R550 - R680) / (R550 + R680 - R450)

Gitelson et al. (2002)

RENDVI

Red Edge Normalized Difference Vegetation Index

(R750 - R705) / (R750 + R705)

Gitelson and Merzlyak (1994)

mRESR

Modified Red Edge Simple Ratio Index

(R750 - R445) / (R750 + R445)

Sims and Gamon (2002)

mRENDVI

Modified Red Edge Normalized Difference Vegetation Index

(R750 - R705) / (R750 + R705 - 2.0f * R445)

Sims and Gamon (2002)

VOG1

Vogelmann Red Edge Index 1

R740 / R720

Vogelmann et al. (1993)

VOG2

Vogelmann Red Edge Index 2

(R734 - R747) / (R715 + R726)

Vogelmann et al. (1993)

VOG3

Vogelmann Red Edge Index 3

(R734 - R747) / (R715 + R720)

Vogelmann et al. (1993)

MCARI

Modified Chlorophyll Absorption Reflectance Index

((R700 - R670) - 0.2f (R700 - R550)) (R700 / R670)

Daughtry et al. (2000)

MCARI1

Modified Chlorophyll Absorption Reflectance Index Improved 1

1.2f (2.5f (R790 - R670) - 1.3f * (R790 - R550))

Haboudane et al. (2004)

MCARI2

Modified Chlorophyll Absorption Reflectance Index Improved 2

(1.5f (2.5f (R800 - R670) - 1.3f (R800 - R550))) / sqrt(pow(2.0f R800 + 1.0f, 2) - 6.0f R800 - 5.0f sqrt(R670) - 0.5f)

Haboudane et al. (2004)

MTVI

Modified Triangular Vegetation Index

1.2f (1.2f (R800 - R550) - 2.5f * (R670 - R550))

Haboudane et al. (2004)

MTVI2

Modified Triangular Vegetation Index Improved

1.5f (1.2f (R800 - R550) - 2.5f (R670 - R550)) / sqrt(pow(2.0f R800 + 1.0f ,2) - (6.0f R800 - 5.0f sqrt(R670)) - 0.5f)

Haboudane et al. (2004)

GMI1

Gitelson and Merzlak Index 1

R750 / R550

Gitelson and Merzlak (1997)

GMI2

Gitelson and Merzlak Index 2

R750 / R700

Gitelson and Merzlak (1997)

Lic1

Lichtenthaler Index 1

(R790 - R680) / (R790 + R680)

Lichtenthaler et al. 1996

Lic2

Lichtenthaler Index 2

R440 / R690

Lichtenthaler et al. 1996

Lic3

Lichtenthaler Index 3

R440 / R740

Lichtenthaler et al. 1996

NDNI

Normalized Difference Nitrogen Index

(log(1.0f / R1510) - log(1.0f / R1680)) / (log(1.0f / R1510) + log(1.0f / R1680))

Fourty et al. (1996)

MSR

Modified Simple Ratio

((R800 / R680) - 1.0f) / (sqrt(R800 / R680) + 1)

Chen et al. (1996)

LAI

Leaf Area Index

3.618f ((2.5.0f (R800 - R680)) / (R800 + 6.0f R680 - 7.5.0f R450 + 1.0f)) - 0.118f

Boegh et al. (2002)

NRI1510

Nitrogen Related Index NRI1510

(R1510 - R660) / (R1510 + R660)

Herrmann et al. (2009)

NRI850

Nitrogen Related Index NRI850

(R850 - R660) / (R850 + R660)

Behrens et al. (2006)

NDLI

Normalized Difference Lignin Index

(log(1.0f / R1754) - log(1.0f / R1680)) / (log(1.0f / R1754) + log(1.0f / R1680))

Melillo et al. (1982)

CAI

Cellulose Absorption Index

(0.5 * (R2000 - R2200)) / R2100

Daughtry et al. (2001)

PSRI

Plant Senescence Reflectance Index

(R680 - R500) / R750

Merzlyak et al. (1999)

CRI1

Carotenoid Reflectance Index 1

1.0f / R510 - 1.0f / R550

Gitelson et al. (2002)

CRI2

Carotenoid Reflectance Index 2

1.0f / R510 - 1.0f / R700

Gitelson et al. (2002)

ARI1

Anthocyanin Reflectance Index 1

1.0f / R550 - 1.0f / R700

Gitelson et al. (2001)

ARI2

Anthocyanin Reflectance Index 2

R800 * ((1.0f / R550) - (1.0f / R700))

Gitelson et al. (2001)

SRPI

Simple Ration Pigment Index

R430 / R680

Penuelas et al. (1995)

NPQI

Normalized Phaeophytinization Index

(R415 - R435) / (R415 + R435)

Barnes et al. (1992)

NPCI

Normalized Pigment Chlorophyll Index

(R680 - R430) / (R680 + R430)

Penuelas et al. (1994)

WBI

Water Band Index

R900 / R970

Penuelas et al. (1995)

NDWI

Normalized Difference Water Index

(R857 - R1241) / ( R700 + R1241)

Gao et al. (1995)

MSI

Moisture Stress Index

R819 / R1599

Hunt and Rock (1989)

NDII

Normalized Difference Infrared Index

(R857 - R1241) / ( R700 + R1241)

Hardisky et al. (1983)

NMDI

Normalized Multiband Drought Index

(R819 - R1649) / (R819 + R1649)

Wang and Qu (2007)

Healthy Index

((R534 - R698) / (R534 + R698)) - (R704 / 2.0f)

Mahlein et al. (2013)

CLSI

Cercospora Leaf Spot Index

((R698 - R570) / (R698 + R570)) - R734

Mahlein et al. (2013)

SBRI

Sugar Beet Rust Index

((R570 - R513) / (R570 + R513)) + (R704 / 2.0f)

Mahlein et al. (2013)

PMI

Powdery Mildew Index

((R520 - R584) / (R520 + R584)) + R724

Mahlein et al. (2013)

Crt1

Carter Index 1

R695 / R420

Carter (1994)

Crt2

Carter Index 2

R695 / R760

Carter (1996)

BIG2

Blue/Green Index

R450 / R550

Zarco-Tejada et al. (2005)

LSI

Leaf Structure Index

R1110 / R810

Maruthi Sridhar et al. (2007)

BRI

Browning Reflectance Index

((1.0f / R550) - (1.0f / R700)) / R800

Chivkunova et al. (2001)

Greenness Index

R554 / R677

Infrared heat imaging data

Summary

Infrared heat imaging data is collected collected using the FLIR SC615 thermal sensor. These data are provided as geotiff image raster files as well as plot level means.

Algorithms are in the flir2tif directory of the Multispectral extractor repository; see the readme for details.
Sensor information: FLIR Thermal Camera collection

Level 1 Data Access

Filesystem (Globus and Workbench)

ua-mac/Level_1/ir_geotiff

Clowder

To be created https://github.com/terraref/computing-pipeline/issues/391

Level 3 Data

Plot level summaries are named 'surface_temperature' in the trait database. In the future this name will be used for the Level 1 data as well. This name from the Climate Forecast (CF) conventions, and is used instead of 'canopy_temperature' for two reasons: First, because we do not (currently) filter soil in this pipeline. Second, because the CF definition of surface_temperature distinguishes the surface from the medium: "The surface temperature is the temperature at the interface, not the bulk temperature of the medium above or below." http://cfconventions.org/Data/cf-standard-names/48/build/cf-standard-name-table.html

Raw data access

Thermal imaging data is available via Clowder and Globus:

/ua-mac/raw_data/flirIrCamera

For details about using this data via Clowder or Globus, please see Data Access section.

Known Issues

Data are unavailable for Season 4 (summer 2017 sorghum) and season 5 (winter 2017-2018 wheat).
- Work to recover these data is ongoing; see terraref/reference-data#190
- Problem description terraref/reference-data#182

Meteorological data

Meteorological data will use Climate Forecasting 'standard names' and 'canonical units' conventions. CF is widely used in climate, meteorology, and earth sciences.

Here are some examples (note that we can change from canonical units to match the appropriate scale, e.g. "C" instead of "K"; time can use any base time and time step (e.g. hours since 2015-01-01 00:00:00 UTC, etc. But the time zone has to be UTC, where 12:00:00 is approx (+/- 15 min). solar noon at Greenwich.

CF standard-name

units

time

days since 1700-01-01 00:00:00 UTC

air_temperature

air_pressure

mole_fraction_of_carbon_dioxide_in_air

mol/mol

moisture_content_of_soil_layer

kg m-2

soil_temperature

relative_humidity

specific_humidity

water_vapor_saturation_deficit

surface_downwelling_longwave_flux_in_air

W m-2

surface_downwelling_shortwave_flux_in_air

W m-2

surface_downwelling_photosynthetic_photon_flux_in_air

mol m-2 s-1

precipitation_flux

kg m-2 s-1

irrigation_flux

kg m-2 s-1

irrigation_transport

kg s-1

wind_speed

m/s

eastward_wind

m/s

northward_wind

m/s

standard_name is CF-convention standard names (except irrigation)
units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs)

Running The Pipeline

Before the Running

The pipepline is developed in Python, so a Python Interpreter is a must. Other than the basic Python standard librarys, the following third-party libraries are required:

netCDF4 for Python
numpy

Other than official CPython interpreter, Pypy is also welcomed, but please make sure that these third-party modules are correctly installed for the target interpreter. The pipeline can only works in Python 2.X versions (2.7 recommended) since numpy does not support Python 3.X versions.

Cloning from the Git:

git clone https://github.com/terraref/computing-pipeline.git
cd  computing-pipeline/scripts/environmental_logger
git checkout master

The extractor for this pipeline is developed and maintained by Max in branch "EnvironmentalLogger-extractor" under the same repository.

Get the Environmental Logger Pipeline to Work

To trigger the pipeline, use the following command:

python ${environmental_logger_source_path}/environmental_logger_json2netcdf.py ${input_JSON_file} ${output_netCDF_file}

Where:

${environmental_logger_source_path} is where the three environmental_logger files are located
${input_JSON_file} is where the input JSON files are located
${output_netCDF_file} is where the users want the pipeline to export the product (netCDF file)

Please note that the parameter for the output file can be a path to either a directory or a file, and it is not necessarily to be existed. If the output is a path to a folder, the final product will be in this folder as a netCDF file that has the same name as the imported JSON file but with a different filename extension (.nc for standard netCDF file); if this path does not exist, environmental_logger pipeline will automatically make one.

Calculation

The calculation in the Environmental Logger is mainly finished by the module environmental_logger_calculation.py under the support of numpy.

Point Cloud Data

Summary

3D point cloud data is collected using the Fraunhofer 3D laser scanner. Custom software installed at Maricopa converts .png output to the .ply point clouds. The .ply point clouds are converted to georeferenced .las files using the 3D point cloud extractor

Level 1 data products are provided in both .las and .ply formats. Raw sensor output (PLY) is converted to LAS format using the ply2las extractor; ply2las extractor code is available on GitHub.

For each scan, there are two .ply files representing two lasers, one on the left and the other on the right. These are combined in the .las files.

Sensor information

Fraunhofer 3D scanner collection

For details about using this data via Clowder or Globus, please see Data Access section.

Data access

Data is available via Clowder, Globus, and Workbench.

Clowder: Laser Scanner 3D LAS
Globus or Workbench File System:
- LAS /raw_data/laser3D_las
- PLY /raw_data/laser3D

Known issues

The position of the lasers is affected by temperature. We added a correction for temperature to adjust for this effect. See terraref/reference-data#161

Controlled Environment phenotype data

LemnaTec Scanalyzer 3D platform at the Donald Danforth Plant Science Center

Phenotype data is derived from images generated by the indoor LemnaTec Scanalyzer 3D platform at the Donald Danforth Plant Science Center using PlantCV. PlantCV is an image analysis package for plant phenotyping. PlantCV is composed of modular functions in order to be applicable to a variety of plant types and imaging systems. PlantCV contains base functions that are required to examine images from an excitation imaging fluorometer (PSII), visible spectrum camera (VIS), and near-infrared camera (NIR). PlantCV is a fully open source project: https://github.com/danforthcenter/plantcv. For more information, see:

Project website: http://plantcv.danforthcenter.org

Full documentation: http://plantcv.readthedocs.io/en/latest

Publications:

To learn more about PlantCV, you can find examples in the terraref/tutorials repository, which is accessible on GitHub and in the TERRA REF workbench under tutorials/plantcv

an ipython notebook demonstration of PlantCV plantcv/plantcv_jupyter_demo.ipynb.

For the TERRA-REF project, a PlantCV Clowder extractor was developed to analyze data from the Bellwether Foundation Phenotyping Facility at the Donald Danforth Plant Science Center. Resulting phenotype data is stored in BETYdb.

Computational pipeline

PlantCV extractor

Description: Processes VIS/NIR images captured at several angles to generate trait metadata. The trait metadata is associated with the source images in Clowder, and uploaded to the configured BETYdb instance.
Output CSV: /sites/danforth/Level_1/<experiment name>

Input

Evaluation is triggered whenever a file is added to a dataset
Following images must be found
- 2x NIR side-view = NIR_SV_0, NIR_SV_90
- 1x NIR top-view = NIR_TV
- 2x VIS side-view = VIS_SV_0, VIS_SV_90
- 1x VIS top-view = VIS_TV
Per-image metadata in Clowder is required for BETYdb submission; this is how barcode/genotype/treatment/timestamp are determined.

Output

Each image will have new metadata appended in Clowder including measures like height, area, perimeter, and longest_axis
Average traits for the dataset (3 VIS or 3 NIR images) are inserted into a CSV file and added to the Clowder dataset
If configured, the CSV will also be sent to BETYdb

Data access

Level 1

BETYdb: https://terraref.ncsa.illinois.edu/bety

For details about accessing BETYdb, please see Data Access section and a tutorial on accessing phenotypes from the trait database on the TERRA REF Workbench in traits/04-danforth-indoor-phenotyping-facility.Rmd.

Raw Data

Clowder: Bellwether Phenotyping Facility Space
Globus and Workbench:
- /sites/danforth/raw_data/<experiment name>

Data Use Policy

Public Domain Data

Data will be released with a CC0 license, meaning that they are in the public domain. The CC0 license allows wide use of these data and while it does not legally bind users to acknowledge the source data users are expected cite our data and research in publications, presentations, and other products.

Data are in the public domain to enable broad and unrestricted re-use. However, any derived pubications should cite the dataset:

Citing TERRA REF data:

Software and Algorithms

Unpublished Data

We plan to make data from the Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) project available for use with attribution.

Timing of Future Data Releases

Planned future releases include Sorghum Season 9, data from experiments at Kansas State University, and data from the Danforth Indoor Phenotyping Facility.

Additional seasons can be requested as needed. We can provide the raw data and software required to process it. We can also collaborate with you to process the data, but this will typically require new funding sources.

Contacts

Todd Mockler, Project/Genomics Lead (email: tmockler@danforthcenter.org)
David LeBauer, Computing Pipeline Lead (email: dlebauer@arizona.edu)
Nadia Shakoor, Project Director (email: nshakoor@danforthcenter.org)

Manuscripts and Authorship Guidelines

Summary

The willingness of many scientists to cooperate and collaborate is what makes TERRA REF possible. Because the platform encompasses a diverse group of people and relies on many data contributors to create datasets for analysis, writing scientific papers can be more challenging than with more traditional projects. We have attempted to lay out ground rules to establish a fair process for establishing authorship, and to be inclusive while not diluting the value of authorship on a manuscript. Please engage with the TERRA REF manuscript writing process knowing you are helping to forge a new model of doing collaborative scientific research.

Copyright, Attribution, and Conditions of Use:

We are making data available early for users under the condition that manuscripts led within the team not be scooped. In these cases, people who wish to use the data for publication prior to official open release date of November 2018 should coordinate co-authorship with the person responsible for collecting the data.

Overview of the TERRA REF authorship process:

Inclusive but not gratuitous

Our primary goals in the TERRA REF authorship process are to consistently, accurately and transparently attribute the contribution of each author on the paper, to encourage participation in manuscripts by interested scientists, and to ensure that each author has made sufficient contribution to the paper to warrant authorship.

Steps:

Read these authorship policies and guidelines.
Proposed ideas are reviewed by the authorship committee primarily to facilitate appropriate collaborations, identify potential duplication of effort, and to support the scientists who generate data while allowing the broader research community access to data as quickly and openly as possible. The authorship committee may suggest altering or combining analyses and papers to resolve issues of overlap.
Circulate your draft analysis and manuscript to solicit Opt-In authorship.
For global analyses, the lead author should circulate the manuscript to the Network by submitting a email to the TERRA REF team.
For analyses of more limited scope, the lead author should circulate the manuscript to network collaborators who have indicated interest at the abstract stage, those who have contributed data, and any others who the lead author deems appropriate.
In both cases, the subject line of the email should include the phrase "OPT-IN PAPER"; This email should also include a deadline by which time co-authors should respond.
The right point to share your working draft and solicit co-authors is different for each manuscript, but in general:
1. sharing early drafts or figures allows for more effective co-author contribution. While ideally this would mean circulating the manuscript at a very early stage for opt-in to the entire network, it is acceptable and even typical to share early drafts or figures among a smaller group of core authors.
2. circulating essentially complete manuscripts does not allow the opportunity for meaningful contribution from co-authors, and is discouraged.
Potential co-authors should signal their intention to opt-in by responding by email to the lead author before the stated deadline.
Potential co-authors should inform the lead author of any additional candidates for co-authorship who should be considered.
Lead authors should is responsible for making sure that any who have made contributions warranting co-authorship have actively opted in or out (authors should not be excluded due to a missed email or a misunderstanding of the scope of the manuscript and their contributions). The goal is to ensure that the author list is inclusive and consistent.
Lead authors should keep an email list of co-authors and communicate regularly about progress including sharing drafts of analyses, figures, and text as often as is productive and practical.
Lead authors should circulate complete drafts among co-authors and consider comments and changes. Given the wide variety of ideas and suggestions provided on each TERRA REF paper, co-authors should recognize the final decisions belong to the lead author.
Final manuscripts should be reviewed and approved by each co-author before submission.
All authors and co-authors should fill out their contribution in the authorship rubric and attach it as supplementary material to any TERRA REF manuscript. Lead authors are responsible for ensuring consistency in credit given for contributions, and may alter co-author's entries in the table to do so.
Note that the last author position may be appropriate to assign in some cases. For example, this would be appropriate for advisors of lead authors who are graduate students or postdocs and for papers that two people worked very closely to produce.
The lead author should carefully review the authorship contribution table to ensure that all authors have contributed at a level that warrants authorship and that contributions are consistently attributed among authors. The lead author should also ensure that all contributions that warrant co-authorship.
- Has each author made contributions in at least two areas in the authorship rubric?
- Did each author provide thoughtful, detailed feedback on the manuscript?
- Have all qualified contributors actively opted in or out of co-authorship?

Authors are encouraged to contact the TERRA REF PI (Mockler) or authorship committee (Jeff White, Geoff Morris, Todd Mockler, David LeBauer, Wasit Wulamu, Nadia Shakoor) about any confusion or conflicts.

Co-authorship

Authorship must be earned through a substantial contribution. Traditionally, project initiation and framing, data analysis and interpretation, software or algorithm development, and manuscript preparation are all authorship-worthy contributions, and remain so for TERRA REF manuscripts. However, TERRA REF collaborators have also agreed that collaborators who lead a site from which data are being used in a paper can also opt-in as co-authors, under the following conditions: (1) the collaborators' site has contributed data being used in the paper's analysis; and (2) that this collaborator makes additional contributions to the particular manuscript, including data analysis, writing, or editing. For co-authorship on opt-out papers, each individual must be able to check at least two boxes in the rubric in addition to contribution to the writing process. These guidelines apply equally to manuscripts led by graduate students.

Co-author Expectations

Each author is expected to meet all of the following conditions:

Substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data, and
Drafting the article or revising it critically for important intellectual content, and
Final approval of the version to be published, and
Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Types of Author Contributions

Author Order

By default, we will follow the conventions of the scientific community that is the target audience of the journal in which the article is published. This should typically follow:

First is lead author
Last is the supervisor of the lead author.
if > 1 lead or senior authors these will be listed first and last, respectively, and identified in the author contributions section of the acknowledgements.
All other contributors are listed alphabetically.

Publications committee

Members: David LeBauer, Todd Mockler, Geoff Morris, Duke Pauli, Nadia Shakoor, Wasit Wulamu

The publications committee ensures communication across projects to avoid overlap of manuscripts, works to provide guidance on procedures and authorship guidelines, and serves as the body of last resort for resolution of authorship disputes within the Network.

Acknowledgments

Please use the following text in the acknowledgments of TERRA REF manuscripts:

Keywords

Please use "TERRA REF"; as one of your keywords on submitted manuscripts, so that TERRA REF work is easily indexed and searchable.

Protocols

The following protocols have been contributed by TERRA-REF team members:

Field Scanner - Coming 2017
Genomics - Coming 2017

Field Scanner

The Maricopa field site is located at the the University of Arizona Maricopa Agricultural Center and USDA Arid Land Research Station in Maricopa, Arizona. At this site, we have deployed the following phenotyping platforms.

Twelve sensors are attached to the gantry system. Detailed information for each sensor including name, variable measured, and field of view are described in the table below, with links to more detailed specifications.

Sensor Calibration

This section describes sensor calibration processes and how to access additional information about specific calibration protocols, calibration targets, and associated reference data.

LemnaTec Field Scanalyzer

Calibration protocols

Calibration targets

The following calibration targets are available:

Aluminum 3D test object

Sensor Calibration

Environmental sensor calibration

The environmental sensor has been calibrated by LemnaTec. The output of the spectrometer is raw counts, users will need to use the calibration files to convert to units of µW m-2 s-1, taking into account the bandwidth of the chip (0.4nm) if converting to µmol m-2 s-1.

Hyperspectral calibration

Sources:

For the SWIR and VNIR sensors, factory calibration is repeated each year using the calibration lamp provided by Headwall. To convert the hyperspectral exposure image to reflectance requires the wavelength-dependent, factory calibrated reflectance of the spectralon at all VNIR and SWIR wavelengths and a good image of a spectralon panel from each camera. This includes periodic measurements of a white spectralon reflectance panel run with 20ms exposure to match panel calibration.

Dark reference measurement:

VNIR
- Dark measurement for VNIR camera is taken at exposure times 20, 25, 30, 35, 40, 45, 50, 55ms.
- Data is in the same hypercube format with 180-200 lines, 955 bands, and 1600 pixel samples.
- Measurement was done using Headwall software, so there is no LemnaTec json file.
- The name of the folder is the exposure time. "current setting exposure" is showing the exposure time in ms.
- Custom workflow to process the calibration files.
SWIR;
- Dark counts handled internally, so no calibration files are necessary.

White reference measurement:

VNIR
- White measurement for VNIR camera is taken at exposure times 20, 25, 30, 25, 40, 25, 50, 55 ms.
- The name of the folder is the exposure time. Data are 1600 sample, 955 bands and 268-298 lines. White reference is located in the lines between 60 to 100 and in the samples between 600 to 1000.
- The white reference scans was done at around 1pm ( one hour after solar noon). I don’t see the saturation with 20ms and 25ms exposure time.
- For the calibration, this needs to be subtracted from the dark current in the same sample, band and exposure time.
- In the following file, I stored an extra file named "CorrectedWhite_raw". This file includes only a single white pixel( one line, one sample) in 955 bands for each exposure time. Data is stored in the similar format but it doesnot include any extra files like frameIndex, image, header ,..
  https:\/\/drive.google.com\/file\/d\/0ByXIACImwxA7dVNHa3pTYkFjdWc\/view?usp=sharing
  Let me know if you have issue with opening the files.

Stereo 3D height scanner

LemnaTec applied calibration matrix to the 3D scanners.

UAV calibration

There are calibrated reference panels and blackbody images taken with UAV sensors before and\/or after the each flight mission.
There are also 4 white,grey and black panels laid on the ground during the flight. Knowing the proprieties of these targets would helps us radiometrically correct the UAV images.
What are the reflectance properties of calibrated reference panels for multispectral camera?
What are the thermal properties of reference target for thermal camera?
What are the reflectance properties of the reference panels laid on the ground during the flight?
Is there any other ground truth data collected during the flight for aerial data processing, such as surface reflectance, temperature and other environmental data? These type of data would be helpful for further atmospheric correction.
There are two sets of reference reflectance panels: one that PDS uses, it is small, PDS will need to provide the specs; the second set consists of 4 8m x 8m canvas tarps, nominally 4%, 8%, 48% and 64% reflectance across vnir bands.
We have data from an ASD spectrometer on many but not all flight days that can be used to give the most accurate actual reflectances for each. Kelly Thorp can provide the numbers. The tarps are old and the dark targets are more reflective than nominal and light targets darker than nominal.
The thermal target is a passive black body, I dont know the surface emissivity, it is around 0.97. There are thermistors in the back of the metal plate to provide physical temperature of the body. The black body is stored in a wood box, insulated, to dampen thermal variations. Id guess it is accurate to 2C.
There is a met station on farm for air temperature, humidity, wind speed, wind direction, solar radiation. we have a sun photometer that can be used for atmospheric water vapor content but currently dont deploy it routinely.

Halogen spectrum

Spectral response data

Relative spectral response data is available for the following sensors:

NDVI
PRI
PAR

Calibration data

Hyperspectral Data

The TERRA hyperspectral data pipeline processes imagery from hyperspectral camera, and ancillary metadata. The pipeline converts the "raw" ENVI-format imagery into netCDF4/HDF5 format with (currently) lossless compression that reduces their size by ~20%. The pipeline also adds suitable ancillary metadata to make the netCDF image files truly self-describing. At the end of the pipeline, the files are typically [ready for xxx]/[uploaded to yyy]/[zzz].

Installation

Software dependencies

Pipeline source code

Once the pre-requisite libraries above have been installed, the pipeline itself may be installed by checking-out the TERRAREF computing-pipeline repository. The relevant scripts for hyperspectral imagery are:

Setup

The pipeline works with input from any location (directories, files, or stdin). Supply the raw image filename(s) (e.g., meat_raw), and the pipeline derives the ancillary filename(s) from this (e.g., meat_raw.hdr, meat_metadata.json). When specifying a directory without a specifice filename, the pipeline processes all files with the suffix "_raw".

shmkdir ~/terrarefcd ~/terrarefgit clone git@github.com:terraref/computing-pipeline.gitgit clone git@github.com:terraref/documentation.git

Run the Hyperspectral Pipeline

shterraref.sh -i ${DATA}/terraref/foo_raw -O ${DATA}/terrarefterraref.sh -I /projects/arpae/terraref/raw_data/lemnatec_field -O /projects/arpae/terraref/outputs/lemnatec_field

Controlled Environment Protocols

Abstract

Automated VIS and NIR imaging in a controlled growth environment

Materials

ProMix BRK20 + 14-14-14 Osmocote pots
pre-filled by Hummert Sorghum seed

Equipment

Conviron Growth House
LemnaTec moving field conveyor belt system
Scanalyzer 3D platform

Procedures

Planting

Plant directly into phenotyping pots

Chamber Conditions

Pre-growth (11 days) and Phenotying (11 days)

14 hour photoperiod
32oC day/22oC night temperature
60% relative humidity
700 umol/m2/s light

Watering Conditions

Prior to phenotyping, plants watered daily
The first night after loading, plants watered 1× by treatment group to 100% field capacity (fc)
Days 2 – 12, plants watered 2× daily by treatment group (100% or 30% FC) to target weight

Automation

Left shift lane rotation within each GH, during overnight watering jobs
VIS (TV and 2 x SV), NIR (TV and 2 x SV) imaging daily

Recipes

Field capacity = 200% GWC (200 g water/100 g soil), based upon extensive GWC testing done by Skyler Mitchell
Target weight (fc) = [(water weight at % fc) + [(average weight of carrier/saucer) + (dry soil weight) + (pot weight)]
Water weight at 100% fc = dry soil weight * (%GWC/100)
Water weight at 30% fc = water weight at 100% fc * 0.30

References

Manual Field Data Protocols

Data were collected manually using standard field methods. These measurements are used to calibrate and validate phenotypes derived from sensor-collected data.

Abstract

Materials

barcode scanning protractor
barcode scanning ruler
digital caliper
drying oven
forage chopper
hand shears
infrared thermometer
juice extractor
leaf punch
meter stick
paper bags
scale
spray paint

Equipment

Procedures

References

Photosynthesis / leaf chemistry from hyperspectral data references:

Appendix

Archived Documentation

Hyperspectral imaging data

Summary

Please see the README hyperspectral pipeline README for more information about how the data are generated and known issues.

Hyperspectral Algorithm and pipeline

See Hyperspectral extractor

Data Access

Raw Data

Raw data is available in the filesystem, accessible via Globus in the following directories:

VNIR: /sites/ua-mac/raw_data/VNIR
SWIR: /sites/ua-mac/raw_data/SWIR

These files are uncalibrated; see the hyperspectral pipeline repository for information on how these can be processed.

Level 1 data access

Hyperspectral data is available via Clowder, Globus #Terraref endpoint, the TERRA REF Workbench, and our THREDDS server:

Clowder:
- VNIR Hyperspectral NetCDFs
- SWIR Collection: Level 1 data not available
Globus and Workbench:
- VNIR: /sites/ua-mac/Level_1/vnir_netcdf
- SWIR: Level 1 data not available
Sensor information:
- Headwall SWIR
- Headwall VNIR

For details about using this data via Clowder or Globus, please see Data Access section.

Level 2 data access

Level 2 data are spectral indices computed at the same resolution as Level 1. These can be found in the same Level 1 directories as their parents, but the files are appended *_ind.nc.

To get a list of hyperspectral indices currently generated https://terraref.ncsa.illinois.edu/bety/api/v1/variables?type=Reflectance%20Index

traits::

Level 3 data access

Hyperspectral Indices

The following indices are computed and provided as both Level 2 data at full spatial resolution and as Level 3 (plot level) means.

Index

Label

Formula

Citation

DWSI1

Disease Water Stress Index 1

R800 / R1660

Apan, Held, Phinn and Markley (2003)

ND900_680

Normalized Difference 900/680

(R900 - R680)/(R900 + R680)

Rouse et al. (1973)

SR900_680

Simple ratio 900/680

R900/R680

Rouse et al. (1973)

DWSI2

Disease Water Stress Index 2

R1660 / R550

Apan, Held, Phinn and Markley (2003)

TCARI

Transformed Chlorophyll Absorption Ratio

3 ((R700 - R670) - 0.2 (R700 - R550) * (R700/R670))

Haboudane et al. (2002)

DWSI3

Disease Water Stress Index 3

R1660 / R680

Apan, Held, Phinn and Markley (2003)

DWSI4

Disease Water Stress Index 4

R550 / R680

Apan, Held, Phinn and Markley (2003)

DWSI5

Disease Water Stress Index 5

(R800 + R550) / (R1660 + R680)

Apan, Held, Phinn and Markley (2003)

SR700_670

Simple Ratio 700/670

R700/R670 Part of TCARI index

RDVI

Renormalized Difference Vegetation Index

(R800 - R670)/(R800 + R670)^0.5

Rougean and Breon (1995)

PRI531

Normalized Difference 531/570 Photochemical Reflectance Index 531/570

(R531 - R570)/(R531 + R570)

Gamon et al. (1992)

EVI

Enhanced Vegetation Index

2.5 (R800 - R680) / (R800 + 6.0f R680 - 7.5f * R450 + 1.0f)

Huete et al. (1997)

ARVI

Atmospherically Resistant Vegetation Index

(R800 - (2.0f R680 - R450)) / (R800 + (2.0f R680 - R450))

Kaufman and Tanr√© (1996)

REIP1

Red-Edge Inflection Point 1

700 + 40 * {[(R670 + R780)/2 - R700] /(R740 - R700)}

Guyot and Baret, 1988

TVI

Triangular Vegetation Index

0.5 (120 (R750 - R550) - 200 * (R670 - R550))

Haboudaneet al. (2004)

GEMI

Global Environmental Monitoring Index

((2 (pow(R800) - pow(R680)) + 1.5 800 + 0.5 680) / (800 + 680 + 0.5) (1.0 - 0.25 (2.0f (pow(800) - pow(680)) + 1.5 800 + 0.5 680) / (800 + 680 + 0.5))) - ((680 - 0.125) / (1.0 - 680))

Pinty and Verstraete (1992)

GARI

Green Atmospherically Resistant Index

(R800 - (R550 - 1.7 (R450 - R680))) / (R800 + (R550 - 1.7 (R450 - R680)))

Gitelson et al. (1996)

DVI

Difference Vegetation Index

R800 - R680

Tucker et al. (1979)

GDVI

Green Difference Vegetation Index

R800 - R550

Sripada et al. (2006)

GNDVI

Green Normalized Difference Vegetation Index

(R800 - R550) / (R800 + R550)

Gitelson and Merzlyak (1998)

GRVI

Green Ratio Vegetation Index

R800 / R550

Sripada et al. (2006)

SR750_710

Simple Ratio 750/710 Zarco-Tejada & Miller 2001

R750/R710

Zarco-Tejada et al. (2001)

MSR705_445

Modified simple ratio 705/445

(R750 - R445)/(R705 - R445)

Sims and Gamon (2002)

Water index

R900 - R970

Penuelas. et al. (1993)

Chl index

Chlorophyll index

R750/R550

Gitelson and Merzlyak (1994)

NDVI705

Normalized Difference 750/705 Chl NDI

(R750 - R705)/(R750 + R705)

Gitelson and Merzlyak (1994)

ChlDela

Chlorophyll content

(R540 - R590)/(R540 + R590)

Delaieux et al. (2014)

FRI2

Fluorescence ratio indices 2

R740/R800

Dobrowski et al. (2005)

NDVI1

Normalized Difference Vegetation Index1

(R800 - R670)/( R800 + R670)

Rouse et al. (1973)

FRI1

Fluorescence ratio index1

R690/R600

Dobrowski et al. (2005)

OSAVI

Optimized Soil Adjusted Vegetation Index

(1 + 0.16) * (R800 - R670)/(R800 + R670 + 0.16)

Rondeaux et al. (1996)

NDRE

Normalized Difference 790/720 Normalized difference red edge index

(R790 - R720)/(R790 + R720)

Barnes et al. (2000)

Car1Black

Carotenoid index from Blackburn 1998

R800/R470

Blackburn (1998)

SIPI

Structure intensive pigment index

(R800 - R450)/(R800 + R650)

Penuelas. et al. (1995)

AntGitelson

Anthocyanin (Gitelson)

(1/R550 - 1/R700) * R780

Gitelson et al.(2003,2006)

Car2Black

Carotenoid index 2 from Blackburn 1998

(R800 - R470)/(R800 + R470)

Blackburn (1998)

PRI586

Photochemical reflectance index from Panigada et al 2014

(R531 - R586)/(R531 + R586)

Panigada et al. (2014)

AntGamon

Anthocyanin from Gamon and Surfus 1999

R650/R550

Gamon and Surfus (1999)

CarChap

Carotenoid index (Chappelle)

R760/R500

Chappelle et al. (1992)

PRI512

Photochemical reflectance index from Hernandez-Clemente et al 2011

(R531- R512)/(R531 + R512)

Hern√°ndez-Clemente et al. (2011)

TCARI_OSAVI

Transformed Chlorophyll Absorption in Reflectance Index/Optimized Soil-Adjusted Vegetation Index: TCARI/OSAVI

TCARI/OSAVI

Haboudane et al. (2002)

IPVI

Infrared Percentage Vegetation Index

R800 / (R800 + R680)

Crippen et al. (1990)

NLI

Non-Linear Index

(pow(R800, 2) - R680) / (pow(R800, 2) + R680)

Goel and Qin (1994)

MNLI

Modified Non-Linear Index

((pow(R800, 2) - R680) * 1.5f) / (pow(R800, 2) + R680 + 0.5f)

Yang et al. (2008)

SAVI

Soil Adjusted Vegetation Index

(1.5f * (R800 - R680)) / (R800 + R680 + 0.5f)

Huete et al. (1988)

TDVI

Transformed Difference Vegetation Index

sqrt(0.5f + ((R800 - R680) / (R800 + R680)))

Bannari et al. (2002)

VARI

Visible Atmospherically Resistant Index

(R550 - R680) / (R550 + R680 - R450)

Gitelson et al. (2002)

RENDVI

Red Edge Normalized Difference Vegetation Index

(R750 - R705) / (R750 + R705)

Gitelson and Merzlyak (1994)

mRESR

Modified Red Edge Simple Ratio Index

(R750 - R445) / (R750 + R445)

Sims and Gamon (2002)

mRENDVI

Modified Red Edge Normalized Difference Vegetation Index

(R750 - R705) / (R750 + R705 - 2.0f * R445)

Sims and Gamon (2002)

VOG1

Vogelmann Red Edge Index 1

R740 / R720

Vogelmann et al. (1993)

VOG2

Vogelmann Red Edge Index 2

(R734 - R747) / (R715 + R726)

Vogelmann et al. (1993)

VOG3

Vogelmann Red Edge Index 3

(R734 - R747) / (R715 + R720)

Vogelmann et al. (1993)

MCARI

Modified Chlorophyll Absorption Reflectance Index

((R700 - R670) - 0.2f (R700 - R550)) (R700 / R670)

Daughtry et al. (2000)

MCARI1

Modified Chlorophyll Absorption Reflectance Index Improved 1

1.2f (2.5f (R790 - R670) - 1.3f * (R790 - R550))

Haboudane et al. (2004)

MCARI2

Modified Chlorophyll Absorption Reflectance Index Improved 2

(1.5f (2.5f (R800 - R670) - 1.3f (R800 - R550))) / sqrt(pow(2.0f R800 + 1.0f, 2) - 6.0f R800 - 5.0f sqrt(R670) - 0.5f)

Haboudane et al. (2004)

MTVI

Modified Triangular Vegetation Index

1.2f (1.2f (R800 - R550) - 2.5f * (R670 - R550))

Haboudane et al. (2004)

MTVI2

Modified Triangular Vegetation Index Improved

1.5f (1.2f (R800 - R550) - 2.5f (R670 - R550)) / sqrt(pow(2.0f R800 + 1.0f ,2) - (6.0f R800 - 5.0f sqrt(R670)) - 0.5f)

Haboudane et al. (2004)

GMI1

Gitelson and Merzlak Index 1

R750 / R550

Gitelson and Merzlak (1997)

GMI2

Gitelson and Merzlak Index 2

R750 / R700

Gitelson and Merzlak (1997)

Lic1

Lichtenthaler Index 1

(R790 - R680) / (R790 + R680)

Lichtenthaler et al. 1996

Lic2

Lichtenthaler Index 2

R440 / R690

Lichtenthaler et al. 1996

Lic3

Lichtenthaler Index 3

R440 / R740

Lichtenthaler et al. 1996

NDNI

Normalized Difference Nitrogen Index

(log(1.0f / R1510) - log(1.0f / R1680)) / (log(1.0f / R1510) + log(1.0f / R1680))

Fourty et al. (1996)

MSR

Modified Simple Ratio

((R800 / R680) - 1.0f) / (sqrt(R800 / R680) + 1)

Chen et al. (1996)

LAI

Leaf Area Index

3.618f ((2.5.0f (R800 - R680)) / (R800 + 6.0f R680 - 7.5.0f R450 + 1.0f)) - 0.118f

Boegh et al. (2002)

NRI1510

Nitrogen Related Index NRI1510

(R1510 - R660) / (R1510 + R660)

Herrmann et al. (2009)

NRI850

Nitrogen Related Index NRI850

(R850 - R660) / (R850 + R660)

Behrens et al. (2006)

NDLI

Normalized Difference Lignin Index

(log(1.0f / R1754) - log(1.0f / R1680)) / (log(1.0f / R1754) + log(1.0f / R1680))

Melillo et al. (1982)

CAI

Cellulose Absorption Index

(0.5 * (R2000 - R2200)) / R2100

Daughtry et al. (2001)

PSRI

Plant Senescence Reflectance Index

(R680 - R500) / R750

Merzlyak et al. (1999)

CRI1

Carotenoid Reflectance Index 1

1.0f / R510 - 1.0f / R550

Gitelson et al. (2002)

CRI2

Carotenoid Reflectance Index 2

1.0f / R510 - 1.0f / R700

Gitelson et al. (2002)

ARI1

Anthocyanin Reflectance Index 1

1.0f / R550 - 1.0f / R700

Gitelson et al. (2001)

ARI2

Anthocyanin Reflectance Index 2

R800 * ((1.0f / R550) - (1.0f / R700))

Gitelson et al. (2001)

SRPI

Simple Ration Pigment Index

R430 / R680

Penuelas et al. (1995)

NPQI

Normalized Phaeophytinization Index

(R415 - R435) / (R415 + R435)

Barnes et al. (1992)

NPCI

Normalized Pigment Chlorophyll Index

(R680 - R430) / (R680 + R430)

Penuelas et al. (1994)

WBI

Water Band Index

R900 / R970

Penuelas et al. (1995)

NDWI

Normalized Difference Water Index

(R857 - R1241) / ( R700 + R1241)

Gao et al. (1995)

MSI

Moisture Stress Index

R819 / R1599

Hunt and Rock (1989)

NDII

Normalized Difference Infrared Index

(R857 - R1241) / ( R700 + R1241)

Hardisky et al. (1983)

NMDI

Normalized Multiband Drought Index

(R819 - R1649) / (R819 + R1649)

Wang and Qu (2007)

Healthy Index

((R534 - R698) / (R534 + R698)) - (R704 / 2.0f)

Mahlein et al. (2013)

CLSI

Cercospora Leaf Spot Index

((R698 - R570) / (R698 + R570)) - R734

Mahlein et al. (2013)

SBRI

Sugar Beet Rust Index

((R570 - R513) / (R570 + R513)) + (R704 / 2.0f)

Mahlein et al. (2013)

PMI

Powdery Mildew Index

((R520 - R584) / (R520 + R584)) + R724

Mahlein et al. (2013)

Crt1

Carter Index 1

R695 / R420

Carter (1994)

Crt2

Carter Index 2

R695 / R760

Carter (1996)

BIG2

Blue/Green Index

R450 / R550

Zarco-Tejada et al. (2005)

LSI

Leaf Structure Index

R1110 / R810

Maruthi Sridhar et al. (2007)

BRI

Browning Reflectance Index

((1.0f / R550) - (1.0f / R700)) / R800

Chivkunova et al. (2001)

Greenness Index

R554 / R677

Existing Data Standards

This page summarizes existing standards, conventions, controlled vocabularies, and ontologies used for the representation of crop physiological traits, agronomic metadata, sensor output, genomics, and other inforamtion related to the TERRA-REF project.

Metadata standards

International Consortium for Agricultural Systems Applications (ICASA)

The ICASA Version 2.0 data standard defines an abstract model and data dictionary for the representation of agricultural field expirements. ICASA is explicitly designed to support implementations in a variety of formats, including plain text, spreadsheets or structured formats. It is important to note that ICASA is both the data dictionary and a format used to describe experiments.

The Agricultural Model Intercomparison Project (AgMIP) project has developed a JSON-based format for use with the AgMIP Crop Experiment (ACE) database and API.

Currently, the ICASA data dictionary is represented as a Google Spreadsheet and is not suitable for linked-data applications. The next step is to render ICASA in RDF for the TERRA-REF project. This will allow TERRA-REF to produce data that leverages the ICASA vocabulary as well as other external or custom vocabularies in a single metadata format.

The ICASA data dictionary is also being mapped to various ontologies as part of the Agronomy Ontology project. With this, it may be possible in the future to represent ICASA concepts using formal ontologies or to create mappings/crosswalks between them.

See also:

White et al (2013). Integrated Description of Agricultural Field Experiments and Production: The ICASA Version 2.0 Data Standards. Computers and Electronics in Agriculture.
AgMIP JSON Data Objects format description
ICASA Master Variable List

Minimum Information About a Plant Phenotyping Experiment (MIAPPE)

MIAPPE was developed by members of the European Phenotyping Network (EPPN) and the EU-funded transPLANT project. It is intended to define a list of attributes necessary to fully describe a phenotyping experiment.

The MIAPPE standard is available from the transPlant standards portal and is compatible with the ISA-Tools suite framework. The transPLANT standards portal also provides example configuration for the ISA toolset.

Section

Recommended ontologies

General metadata

Ongtology for Biomedical Investigations (OBI), Crop Research Ontology (CRO)

Timing and location

OBI, Gazetteer (GAZ)

Biosource

UNIPROT taxonomy, NCBI taxonomy

Environment, treatments

XEO Environment Ontology, Ontology of Environmental Features (ENVO), CRO

Experimental design

OBI, CRO, Statistics Ontology (STATO)

Observed values

Trait Ontology (TO), Plant Ontology (PO), Crop Ontology (CO), Phenotypic Quality Ontology (PATO), XEO/XEML

MIAPPE is currently the only standard listed in biosharing.org for the phenotyping domain. While several databases claim to support MIAPPE, the standard is still nascent.

MIAPPE is based on the ISA framework, building on earlier “minimum information” standards, such as MIAME (Minimum Information about a Microarray Experiment). If the MIAPPE standard is determined to be useful for TERRA-REF, it would be worth reviewing the MIAME steandard and related formats such as MAGE-TAG, MINiML, and SOFT accepted by the Gene Expression Omnibus (GEO). GEO is a long-standing repository for genetic research data and might serve as another model for TERRA-REF.

It is worth noting that linked-data methods are supported but optional when depositing data to GEO. The MAGE-TAB format, similar to the MIAPPE ISA Tab format, does support sources for controlled vocabulary terms or ontologies.

See also:

Minimum Information about a Plant Phenotyping Experiment

Dublin Core Application Profiles

While some communities define explicit metadata schema (e.g., Ecological Metadata Language), another approach is the use of "application profiles." An application profile is declaration of metadata terms adopted by a community or an organization along with the source of the terms. Application profiles are composed of terms drawn from multiple vocubularies or ontologies to define a "schema" or "profile" for metadata. For example, the Dryad metadata profile draws on the Dublin Core, Darwin Core, and Dryad-specific elements.

See also:

DCMI Guidelines for Dublin Core Application Profiles.
Example Dryad Metadata Profile
DCMI Singapore Framework

Trait Dictionary Format (Crop Ontology)

The Crop Ontology curation tool supports import and export of trait information in a trait dictionary format.

See also:

The Crop Ontology Improving the Quality of 18 Crop Trait Dictionaries

Vocabularies and Ontologies

This section reviews related controlled vocabularies, data dictionaries, and ontologies.

Biofuel Ecophysiological Traits and Yields Database (BETYdb)

While BETYdb is not a controlled vocabulary itself, the relational schema models a variety of concepts including managements, sites, treatments, traites, and yields.

The BETYdb “variables” table defines variables used to represent traits in the BETYdb relational model. There has been some effort to standardize variable names by adopting Climate Forecasting (CF) convention standard names where variables overlap. A variable is represented as a name, description, units, as well as min/max values.

For example:

"variable": {
    "created_at": "2016-03-07T11:23:58-06:00",
    "description": "",
    "id": 604,
    "label": "",
    "max": "1000",
    "min": "0",
    "name": "NDVI",
    "notes": "",
    "standard_name": "normalized_difference_vegetation_index",
    "standard_units": "ratio",
    "type": "",
    "units": "ratio",
    "updated_at": "2016-03-07T11:26:07-06:00"
}

See also:

DCMI Metadata terms

Controlled vocabulary for the representation of bibliographic information. See also:

DCMI Terms

Climate and Forecast Standard Name Table

Standard variable names and naming convention for use with NetCDF. The Climate and Forecast metadata conventions are intended to promote sharing of NetCDF files. The CF conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.

Basic conventions include lower-case letters, numbers, underscores, and US spelling.

Information is encoded in the variable name itself. The basic format is (optional components in []):

[surface] [component] standard_name [at surface] [in medium] [due to process] [assuming condition]

For example:

Standard names have optional canonical units, AMIP and GRIB (GRidded Binary) codes.

The CF standard names have been converted to RDF by several communities, including the Marine Metadata Interoperability (MMI) project.

Dimensions: time, lat, lon, other specify time first (unlimited) lat, lon or x, y extent to field boundaries.

See also:

CF Conventions
CF Conventions FAQ mentions RDF conversions.

ICASA master variable list

Vocabulary and naming conventions for agricultural modeling variables, used by AgMIP. The ICASA master variable list is included, at least in part, in the AgrO ontology. The NARDN-HD Core Harmonized Crop Experiment Data is also taken from the ICASA vocabulary.

ICASA variables have a number of fields, including name, description, type, min and max values.

See also:

ICASA Master Variable List
White et al (2013). Integrated Description of Agricultural Field Experiments and Production: The ICASA Version 2.0 Data Standards. Computers and Electronics in Agriculture.

NARDN-HD Core Harmonized Crop Experiment Data

A subset of the ICASA data dictionary representing set of core variables that are commonly collected in field crop experiments. These will be used to harmonize data from USDA experiments as part of a National Agricultural Research Data Network.

CSDMS Standard Names

Variable naming rules and patterns for any domain developed as part of the CSDMS project as an alternative to CF. CSDMS standard names is considered to have a more flexible community approval mechanism than CF. CSDMS names include object, quantity/attribute parts.

CSDMS names have been converted to RDF as part of the Earth Cube Geosemantic Server project.

See also:

CSMDS Standard Names

International Plant Names Index (IPNI)

http://www.ipni.org/

IPNI is a database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes. It's goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names.

NCBI Taxonomy

http://www.ncbi.nlm.nih.gov/taxonomy

A curated classification and nomenclature for all of the organisms in the public sequence databases that represents about 10% of the described species of life on the planet. Taxonomy recommended by MIAPPE.

Ontologies

Agronomy Ontology (AGRO)

The Agronomy Ontology “describes agronomic practices, agronomic techniques, and agronomic variables used in agronomic experiments.” It is intended as a complementary ontology to the Crop Ontology (CO). Variables are selected out of the International Consortium for Agricultural Systems Applications (ICASA) vocabulary and a mapping between AgrO and ICASA is in progress. AgrO is intended to work with the existing ontologies including ENVO, UO, PATO, IAO, and CHEBI. It will be part of an Agronomy Management System and fieldbook modeled on the CGIAR Breeding Management System to capture agronomic data.

See also:

OBO Foundry. Agonomy Ontology
FAO. Crop Ontology: harmonizing semantics for phenotyping and agronomy data
RDA. Interest Group on Agricultural Data (IGAD)

Crop Ontology (CO)

The Crop Ontology (CO) contains "Validated concepts along with their inter-relationships on anatomy, structure and phenotype of crops, on trait measurement and methods as well as on Germplasm with the multi-crop passport terms." The ontology is actively used by the CGIAR community and a central part of the Breeding Management System. MIAPPE recommends the CO (along with TO, PO, PATO, XEML) for observed variables.

Shrestha et al (2012) describe a method for representing trait data via the CO.

See also:

Crop Ontology
Shrestha et al (2012). Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the Crop Ontology developed by the crop communities of practice. Front Physiol. 2012 Aug 25;3:326.

Crop Research Ontology (CRO)

Describes experimental design, environmental conditions and methods associated with the crop study/experiment/trial and their evaluation. CRO is part of the Crop Ontology platform, originally developed for the International Crop Information System (ICIS). CRO is recommended in the MIAPPE standard for general metadata, environment, treatments, and experimental design fields.

See also:

Extensible Observation Ontology (OBOE)

Cited in Kattge et al (2011) as an example of an ontology used in ecology and environmental sciences to represent measurements and observation. However, the CRO may be better suited for TERRA-REF.

See also:

Kattge, J.(2011). A generic structure for plant trait databases

Gene Ontology (GO)

Defines concepts/classes used to describe gene function, and relationships between these concepts. GO is a widely-adopted ontology in genetics research, supported by databases such as GEO. This ontology is cited in Krajewski et al (2015) and might be relevant for the TERRA genomics pipeline.

See also:

Gene Ontology
Krajewski et al (2015). Towards recommendations for metadata and data handling in plant phenotyping. Journal of Experimental Botany, 66(18), 5417–5427.

Information Artifact Ontology (IAO)

Information entities, originally driven by work by OBI (e.g., abstract, author, citation, document etc). IAO covers similar territory to the Dublin Core vocabulary.

Ontology for Biomedical Investigations (OBI)

Integrated ontology for the description of biological and clinical investigations. This includes a set of 'universal' terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain. Recommended by MIAPPE for general metadata, timing and location, and experimental design.

See also:

Minimum Information about a Plant Phenotyping Experiment

Phenotype and Attribute Ontology (PATO)

Phenotypic qualities (properties).

Recommended in MAIPPE for use in the observed values field.

See also:

Minimum Information about a Plant Phenotyping Experiment

Plant Environment Ontology (EO)

Part of the Plant Ontology (PO), standardized controlled vocabularies to describe various types of treatments given to an individual plant / a population or a cultured tissue and/or cell type sample to evaluate the response on its exposure.

Plant Ontology (PO)

Describes plant anatomy and morphology and stages of development for all plants intended to create a framework for meaningful cross-species queries across gene expression and phenotype data sets from plant genomics and genetics experiment. Recommended by MIAPPE for observed values fields. Along with EO, GO, and TO make up the Gramene database. Links plant anatomy, morphology and growth and development to plant genomics data.

See also:

Minimum Information about a Plant Phenotyping Experiment

Plant Trait Ontology (TO)

Along with EO, GO, and PO, make up the Gramene database to link plant anatomy, morphology and growth and development to plant genomics data. Recommended by MIAPPE for observed values fields.

Example trait entry:

[Term]
id: TO:0000019
name: seedling height
def: "Average height measurements of 10 seedlings, in centimeters from the base of the shoot to the tip of the tallest leaf blade." [IRRI:SES]
synonym: "SH" RELATED []
is_a: TO:0000207 ! plant height

See also:

Minimum Information about a Plant Phenotyping Experiment

Statistics Ontology (STATO)

General purpose statistics ontology coveraging processes such as statistical tests, their conditions of application, and information needed or resulting from statistical methods, such as probability distributions, variables, spread and variation metrics. Recommended by MIAPPE for experimental design.

See also:

Minimum Information about a Plant Phenotyping Experiment

Units of Measurement Ontology (UO)

Metric units for PATO. This OBO ontology defines a set of prefixes (giga, hecto, kilo, etc) and units (area/square meter, volume/liter, rate/count per second, temperature/degree Fahrenheit). The two top-level classes are prefixes and units.

UO is mentioned in relation to the Agronomy Ontology (AGRO), but PATO is also recommended by MIAPPE for observed values fields

While there are general standard units, it seems unlikely that these would ever be gathered in a single place. It seems more useful to define a high-level ontology to represent a "unit" and allow domains and communities to publish their own authoritative lists.

XEML Environment Ontology (XEO)

Created to help plant scientists in documenting and sharing metadata describing the abiotic environment.

DDI-RDF Discovery Vocabulary

Data Catalog Vocabulary (DCAT)

The Data Catalog Vocabulary is an RDF vocabulary intended to facilitate interoperability between data catalogs published on the Web. DCAT defines a set of classes including Dataset, Catalog, CatalogRecord, and Distribution.

Data Cite Ontology

The DataCite Ontology

Data Cube Vocabulary

The Data Cube Vocabulary is an RDF-based model for publishing multi-dimentional datasets, based in part on the SDMX guidelines. DataCube defines a set of classes including DataSet, Observation, and MeasureProperty that may be relevant to the TERRA project.

Statistical Data and Metadata Exchange (SDMX)

SDMX is an international initiative for the standarization of the exchange of statistical data and metadata among international organizations. Sponsors of the initiative include Eurostat, European Central Bank, the OECD, World Bank and the UN Statistical Division. They have defined a framework and an exchange format, SDMX-ML, for data exchange. Community members have also developed RDF encodings of the SDMX guidelines that are heavily referenced in the Data Cube vocabulary examples.

Standard formats, ontologies, and controlled vocabularies are typically used in the context of specific software systems.

Agricultural Model Inter-Comparison and Improvement Project (AgMIP) Crop Experiment (ACE) Database

AgMIP "seeks to improve the capability of ecophysiological and economic models to describe the potential impacts of climate change on agricultural systems. AgMIP protocols emphasize the use of multiple models; consequently, data harmonization is essential. This interoperability was achieved by establishing a data exchange mechanism with variables defined in accordance with international standards; implementing a flexibly structured data schema to store experimental data; and designing a method to fill gaps in model-required input data."

The data exchange format is based on a JSON rendering of the ICASA Master Variable List. Data are transfer into and out of the AgMIP Crop Experiment (ACE) and AgMIP Crop Model (ACMO) databases via REST apis using these JSON objects.

See also

AgMIP Crop Expirement Database
Porter et al (2014). Harmonization and translation of crop modeling data to ensure interoperability. Environmental Modelling and Software. 62:495-508.
AgMIP Data Products presentation
AgMIP on Github
AgMIP Crop Experiment Database data variables
AgMIP API
AgMIP using ICASA standards

Biofuel Ecophysiological Traits and Yields Database (BETYdb)

BETYdb is used to store TERRA meta-data, provenance, and traits information.

BETYdb traits are available as web-page, csv, json, xml. This can be extended to allow spatial, temporal, and taxonomic / genomic queries. Trait vectors can be queries and rendered in several output formats. For example:

Here are some examples from betydb.org.

A separate instance of BETYdb is maintained for use by TERRA Ref at terraref.ncsa.illinois.edu.org/bety. The scope of the TERRA Ref database is limited to high througput phenotyping data and metadata produced and used by the TERRA program. Users can set up their own instances of BETYdb and import any public data in the distributed BETYdb network.

See also: BETYdb documentation

BETYdb Data Access includes accessing data with web interface, API, and R traits package
BETYdb constraints, see section "uniqueness constraints"
BETYdb Data Entry

Gramene

Gramene is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species

Integrated Breeding Platform/Breeding Management System

System for managing the breeding process including lists of germplasms, defining crosses, managing nurseries, trials, as well as ontologies and statistical analysis.

See also:

BMS Site

TERRA Ref has an instance of BMS hosted by CyVerse (requires login).

International Crop Information System

ICIS is "a database system that provides integrated management of global information on crop improvement and management both for individual crops and for farming systems." ICIS is developed by Consultative Group for International Agricultural Research (CGIAR).

See also

Fox and Skovmand (1996). "The International Crop Information System (ICIS) - connects genebank to breeder to farmer’s field." Plant adaptation and crop improvement, CAB International.

MODAPS NASA MODIS Satellite data

The MODAPS NASA MODIS Satellite data encompasses a library of functions that provides programmatic data access and processing services to MODIS Level 1 and Atmosphere data products. These routines enable both SOAP and REST based web service calls against the data archives maintained by MODAPS. These routines mirror existing LAADS Web services.

See also:

NDISC Modis Data Summaries

Phenomics Ontology Driven Database (PODD)

http://www.plantphenomics.org.au/projects/podd/ Online repository for storage and retrieval of raw and analyzed data from Australian Plant Phenomics Facility (APPF) phenotyping platforms. PODD is based on Fedora Commons repository software with data and metadata modeled using OWL/RDFS.

See also:

PODD Project Site

Plant Breeders API

Specifies a standard interface for plant phenotype/genotype databases to serve data for use in crop breeding applications. This is the API used by FieldBook, which allows users to turn spreadsheets into databases. Examples indicate that the responses will include values linked to the Crop Ontology, for example:

https://github.com/plantbreeding/API/blob/master/Specification/Traits/ListAllTraits.md

However, in general the BRAPI returned JSON data without linking context (i.e., not JSON-LD), so it is in essence it’s own data structure.

Other notes:

The Breeding Management System (BMS) group has implemented a few features to make it compatible with Field Book in its current state without the use of API.
BMS and the Genomic & Open-source Breeding Informatics Initiative (GOBII) are both pushing for the API and plan on implementing it when it's complete.
Read news about the BMS Breeding Management System Standalone Server and genomes2fields migrating to BMS

See also

Plant Breeding API

Plant Genomics and Phenomics Research Data Repository (PGP)

German repository for plant research data including image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents.

See also:

Arend et al (2016). PGP repository: a plant phenomics and genomics data publication infrastructure. Database.
PGP Repository

USDA Plants

“The PLANTS Database provides standardized information about the vascular plants, mosses, liverworts, hornworts, and lichens of the U.S. and its territories. It includes names, plant symbols, checklists, distributional data, species abstracts, characteristics, images, crop information, automated tools, onward Web links, and references.”

See also

USDA Plants Website

USDA Quick Stats

Web based application supports querying the agricultural census and survey statistics. Also available via API.

See also

USDA Quick Stats Website

transPLANT

Infrastructure to support computational analysis of genomic data from crop and model plants. This includes the large-scale analysis of genotype-phenotype associations, a common set of reference plant genomic data, archiving genomic variation, and a search engine integrating reference bioinformatics databases and physical genetic materials. See also

transPlant Website

Sensor Data

Meteorological data

Proposed format for meteorological variables exported from Lemnatec platform

Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) data formats

One implementation of CF for ecosystem model driver (met, soil) and output (mass, energy dynamics)
- Standardized Met driver data
- Terrestrial Ecosystem Model output

Date-Time:

YYYY-MM-DD hh:mm:ssZ: based on ISO 8601 . Optional offset for local time; precision determined by data (e.g. could be YYYY-MM-DD and decimals specified by a period.

Submitting data to BETYdb

Submitting Data to BETYdb

BETYdb is a database used to centralize data from research done in all TERRA projects. (It is also the name of the Web interface to that database.) Uploading data to BETYdb will allow everyone on the team access to research done on the TERRA project.

Preliminary steps

Before submitting data to BETYdb, you must first have an account.

Go to the BETYdb homepage.
Click the "Register for BETYdb" button to create an account. If you plan to submit data, be sure to request "Creator" page access level when filling out the sign-up form.
Understand how the database is organized and what search options are avaible. Do this by exploring the data using the Data tab (see next section).

Exploring the data

The Data tab contains a menu for searching the database for different types of data. The Data tab is also the pathway to pages allowing you to add new data of your own. But if you have a sizable amount of trait or yield data you wish to submit, you will likely want to use the Bulk Upload wizard (see below).

As an example, try clicking the Data tab and selecting Citations, the first menu item. A page with a list of citations that have already been uploaded into the system appears.

Citations are listed by the first author's last name. For example a journal article written by Andrew Davis and Kerri Shaw would have the name "Davis" in the author slot.

Use the search box located in the top right corner of the page to search for citations by author, year, title, journal, volume, page, URL, or DOI. Note that the search string must exactly match a substring of the value of one of these items (though the matching is case-insensitive).

Each of the other collections listed in the Data menu may be searched similarly. For example, on the Cultivars page you can search cultivars in the system by searching for them by any of several facets pertaining to cultivars, including the name, ecotype, associated species, even the notes. Keep in mind that when switching to a new Data menu item (such as Cultivars), the resulting page will initially show all items of the type selected that are currently on file. (More precisely, since results are paginated, it will show the first twenty-five of those results.)

Preparing for bulk upload of data

The Bulk Upload wizard expects data in CSV format, with one row for each set of associated data items. ("Associated data items" usually means a set of measurements made on the same entity at the same time.) Each trait or yield data item must be associated with a citation, site, species, and treatment and may be associated with a specific cultivar of the associated species. Before you can upload data from a data file, this associated citation, site, species, cultivar, and treatment information must already be in place.

Moreover, if you are uploading trait data, your CSV data file must have one or more trait variable columns (and optionally, one or more covariate variable columns), and the names of these columns must match the names of existing variables. (See the discussion of variables below.)

Details on adding associated data

There is no bulk upload process for adding citations, site, species, cultivars, treatment, and variables to the database. They must be added one at a time using Web forms. Since most often a set of dozens or hundreds of traits is associated with a single citation, site, or species (etcetera), usually this is not an undue burden.

Details on checking that items of each particular type exist (and adding them if they don't) follow:

Citations: To check that the needed citations exist, go to the citations listing by clicking Citations in the Data menu. Search for your citation(s) to determine if all citations associated with your data already exist. If they don't, then create new citations as needed. Be sure to fill in all the required data; author, year, and title are required; if at all possible, include the journal name, volume, page numbers, and DOI. (You must include the DOI if that is what your data files uses to identify citations.)

Sites: Go to the Data tab and click on Sites to verify that all sites in your data file are listed on the Sites page. If any of your sites are not already in the system, you will need to add them to the database. To do this, first search the citations list for the associated citation, select it (by clicking the checkmark in the row where it is listed) and then click the New Site button. A new site must have a name, but if possible, supply other information—the city, state, and country where the site is located, the latitude, longitude, and altitude of the site, and possibly climate and soil data.

It is possible that sites referenced by your data are already in the database but that they aren't yet associated with the citation associated with that data. To see the set of sites associated with a given citation, find the citation in the citations list and select it by clicking the checkmark in its row. This will take you to the Listing Sites page; all of the sites associated with the selected citation (if any) will be listed at the top. To associate another site with the selected citation, enter its name in the search box, find the row containing it, and click the "link" action in that row.

Treatments: The treatment specified for each of your data items must not only match the name of an existing treatment, it must also be associated with the citation for the data item. To see the list of treatments associated with a particular citation, select the citation as in the instructions for Sites. Then click the Treatments link on the Listing Sites page. The top section of this page lists all treatments associated with the selected citation.

Currently, there is no way to associate an arbitrary treatment with a citation via the Web interface. You will either have to make a new treatment with the desired name (after the desired citation has been selected), or you will have to (or have an administrator) modify the database directly.

Species: To check that the needed species entries exist, go to the the species listing by clicking Species in the Data menu. Search for each of the species required by your data. The species entry in the CSV file must match the scientific name (Latin name) of the species listed in the database. If necessary, add any species in your data that has not yet been added to the database. When adding a species, scientificname is the only required field, but the genus and species fields should be filled out as well.

Cultivars: If your data lists cultivars, you should check that these are in the database as well. Cultivar names are not necessarily unique, but they are unique within a given species. To check whether a cultivar matching the name and species listed in your CSV file has been added to the database, go to the cultivar listing by clicking Cultivars in the Data menu. Searching either by species name or cultivar name should quickly determine if the needed cultivar exists. If it needs to be added, click the New Cultivar button. Fill in the species search box with enough of the species name to narrow down the result list to a workable size, and then select the correct species from the result list immediately below the search box. Then type the name of the cultivar you wish to add in the Name field. The Ecotype and Notes sections are optional.

Variables: If you are submitting trait data, verify that the variables associated with each trait and each covariate match the names of variables in the system (for example, canopy_height, hull_area, or solidity). To do this, go to the Data tab and click on Variables. If any of your variables are not already in the system, you will need to add them.

For a variable to be recognized as a trait variable or covariate, it is not enough for it simply to be in the variables table; it must also be in the trait_covariate_associations table. To check which variables will be recogized as trait variables or covariates, click on the Bulk Upload tab. Then click the link View List of Recognized Traits. This will bring up a table that lists all names of variables recognized as traits and the names of all variables recognized as required or optional covariates for each trait. If you need to add to this table and do not have direct access to the underlying database to which you are submitting data, you will need to e-mail the site adminstrator to request additions. (See the "Contact Us" section in the footer of the BETYdb homepage.)

The Bulk Upload Wizard

Once you have entered all the necessary data to prepare for a bulk data upload, you can then begin the bulk upload process.

There are some key rules for bulk uploading:

Templates To help you get started, some data file templates are available. There are four different templates to choose from.
- yields_template_by_citation_author_year_title.csv
  Use this template if you are uploading yields and you wish to specify the citations by author, year, and title.
- yields_template_by_citation_doi.csv
  Use this template if you are uploading yields and you wish to specify the citations by DOI.
- traits_template_by_citation_author_year_title.csv
  Use this template if you are uploading traits and you wish to specify the citations by author, year, and title.
- traits_template_by_citation_doi.csv
  Use this template if you are uploading traits and you wish to specify the citations by DOI.
These "templates" consist of a single line of text showing a typical header row for a CSV file. In the traits templates, the headings of the form "[trait variable 1]" or "[covariate 1]" must be replaced with actual variable names corresponding to a trait variable or covariate, respectively.
These templates show all possible columns that may be included. In most cases, fewer columns will be needed and the unneeded column headings should be removed. The only programmatically required headings are "yield" (for uploads of yield data), or, for uploads of trait data, the name of at least one recognized trait variable. All other data required for an upload—the citation, site, species, treatment, access level, and date—may be specified interactively, provided that they have a uniform value for all of the trait or yield data in the file being uploaded. (Specification of a cultivar is not required, but it too may be specified interactively if it has a uniform value for all of the data in the file.)
Matching It is important that text values and trait or covariate column names in the data file match records in the database. This includes variable names, site names, species and cultivar names, etc. Note, however, that matching is somewhat lax: the matching is done case-insensitively, and extraneous spaces in values in the data file are ignored.
Some special cases of note: In the case of citation_title, the supplied value need only match an initial substring of the title specified in the database as long as the combination of author, year, and the initial portion of the title uniquely identifies a citation stored in the database. (The value for citation_title may even be empty if the author and year together uniquely identify a citation!) And in the case of species names, the letter 'x' may be used to match the times symbol '×' used in names of hybrid species.
Column order The order of columns in the data file is immaterial; in making the template files, an arbitrary order was chosen. But because the data in the data file is displayed for review during the bulk upload process, it may be that some orderings are easier to work with than others.
Quotation rules Since commas are used to delineate columns in CSV files, any data value containing a comma must be surrounded by double quotes. (Single quotes are interpreted as part of the value!) If the value itself contains a double-quote, this double-quote must be doubled ("") in addition to surrounding the value with double quotes.
Character encoding Non-ASCII characters must use UTF-8 encoding.
Blank lines There can be no blank lines in the file, either between data rows or at the end of the file.

Troubleshooting data files

Immediately after uploading a data file (or after specifying the citation if this is done interactively), the Bulk Upload Wizard tries to validate the uploaded file and displays the results of this validation.

The types of errors one may encounter at this stage fall into roughly three categories:

Parsing errors
These are errors at the stage of parsing the CSV file, before the header or data values are even checked. An error at this stage returns one to the file-upload page.
Header errors
These are errors caused by having an incongruous set of headings in the header row. Here are some examples:
1. There is citation_author column heading without a corresponding citation_year and citation_title heading. It is an error to use one of these headings without the other two.
2. There is both a citation_doi heading and a citation_author, citation_year, or citation_title heading. If citation_doi is used, none of the other citation-related headings is allowed.
3. There is an SE heading without an n heading or vice versa.
4. There is neither a yield heading nor a heading corresponding to a recognized trait variable.
5. There is both a yield heading and a heading corresponding to a recognized trait variable. A data file can be used to insert data into the traits table or the yields table but not both at once.
6. There is a cultivar heading but no species heading.
If any of these errors occur, validation of data values will not proceed.
There may be other errors associated with the header row that aren't treated as errors as such. For example, if you intend to supply two trait variables per row but misspell one of them, the data in the column headed by the misspelled variable name will simply be ignored. That column will be grayed-out, but the file may still be used to insert data corresponding to the "good" variable (provided there are no other errors). In other words, if you ignore the "ignored column" warning and the gray highlighting, you may end up uploading only a portion of the data you intended to upload.
Value errors
If there are no file-parsing errors or header errors, the Bulk Upload wizard will proceed to validate data values. Valid values will be highlighted in green. Ignored columns will be highlighted in gray. (This will warn you, for example, if you have misspelled the name of a trait variable.) Other colors signify various sorts of errors. A summary of errors is shown at the top of the page with links to rows in which the various errors occur.
1. Matching value errors
  Each row of the CSV file must be associated with a unique citation, site, species, and treatment and may be associated with a unique cultivar. These associations may either be specified in the CSV file or, if a particular association is constant for all rows of the file, it may be specified interactively. If they are specified in the file, problems that may arise include:
  - The combination of values for citation_author, citation_year, and citation_title do not uniquely identify a citation in the database. (This may be because there are no matches or too many (i.e., more than one) matches. (There should never be multiple database rows having the same combination of author, year, and title, but this is not currently enforced.)
  - The value for citation_doi does not uniquely match a citation in the database. (Again, citation DOIs should be unique, but the database schema doesn't enforce this.)
  - The value for site does not uniquely match the sitename of a site in the database. (site.sitename should be unique, but this again is not enforced.)
  - The site specified in a given row is not consistent with the citation specified in that row. (If you visit the "Show" page for the site, you should see the citation listed at the top of the page right under Viewing Site.)
  - The value for species does not match the value of scientificname for a unique row of the species table. (species.scientificname should be unique, but the database scheme doesn't currently enforce this.)
  - The value for treatment does not match the value of the name of any treatment row in the database.
  - The value for treatment in a particular row matches one or more treatments in the database, but none are associated with the citation specified by that row.
  - The value for treatment in a particular row matches more that one treatment in the database that is associated with the citation specified by that row. (This error is rare. Names of treatments associated with a particular citation should be unique, but this is not yet enforced.)
  - The value for cultivar specified in a particular row is not consistent with the species specified in that row.
2. Other value errors, not having to do with associated attributes of the data, are as follows:
  - A value for a trait is out of range. An obvious example would be giving a negative number as the value for annual yield. If a variable value is flagged as being out of range, double check the data. If you determine that the value is indeed correct, you should request to have the range in the database adjusted for that variable.
  - A value for the measurement date is not in the correct format or is out of range.
  - A value for the access level is not 1, 2, 3, or 4.
  - A value of the wrong type is given. Examples would be giving a text value for yield or a floating point number for n.

After successful validation

Global options and values

If there are no errors in the data file, the bulk upload will proceed to a page allowing you to choose rounding options for your data values. You may choose to keep 1, 2, 3, or 4 significant digits, 3 being the default. If your data includes a standard error (SE) column, you may separately specify the amount of rounding for the standard error. Here the default is 2 significant digits.

If you did not specify all associated-data values and or did not specify an access level in the data file itself, this page will also allow you to specify a uniform global value for any association not specified in the file; and it will allow you to specify a uniform access level if your data file did not have an access_level column.

Verification page

Once you have specified global options and values, you will be taken to a verification page that will summarize the global options you have selected and the associations you specified for your data. The latter will be presented in more detail than any specification in your data file or on the Upload Options and Global Values page. For example, when summarizing the sites associated with your data, not only are the site names listed, but the city, state, country, latitude, longitude, soil type, and soil notes are also displayed. This will help ensure that the citations, sites, species, etc. that you specified are really the ones that you intended.

Once you have verified the data, clicking the Insert Data button will complete the upload. The insertions are done in an SQL transaction: if any insertion fails, the entire transaction is rolled back.