2 of 6

Data Standards

Overview

TERRA’s data standards facilitate the exchange of genomic and phenomic data across teams and external researchers. Applying common standards makes it easier to exchange analytical methods and data across domains and to leverage existing tools.

When practical, existing conventions and standards have been used to create data standards. Spatial data adopts Federal Geographic Data Committee (FGDC) and Open Geospatial Consortium (OGC) data and meta-data standards. CF variable naming convention was adopted for meteorological data and biophysical data. Data formats and variable naming conventions were adapted from NEON and NASA.

Feedback from data creators and users were used to define the types of data formats, semantics, and interfaces, file formats, and representations of space, time, and genetic identity based on existing standards, commonly used file formats, and user needs.

We anticipate that standards and data formats will evolve over time as we clarify use cases, develop new sensors and analytical pipelines, and build tools for data format conversion and feature extraction and tracking provenance. Each year we will re-convene to assess our standards based on user needs. The Standards Committee will assess the trade-off between the upfront cost of adoption with the long-term value of the data products, algorithms, and tools that will be developed as part of the TERRA program. The specifications for these data products will be developed iteratively over the course of the project in coordination with TERRA funded projects. The focus will be to take advantage of existing tools based on these standards, and to develop data translation interfaces where necessary.

Existing Data Standards

This page summarizes existing standards, conventions, controlled vocabularies, and ontologies used for the representation of crop physiological traits, agronomic metadata, sensor output, genomics, and other inforamtion related to the TERRA-REF project.

Metadata standards

International Consortium for Agricultural Systems Applications (ICASA)

The ICASA Version 2.0 data standard defines an abstract model and data dictionary for the representation of agricultural field expirements. ICASA is explicitly designed to support implementations in a variety of formats, including plain text, spreadsheets or structured formats. It is important to note that ICASA is both the data dictionary and a format used to describe experiments.

The Agricultural Model Intercomparison Project (AgMIP) project has developed a JSON-based format for use with the AgMIP Crop Experiment (ACE) database and API.

Currently, the ICASA data dictionary is represented as a Google Spreadsheet and is not suitable for linked-data applications. The next step is to render ICASA in RDF for the TERRA-REF project. This will allow TERRA-REF to produce data that leverages the ICASA vocabulary as well as other external or custom vocabularies in a single metadata format.

The ICASA data dictionary is also being mapped to various ontologies as part of the Agronomy Ontology project. With this, it may be possible in the future to represent ICASA concepts using formal ontologies or to create mappings/crosswalks between them.

See also:

White et al (2013). Integrated Description of Agricultural Field Experiments and Production: The ICASA Version 2.0 Data Standards. Computers and Electronics in Agriculture.
AgMIP JSON Data Objects format description
ICASA Master Variable List

Minimum Information About a Plant Phenotyping Experiment (MIAPPE)

MIAPPE was developed by members of the European Phenotyping Network (EPPN) and the EU-funded transPLANT project. It is intended to define a list of attributes necessary to fully describe a phenotyping experiment.

The MIAPPE standard is available from the transPlant standards portal and is compatible with the ISA-Tools suite framework. The transPLANT standards portal also provides example configuration for the ISA toolset.

Section

Recommended ontologies

General metadata

Ongtology for Biomedical Investigations (OBI), Crop Research Ontology (CRO)

Timing and location

OBI, Gazetteer (GAZ)

Biosource

UNIPROT taxonomy, NCBI taxonomy

Environment, treatments

XEO Environment Ontology, Ontology of Environmental Features (ENVO), CRO

Experimental design

OBI, CRO, Statistics Ontology (STATO)

Observed values

Trait Ontology (TO), Plant Ontology (PO), Crop Ontology (CO), Phenotypic Quality Ontology (PATO), XEO/XEML

MIAPPE is currently the only standard listed in biosharing.org for the phenotyping domain. While several databases claim to support MIAPPE, the standard is still nascent.

MIAPPE is based on the ISA framework, building on earlier “minimum information” standards, such as MIAME (Minimum Information about a Microarray Experiment). If the MIAPPE standard is determined to be useful for TERRA-REF, it would be worth reviewing the MIAME steandard and related formats such as MAGE-TAG, MINiML, and SOFT accepted by the Gene Expression Omnibus (GEO). GEO is a long-standing repository for genetic research data and might serve as another model for TERRA-REF.

It is worth noting that linked-data methods are supported but optional when depositing data to GEO. The MAGE-TAB format, similar to the MIAPPE ISA Tab format, does support sources for controlled vocabulary terms or ontologies.

See also:

Minimum Information about a Plant Phenotyping Experiment

Dublin Core Application Profiles

While some communities define explicit metadata schema (e.g., Ecological Metadata Language), another approach is the use of "application profiles." An application profile is declaration of metadata terms adopted by a community or an organization along with the source of the terms. Application profiles are composed of terms drawn from multiple vocubularies or ontologies to define a "schema" or "profile" for metadata. For example, the Dryad metadata profile draws on the Dublin Core, Darwin Core, and Dryad-specific elements.

See also:

DCMI Guidelines for Dublin Core Application Profiles.
Example Dryad Metadata Profile
DCMI Singapore Framework

Trait Dictionary Format (Crop Ontology)

The Crop Ontology curation tool supports import and export of trait information in a trait dictionary format.

See also:

The Crop Ontology Improving the Quality of 18 Crop Trait Dictionaries

Vocabularies and Ontologies

This section reviews related controlled vocabularies, data dictionaries, and ontologies.

Biofuel Ecophysiological Traits and Yields Database (BETYdb)

While BETYdb is not a controlled vocabulary itself, the relational schema models a variety of concepts including managements, sites, treatments, traites, and yields.

The BETYdb “variables” table defines variables used to represent traits in the BETYdb relational model. There has been some effort to standardize variable names by adopting Climate Forecasting (CF) convention standard names where variables overlap. A variable is represented as a name, description, units, as well as min/max values.

For example:

"variable": {
    "created_at": "2016-03-07T11:23:58-06:00",
    "description": "",
    "id": 604,
    "label": "",
    "max": "1000",
    "min": "0",
    "name": "NDVI",
    "notes": "",
    "standard_name": "normalized_difference_vegetation_index",
    "standard_units": "ratio",
    "type": "",
    "units": "ratio",
    "updated_at": "2016-03-07T11:26:07-06:00"
}

See also:

DCMI Metadata terms

Controlled vocabulary for the representation of bibliographic information. See also:

DCMI Terms

Climate and Forecast Standard Name Table

Standard variable names and naming convention for use with NetCDF. The Climate and Forecast metadata conventions are intended to promote sharing of NetCDF files. The CF conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.

Basic conventions include lower-case letters, numbers, underscores, and US spelling.

Information is encoded in the variable name itself. The basic format is (optional components in []):

[surface] [component] standard_name [at surface] [in medium] [due to process] [assuming condition]

For example:

Standard names have optional canonical units, AMIP and GRIB (GRidded Binary) codes.

The CF standard names have been converted to RDF by several communities, including the Marine Metadata Interoperability (MMI) project.

Dimensions: time, lat, lon, other specify time first (unlimited) lat, lon or x, y extent to field boundaries.

See also:

CF Conventions
CF Conventions FAQ mentions RDF conversions.

ICASA master variable list

Vocabulary and naming conventions for agricultural modeling variables, used by AgMIP. The ICASA master variable list is included, at least in part, in the AgrO ontology. The NARDN-HD Core Harmonized Crop Experiment Data is also taken from the ICASA vocabulary.

ICASA variables have a number of fields, including name, description, type, min and max values.

See also:

ICASA Master Variable List
White et al (2013). Integrated Description of Agricultural Field Experiments and Production: The ICASA Version 2.0 Data Standards. Computers and Electronics in Agriculture.

NARDN-HD Core Harmonized Crop Experiment Data

A subset of the ICASA data dictionary representing set of core variables that are commonly collected in field crop experiments. These will be used to harmonize data from USDA experiments as part of a National Agricultural Research Data Network.

CSDMS Standard Names

Variable naming rules and patterns for any domain developed as part of the CSDMS project as an alternative to CF. CSDMS standard names is considered to have a more flexible community approval mechanism than CF. CSDMS names include object, quantity/attribute parts.

CSDMS names have been converted to RDF as part of the Earth Cube Geosemantic Server project.

See also:

CSMDS Standard Names

International Plant Names Index (IPNI)

http://www.ipni.org/

IPNI is a database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes. It's goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names.

NCBI Taxonomy

http://www.ncbi.nlm.nih.gov/taxonomy

A curated classification and nomenclature for all of the organisms in the public sequence databases that represents about 10% of the described species of life on the planet. Taxonomy recommended by MIAPPE.

Ontologies

Agronomy Ontology (AGRO)

The Agronomy Ontology “describes agronomic practices, agronomic techniques, and agronomic variables used in agronomic experiments.” It is intended as a complementary ontology to the Crop Ontology (CO). Variables are selected out of the International Consortium for Agricultural Systems Applications (ICASA) vocabulary and a mapping between AgrO and ICASA is in progress. AgrO is intended to work with the existing ontologies including ENVO, UO, PATO, IAO, and CHEBI. It will be part of an Agronomy Management System and fieldbook modeled on the CGIAR Breeding Management System to capture agronomic data.

See also:

OBO Foundry. Agonomy Ontology
FAO. Crop Ontology: harmonizing semantics for phenotyping and agronomy data
RDA. Interest Group on Agricultural Data (IGAD)

Crop Ontology (CO)

The Crop Ontology (CO) contains "Validated concepts along with their inter-relationships on anatomy, structure and phenotype of crops, on trait measurement and methods as well as on Germplasm with the multi-crop passport terms." The ontology is actively used by the CGIAR community and a central part of the Breeding Management System. MIAPPE recommends the CO (along with TO, PO, PATO, XEML) for observed variables.

Shrestha et al (2012) describe a method for representing trait data via the CO.

See also:

Crop Ontology
Shrestha et al (2012). Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the Crop Ontology developed by the crop communities of practice. Front Physiol. 2012 Aug 25;3:326.

Crop Research Ontology (CRO)

Describes experimental design, environmental conditions and methods associated with the crop study/experiment/trial and their evaluation. CRO is part of the Crop Ontology platform, originally developed for the International Crop Information System (ICIS). CRO is recommended in the MIAPPE standard for general metadata, environment, treatments, and experimental design fields.

See also:

Extensible Observation Ontology (OBOE)

Cited in Kattge et al (2011) as an example of an ontology used in ecology and environmental sciences to represent measurements and observation. However, the CRO may be better suited for TERRA-REF.

See also:

Kattge, J.(2011). A generic structure for plant trait databases

Gene Ontology (GO)

Defines concepts/classes used to describe gene function, and relationships between these concepts. GO is a widely-adopted ontology in genetics research, supported by databases such as GEO. This ontology is cited in Krajewski et al (2015) and might be relevant for the TERRA genomics pipeline.

See also:

Gene Ontology
Krajewski et al (2015). Towards recommendations for metadata and data handling in plant phenotyping. Journal of Experimental Botany, 66(18), 5417–5427.

Information Artifact Ontology (IAO)

Information entities, originally driven by work by OBI (e.g., abstract, author, citation, document etc). IAO covers similar territory to the Dublin Core vocabulary.

Ontology for Biomedical Investigations (OBI)

Integrated ontology for the description of biological and clinical investigations. This includes a set of 'universal' terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain. Recommended by MIAPPE for general metadata, timing and location, and experimental design.

See also:

Minimum Information about a Plant Phenotyping Experiment

Phenotype and Attribute Ontology (PATO)

Phenotypic qualities (properties).

Recommended in MAIPPE for use in the observed values field.

See also:

Minimum Information about a Plant Phenotyping Experiment

Plant Environment Ontology (EO)

Part of the Plant Ontology (PO), standardized controlled vocabularies to describe various types of treatments given to an individual plant / a population or a cultured tissue and/or cell type sample to evaluate the response on its exposure.

Plant Ontology (PO)

Describes plant anatomy and morphology and stages of development for all plants intended to create a framework for meaningful cross-species queries across gene expression and phenotype data sets from plant genomics and genetics experiment. Recommended by MIAPPE for observed values fields. Along with EO, GO, and TO make up the Gramene database. Links plant anatomy, morphology and growth and development to plant genomics data.

See also:

Minimum Information about a Plant Phenotyping Experiment

Plant Trait Ontology (TO)

Along with EO, GO, and PO, make up the Gramene database to link plant anatomy, morphology and growth and development to plant genomics data. Recommended by MIAPPE for observed values fields.

Example trait entry:

[Term]
id: TO:0000019
name: seedling height
def: "Average height measurements of 10 seedlings, in centimeters from the base of the shoot to the tip of the tallest leaf blade." [IRRI:SES]
synonym: "SH" RELATED []
is_a: TO:0000207 ! plant height

See also:

Minimum Information about a Plant Phenotyping Experiment

Statistics Ontology (STATO)

General purpose statistics ontology coveraging processes such as statistical tests, their conditions of application, and information needed or resulting from statistical methods, such as probability distributions, variables, spread and variation metrics. Recommended by MIAPPE for experimental design.

See also:

Minimum Information about a Plant Phenotyping Experiment

Units of Measurement Ontology (UO)

Metric units for PATO. This OBO ontology defines a set of prefixes (giga, hecto, kilo, etc) and units (area/square meter, volume/liter, rate/count per second, temperature/degree Fahrenheit). The two top-level classes are prefixes and units.

UO is mentioned in relation to the Agronomy Ontology (AGRO), but PATO is also recommended by MIAPPE for observed values fields

While there are general standard units, it seems unlikely that these would ever be gathered in a single place. It seems more useful to define a high-level ontology to represent a "unit" and allow domains and communities to publish their own authoritative lists.

XEML Environment Ontology (XEO)

Created to help plant scientists in documenting and sharing metadata describing the abiotic environment.

DDI-RDF Discovery Vocabulary

Data Catalog Vocabulary (DCAT)

The Data Catalog Vocabulary is an RDF vocabulary intended to facilitate interoperability between data catalogs published on the Web. DCAT defines a set of classes including Dataset, Catalog, CatalogRecord, and Distribution.

Data Cite Ontology

The DataCite Ontology

Data Cube Vocabulary

The Data Cube Vocabulary is an RDF-based model for publishing multi-dimentional datasets, based in part on the SDMX guidelines. DataCube defines a set of classes including DataSet, Observation, and MeasureProperty that may be relevant to the TERRA project.

Statistical Data and Metadata Exchange (SDMX)

SDMX is an international initiative for the standarization of the exchange of statistical data and metadata among international organizations. Sponsors of the initiative include Eurostat, European Central Bank, the OECD, World Bank and the UN Statistical Division. They have defined a framework and an exchange format, SDMX-ML, for data exchange. Community members have also developed RDF encodings of the SDMX guidelines that are heavily referenced in the Data Cube vocabulary examples.

Standard formats, ontologies, and controlled vocabularies are typically used in the context of specific software systems.

Agricultural Model Inter-Comparison and Improvement Project (AgMIP) Crop Experiment (ACE) Database

AgMIP "seeks to improve the capability of ecophysiological and economic models to describe the potential impacts of climate change on agricultural systems. AgMIP protocols emphasize the use of multiple models; consequently, data harmonization is essential. This interoperability was achieved by establishing a data exchange mechanism with variables defined in accordance with international standards; implementing a flexibly structured data schema to store experimental data; and designing a method to fill gaps in model-required input data."

The data exchange format is based on a JSON rendering of the ICASA Master Variable List. Data are transfer into and out of the AgMIP Crop Experiment (ACE) and AgMIP Crop Model (ACMO) databases via REST apis using these JSON objects.

See also

AgMIP Crop Expirement Database
Porter et al (2014). Harmonization and translation of crop modeling data to ensure interoperability. Environmental Modelling and Software. 62:495-508.
AgMIP Data Products presentation
AgMIP on Github
AgMIP Crop Experiment Database data variables
AgMIP API
AgMIP using ICASA standards

Biofuel Ecophysiological Traits and Yields Database (BETYdb)

BETYdb is used to store TERRA meta-data, provenance, and traits information.

BETYdb traits are available as web-page, csv, json, xml. This can be extended to allow spatial, temporal, and taxonomic / genomic queries. Trait vectors can be queries and rendered in several output formats. For example:

Here are some examples from betydb.org.

A separate instance of BETYdb is maintained for use by TERRA Ref at terraref.ncsa.illinois.edu.org/bety. The scope of the TERRA Ref database is limited to high througput phenotyping data and metadata produced and used by the TERRA program. Users can set up their own instances of BETYdb and import any public data in the distributed BETYdb network.

See also: BETYdb documentation

BETYdb Data Access includes accessing data with web interface, API, and R traits package
BETYdb constraints, see section "uniqueness constraints"
BETYdb Data Entry

Gramene

Gramene is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species

Integrated Breeding Platform/Breeding Management System

System for managing the breeding process including lists of germplasms, defining crosses, managing nurseries, trials, as well as ontologies and statistical analysis.

See also:

BMS Site

TERRA Ref has an instance of BMS hosted by CyVerse (requires login).

International Crop Information System

ICIS is "a database system that provides integrated management of global information on crop improvement and management both for individual crops and for farming systems." ICIS is developed by Consultative Group for International Agricultural Research (CGIAR).

See also

Fox and Skovmand (1996). "The International Crop Information System (ICIS) - connects genebank to breeder to farmer’s field." Plant adaptation and crop improvement, CAB International.

MODAPS NASA MODIS Satellite data

The MODAPS NASA MODIS Satellite data encompasses a library of functions that provides programmatic data access and processing services to MODIS Level 1 and Atmosphere data products. These routines enable both SOAP and REST based web service calls against the data archives maintained by MODAPS. These routines mirror existing LAADS Web services.

See also:

NDISC Modis Data Summaries

Phenomics Ontology Driven Database (PODD)

http://www.plantphenomics.org.au/projects/podd/ Online repository for storage and retrieval of raw and analyzed data from Australian Plant Phenomics Facility (APPF) phenotyping platforms. PODD is based on Fedora Commons repository software with data and metadata modeled using OWL/RDFS.

See also:

PODD Project Site

Plant Breeders API

Specifies a standard interface for plant phenotype/genotype databases to serve data for use in crop breeding applications. This is the API used by FieldBook, which allows users to turn spreadsheets into databases. Examples indicate that the responses will include values linked to the Crop Ontology, for example:

https://github.com/plantbreeding/API/blob/master/Specification/Traits/ListAllTraits.md

However, in general the BRAPI returned JSON data without linking context (i.e., not JSON-LD), so it is in essence it’s own data structure.

Other notes:

The Breeding Management System (BMS) group has implemented a few features to make it compatible with Field Book in its current state without the use of API.
BMS and the Genomic & Open-source Breeding Informatics Initiative (GOBII) are both pushing for the API and plan on implementing it when it's complete.
Read news about the BMS Breeding Management System Standalone Server and genomes2fields migrating to BMS

See also

Plant Breeding API

Plant Genomics and Phenomics Research Data Repository (PGP)

German repository for plant research data including image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents.

See also:

Arend et al (2016). PGP repository: a plant phenomics and genomics data publication infrastructure. Database.
PGP Repository

USDA Plants

“The PLANTS Database provides standardized information about the vascular plants, mosses, liverworts, hornworts, and lichens of the U.S. and its territories. It includes names, plant symbols, checklists, distributional data, species abstracts, characteristics, images, crop information, automated tools, onward Web links, and references.”

See also

USDA Plants Website

USDA Quick Stats

Web based application supports querying the agricultural census and survey statistics. Also available via API.

See also

USDA Quick Stats Website

transPLANT

Infrastructure to support computational analysis of genomic data from crop and model plants. This includes the large-scale analysis of genotype-phenotype associations, a common set of reference plant genomic data, archiving genomic variation, and a search engine integrating reference bioinformatics databases and physical genetic materials. See also

transPlant Website

Sensor Data

Meteorological data

Proposed format for meteorological variables exported from Lemnatec platform

Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) data formats

One implementation of CF for ecosystem model driver (met, soil) and output (mass, energy dynamics)
- Standardized Met driver data
- Terrestrial Ecosystem Model output

Date-Time:

YYYY-MM-DD hh:mm:ssZ: based on ISO 8601 . Optional offset for local time; precision determined by data (e.g. could be YYYY-MM-DD and decimals specified by a period.

Agronomic and Phenotype Data Standards

Current Practice

In TERRA-REF v0 release, agronomic and phenotype data is stored and exchanged using the BETYdb API. Agronomic data is stored in the sites, managements, and treatments tables. Phenotype data is stored in the traits, variables, and methods tables. Data is ingested and accessed via the BETYdb API formats.

Standardization Efforts

In cooperation with participants from AgMIP, the Crop Ontology, and Agronomy Ontology groups, the TERRA-REF team is pursuing the development of a format to facilitate the exchange of data across systems based on the ICASA Vocabulary and AgMIP JSON Data Objects. An initial draft of this format is available for comment on Github.

In addition, we plan to enable the TERRA-REF databases to import and export data via the Plant Breeding API (BRAPI).

Genomic Data Standards

Overview

Genomic data have reached a high level of standardization in the scientific community. Today, all high-impact journals typically ask the author to deposit their genomic data in either or both of these databases before publication.

Below are the most widely accepted formats that are relevant to the data and analyses generated in TERRA-REF.

Raw reads + quality scores

Raw reads + quality scores are stored in FASTQ format. FASTQ files can be manipulated for QC with FASTX-Toolkit

Reference genome assembly

Reference genome assembly (for alignment of reads or BLAST) is in FASTA format. FASTA files generally need indexing and formatting that can be done by aligners, BLAST, or other applications that provide built-in commands for this purpose.

Sequence alignment

Sequence alignments are in BAM format – in addition to the nucleotide sequence, the BAM format contains fields to describe mapping and read quality. BAM files are binary files but can be visualized with IGV. If needed, BAM can be converted in SAM (text file) with SAMtools

BAM is the preferred format for sra database (sequence read archive).

SNP and genotype variants

SNP and genotype variants are in VCF format. VCF contains all information about read mapping and SNP and genotype calling quality. VCF files are typically manipulated with vcftools

VCF format is also the format required by dbSNP, the largest public repository all SNPs.

Genomic coordinates

Genomic coordinates are given in a BED format – gives the start and end positions of a feature in the genome (for single nucleotides, start = end). BED files can be edited with bedtools.

Sensor Data Standards

Current Practice

In the TERRA-REF release, sensor metadata is generally stored and exchanged using formats defined by LemnaTec. Sensor metadata is stored in metadata.json files for each dataset. This information is ingested into Clowder and available via the "Metadata" tab metadata.jsonld API endpoint.

Manufacturer information about devices and sensors are available via Clowder in the Devices and Sensors Information collection. This collection includes datasets representing each sensor or calibration target containing specifications\/datasheets, calibration certificates, and associated reference data.

Fixed metadata

Authoritative fixed sensor metadata is available for each of the sensor datasets. This has been extended to include factory calibrated spectral response and relative spectral response information. For more information, please see the sensor-metadata repository on Github.

Runtime metadata

Runtime metadata for each sensor run is stored in the metadata.json files in each sensor output directory.

Reference data

Additional reference data is available for some sensors:

Factory calibration data for the LabSphere and SphereOptics calibration targets.
Relative spectral response (RSR) information for sensors
Calibration data for the environmental logger
Dark\/white reference data for the SWIR and VNIR sensors.

Standardization Efforts

The TERRA-REF team is currently investigating available standards for the representation of sensor information. Preliminary work has been done using OGC SensorML vocabularies in a custom JSON-LD context. For more information, please see the sensor-metadata repository on Github.

Data Standards Committee

The Standards Committee is responsible for defining and advising the development of data products and access protocols for the ARPA-E TERRA program. The committee consists of twelve core participants: one representative from each of the six funded projects and six independent experts. The committee will meet virtually each month and in person each year to discuss, develop, and revise data products, interfaces, and computing infrastructure.

Roles and responsibilities

TERRA Project Standards Committee representatives are expected to represent the interests of their TERRA team, their research community, and the institutions for which they work. External participants were chosen to represent specific areas of expertise and will provide feedback and guidance to help make the TERRA platform interoperable with existing and emerging sensing, informatics, and computing platforms.

Specific duties

Participate in monthly to quarterly teleconferences with the committee.
Provide expert advice.
Provide feedback from other intersted parties.
Participate in, or send delegate to, annual two-day workshops.

Annual Meetings

If we can efficiently agree on and adopt conventions, we will have more flexibility to use these workshops to train researchers, remove obstacles, and identify opportunities. This will be an opportunity for researchers to work with developers at NCSA and from the broader TERRA informatics and computing teams to identify what works, prioritize features, and move forward on research questions that require advanced computing.

Project Timeline

August 2015: Establish committee, form a data plan
January 2016: v0 file standards
January 2017: v1 file standards, sample data sets
January 2018: mock data cube generator, standardized data products, simulated data
January 2019: standardized data products, simulated data

Data Standards Participants

TERRA Project Representatives (6)
ARPA-E Program Representatives (2)
Board of External Advisors (6)

(numbers in parentheses are targets, for which we have funding)

People

Name

Institution

Coordinators

David Lee

ARPA-E

david.lee2_at_hq.doe.gov

David LeBauer

UIUC / NCSA

dlebauer_at_illinois.edu

TERRA Project Representatives

Paul Bartlett

Near Earth Autonomy

paul_at_nearearthautonomy.com

Jeff White

USDA ALARC

Jeffrey.White_at_ars.usda.gov

Melba Crawford

Purdue

melbac_at_purdue.edu

Mike Gore

Cornell

mag87_at_cornell.edu

Matt Colgan

Blue River

matt.c_at_bluerivert.com

Christer Janssen

Pacific Northwest National Laboratory

georg.jansson_at_pnnl.gov

Barnabas Poczos

Carnegie Mellon

bapoczos_at_cs.cmu.edu

Alex Thomasson

Texas A&M University

thomasson_at_tamu.edu

External Advisors

Cheryl Porter

ICASA / AgMIP / USDA

Shawn Serbin

Brookhaven National Lab

sserbin_at_bnl.gov

Shelly Petroy

NEON

spetroy_at_neoninc.org

Christine Laney

NEON

claney_at_neoninc.org

Carolyn J. Lawrence-Dill

Iowa State

triffid_at_iastate.edu

Eric Lyons

University of Arizona / iPlant

ericlyons_at_email.arizona.edu

Data Standards

Overview

See also

Existing Data Standards

Metadata standards

International Consortium for Agricultural Systems Applications (ICASA)

Minimum Information About a Plant Phenotyping Experiment (MIAPPE)

Dublin Core Application Profiles

Trait Dictionary Format (Crop Ontology)

Vocabularies and Ontologies

Biofuel Ecophysiological Traits and Yields Database (BETYdb)

DCMI Metadata terms

Climate and Forecast Standard Name Table

ICASA master variable list

NARDN-HD Core Harmonized Crop Experiment Data

CSDMS Standard Names

International Plant Names Index (IPNI)

NCBI Taxonomy

Ontologies

Agronomy Ontology (AGRO)

Crop Ontology (CO)

Crop Research Ontology (CRO)

Extensible Observation Ontology (OBOE)

Gene Ontology (GO)

Information Artifact Ontology (IAO)

Ontology for Biomedical Investigations (OBI)

Phenotype and Attribute Ontology (PATO)

Plant Environment Ontology (EO)

Plant Ontology (PO)

Plant Trait Ontology (TO)

Statistics Ontology (STATO)

Units of Measurement Ontology (UO)

XEML Environment Ontology (XEO)

DDI-RDF Discovery Vocabulary

Data Catalog Vocabulary (DCAT)

Data Cite Ontology

Data Cube Vocabulary

Statistical Data and Metadata Exchange (SDMX)

Related Software, Services, and Databases

Agricultural Model Inter-Comparison and Improvement Project (AgMIP) Crop Experiment (ACE) Database

Biofuel Ecophysiological Traits and Yields Database (BETYdb)

Gramene

Integrated Breeding Platform/Breeding Management System

International Crop Information System

MODAPS NASA MODIS Satellite data

Phenomics Ontology Driven Database (PODD)

Plant Breeders API

Plant Genomics and Phenomics Research Data Repository (PGP)

USDA Plants

USDA Quick Stats

transPLANT

Sensor Data

Meteorological data

Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) data formats

Date-Time:

Agronomic and Phenotype Data Standards

Current Practice

Standardization Efforts

Genomic Data Standards

Overview

Raw reads + quality scores

Reference genome assembly

Sequence alignment

SNP and genotype variants

Genomic coordinates

See Also

Sensor Data Standards

Current Practice

Standardization Efforts

Data Standards Committee

Roles and responsibilities

Specific duties

Annual Meetings

Project Timeline

Data Standards Participants

People