TERRA-REF Documentation
WebsiteGitHubTutorials
revisions
revisions
  • Introduction
  • Data Sources
  • Software
  • Scientific Objectives and Experimental Design
    • Protocols
      • Controlled Environment Protocols
      • Manual Field Data Protocols
      • Phenotractor Protocols
      • Sensor Calibration
      • Template Protocol
      • UAV Protocols
    • Experimental Design
      • Experimental Design Danforth
        • Sorghum Lines Danforth
      • Experimental Design Genomics
        • Sorghum Lines Genomics Year 1
        • Sorghum Lines Genomics Year 1 (continued)
        • Sorghum Lines Genomics Year 2
      • Experimental Design MAC
  • User Manual
    • What Data is Available
    • Data Products
      • Environmental conditions
      • Fluorescence intensity imaging
      • Genomics data
      • Geospatial information
      • Hyperspectral imaging data
      • Infrared heat imaging data
      • Multispectral imaging data
      • Meteorological data
      • Phenotype data
      • Point Cloud Data
    • How to Access Data
      • Using Clowder (Sensor and Genoomics data)
      • Using Globus (Sensor and Genomics data)
      • Using BETYdb (trait data, experimental metadata)
        • Accessing BETYdb via ArcMap and other GIS software
      • Using CoGe (Genomics)
      • Using CyVerse (Genomics)
      • Using Analysis Workbench (all data)
    • Data Use Policy
    • Manuscripts and Authorship Guidelines
    • Release / reprocessing schedule
  • Technical Documentation
    • Data Standards
      • Existing Data Standards
      • Agronomic and Phenotype Data Standards
      • Genomic Data Standards
      • Sensor Data Standards
      • Data Standards Committee
    • Directory Structure
    • Data Storage
    • Data Transfer
    • Data Processing Pipeline
      • Geospatial Time Series Structure
    • Data Backup
    • Data Collection
    • Data Product Creation
      • Genomic Data
      • Hyperspectral Data
    • Quality Assurance and Quality Control
    • Systems Configuration
  • Developer Manual
    • Submitting data to Clowder
    • Submitting data to BETYdb
    • Submitting Data to CoGe
    • Developing Clowder Extractors
  • Tutorials
  • Appendix
    • Code of Conduct
    • Collaboration Tools
    • Glossary
Powered by GitBook
On this page
  • Overview
  • Raw reads + quality scores
  • Reference genome assembly
  • Sequence alignment
  • SNP and genotype variants
  • Genomic coordinates
  • See Also
Export as PDF
  1. Technical Documentation
  2. Data Standards

Genomic Data Standards

PreviousAgronomic and Phenotype Data StandardsNextSensor Data Standards

Last updated 7 years ago

Overview

Genomic data have reached a high level of standardization in the scientific community. Today, all high-impact journals typically ask the author to deposit their genomic data in either or both of these databases before publication.

Below are the most widely accepted formats that are relevant to the data and analyses generated in TERRA-REF.

Raw reads + quality scores

Raw reads + quality scores are stored in . FASTQ files can be manipulated for QC with

Reference genome assembly

Reference genome assembly (for alignment of reads or BLAST) is in . FASTA files generally need indexing and formatting that can be done by aligners, BLAST, or other applications that provide built-in commands for this purpose.

Sequence alignment

Sequence alignments are in BAM format – in addition to the nucleotide sequence, the BAM format contains fields to describe mapping and read quality. BAM files are binary files but can be visualized with . If needed, BAM can be converted in SAM (text file) with

BAM is the preferred format for sra database (sequence read archive).

SNP and genotype variants

SNP and genotype variants are in . VCF contains all information about read mapping and SNP and genotype calling quality. VCF files are typically manipulated with

VCF format is also the format required by dbSNP, the largest public repository all SNPs.

Genomic coordinates

See Also

Genomic coordinates are given in a BED format – gives the start and end positions of a feature in the genome (for single nucleotides, start = end). can be edited with .

FASTQ format
FASTX-Toolkit
FASTA format
IGV
SAMtools
VCF format
vcftools
BED files
bedtools
Genomics Data Pipeline
Genomics Data Products