TERRA-REF Documentation
WebsiteGitHubTutorials
Primary version
Primary version
  • Introduction
  • Scientific Objectives
  • Experimental Design
    • The Maricopa Agricultural Center (MAC)
    • Controlled Environment Phenotyping
    • Genomics
  • Data
    • How to Access Data
    • Data Products
      • Environmental conditions
      • Phenotype Data
      • Genomics data
      • Fluorescence intensity imaging
      • Geospatial information
      • Hyperspectral imaging data
      • Infrared heat imaging data
      • Meteorological data
      • Point Cloud Data
      • Controlled Environment phenotype data
    • Data Use Policy
    • Manuscripts and Authorship Guidelines
  • Protocols
    • Field Scanner
    • Sensor Calibration
    • Hyperspectral Data
    • Controlled Environment Protocols
    • Manual Field Data Protocols
    • Phenotractor Protocols
    • UAV Protocols
    • Genomic Protocols
  • Technical Documentation
    • Software
    • Data Standards
      • Existing Data Standards
      • Agronomic and Phenotype Data Standards
      • Genomic Data Standards
      • Sensor Data Standards
      • Data Standards Committee
    • Data Product Levels
    • Directory Structure
    • Data Transfer
    • Data Processing Pipeline
    • Time Series Data in Geostreams
    • Data Backup
    • Systems Configuration
  • Code of Conduct
  • Appendix
    • Glossary
    • Accessing BETYdb with GIS Software
  • References
  • Archived Documentation
    • Developer Manual
      • Submitting data to Clowder
      • Submitting data to BETYdb
      • Submitting Data to CoGe
      • Developing Clowder Extractors
Powered by GitBook
On this page
  • Overview
  • Raw reads + quality scores
  • Reference genome assembly
  • Sequence alignment
  • SNP and genotype variants
  • Genomic coordinates
  • See Also
Export as PDF
  1. Technical Documentation
  2. Data Standards

Genomic Data Standards

PreviousAgronomic and Phenotype Data StandardsNextSensor Data Standards

Last updated 5 years ago

Overview

Genomic data have reached a high level of standardization in the scientific community. Today, all high-impact journals typically ask the author to deposit their genomic data in either or both of these databases before publication.

Below are the most widely accepted formats that are relevant to the data and analyses generated in TERRA-REF.

Raw reads + quality scores

Raw reads + quality scores are stored in . FASTQ files can be manipulated for QC with

Reference genome assembly

Reference genome assembly (for alignment of reads or BLAST) is in . FASTA files generally need indexing and formatting that can be done by aligners, BLAST, or other applications that provide built-in commands for this purpose.

Sequence alignment

Sequence alignments are in BAM format – in addition to the nucleotide sequence, the BAM format contains fields to describe mapping and read quality. BAM files are binary files but can be visualized with . If needed, BAM can be converted in SAM (text file) with

BAM is the preferred format for sra database (sequence read archive).

SNP and genotype variants

SNP and genotype variants are in . VCF contains all information about read mapping and SNP and genotype calling quality. VCF files are typically manipulated with

VCF format is also the format required by dbSNP, the largest public repository all SNPs.

Genomic coordinates

See Also

Genomic coordinates are given in a BED format – gives the start and end positions of a feature in the genome (for single nucleotides, start = end). can be edited with .

FASTQ format
FASTX-Toolkit
FASTA format
IGV
SAMtools
VCF format
vcftools
BED files
bedtools
Genomics Data Pipeline
Genomics Data Products