TERRA-REF Documentation
WebsiteGitHubTutorials
revisions
revisions
  • Introduction
  • Data Sources
  • Software
  • Scientific Objectives and Experimental Design
    • Protocols
      • Controlled Environment Protocols
      • Manual Field Data Protocols
      • Phenotractor Protocols
      • Sensor Calibration
      • Template Protocol
      • UAV Protocols
    • Experimental Design
      • Experimental Design Danforth
        • Sorghum Lines Danforth
      • Experimental Design Genomics
        • Sorghum Lines Genomics Year 1
        • Sorghum Lines Genomics Year 1 (continued)
        • Sorghum Lines Genomics Year 2
      • Experimental Design MAC
  • User Manual
    • What Data is Available
    • Data Products
      • Environmental conditions
      • Fluorescence intensity imaging
      • Genomics data
      • Geospatial information
      • Hyperspectral imaging data
      • Infrared heat imaging data
      • Multispectral imaging data
      • Meteorological data
      • Phenotype data
      • Point Cloud Data
    • How to Access Data
      • Using Clowder (Sensor and Genoomics data)
      • Using Globus (Sensor and Genomics data)
      • Using BETYdb (trait data, experimental metadata)
        • Accessing BETYdb via ArcMap and other GIS software
      • Using CoGe (Genomics)
      • Using CyVerse (Genomics)
      • Using Analysis Workbench (all data)
    • Data Use Policy
    • Manuscripts and Authorship Guidelines
    • Release / reprocessing schedule
  • Technical Documentation
    • Data Standards
      • Existing Data Standards
      • Agronomic and Phenotype Data Standards
      • Genomic Data Standards
      • Sensor Data Standards
      • Data Standards Committee
    • Directory Structure
    • Data Storage
    • Data Transfer
    • Data Processing Pipeline
      • Geospatial Time Series Structure
    • Data Backup
    • Data Collection
    • Data Product Creation
      • Genomic Data
      • Hyperspectral Data
    • Quality Assurance and Quality Control
    • Systems Configuration
  • Developer Manual
    • Submitting data to Clowder
    • Submitting data to BETYdb
    • Submitting Data to CoGe
    • Developing Clowder Extractors
  • Tutorials
  • Appendix
    • Code of Conduct
    • Collaboration Tools
    • Glossary
Powered by GitBook
On this page
  • Maricopa Agricultural Center, Arizona
  • Automated controlled-environment phenotyping, Missouri
  • Kansas State University
  • HudsonAlpha - Genomics
Export as PDF
  1. Technical Documentation

Data Transfer

PreviousData StorageNextData Processing Pipeline

Last updated 7 years ago

 #Data Transfer

Maricopa Agricultural Center, Arizona

Environmental Sensors

Transferring ima

Data is sent to the gantry-cache server located inside the main UA-MAC building's telecom room via FTP over a private 10GbE interface. Path to each file being transferred is logged to /var/log/xferlog. Docker container running on the gantry-cache reads through this log file, tracking the last line it has read and scans the file regularly looking for more lines. File paths are scraped from the log and are bundled into groups of 500 to be transferred to the Spectrum Scale file systems that backs the ROGER cluster at NCSA via the Globus Python API. The log file is rolled daily and compressed to keep size in check. Sensor directories on the gantry-cache are white listed for being monitored to prevent accidental or junk data from being ingested into the Clowder pipeline.

A Docker container in the terra-clowder VM running in ROGER's Openstack environment gets pinged about incoming transfers and watches for when they complete, once completed the same files are queued to be ingested into Clowder.

Once files have been successfully received by the ROGER Globus endpoint, the files are then removed from the gantry-cache server by the Docker container running on the gantry-cache server. A clean up script walks the gantry-cache daily looking for files older than two days that have not been transferred and queues any if found.

Automated controlled-environment phenotyping, Missouri

Transferring images

Processes at Danforth monitor the database repository where images captured from the Scanalyzer are stored. After initial processing, files are transferred to NCSA servers for additional metadata extraction, indexing and storage.

At the start of the transfer process, metadata collected and derived during Danforth's initial processing will be pushed.

The current "beta" Python script can be viewed . During transfer tests of data from Danforth's sorghum pilot experiment, 2,725 snapshots containing 10 images each were uploaded in 775 minutes (3.5 snapshots\/minute).

Transfer volumes

The Danforth Center transfers approximately X GB of data to NCSA per week.

Kansas State University

HudsonAlpha - Genomics

Log of files transfered from Arizona to NCSA
on GitHub