TERRA-REF Documentation
WebsiteGitHubTutorials
Primary version
Primary version
  • Introduction
  • Scientific Objectives
  • Experimental Design
    • The Maricopa Agricultural Center (MAC)
    • Controlled Environment Phenotyping
    • Genomics
  • Data
    • How to Access Data
    • Data Products
      • Environmental conditions
      • Phenotype Data
      • Genomics data
      • Fluorescence intensity imaging
      • Geospatial information
      • Hyperspectral imaging data
      • Infrared heat imaging data
      • Meteorological data
      • Point Cloud Data
      • Controlled Environment phenotype data
    • Data Use Policy
    • Manuscripts and Authorship Guidelines
  • Protocols
    • Field Scanner
    • Sensor Calibration
    • Hyperspectral Data
    • Controlled Environment Protocols
    • Manual Field Data Protocols
    • Phenotractor Protocols
    • UAV Protocols
    • Genomic Protocols
  • Technical Documentation
    • Software
    • Data Standards
      • Existing Data Standards
      • Agronomic and Phenotype Data Standards
      • Genomic Data Standards
      • Sensor Data Standards
      • Data Standards Committee
    • Data Product Levels
    • Directory Structure
    • Data Transfer
    • Data Processing Pipeline
    • Time Series Data in Geostreams
    • Data Backup
    • Systems Configuration
  • Code of Conduct
  • Appendix
    • Glossary
    • Accessing BETYdb with GIS Software
  • References
  • Archived Documentation
    • Developer Manual
      • Submitting data to Clowder
      • Submitting data to BETYdb
      • Submitting Data to CoGe
      • Developing Clowder Extractors
Powered by GitBook
On this page
  • Raw data
  • BETYdb
  • See Also
Export as PDF
  1. Technical Documentation

Data Backup

PreviousTime Series Data in GeostreamsNextSystems Configuration

Last updated 5 years ago

Raw data

Script uses the Spectrum Scale policy engine to find all files that were modified the day prior, and passes that list to a job in the batch system. The job bundles the files into a .tar file, then uses pigz to compress it in parallel across 18 threads. Since this script is run as a job in the batch system, with variables passed with the date, if the batch system is busy, the backups won't need to preclude each other. The .tgz files are then sent over to NCSA Nearline using Globus, then purged from file system.

BETYdb

Runs every night at 23:59. .

This script creates a daily backup every day of the month. On Sundays creates a weekly backup, on the last day of the month it creates a monthly backup and at the last day of the year it will create a yearly backup. This script overwrite existing backups, for example every 1st of the month it will create a backup called bety-d-1 that contains the backup of the 1st of the month. See the script for the rest of the file names.

These backups are copied using crashplan to a central location and should allow recovery in case of a catastrophic failure.

See Also

  • Description of Blue Water's nearline storage system

  • Github issues:

View the script
https://bluewaters.ncsa.illinois.edu/data
https://github.com/terraref/computing-pipeline/issues/87
https://github.com/terraref/computing-pipeline/issues/384