Preserve

Overview

Researchers should ensure that all research data, regardless of format, is stored securely and backed up or copied regularly. During and after a research project, it is the responsibility of the PI, via a sound Data Management Plan, to specify what data need to be archived and preserved and where they should be preserved.

Sustainable Data Formats

Overview

The file format in which data are stored and archived is a primary factor in the ability to use data in the future. As the custodian of the primary data, the researher should adopt an orderly system of data organization and should communicate the chosen system to all members of a research group and to the appropriate administrative personnel, where appropriate or applicable.
File formats and file naming according to standards are necessary to ensure that data can be uniquely identified and made accessible for future uses.

When selecting tools for storing your data and preparing it for archiving, pay special attention to the output formats of your data. Data stored in a proprietary or obsolete format may be unusable to other researchers.


Accessible Formats

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Preferred Formats (general)

  • PDF/A – not Microsoft Word
  • ASCII – not Microsoft Excel
  • MPEG-4 – not Quicktime
  • TIFF or JPEG2000 – not GIF or JPG
  • XML or RDF – not RDBMS

Preferred Formats (detailed)

Below are tables that list differnt data types and the preferred formats for long-term preservation of the data. Other acceptable formats are listed but these may or may not ensure the long-term preservation of the data. Not all of these formats are accepted in the Mines institutional repository.


Digital Audio Data

Preferred FormatsOther Acceptable Formats
  • Free Lossless Audio Codec (FLAC) (.flac)

  • Waveform Audio Format (WAV) (.wav)

  • MPEG-1 Audio Layer 3 (.mp3) - spoken word audio only

  • MPEG-1 Audio Layer 3 (.mp3)

  • Audio Interchange File Format (AIFF) (.aif) 

Digital Image Data

Preferred FormatsOther Acceptable Formats
  • TIFF version 6 uncompressed (.tif)

Viewers: OMERO for conversion, viewing and metadata for biological microscope slides and other TIFF files.


  • JPEG (.jpeg, .jpg) but only if created in this format

  • TIFF (other versions)(.tif, .tiff)

  • JPEG 2000 (.jp2, .jpm)

  • Adobe Portable Document Format (PDF/A, PDF) (.pdf)

  • Photoshop files (.psd)

  • Standard applicable RAW image (.raw)

Digital Video Data

Preferred FormatsOther Acceptable Formats
  • MPEG-4 High Profile (.mp4)

  • Motion JPEG 2000 (.mj2)

  • JPEG 2000 (.jp2, .jpm)

Chemistry Data:

      (spectroscopy data; plots with contours, peak position and intensity)

Preferred Formats

Convert NMR, IR, Raman, UV, Mass Spectrometry files to JCAMP format for ease in sharing.



JCAMP file viewers: JSpecView, ChemDoodle


Geospatial Data:

      (vector and raster)

Preferred FormatsOther Acceptable Formats
  • ESRI Shapefile (.shp,.shx, .dbf; optional -- .prj, .sbx, .sbn)

  • geo-referenced TIFF (.tif, .tfw)

  • CAD data (.dwg)

  • tabular GIS attribute data

  • Keyhole Mark-up Language (KML) (.kml)

  • ESRI Geodatabase format (.mdb)

  • MapInfo Interchange Format (.mif) for vector data

  • Adobe Illustrator (.ai)

  • CAD data (.dxf, .svg)

  • Binary formats of GIS and CAD packages

     

Qualitative Data:

      (textual)

Preferred FormatsOther Acceptable Formats
  • eXtensible Mark-up Language (XML) text

    according to an appropriate Document Type Definition (DTD) or schema (.xsd)

  • Rich Text Format (.rtf)

  • plain text data, UTF-8 (unicode) (.txt)

  • plain text data, ASCII (.txt)

  • Hypertext Mark-up Language (HTML) (.html)

  • widely-used proprietary formats, e.g. MS Word (.doc/.docx)

  • LaTeX (.tex)

Quantitative Data:

      (tabular data with extensive metadata)
In this case, the table contains the matrix of data plus metadata that has labels for variables, code labels and defined missing values.

Preferred FormatsOther Acceptable Formats
  • SPSS portable format (.por)

  • delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information

  • structured text or mark-up file containing metadata information, e.g. DDI XML file

  • delimited text of given character set (only characters not present in the data should be used as delimiters (.txt)

  • MS Access (.mdb/.accdb),

  • proprietary formats of statistical packages (e.g. SPSS .sav or Stata .dta)

Quantitative Data:

      (tabular data with minimal metadata>
In this case, the table contains the matrix of data but may or may not have column headings or variable names and probably no other metadata or labeling.

Preferred FormatsOther Acceptable Formats
  • comma-separated values (CSV) file (.csv)

  • tab-delimited file (.tab) including delimited text of a given character set with SQL data definition statements where appropriate

  • eXtensible Mark-up Language (.xml) according to appropriate Document Type Definition (DTD) or schema (.xsd)

  • Rich Text Format (.rtf)

  • Plain text data, ASCII (.txt)

  • delimited text of given character set (only characters not present in the data should be used as delimiters (.txt)

  • MS Word (.doc/.docx)

  • MS Access (.mdb/.accdb),

  • MS Excel (.xsl/.xlsl)

  • OpenDocument Spreadsheet (.ods)

  • dBase (.dbf)

Scripts and Computer Code

Preferred Formats
Work directly with Research Support Services for latest information

Documentation

Preferred FormatsOther Acceptable Formats
  • Open Document Text (.odt)

  • Rich Text Format (.rtf)

  • HTML (.htm, .html)

  • PDF/A or PDF (.pdf)

  • plain text (.txt)

  • widely-used proprietary formats, e.g. MS Word (.doc/.docx) or MS Excel (.xls/ .xlsx)

  • eXtensible Mark-up Language (.xml) according to appropriate Document Type Definition (DTD) or schema (.xsd)


Above tables adapted from:

Physical Samples

Overview

Under the NSF research results dissemination and sharing guidelines, the definition of research data includes samples and physical collections. The researcher is responsible for management of physical samples during the research project. Mines is required to archive physical samples if they are needed to verify or reproduce research results or to extend the research in new directions.


Physical Storage

Some departments have physical storage spaces for samples used during student research. Check with your department. If no physical department storage is available, then arrangements may be possible with the Mines Geology museum. The researcher needs to consult with the Museum Director about storage after a project ends and any related budget issues. Ideally, this has been considered before writing a proposal data management plan. The project budget must include the total cost of archival storage including specific equipment and facilities needed for the proposed storage time. Additionally, sample storage requirements that are beyond the present capabilities of existing facilities must include plans needed to develop cost accounting and implement budgeting procedures.


Documentation for Physical Collections and Samples

Create a document that can be on file with proposal that includes the following:

  • A general description of sample type(s), such as polished sections of metallurgical alloys; microscope slides with mounted sections of tissue; concrete samples; ampoules of liquid; etc.
  • A general description of sample size
  • An estimate of the number of samples that will be generated during the study. The estimate will only need to be to the order of magnitude: tens, hundreds, thousands of samples is sufficient
  • Special conditions for storage should be described, such as temperature control, vacuum, isolation, etc.
  • Special security or access issues to samples should be described
  • The time period for archival storage must be specified. In some cases, only an estimate may be possible, but the conditions for extension of archival times should be clearly described
Sponsor Requirements for Preservation

Overview

Many federal agencies and other funders expect researchers to share their data after a research project ends, and some journals and societies now require data archiving. Depending on the type of research, various subject domain and funding agency requirements exist for how soon data are expected to be made available and for how long.


Mines Policy

Mines recommends that research data should be archived for a minimum of three years after the final project close-out, with original data retained wherever possible. The researcher should review funder/sponsor requirements.


Federal Agency Expectations

Most funding agency data sharing policies ask that data from projects be shared in a timely matter, understanding that what constitutes a “timely matter” will vary from project to project. Many funding agencies allow for embargo periods for political/commercial/patent reasons, as long as they are explained in the Data Management Plan.When retention periods are specified, it is important to understand when the clock starts ticking. OMB Circular A-100 states that the retention period is three years from the date the final financial report is submitted. NIH uses that same language. But the NSF General Grant Conditions states that records must be retained for three years after submission of all required reports. The researcher should check the retention requirements for each sponsor they are involved with. Here are some examples of funding agency data availability and retention periods:

  • NSF Engineering Directorate: Accessible for a minimum of three years after the end of the project or public release, whichever comes first. Release “at the earliest reasonable time.”
  • NSF Earth Sciences Division: Made openly available no later than two years after data collection
  • NSF Ocean Sciences Division: Made openly available no later than two years after data collection
  • NOAA: Available no later than two years after data collection
  • NIH: Available no later than the acceptance for publication of main findings from final data set
Dos and Don'ts

Overview

This page provicde some quick Do and Don’t to make preservation of your data easier.


Planning

  • Create a sound data management plan addressing both funder requirements and the expectations of the field in regards in data collection, managing and sharing
  • Estimate the amount of data required for your project as early as possible,
  • Include costs for data storage (including storage of back-up copies) in proposal budgets
  • Notify Research Support Services and CCIT of upcoming storage requirements so they can help with planning and avoid delays
  • If access to data needs to be restricted, be sure a clear rationale is provided to funders and describe any limitations or permissions that may exist for access
  • Be sure to address after project access
  • Keep in mind that a bit more “up-front” planning may actually mean less work in the long-term

Project Work

  • Enact a robust backup plan to ensure data are not lost and that they are shared with other project researchers securely and safely
  • Establish a data organization structure and file-naming cnvention and use it consistently

End of Project

  • Based on the data management plan, determine which data should be preseved and work with Research Support Services to deposit in the institutional reposit or another appropriate repository
  • Have three copies of your data—the original master file, a local backup (e.g., on an external hard drive) and an external backup (e.g., on a managed networked drive or on a web-based storage service).

Sharing Data

  • Ensure data do not become inaccessible if someone leaves the project
  • Determine if any data needs to be restricted and work with Research Support Services to enable gatekeeping