Metadata Creation


Overview

As the custodian of the primary data, the researcher should ensure project data are properly documented in order to facilitate current use and enable future discovery and sharing. As early as you can, document your data and your data organization protocol, even before data collection begins; doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project.


The following is a list of elements and aspects of your research project and data that should be documented, regardless of discipline. At minimum, this information should be stored in a readme.txt file or the equivalent, together with the data. The Mines Research Support Services group uses this documentation to create the required metadata for the Mines institutional repository.


Elements marked with an * are required by the Mines institutional repository.


See the Deposit with Mines page to understand the submittal process


General Information

  • *Title: name of the dataset or research project that produced it
  • *Creator: names and addresses of the organization or people who created the data
  • Identifier: number used to identify the data, even if it is just an internal project reference number
  • *Researcher identifier: a unique and persistent digital identifier that distinguishes you from every other researcher or author; requires registration with ResearchID or ORCID 
  • *Abstract: a concise description or summary of the dataset
  • *Subject: keywords or phrases describing the subject or content of the data; these are additional search terms that are not listed in the abstract
  • *Funders: name of the organizations or agencies who funded the research
  • *Award: the grant number(s) if the data was generated from work on a grant
  • *Rights: any known intellectual property rights held for the data (copyright)
  • Publication citations: any citations that describe or use the data

Data Characteristics

  • *Access information: if you deposited the data in a repository external to Mines, describe where and how the data can be accessed by other researchers
  • *Access restrictions: if there are restrictions on making the data openly accessible indicated the nature of the restriction and how long they need to be in place
  • *Language: language(s) of the intellectual content
  • *Dates: key dates (and times) associated with the data, including: funding period; project start and end date; release date; time period covered by the data (coverage); and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule, date of last update
  • *Date of publication: the date the data was made available, created or compiled as an entity for use by others
  • *Location: spatial coverage of the data or sampling site information 
  • Methodology: how the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook
  • *Data processing: during your research, record information on how the data has been altered or processed
  • Sources: citations to material for data derived from other sources, including details of where the source data is held and how it was accessed
  • Unit of analysis: the major entity that is being analyzed in the study
  • *Type: the dominant kinds of data; choose from Collection, Event, Image, Moving Image, Physical Object, Software, Sound, Text

File Characteristics

  • Count: total number of files
  • *Size: how much space the dataset requires on a computer server
  • *File names: list of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
  • *File formats: format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data
  • *File structure: organization of the data file(s) and the layout of the variables, when applicable
  • *Variable list: list of variables in the data files, when applicable
  • *Code lists: explanation of codes or abbreviations used in either filenames or the variables in the data files (e.g. '999 indicates a missing value in the data')
  • *Versions: date/time stamp for each file, and use a separate ID for each version
  • Checksums: to test if the files have changed over time