Share and Discover
After a research project ends, it may be necessary to make the data, or at minimum information about those data, available to other researchers per the Data Management Plan. A good way to disseminate research data to colleagues and more widely is to deposit it in a repository or archive. Controlling access to these data can then be accomplished in a variety of ways, for example using passwords, encryption, or a permissions authorization system.
Local Mines Institutional Repository
- Mines supports an institutional repository
- Suitable for end-of-project deposit in order to meet Federal requirements to make data publicly accessible
- Various levels of access are available (e.g. unrestricted or permission-based)
- Multiple types of files can be deposited
- End users would be able to download files
- Supports small to medium-sized datasets
Researchers may also deposit with external repositories, such as subject- and discipline-based repositories as well as national and international data centers. Some journals and societies may also offer repository services for data used in articles. Each repository has their own requirements in terms of domain, data re-use and access, file formats and metadata. Researchers also need to ensure the chosen repository meets the guidelines set forth by their funder.
Use re3data.org to find a repository in your field.
Depending on the research discipline, data can often be deposited in one or more data centers (or repositories) that will provide access to the data. These repositories may have specific requirements in regards to: subject/research domain, data re-use and access, file format and data structure, and metadata.
Institutional Repository Information
Mines supports an institutional repository that can be used for end-of-project deposit in order to meet Federal requirements of public access. Your data will be preserved according to the digital preservation standards enacted by the repository host which happens to be Colorado State University (CSU). The Arthur Lakes Library group works closely with CSU to understand their requirements.
The institutional repository has file size limitations; therefore researchers must contact the Arthur Lakes Library to ensure sufficient storage availability. For large datasets, other arrangements with ITS may be required to support local storage and access. Alternatively, the researcher may opt for off-campus archival storage.
There are many different external repositories in which to deposit data. In many cases, repositories and data centers will have their own policies regarding transfer, access permissions, data formats, metadata creation, retention periods, costs, policies and procedures. If you are going to use a repository/data center, check their policies before including them in a Data Management Plan. Any data that are deposited externally still needs to create metadata that can be added to the Mines institutional repository in order to facilitate discovery and re-use.
- Cambridge Structural Database – small molecule crystal structures ChemSpider – free-to-access collection of chemical structures and their associated information
- eCrystals – x-ray crystallographic data
- PubChem – NCBI’s repository of bioactivy/bioassay data and information for “small” molecules (i.e. not macromolecular). Both text-based and structure-based search tools are provided
- Cooperative Association for Internet Data Analysis (CAIDA) – Archive of data for scientific analysis of network functions
Environmental and Geosciences
- Marine Geoscience Data System (MGDS) – A data portal, hosted at the Lamont-Doherty Earth Observatory (Columbia University), for a number of NSF-supported marine research programs
- National Climatic Data Center (NCDC) – Meteorology and paleoclimatology
- National Oceanographic Data Center (NODC)– World-wide marine environmental and ecosystem data
- GEON – Portal for datasets and visualization tools
- National Snow and Ice Data Center (NSIDC) – Cryospheric datasets from ground field reseach and satellites
GIS and Geography
- Geodata.gov – One-stop for federal, state and local geographic data
- GeoCommons.com GIS file repository and finding tool
- Federal Geographic Data Committee – Provides access to the National Spatial Data Infrastructure (NSDI) Clearing House Network and the geodata.gov portal
- National Geographic Data Center – Archive of datasets
Life and Biological Sciences
- Dryad – Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. It has been developed by the National Evolutionary Synthesis Center and the University of North Carolina Metadata Research Center
- National Biological Information Infrastructure – This portal links to a wide variety of data sources, such as the Fisheries and Aquatic Resources Data Access Wizard and the Biogeographic Information and Observation System (BIOS). A full list of all data sets is available here
- Protein DataBank – Experimentally determined structures for macromolecules (protein and nucleic acids). The site includes search and visualization tools
- UniProt – Free protein sequences
- HEP Data – high-energy physics reaction database of Numerical HEP scattering cross sections
- NIST Physical Standards Laboratory – physical reference data and property tables
- National Nuclear Data Center – includes nuclear structure, reaction and decay databases
- ICPSR (Inter-university Consortium for Political and Social Research) A non-profit, membership-based data archive located at the University of Michigan. The UO is a member of ICPSR, which allows students, staff, and faculty to access ICPSR data files and documentation for research.
- â€‹Dataverse Network is a collection of social science research data contained in virtual data archives called “dataverses”. Maintained by the IQSS (Institute for Quantitative Social Sciences at Harvard), you can create your own “dataverse” and upload your data, subject to certain terms.
Directories of repositories
- re3data (“REgistry of REsearch REpositories”) List of repositories
- DataBib List of repositories
- DataCite List of Repositories Compiled by the British Library, BioMed Central, and the UK’s Digital Curation Centre.
- Distributed Data Curation Center: Other Data Repositories Managed by Purdue University Libraries, the Distributed Data Curation Center lists of more than 50 open data repositories from a range of science disciplines.
- Gene Expression Omnibus The Gene Expression Omnibus (GEO) is an open data repository which provides access to microarray, next-generation sequencing, and other forms of functional genomic data submitted by the scientific community.
- Global Change Master Directory The Global Change Master Directory, maintained by the Earth Sciences Directorate at the National Aeronautics and Space Administration (NASA), provides access to more than 25,000 earth and environmental science data sets, relevant to global change and Earth science research.
- MIT Data Management and Publishing: Sharing Your Data The MIT Libraries’ subject guide on data management and publishing includes a list of open data repositories spanning the disciplines of astronomy, atmospheric science, biology, chemistry, earth science, oceanography and space science.
- Oceanographic Data Repositories Funded by the National Science Foundation, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) provides access to several oceanographic data repositories created by the US Joint Global Ocean Flux Study and US Global Ocean Ecosystem Dynamic programs.
- Open Access Directory: Data Repositories Launched in 2008 and hosted by the Graduate School of Library and Information Science at Simmons College, the Open Access Directory is a wiki that lists links to over 50 open data repositories in the disciplines of archaeology, biology, chemistry, environmental sciences, geology, geosciences and geospatial data, marine sciences, medicine and physics, as well as multidisciplinary open data repositories.
- Public Data Sets on Amazon Web Services Amazon Web Services provides a centralized place to download public domain and non-proprietary astronomy, biology, chemistry and climatology data sets.
All data (even your own) used in publications, presentations, posters, etc. should be cited to
- provide access and proper credit
- allow for verification of results
- encourage reuse of the data
That is, data citation is just as important as citing the literature consulted. It helps maintain the chain of the scholarly record.
To be most effective, a data citation should include at least the following elements. The utility of these elements will depend on the research discipline, source data center/repository, and data format.
- Responsible party (i.e., study PI, sample collector, government agency)
- Name of table, map, or dataset with any applicable unique IDs
- Date published (the date the data were created or posted online)
- Name of data center, repository, and/or publication
- Type of data file
- Analysis software, if required
- Date accessed
- URL and/or DOI/DOI link or other persistent link
- American Geophysical Union (AGU): Policy on Referencing Data in and Archiving Data for Publications
- Federation of Earth Science Information Partners (ESIP) Interagency Data Stewardship/Citations
- Citing and linking to the Gene Expression Omnibus (NCBI) database
- Dryad Data Citation Guidelines
- Social science data at MIT citation guide
- NOAA Paleoclimatology Data Citations
- ICPSR recommended citation procedures
- Socioeconomic Data and Applications Center (SEDAC) has guidelines for preparing citations
Citation Format Creation
If a DOI exist for your data, use the CrossRef Data Citation Formatter to generate a citation in a certain language and format/style (e.g. APA or Geological Society of America, etc.). Try it by going to the link and entering this DOI: 10.3886/ICPSR07325.
Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/
CBS News. 2009. CBS News Poll: Energy USCBS2009-02A Version 2 [MRDF]. New York: CBS News [producer]. Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut [distributor].
National Center for Health Statistics. National Ambulatory Medical Survey, 1994. Public-use data file and documentation. ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/. 1996.
Duncan, Otis D., and Howard Schuman. Detroit Area Study, 1971: Social Problems and Social Change in Detroit [Computer file]. ICPSR07325-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 1997. doi:10.3886/ICPSR07325
Access / Restrictions
Retrieval and access procedures for restricted data are based on the data archive and the individuals deemed responsible for providing access. Retrieval and access to both restricted and unrestricted research data should be aligned with funder/sponsor requirements and based on the Data Management Plan. Controlling access can then be accomplished in a variety of ways, such as password protection or encryption, or through a system of permissions authorizations involving one or more Gatekeepers. Note that the researcher’s ideal choice of permission restrictions may conflict with the rules of a given repository. If data are restricted, one or more individuals responsible for authorization will need to be specified as the Gatekeeper, whose role it is to control access. Gatekeepers can be the Research Support Services group, the PI, an Office of Research Administration employee, a data center’s archivist, or whoever is thus designated in the Data Management Plan. If a permission authorization system is to be used, specific requirements and guidelines for evaluating the request and providing access need to be specified.
Tools for End Users
This page recommends tools or sites that will help you find tools for manipulating, visualizing and interacting with data, metadata, web technologies, etc. Be sure to checkout the ITS software list.
Recommended Sites for Searching for Tools
- DataOne Software Tools Catalog
- tools for math, science, computing, metadata, and MUCH more
- Global Change Master Directory (GCMD)
- data analysis and visualization
- data management and handling
- hazards management
MATLAB (Campus Computer Lab)