Skip to main content

Data Deposit

Data deposit is the end-of-project transfer of a curated data package into a trustworthy repository to ensure discovery, citation, appropriate access controls, and long-term preservation; it is distinct from active, in-project storage. This page explains what to deposit, how to prepare and document your files, how to choose a repository, and how to meet policy, licensing, and ethical requirements, including community-governed data considerations.

What and When to Deposit

Definition and Scope

Deposit is the end-of-project transfer of a curated, final dataset and its documentation to a trustworthy repository for discovery, citation, appropriate access controls, and long‑term preservation; it is distinct from active, in‑project storage and informal sharing. Link back to your Data Planning page for lifecycle definitions and earlier decisions about what will be created, retained, and shared.

What Belongs

Deposit the version of record for the data supporting published findings, the essential derivatives needed for comprehension or reuse, the analysis code or stable links to a tagged release, and complete documentation (README plus codebook/data dictionary).

Timing

Align deposit with funder and institutional expectations, which commonly require data to be available upon publication or within a defined post‑project window; where justified, use a time‑limited embargo with a documented rationale and lift date.

Selection and organization: 

Identify the authoritative file set; remove redundant and temporary files; impose a clear directory structure and naming convention (taken from your DMP); freeze tool‑dependent outputs to stable exports where feasible (ex, turn docx files into .RTF); document any nonstandard dependencies in the README.

Preservation-friendly formats:

Prefer open, widely supported formats (e.g., CSV/TXT for tabular data, TIFF/PNG for images, WAV/FLAC for audio, PDF/A for documents); if a proprietary format is necessary for fidelity, provide an open counterpart and note any feature loss.

Documentation:

Supply a README (project overview, file inventory, methods, processing steps, software and versions, known limitations) and a codebook/data dictionary (variables, units, allowable values, encodings, missing value conventions, derived variables).

Integrity checks:

Validate completeness and consistency; ensure files open/render on independent systems; where supported, compute and retain checksums for large or critical files.

Mini readiness checklist:

  1. Final file set organized
  2. Preservation-friendly formats in place or justified
  3. README and codebook complete
  4. Sensitive data reviewed/de-identified or restrictions planned

Required metadata

Provide study‑level metadata (title, creators with affiliations, abstract, keywords, methods, temporal/geographic coverage, funding sources, related outputs) and, where applicable, discipline‑specific schemas (e.g., DDI for social sciences, QuDEx for qualitative studies, etc).

Persistent identifiers

Enable a DOI for the dataset; include ORCID IDs for contributors; ROR IDs for institutions; record grant identifiers; link to software/code releases with their own PIDs as appropriate.

Citation

Include a standard dataset citation (creator, year, title, version, repository, DOI) in the repository record and README. Instruct the other authors to cite the dataset in manuscripts, CVs, and reports. Go back and link publications to grants to strengthen provenance and impact tracking.

Repository choice:

Prioritize disciplinary repositories for domain fit; use institutional repositories for alignment and curation; consider generalist options when disciplinary homes are unavailable. Evaluate curation level, preservation policy, access controls, sustainability/certification, and potential deposit fees.

Access model, license, embargo:

Choose an access level (open, restricted, or metadata‑only/closed) consistent with consent, legal/contractual limits, and policy mandates. Select a Creative Commons license that best suits your needs and, where necessary, set a justified embargo with a lift date.

Policy anchor:

Align repository and access decisions with funder and institutional expectations, noting disciplinary preferences where they exist, and ensure your choices are consistent with the commitments made in your DMP.

Indigenous Data Sovereignty:

For community‑governed data, implement OCAP/CARE‑aligned governance, culturally appropriate consent, and community‑defined access conditions; consider restricted access, community custody, or repatriation rather than open deposit, and document governance agreements in the metadata.

Submission workflow

  1. Create the dataset record
  2. Complete required metadata
  3. Upload files and documentation
  4. Set license and access/embargo
  5. Respond to any curator feedback
  6. Finalize for DOI minting and release

Post-deposit

Use repository versioning for corrections and new releases while maintaining provenance across versions. Update links to publications and software, and ensure contact and stewardship information remains current to support sustained access requests.