Skip to Main Content
UNLV Logo

Research Data Management

Open vs. Proprietary

File Format Selection

  • Open file formats are publicly documented and can be used and implemented by anyone without restrictions. They support long-term access, transparency, and interoperability across different software. Examples include CSV (.csv), TXT (.txt), and XML (.xml).

  • Proprietary file formats are controlled by a company or organization and may require specific software to open or edit (e.g., .xlsx, .sav, .docx). These formats can pose challenges for data sharing and preservation due to limited accessibility or software dependency.

  • Converting proprietary formats to open formats can improve data longevity and accessibility. For example, save .xlsx spreadsheets as .csv files or export .docx documents as .txt or .pdf

Common Data Formats

Data Types Original Data Format
Preservation Friendly Formats
(Open Standard, Uncompressed)
Text Hand-written, docx, wpd, odt, rtf, txt, html, xml, pdf xmlPDF/Atxt
Tabular csv, tsv, pipe-delimited, xls(x), ods, dif, xps csv
Tabular (Extensive) sav (SPSS), sas7bdat or xpt (SAS), dta (STATA)  csv, txt with setup file  or associated script (r or m)
Database db, dbf, sql, sqlite, db, db3, xml xml, sqlite
Visual static: pdf, jpeg, tiff, png, gif, bmp,
moving: mpeg, mov, avi, mxf
PDF/A, tiffJPEG2000
MPEG-4
Audio
wav(e), mp3, mp2, aiff, wma, aac, dct, flac, ogg 
wave, aiff 

For more, see these resources: 

File Naming

File Naming Best Practices

Naming your files consistently is one low-hanging RDM fruit that will really help you in your research projects. Certain choices in file naming are essential to accessing and sharing files across different types of computer environments.

You should follow these practices as you implement a file naming convention for your project:

  • Prefix your files with the date created using a YYYY-MM-DD format
  • Avoid special characters like &, %, $, #, @, and *. Just use letters and numbers.
  • Do not make file identity dependent on capitalization unless implementing camel case (e.g. fileName.xml).
  • Never use spaces in filenames – many systems and software will not recognize them or will give errors unless such filenames are treated specially. Use an underscore _ instead of a space.
  • Use short file names. For your sake and the sake of systems that’ll fail if you give it like a 50 character file name.

It’s the difference between VS_IMG%Archive2&3 Jan 2018.tiff and 2018-01-04_VS-Archive2-3.tiff. One is way more understandable later on than the other. Another hint: you don’t want all the metadata about your files in the file name, because then it can get too long and unwieldy.

Other practices to keep in mind:

  • Use 001, 002, 003 instead of 1, 2, 3 to help sort and search through the data more effectively.
  • Choose file names that are recognizable to humans and that make sense within the project environment by including information such as:

    • Name of creator (say, in a collaboratively built project)

    • Date of creation

    • Version number (avoid terms like "final" or "latest," since file versions usually not final)

    • Descriptive term for object referenced by the file (a text title, a specimen name, a geographical location, a scientific instrument type)

Documentation

Data Documentation

Once you have set up your file organization, storage, and file naming conventions, it’s a good idea to think about how you will document your project. Our rule of thumb is: If it happened during your project, chances are you need to document it!

It’s really useful when, let’s say, you are trying to go back to some data from 6 months ago and you forget what it is, or why you named a variable a certain way, or you forget when it was collected. By keeping documentation such as README files (which give a high-level overview of files in a given project and how they can be used) and codebooks (files that document variables and their meaning).

We recommend that you document your work with the following files at least:

  • README file for every folder that describes the files in your folder and explains the naming convention you used
  • codebook that defines specific details of your data  -- the variables, column headers for spreadsheets, participant aliases, or qualitative tags are some examples of facets of a dataset that should be described in a codebook.

Storage and Backup

Have a Plan

  • Have a plan for what you will do with the data you generate.
  • Follow the 3-2-1 backup rule.
  • Have a data management plan and follow it. 
  • Make plans for continuity for when students graduate or leave UNLV. 

What is the 3-2-1 backup rule? 

3

Three Copies: This includes the original data and two additional backups.

2

Two Different Media: The backups should be stored on different types of storage, such as hard drives or cloud storage.

1

One Offsite: At least one backup copy should be stored in a separate location from your primary data, like a cloud storage service or a different physical location.


For Server & Server Options, see Research IT's Research Technology page and UNLV IT's File Storage

© University of Nevada Las Vegas