Add a data dictionary to a dataset

Pairing a dataset with its dictionary makes it easier to interpret. This tutorial describes how a module from the ready4use R package can help you to pair a dataset and its dictionary.

This below section renders a vignette article from the ready4use library. You can use the following links to:

Note: This vignette is illustrated with fake data. The dataset explored in this example should not be used to inform decision-making.

ready4use includes a number of tools for labeling health economic model data and forms part of the ready4 framework.

Create a dataset-dictionary pair

A data dictionary contains useful metadata about a dataset. To illustrate this point, we can ingest examples of a toy (fake) dataset and its data-dictionary.

objects_ls <- Ready4useRepos(dv_nm_1L_chr = "fakes",
                    dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/HJXYKQ",
                    dv_server_1L_chr = "dataverse.harvard.edu") %>%
  ingest(metadata_1L_lgl = F)

Importantly (and a requirement for subsequent steps), the data dictionary we ingest is a ready4use_dictionary object.

class(objects_ls$eq5d_ds_dict)
#> [1] "ready4use_dictionary" "ready4_dictionary"    "tbl_df"               "tbl"                  "data.frame"

We can now pair the data dictionary with its dataset in a new object X, a Ready4useDyad.

X <- Ready4useDyad(ds_tb = objects_ls$eq5d_ds_tb,
                   dictionary_r3 = objects_ls$eq5d_ds_dict)

Inspect data

We can inspect X by printing selected information about it to console using the exhibit method. If we only wish to see the first or last few records, we can pass “head” or “tail” to the display_1L_chr argument.

 exhibit(X,
         display_1L_chr = "head",
         scroll_box_args_ls = list(width = "100%"))
Dataset
uid Timepoint data_collection_dtm d_age Gender d_sex_birth_s d_sexual_ori_s d_relation_s d_ATSI CALD Region d_studying_working eq5dq_MO eq5dq_SC eq5dq_UA eq5dq_PD eq5dq_AD K10_int Psych_well_int
1 BL 2019-10-22 14 Male Male Heterosexual In a relationship No No Metro Not studying or working 1 1 1 1 2 11 87
2 BL 2019-10-17 19 Female Female Heterosexual In a relationship Yes Yes Regional Studying only 1 2 1 1 1 14 65
2 FUP 2020-02-14 19 Female Female Heterosexual In a relationship Yes Yes Regional Studying only 3 1 1 1 1 10 71
3 BL 2020-02-15 21 Female Female Other Not in a relationship NA NA Metro Studying only 1 1 3 1 1 13 74
3 FUP 2020-06-14 21 Female Female Other Not in a relationship NA NA Metro Studying only 1 1 2 1 1 10 64
4 BL 2019-12-14 12 Female Female Heterosexual In a relationship Yes Yes Metro Not studying or working 1 1 1 3 1 18 40

The dataset may be more meaningful if variables are labelled using the descriptive information from the data dictionary. This can be accomplished using the renew method.

X <- renew(X,
           type_1L_chr = "label")
exhibit(X,
        display_1L_chr = "head",
         scroll_box_args_ls = list(width = "100%"))
Dataset
Unique identifier Data collection round Date of data collection Age Gender (grouped) Sex at birth Sexual orientation Relationship status Aboriginal or Torres Strait Islander Culturally And Linguistically Diverse Region of residence (metropolitan or regional) Education and employment status EQ5D - Mobility domain score EQ5D - Self-Care domain score EQ5D - Usual Activities domain score EQ5D - Pain / Discomfort domain score EQ5D - Anxiety / Depression domain score Kessler Psychological Distress - 10 Item Total Score Overall Wellbeing Measure (Winefield et al. 2012)
1 BL 2019-10-22 14 Male Male Heterosexual In a relationship No No Metro Not studying or working 1 1 1 1 2 11 87
2 BL 2019-10-17 19 Female Female Heterosexual In a relationship Yes Yes Regional Studying only 1 2 1 1 1 14 65
2 FUP 2020-02-14 19 Female Female Heterosexual In a relationship Yes Yes Regional Studying only 3 1 1 1 1 10 71
3 BL 2020-02-15 21 Female Female Other Not in a relationship NA NA Metro Studying only 1 1 3 1 1 13 74
3 FUP 2020-06-14 21 Female Female Other Not in a relationship NA NA Metro Studying only 1 1 2 1 1 10 64
4 BL 2019-12-14 12 Female Female Heterosexual In a relationship Yes Yes Metro Not studying or working 1 1 1 3 1 18 40

To remove dataset labels, use the renew method with “unlabel” passed to the type_1L_chr argument.

X <- renew(X,
           type_1L_chr = "unlabel")

By default, the exhibit method will print the dataset part of the Ready4useDyad instance. To inspect the data dictionary, pass “dict” to the type_1L_chr argument.

exhibit(X,
        display_1L_chr = "head",
        type_1L_chr = "dict",
        scroll_box_args_ls = list(width = "100%"))
Data Dictionary
Variable Category Description Class
CALD demographic Culturally And Linguistically Diverse factor
d_age demographic age integer
d_ATSI demographic Aboriginal or Torres Strait Islander character
d_relation_s demographic relationship status character
d_sex_birth_s demographic sex at birth character
d_sexual_ori_s demographic sexual orientation factor
Last modified February 2, 2024: more tidy-ups of model export (11ef896)