The role of a template variable is
to avoid the duplication of work that can happen when recoding two
variables with the same specification but that represent different
information. For example, imagine a database with two columns called
PL
and SL
that encodes the primary and
secondary language that an individual speaks. Both columns are
categorical with the same set of categories, English,
French, Mandarin,
Hindi and N/A. The last category
represents a “not applicable” value, for example if an individual does
not speak a second language. An example dataset is shown below,
example_dataset <- read.csv(system.file("example-dataset.csv", package="recodeflow"), fileEncoding = "UTF-8-BOM")
DT::datatable(example_dataset)
If we want to use these language variables in a study, we should probably assign a unique number to each category, effectively converting both variables to a numeric categorical variable. We want to recode the above example dataset to the one below,
example_dataset <- read.csv(system.file("recoded-dataset.csv", package="recodeflow"), fileEncoding = "UTF-8-BOM")
DT::datatable(example_dataset)
The recoding will create two new categorical variables,
primary_lang
and secondary_lang
from the
PL
and SL
respectively with the recoding rules
shown below,
recoding_rules <- data.frame(
language = c("English", "French", "Mandarin", "Hindi", "N/A"),
recoded_value = c(1,2,3,4,"NA(a)")
)
DT::datatable(recoding_rules)
The variable details sheet which encodes the recoding rules above is shown below,
variable_details <- read.csv(system.file("no-template-variable-variable-details.csv", package="recodeflow"), fileEncoding = "UTF-8-BOM")
DT::datatable(variable_details)
Both variables have the same specification (the same categories) but represent different concepts (different variable names and labels). With a few variables and categories, it’s easy enough to repeat the rules in the respective sheet (variable_details). But imagine this situation scaled to 8 variables and 132 categories - a real use case that prompted this feature! With so many variables and categories, it would be tiring to create the specifications as well as maintain them. A template variable can save you the headache!
The templateVariable
column allows you to specify a
“virtual variable”, a variable that does not exist in the database, but
whose specification can be used by other variables in the database. For
example, we can rewrite the above variables details sheet using a
template variable called language
.
variable_details <- read.csv(system.file("template-variable-variable-details.csv", package="recodeflow"), fileEncoding = "UTF-8-BOM")
DT::datatable(variable_details)
In the above sheet we have defined a new template variable called
language
(rows 1 to 6) which provides the reusable
specifications for other languages in the database. The template
variable can now be reused in other concrete variables in the database
like the primary_language
and
secondary_language
variables as defined in rows 7 and 8. To
use a template variable, simply set the templateVariable
column to the name of the template variable.
As we can see, by creating a template variable we can reduce the number of rows in our variable details sheet but more importantly reduce the duplication of rows, which in turn increases maintainability of our sheet.
The templateVariable
column can have one of the
following values:
When creating a template variable,
variable
column should be the name of the template
variabletemplateVariable
column should be YesvariableStart
and variableStartLabel
column should be N/A
. Variables that extend the template
variable will set them.When extending a template variable,
typeEnd
, typeStart
,
databaseStart
columns should be equal to value set in the
template variable. The numValidCat
, catLabel
,
catLabeLong
, units
, recStart
, and
catStartLabel
should be set to N/A. These will come from
the template variable.recStart
column can be set to a function, otherwise it should be
N/A
.