Derived variables

Introduction

recodeflow supports the use of derived variables. Derived variables can be any custom function as long as the variable can be calculated on a per row basis. Functions requiring operations across rows or on the full data set are not supported.

The two most common uses for derived variables are:

  • Variables derived from two or more variables,
  • Variables that are derived using math equations (e.g., BMI is calculated by dividing weight by the square of height).

To create derived variables, you need to complete two steps:

  1. Create and load a customized function.
  2. Add the derived variable to the variable_details and variables worksheets.

Example of a derived function

We’ll walk through an example of creating a derived variable with our example data.

Our customized derived function is multiplying the blood concentration of cholesterol (chol) with the blood concentration of bilirunbin (bili).

1. Create and load a customized function for your derived variables.

Create the custom function: Here is the customized function for our derived variable (chol*bili):

#example_der_fun caluclates chol*bili
#@param chol the row value for chol
#@param bili the row value for bili
#@export 
example_der_fun <- function(chol, bili){
  # as numeric is used to coerce in case categorical numeric variables are used.
  # Warning either chol or bili being NA will result in NA return
  example_der <- as.numeric(chol)*as.numeric(bili)
  
  return(example_der)
}

Note: You must use roxygen2 documentation for custom functions otherwise the function cannot be attached to a package. See roxygen2 on how to format and document your function.

Load the custom function into your R environment. Load the customized function by either:

  • entering your functions into the console and running the code, or
  • attaching the functions to your own package using the build and install tool. Then load your package using “library(”name of package”)” or by using the rec_with_table parameter to pass the path to your function R script.

If you don’t load the customized function you cannot create the derived variable.

2. Add the derived variable to the variable_details and variables worksheets.

Add the derived variable to the variables worksheet. You’ll use the same nomenclature as any other variable. See the article variables_sheet for nomenclature rules.

Add the derived variable to the variable_details. See the article variable_details for nomenclature rules.

3. Recode the derived variable

Use the function rec_with_table to recode your derived function.

  1. Load recodeflow
#Load the package
library(recodeflow)
  1. Recode the underlying variables (chol and bili) and the derived variable (example_der).
derived1 <- rec_with_table(data = tester1,
                          variables = c("chol", "bili","example_der"),
                          variable_details = variable_details,
                          log = TRUE)
## Using the passed data variable name as database_name
## NOTE for bili: This is sample survival pbc data
## The variable bili was recoded into bili for the database tester1 the following recodes were made:
##   value_to   From rows_recoded
## 1     copy [0,28]          209
## 2     <NA>   else            0
## NOTE for chol: This is sample survival pbc data
## NOTE for chol: This is sample survival pbc data
## The variable chol was recoded into chol for the database tester1 the following recodes were made:
##   value_to        From rows_recoded
## 1     copy [120, 1775]          186
## 2    Na::a          NA            0
## 3     <NA>        else           23