--- title: "Contributing new tests" output: rmarkdown::html_vignette: default vignette: > %\VignetteIndexEntry{Contributing new tests} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This package contains a set of functions for to assist in the running and reporting of tests that have been published on the SCTO platform. The tests themselves are located in the [validation_tests](www.github.com/SwissClinicalTrialOrganisation/validation_tests) repository. The results of the tests are then published as issues in the [pkg_validation](www.github.com/SwissClinicalTrialOrganisation/pkg_validation) repository. Tests are written using functions from the [testthat](https://testthat.r-lib.org/) package, and can be downloaded, run, and reported using the functions in this package. ## `test_skeleton` helps build the structure The `test_skeleton` function can be used to create the relevant folder structure for testing a new package, or adding a file for testing additional functions. In the code below, substitute `pkg` with the name of the package to be tested and add the names of the function(s) you want to test in the `funs` argument. ```{r, eval = FALSE} test_skeleton("pkg", funs = c("fun", "fun2", "etc")) ``` This will create a set of files in your working directory: ``` -- pkg +- info.txt +- setup-pkg.R +- test-fun.R +- test-fun2.R +- test-etc.R ``` - `info.txt` file will contain the name of the package and a freetext description of what is tested, - `setup-pkg.R` is for any necessary setup code (e.g. installation of the package), - for each function in the `funs` argument, a `test-function.R` file is created, which will contain the actual testing code. In the event that the package already has tests, the `test_skeleton` function will not overwrite the existing files, only adding any necessary `test-function.R` files. Add the relevant tests to the `test-function.R` files and check that they work as expected (run `devtools::load_all()` followed by `test("package")`). ## Submitting tests to the platform Once you have added the necessary tests (see the next section), add the files to the [validation_tests repository](https://github.com/SwissClinicalTrialOrganisation/validation_tests). The easiest way is to navigate to the tests folder, click on "Add file" and then "Upload files". Now you can simply drag the `pkg` folder into the browser window and commit the change. This will have forked the repository to your own GitHub account, and you can now create a pull request to the original repository to incorporate your code. It is also possible to fork the repository, clone it to your computer and make the commit there, but this is not strictly necessary. At this stage, the four-eye principle will be applied to the pull request to check the adequacy and quality of the tests and code. If the reviewer agrees with your tests, they will be merged into the package. If they note any issues, which you will see as comments in the GitHub pull request, you will need to address them before the tests can be merged. Once merged, the tests can be run via the validation package `validation::test("packagename")` and documented in the repository at [https://github.com/SwissClinicalTrialOrganisation/pkg_validation](https://github.com/SwissClinicalTrialOrganisation/pkg_validation). ## Writing tests Testing is performed via the [testthat](https://testthat.r-lib.org/) framework. All tests for a given function should be placed in a dedicated `test-function.R` file. Each test is comprised of one or more expectations and a descriptive name. E.g. ```{r, eval = FALSE} test_that("some meaningful message about the tests", { expect_equal(1 + 1, 2) expect_true(is.numeric(1)) expect_false(is.character(1)) }) ``` Where multiple tests are to be made on what could be a single object, it is often useful to create the object outside of the `test_that` function. This is particularly useful when different descriptive texts should be shown for the tests (e.g. perhaps the coefficients and standard errors from a model): ```{r, eval = FALSE} obj <- some_function(params) test_that("test 1 on obj", { expect_equal(obj$value_to_test, expected_value) }) test_that("test 2 on obj", { expect_equal(obj$another_value_to_test, expected_value) }) ``` If objects are only useful to the test, they can be created within the `test_that` function. ```{r, eval = FALSE} test_that("tests on obj", { obj <- some_function(params) expect_equal(obj$value_to_test, expected_value) expect_equal(obj$another_value_to_test, expected_value) }) ``` Making the description of the test meaningful is important, as it will help the user diagnose where the problem is. `testthat` supports a large number of expectations, which are documented in the [testthat documentation](https://testthat.r-lib.org/reference/index.html). We demonstrate a few examples below. ### Compare computation to a reference value To test the computation of the function, the following code must be added to the testing file, for as many test cases as considered appropriate: ```{r, eval = FALSE} test_that("function f works", { expect_equal(f(x), y) }) ``` Where f is the function to be tested, x are the input parameters for the function and y is the expected returned value. Note that is/may be necessary/desirable to set a tolerance for floating point comparisons. This can be done with the `tolerance` argument. ### Testing for errors, warnings and other messages To test whether, under certain conditions, the function returns an error, a warning or a message, the following corresponding code can be adapted, for as many test cases as considered appropriate: ```{r, eval = FALSE} test_that("function f returns an error", { expect_error(f(x)) }) test_that("function f returns a warning", { expect_warning(f(x)) }) test_that("function f returns a message", { expect_message(f(x)) }) ``` Where f is the function to be tested, x are the arguments that define the conditions. Use the `regexp` argument to check for a particular error, warning or message. ```{r, eval = FALSE} test_that("function f returns an error", { expect_error(f(x), regexp = "some error message") }) ``` In contrast, to test whether the function runs without returning an error, a warning or a message, the following corresponding code can be adapted, for as many test cases as considered appropriate: ```{r, eval = FALSE} test_that("function f runs without error", { expect_no_error(f(x)) }) test_that("function f runs without a warning", { expect_warning(f(x)) }) test_that("function f runs without a message", { expect_message(f(x)) }) ``` ### Testing booleans To test whether, under certain conditions, the function returns TRUE or FALSE, adapt the following code as appropriate: ```{r, eval = FALSE} test_that("function f returns TRUE", { expect_true(f(x)) }) test_that("function f returns FALSE", { expect_false(f(x)) }) ``` Where f is the function to be tested, x are the arguments that define the conditions. ### Testing for NULL To test whether, under certain conditions, the function returns NULL, the following code can be adapted, for as many test cases as considered appropriate: ```{r, eval = FALSE} test_that("function f returns NULL", { expect_null(f(x)) }) ``` ### Testing the type of object returned (base R) To test whether, the function returns an object of a certain type, the following code can be adapted, for as many test cases as considered appropriate: ```{r, eval = FALSE} test_that("function f returns object of type XXX", { expect_type(f(x), type) }) ``` Where f is the function to be tested, x are the arguments that define the conditions and type is any of the following: “integer”, “character”, “factor”, “logical”, “double”. ### Testing the class (s3) of an object To test whether, the function returns an object of class s3, the following code can be adapted, for as many cases as considered appropriate: ```{r, eval = FALSE} test_that("function f returns object of class s3", { expect_s3_class(f(x), class) }) ``` Where f is the function to be tested, x are the arguments that define the conditions and class is, among others, any of the following: “data.frame”, “factor”, “Date”, “POSIXct”, etc. ### Running tests under certain conditions On occasion, it may be desirable to restrict the tests to specific package versions. This can be done by using the `skip_if` functions in `testthat`. For example, the pivot functions we introduced to tidyr in version 1.0.0. If we have tests on those functions, we can restrict them to versions 1.0.0 and above with the following code: ``` skip_if(packageVersion("tidyr") < "1.0.0") ``` This line can be placed at the top of the test file, before any tests are run. The equivalent can be done for versions below a certain version, which might be useful for deprecated functions. It might be suitable to stop tests from being run if the internet is not available: ``` skip_if_offline() ``` Or if a package can only be run on a specific operating system: ``` skip_on_os("mac") ``` ### Using data in tests It is often necessary to use data in tests. R includes a number of built-in datasets that can be used for this purpose (e.g. iris, mtcars). The [medicaldata](https://doi.org/10.32614/CRAN.package.medicaldata) R package also contains a number of datasets, primarily from published sources. The SCTO validation platform also has a [location for additional datasets](https://github.com/SwissClinicalTrialOrganisation/validation_tests/tree/main/datasets), which may also be used. In order to access datasets within tests, the `get_test_data` function can be used in the `setup-pkg.R` file. Suppose we want to use the `mtcars.csv` dataset from the repository. We might use the following setup-stats.R file: ``` if(!require(stats)) install.packages("stats") library(stats) library(testthat) get_test_data("mtcars.csv") mtcars <- read.csv("mtcars.csv") withr::defer({ # most of the time, we would want to detach packages, in this case we don't # detach(package:stats) }, teardown_env()) ``` We do not need to worry about the location the file is saved to as the `test` function, via `testthat`, sets the working directory to the package-tests directory. Where a new dataset might be useful when testing multiple packages, it can be added to the datasets folder in the validation_tests repository via a pull request. ## Worked example Theory is all well and good, but it's always useful to see how that would be in practice. Assume that we want to check that the `lm` function from the `stats` package works as expected. We can write a test file that checks that the function returns the expected coefficients and standard errors. The following assumes that we have cloned the validation repository to our computer and we are within that project. To begin with, we construct the testing files, specifically the `test_skeleton` function. We only want to test the `lm` function, so we only pass that to the second argument: ```{r, eval = FALSE} test_skeleton("stats", "lm") ``` This will have created a `stats` folder within inst/tests. Within that folder, there will be files called `info.txt`, `setup-stats.R`, and `test-lm.R`. ### `test-lm.R` First we will write the actual tests that we want to run. Tests are entered into the `test-lm.R` file. Opening that file, we see that there are just a few comments at the top of the file, with some reminders. ``` # Write relevent tests for the function in here # Consider the type of function: # - is it deterministic or statistic? # - is it worth checking for errors/warnings under particular conditions? ``` We decide that we will use the mtcars dataset as our basis for testing `lm`, so we can load the dataset. We want to test both the linear effect of the number of cylinders on the miles per gallon, and the effect of the number of cylinders (`cyl`), as well as when `cyl` is treated as a factor. ``` # Write relevent tests for the function in here # Consider the type of function: # - is it deterministic or statistic? # - is it worth checking for errors/warnings under particular conditions? data(mtcars) mtcars$cyl_f <- factor(mtcars$cyl) ``` Note that if there are multiple functions being tested (each in their own test-function.R file) that require the same data, we can load and prepare the data in the `setup-stats.R` file. ```{r, echo=FALSE, eval=FALSE} data(mtcars) mtcars$cyl_f <- factor(mtcars$cyl) summary(lm(mpg ~ cyl, data = mtcars)) summary(lm(mpg ~ cyl_f, data = mtcars)) tapply(mtcars$mpg, mtcars$cyl, mean) ``` We can also define the models that we want to test: ``` cmod <- lm(mpg ~ cyl, data = mtcars) fmod <- lm(mpg ~ cyl_f, data = mtcars) ``` We do not include the model definitions within a test_that call because we will use the same models in multiple tests. Again, if we needed to use those models for testing multiple functions, we could define them in the setup file. #### Testing coefficients Suppose that we know that the coefficient for `mpg ~ cyl` is known (-2.88 for the linear effect). We can write a test that checks that expectation: ``` test_that("lm returns the expected coefficients", { expect_equal(coef(cmod)[2], -2.88) }) ``` Due to floating point precision, this is probably insufficient - R will not return exactly -2.88. We can use the `tolerance` argument to check that the coefficient is within a certain range (we could also round the coefficient). We also need to tell `expect_equal` to ignore the names attribute of the vector, otherwise it compares the whole object, attributes and all: ``` test_that("lm returns the expected coefficients", { expect_equal(coef(cmod)[2], -2.88, tolerance = 0.01, check.attributes = FALSE) }) ``` We can do the same for the coefficients from the model with `cyl_f`. This time, we can derive the values from the `tapply` function as, in this case, the coefficients are just the means: ``` test_that("lm returns the expected coefficients", { means <- tapply(mtcars$mpg, mtcars$cyl, mean) coefs <- coef(fmod) expect_equal(coefs[1], means[1], check.attributes = FALSE) expect_equal(coefs[2], means[2] - means[1], check.attributes = FALSE) expect_equal(coefs[3], means[3] - means[1], check.attributes = FALSE) }) ``` We have now performed 4 tests (the expectations) in two `test_that` calls. We can also combine them together: ``` test_that("lm returns the expected coefficients", { expect_equal(coef(cmod)[2], -2.88, tolerance = 0.01, check.attributes = FALSE) means <- tapply(mtcars$mpg, mtcars$cyl, mean) coefs <- coef(fmod) expect_equal(coefs[1], means[1], check.attributes = FALSE) expect_equal(coefs[2], means[2] - means[1], check.attributes = FALSE) expect_equal(coefs[3], means[3] - means[1], check.attributes = FALSE) }) ``` Whether to put them in one or two calls is up to the author. Distributing them across more calls helps identify which tests fail, but it also makes the file longer. ##### A note on selecting tolerances The tolerance is a tricky thing to select. It is a balance between being too strict and too lenient. If the tolerance is too strict, then the test will fail when the function is working as expected. If the tolerance is too lenient, then the test will pass when the function is not working as expected. Consider the example above. We compared -2.88 with the coefficient which R reports to (at least) 5 decimal places. In this case, it does not make sense to use a tolerance of less than 0.01 because we only know the coefficient to two decimal places (even though we would have access to a far greater precision had we worked for it). Generally speaking, values that are easy to calculate should probably have a lower tolerance. Values that are very dependent on specifics of the implementation (e.g. maximisation algorithm, etc) should probably have a higher tolerance. This is especially the case when using external software as a reference (e.g. Stata uses different defaults settings to `lme4`, causing differences in SEs). Simulation results, may also require a more lenient tolerance. #### Testing the standard errors against Stata Suppose that we have used Stata as a reference software for the standard errors. We include the commands used in the reference software in comments in the script, including the output and the information of the version of the reference software. ``` # write.csv(mtcars, "mtcars.csv", row.names = FALSE) # reference software: Stata 17.0 (revision 2024-02-13) # import delimited "mtcars.csv" # regress mpg cyl # [output truncated for brevity] # ------------------------------------------------------------------------------ # mpg | Coefficient Std. err. t P>|t| [95% conf. interval] # -------------+---------------------------------------------------------------- # cyl | -2.87579 .3224089 -8.92 0.000 -3.534237 -2.217343 # _cons | 37.88458 2.073844 18.27 0.000 33.64922 42.11993 # ------------------------------------------------------------------------------ # [output truncated for brevity] # regress mpg i.cyl # [output truncated for brevity] ``` We can then use the SE values from Stata in the tests, specifying a suitable tolerance (it's pretty simple to calculate, so we can be quite stringent): ``` test_that("Standard errors from LM are correct", { expect_equal(summary(cmod)$coefficients[2, 2], 0.322408, tolerance = 0.00001) expect_equal(summary(fmod)$coefficients[2, 2], 1.558348, tolerance = 0.00001) expect_equal(summary(fmod)$coefficients[3, 2], 1.298623, tolerance = 0.0001) }) ``` #### The completed test file The test file including the tests above is then: ``` # Write relevent tests for the function in here # Consider the type of function: # - is it deterministic or statistic? # - is it worth checking for errors/warnings under particular conditions? local_edition(3) data(mtcars) mtcars$cyl_f <- factor(mtcars$cyl) cmod <- lm(mpg ~ cyl, data = mtcars) fmod <- lm(mpg ~ cyl_f, data = mtcars) test_that("lm returns the expected coefficients", { expect_equal(coef(cmod)[2], -2.88, tolerance = 0.01, check.attributes = FALSE) means <- tapply(mtcars$mpg, mtcars$cyl, mean) coefs <- coef(fmod) expect_equal(coefs[1], means[1], check.attributes = FALSE) expect_equal(coefs[2], means[2] - means[1], check.attributes = FALSE) expect_equal(coefs[3], means[3] - means[1], check.attributes = FALSE) }) # write.csv(mtcars, "mtcars.csv", row.names = FALSE) # reference software: Stata 17.0 (revision 2024-02-13) # import delimited "mtcars.csv" # regress mpg cyl # # Source | SS df MS Number of obs = 32 # -------------+---------------------------------- F(1, 30) = 79.56 # Model | 817.712952 1 817.712952 Prob > F = 0.0000 # Residual | 308.334235 30 10.2778078 R-squared = 0.7262 # -------------+---------------------------------- Adj R-squared = 0.7171 # Total | 1126.04719 31 36.3241028 Root MSE = 3.2059 # # ------------------------------------------------------------------------------ # mpg | Coefficient Std. err. t P>|t| [95% conf. interval] # -------------+---------------------------------------------------------------- # cyl | -2.87579 .3224089 -8.92 0.000 -3.534237 -2.217343 # _cons | 37.88458 2.073844 18.27 0.000 33.64922 42.11993 # ------------------------------------------------------------------------------ # regress mpg i.cyl # # Source | SS df MS Number of obs = 32 # -------------+---------------------------------- F(2, 29) = 39.70 # Model | 824.78459 2 412.392295 Prob > F = 0.0000 # Residual | 301.262597 29 10.3883654 R-squared = 0.7325 # -------------+---------------------------------- Adj R-squared = 0.7140 # Total | 1126.04719 31 36.3241028 Root MSE = 3.2231 # # ------------------------------------------------------------------------------ # mpg | Coefficient Std. err. t P>|t| [95% conf. interval] # -------------+---------------------------------------------------------------- # cyl | # 6 | -6.920779 1.558348 -4.44 0.000 -10.10796 -3.733599 # 8 | -11.56364 1.298623 -8.90 0.000 -14.21962 -8.907653 # | # _cons | 26.66364 .9718008 27.44 0.000 24.67608 28.65119 # ------------------------------------------------------------------------------ test_that("Standard errors from lm are correct", { expect_equal(summary(cmod)$coefficients[2, 2], 0.322408, tolerance = 0.0001) expect_equal(summary(fmod)$coefficients[2, 2], 1.558348, tolerance = 0.0001) expect_equal(summary(fmod)$coefficients[3, 2], 1.298623, tolerance = 0.0001) }) ``` For `lm`, other things that might be tested include the R-squared, the F-statistic, and the p-values. Generally speaking, we might also want to test that the model is of the appropriate class (also `lm` in this case), or that the model has the expected number of coefficients, that the function issues warnings and/or errors at appropriate times. ### `info.txt` It is easiest to write the `info.txt` file once all tests have been written. It provides a listing of what has been tested in prose form and serves as a quick overview of the tests. The default text the file contains a single line: ``` Tests for package stats ``` Extra details on the tests that we have performed should be added. In this case, we might modify it to: ``` Tests for package stats - coefficients and SEs from a unvariate model with continuous and factor predictors. SEs were checked against Stata. ``` Where tests are for/from a specific version of the package, as might be the case for newly added or deprecated functions, this should also be noted. ### `setup-stats.R` This file should contain the code necessary to load the package and any other packages that are required for the tests. In this case, we need the `stats` package and the `testthat` package. The testthat package is always needed. In general, loading the package is necessary. As `stats` is a standard R package, it's no necessary in this case. We also try to leave the environment as we found it, so we detach the packages via the `withr::defer` function. Again, in this case, we don't want to detach it, as it is a standard package. ``` if(!require(stats)) install.packages("stats") library(stats) library(testthat) withr::defer({ # most of the time, we would want to detach packages, in this case we don't # detach(package:stats) }, teardown_env()) ``` ### Testing that the tests work Assuming the tests are in a folder called `stats`, which is within out current working directory, we can run the tests with: ```r validation::test("stats", download = FALSE) ``` We specify `download = FALSE` because `validation::test` will download the files from GitHub and run those tests by default. `download = FALSE` tells it to use the local copy of the files instead. The output should return various information on our tests and system. The first part comes from testthat itself while it runs the tests, the remainder (after "Copy and paste the following...") provides summary information that should be copied into a github issue : ``` ✔ | F W S OK | Context ✔ | 7 | lm ══ Results ══════════════════════════════════════════════════════════════════════════════════ [ FAIL 0 | WARN 0 | SKIP 0 | PASS 7 ] ## Copy and paste the following output into the indicated sections of a new issue ISSUE NAME: [Package test]: stats version 4.3.1 ### Name of the package you have validated: stats ### What version of the package have you validated? 4.3.1 ### When was this package tested? 2024-03-18 ### What was tested? Tests for package stats - coefficients and SEs from a unvariate model with continuous and factor predictors. SEs were checked against Stata. ### Test results PASS ### Test output: |file |context |test | nb| passed|skipped |error | warning| |:---------|:-------|:------------------------------------|--:|------:|:-------|:-----|-------:| |test-lm.R |lm |lm returns the expected coefficients | 8| 4|FALSE |FALSE | 4| |test-lm.R |lm |Standard errors from lm are correct | 3| 3|FALSE |FALSE | 0| ### SessionInfo: R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=German_Switzerland.utf8 LC_CTYPE=German_Switzerland.utf8 [3] LC_MONETARY=German_Switzerland.utf8 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.utf8 time zone: Europe/Zurich tzcode source: internal attached base packages: [1] graphics grDevices utils datasets methods base other attached packages: [1] gh_1.4.0 validation_0.1.0 testthat_3.2.0 loaded via a namespace (and not attached): [1] xfun_0.40 httr2_0.2.3 htmlwidgets_1.6.2 devtools_2.4.5 [5] remotes_2.4.2.1 processx_3.8.2 callr_3.7.3 vctrs_0.6.5 [9] tools_4.3.1 ps_1.7.5 generics_0.1.3 curl_5.1.0 [13] tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3 desc_1.4.2 [17] lifecycle_1.0.4 compiler_4.3.1 stringr_1.5.1 brio_1.1.3 [21] httpuv_1.6.12 htmltools_0.5.6.1 usethis_2.2.2 yaml_2.3.8 [25] pkgdown_2.0.7 tidyr_1.3.0 later_1.3.1 pillar_1.9.0 [29] crayon_1.5.2 urlchecker_1.0.1 ellipsis_0.3.2 cranlogs_2.1.1 [33] rsconnect_1.1.1 cachem_1.0.8 sessioninfo_1.2.2 mime_0.12 [37] tidyselect_1.2.0 digest_0.6.33 stringi_1.8.3 dplyr_1.1.4 [41] purrr_1.0.2 rprojroot_2.0.3 fastmap_1.1.1 cli_3.6.2 [45] magrittr_2.0.3 pkgbuild_1.4.2 utf8_1.2.4 withr_3.0.0 [49] waldo_0.5.1 prettyunits_1.2.0 promises_1.2.1 rappdirs_0.3.3 [53] roxygen2_7.3.0 rmarkdown_2.25 httr_1.4.7 gitcreds_0.1.2 [57] stats_4.3.1 memoise_2.0.1 shiny_1.8.0 evaluate_0.22 [61] knitr_1.45 miniUI_0.1.1.1 profvis_0.3.8 rlang_1.1.3 [65] Rcpp_1.0.11 xtable_1.8-4 glue_1.7.0 xml2_1.3.5 [69] pkgload_1.3.3 rstudioapi_0.15.0 jsonlite_1.8.7 R6_2.5.1 [73] fs_1.6.3 ### Where is the test code located for these tests? please enter manually ### Where the test code is located in a git repository, add the git commit SHA please enter manually, if relevant ``` ## Hints for working with GitHub RStudio has a built-in git interface, which is a good way to manage your git repositories if you use RStudio. The [Happy Git and GitHub for the useR](https://happygitwithr.com/) book is a comprehensive guide to working with git and GitHub. Of particular use are chapters 9 to 12 on connecting your computer with GitHub. The GitHub desktop app is a good way to manage your git repositories if you are not comfortable with the command line. This is also an easy way to connect your computer with your GitHub account. There are many other GUIs for working with git repositories. See [here for a listing of some of them](https://git-scm.com/downloads/guis).