This package contains a set of functions for to assist in the running and reporting of tests that have been published on the SCTO platform. The tests themselves are located in the validation_tests repository. The results of the tests are then published as issues in the pkg_validation repository.
Tests are written using functions from the testthat package, and can be downloaded, run, and reported using the functions in this package.
test_skeleton
helps build the structureThe test_skeleton
function can be used to create the
relevant folder structure for testing a new package, or adding a file
for testing additional functions. In the code below, substitute
pkg
with the name of the package to be tested and add the
names of the function(s) you want to test in the funs
argument.
This will create a set of files in your working directory:
-- pkg
+- info.txt
+- setup-pkg.R
+- test-fun.R
+- test-fun2.R
+- test-etc.R
info.txt
file will contain the name of the package and
a freetext description of what is tested,setup-pkg.R
is for any necessary setup code
(e.g. installation of the package),funs
argument, a
test-function.R
file is created, which will contain the
actual testing code.In the event that the package already has tests, the
test_skeleton
function will not overwrite the existing
files, only adding any necessary test-function.R
files.
Add the relevant tests to the test-function.R
files and
check that they work as expected (run devtools::load_all()
followed by test("package")
).
Once you have added the necessary tests (see the next section), add
the files to the validation_tests
repository. The easiest way is to navigate to the tests folder,
click on “Add file” and then “Upload files”. Now you can simply drag the
pkg
folder into the browser window and commit the change.
This will have forked the repository to your own GitHub account, and you
can now create a pull request to the original repository to incorporate
your code. It is also possible to fork the repository, clone it to your
computer and make the commit there, but this is not strictly
necessary.
At this stage, the four-eye principle will be applied to the pull request to check the adequacy and quality of the tests and code. If the reviewer agrees with your tests, they will be merged into the package. If they note any issues, which you will see as comments in the GitHub pull request, you will need to address them before the tests can be merged.
Once merged, the tests can be run via the validation package
validation::test("packagename")
and documented in the
repository at https://github.com/SwissClinicalTrialOrganisation/pkg_validation.
Testing is performed via the testthat framework. All tests for
a given function should be placed in a dedicated
test-function.R
file.
Each test is comprised of one or more expectations and a descriptive name.
E.g.
test_that("some meaningful message about the tests", {
expect_equal(1 + 1, 2)
expect_true(is.numeric(1))
expect_false(is.character(1))
})
Where multiple tests are to be made on what could be a single object,
it is often useful to create the object outside of the
test_that
function. This is particularly useful when
different descriptive texts should be shown for the tests (e.g. perhaps
the coefficients and standard errors from a model):
obj <- some_function(params)
test_that("test 1 on obj", {
expect_equal(obj$value_to_test, expected_value)
})
test_that("test 2 on obj", {
expect_equal(obj$another_value_to_test, expected_value)
})
If objects are only useful to the test, they can be created within
the test_that
function.
test_that("tests on obj", {
obj <- some_function(params)
expect_equal(obj$value_to_test, expected_value)
expect_equal(obj$another_value_to_test, expected_value)
})
Making the description of the test meaningful is important, as it will help the user diagnose where the problem is.
testthat
supports a large number of expectations, which
are documented in the testthat
documentation. We demonstrate a few examples below.
To test the computation of the function, the following code must be added to the testing file, for as many test cases as considered appropriate:
Where f is the function to be tested, x are the input parameters for the function and y is the expected returned value.
Note that is/may be necessary/desirable to set a tolerance for
floating point comparisons. This can be done with the
tolerance
argument.
To test whether, under certain conditions, the function returns an error, a warning or a message, the following corresponding code can be adapted, for as many test cases as considered appropriate:
test_that("function f returns an error", {
expect_error(f(x))
})
test_that("function f returns a warning", {
expect_warning(f(x))
})
test_that("function f returns a message", {
expect_message(f(x))
})
Where f is the function to be tested, x are the arguments that define
the conditions. Use the regexp
argument to check for a
particular error, warning or message.
In contrast, to test whether the function runs without returning an error, a warning or a message, the following corresponding code can be adapted, for as many test cases as considered appropriate:
To test whether, under certain conditions, the function returns TRUE or FALSE, adapt the following code as appropriate:
test_that("function f returns TRUE", {
expect_true(f(x))
})
test_that("function f returns FALSE", {
expect_false(f(x))
})
Where f is the function to be tested, x are the arguments that define the conditions.
To test whether, under certain conditions, the function returns NULL, the following code can be adapted, for as many test cases as considered appropriate:
To test whether, the function returns an object of a certain type, the following code can be adapted, for as many test cases as considered appropriate:
Where f is the function to be tested, x are the arguments that define the conditions and type is any of the following: “integer”, “character”, “factor”, “logical”, “double”.
To test whether, the function returns an object of class s3, the following code can be adapted, for as many cases as considered appropriate:
Where f is the function to be tested, x are the arguments that define the conditions and class is, among others, any of the following: “data.frame”, “factor”, “Date”, “POSIXct”, etc.
On occasion, it may be desirable to restrict the tests to specific
package versions. This can be done by using the skip_if
functions in testthat
.
For example, the pivot functions we introduced to tidyr in version 1.0.0. If we have tests on those functions, we can restrict them to versions 1.0.0 and above with the following code:
skip_if(packageVersion("tidyr") < "1.0.0")
This line can be placed at the top of the test file, before any tests are run. The equivalent can be done for versions below a certain version, which might be useful for deprecated functions.
It might be suitable to stop tests from being run if the internet is not available:
skip_if_offline()
Or if a package can only be run on a specific operating system:
skip_on_os("mac")
It is often necessary to use data in tests. R includes a number of built-in datasets that can be used for this purpose (e.g. iris, mtcars).
The medicaldata R package also contains a number of datasets, primarily from published sources.
The SCTO validation platform also has a location for additional datasets, which may also be used.
In order to access datasets within tests, the
get_test_data
function can be used in the
setup-pkg.R
file. Suppose we want to use the
mtcars.csv
dataset from the repository. We might use the
following setup-stats.R file:
if(!require(stats)) install.packages("stats")
library(stats)
library(testthat)
get_test_data("mtcars.csv")
mtcars <- read.csv("mtcars.csv")
withr::defer({
# most of the time, we would want to detach packages, in this case we don't
# detach(package:stats)
}, teardown_env())
We do not need to worry about the location the file is saved to as
the test
function, via testthat
, sets the
working directory to the package-tests directory.
Where a new dataset might be useful when testing multiple packages, it can be added to the datasets folder in the validation_tests repository via a pull request.
Theory is all well and good, but it’s always useful to see how that would be in practice.
Assume that we want to check that the lm
function from
the stats
package works as expected. We can write a test
file that checks that the function returns the expected coefficients and
standard errors.
The following assumes that we have cloned the validation repository to our computer and we are within that project.
To begin with, we construct the testing files, specifically the
test_skeleton
function. We only want to test the
lm
function, so we only pass that to the second
argument:
This will have created a stats
folder within inst/tests.
Within that folder, there will be files called info.txt
,
setup-stats.R
, and test-lm.R
.
test-lm.R
First we will write the actual tests that we want to run. Tests are
entered into the test-lm.R
file. Opening that file, we see
that there are just a few comments at the top of the file, with some
reminders.
# Write relevent tests for the function in here
# Consider the type of function:
# - is it deterministic or statistic?
# - is it worth checking for errors/warnings under particular conditions?
We decide that we will use the mtcars dataset as our basis for
testing lm
, so we can load the dataset. We want to test
both the linear effect of the number of cylinders on the miles per
gallon, and the effect of the number of cylinders (cyl
), as
well as when cyl
is treated as a factor.
# Write relevent tests for the function in here
# Consider the type of function:
# - is it deterministic or statistic?
# - is it worth checking for errors/warnings under particular conditions?
data(mtcars)
mtcars$cyl_f <- factor(mtcars$cyl)
Note that if there are multiple functions being tested (each in their
own test-function.R file) that require the same data, we can load and
prepare the data in the setup-stats.R
file.
We can also define the models that we want to test:
cmod <- lm(mpg ~ cyl, data = mtcars)
fmod <- lm(mpg ~ cyl_f, data = mtcars)
We do not include the model definitions within a test_that call because we will use the same models in multiple tests. Again, if we needed to use those models for testing multiple functions, we could define them in the setup file.
Suppose that we know that the coefficient for mpg ~ cyl
is known (-2.88 for the linear effect). We can write a test that checks
that expectation:
test_that("lm returns the expected coefficients", {
expect_equal(coef(cmod)[2], -2.88)
})
Due to floating point precision, this is probably insufficient - R
will not return exactly -2.88. We can use the tolerance
argument to check that the coefficient is within a certain range (we
could also round the coefficient). We also need to tell
expect_equal
to ignore the names attribute of the vector,
otherwise it compares the whole object, attributes and all:
test_that("lm returns the expected coefficients", {
expect_equal(coef(cmod)[2], -2.88, tolerance = 0.01, check.attributes = FALSE)
})
We can do the same for the coefficients from the model with
cyl_f
. This time, we can derive the values from the
tapply
function as, in this case, the coefficients are just
the means:
test_that("lm returns the expected coefficients", {
means <- tapply(mtcars$mpg, mtcars$cyl, mean)
coefs <- coef(fmod)
expect_equal(coefs[1], means[1], check.attributes = FALSE)
expect_equal(coefs[2], means[2] - means[1], check.attributes = FALSE)
expect_equal(coefs[3], means[3] - means[1], check.attributes = FALSE)
})
We have now performed 4 tests (the expectations) in two
test_that
calls. We can also combine them together:
test_that("lm returns the expected coefficients", {
expect_equal(coef(cmod)[2], -2.88, tolerance = 0.01, check.attributes = FALSE)
means <- tapply(mtcars$mpg, mtcars$cyl, mean)
coefs <- coef(fmod)
expect_equal(coefs[1], means[1], check.attributes = FALSE)
expect_equal(coefs[2], means[2] - means[1], check.attributes = FALSE)
expect_equal(coefs[3], means[3] - means[1], check.attributes = FALSE)
})
Whether to put them in one or two calls is up to the author. Distributing them across more calls helps identify which tests fail, but it also makes the file longer.
The tolerance is a tricky thing to select. It is a balance between being too strict and too lenient. If the tolerance is too strict, then the test will fail when the function is working as expected. If the tolerance is too lenient, then the test will pass when the function is not working as expected.
Consider the example above. We compared -2.88 with the coefficient which R reports to (at least) 5 decimal places. In this case, it does not make sense to use a tolerance of less than 0.01 because we only know the coefficient to two decimal places (even though we would have access to a far greater precision had we worked for it).
Generally speaking, values that are easy to calculate should probably
have a lower tolerance. Values that are very dependent on specifics of
the implementation (e.g. maximisation algorithm, etc) should probably
have a higher tolerance. This is especially the case when using external
software as a reference (e.g. Stata uses different defaults settings to
lme4
, causing differences in SEs). Simulation results, may
also require a more lenient tolerance.
Suppose that we have used Stata as a reference software for the standard errors. We include the commands used in the reference software in comments in the script, including the output and the information of the version of the reference software.
# write.csv(mtcars, "mtcars.csv", row.names = FALSE)
# reference software: Stata 17.0 (revision 2024-02-13)
# import delimited "mtcars.csv"
# regress mpg cyl
# [output truncated for brevity]
# ------------------------------------------------------------------------------
# mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
# -------------+----------------------------------------------------------------
# cyl | -2.87579 .3224089 -8.92 0.000 -3.534237 -2.217343
# _cons | 37.88458 2.073844 18.27 0.000 33.64922 42.11993
# ------------------------------------------------------------------------------
# [output truncated for brevity]
# regress mpg i.cyl
# [output truncated for brevity]
We can then use the SE values from Stata in the tests, specifying a suitable tolerance (it’s pretty simple to calculate, so we can be quite stringent):
test_that("Standard errors from LM are correct", {
expect_equal(summary(cmod)$coefficients[2, 2], 0.322408,
tolerance = 0.00001)
expect_equal(summary(fmod)$coefficients[2, 2], 1.558348,
tolerance = 0.00001)
expect_equal(summary(fmod)$coefficients[3, 2], 1.298623,
tolerance = 0.0001)
})
The test file including the tests above is then:
# Write relevent tests for the function in here
# Consider the type of function:
# - is it deterministic or statistic?
# - is it worth checking for errors/warnings under particular conditions?
local_edition(3)
data(mtcars)
mtcars$cyl_f <- factor(mtcars$cyl)
cmod <- lm(mpg ~ cyl, data = mtcars)
fmod <- lm(mpg ~ cyl_f, data = mtcars)
test_that("lm returns the expected coefficients", {
expect_equal(coef(cmod)[2], -2.88, tolerance = 0.01, check.attributes = FALSE)
means <- tapply(mtcars$mpg, mtcars$cyl, mean)
coefs <- coef(fmod)
expect_equal(coefs[1], means[1], check.attributes = FALSE)
expect_equal(coefs[2], means[2] - means[1], check.attributes = FALSE)
expect_equal(coefs[3], means[3] - means[1], check.attributes = FALSE)
})
# write.csv(mtcars, "mtcars.csv", row.names = FALSE)
# reference software: Stata 17.0 (revision 2024-02-13)
# import delimited "mtcars.csv"
# regress mpg cyl
#
# Source | SS df MS Number of obs = 32
# -------------+---------------------------------- F(1, 30) = 79.56
# Model | 817.712952 1 817.712952 Prob > F = 0.0000
# Residual | 308.334235 30 10.2778078 R-squared = 0.7262
# -------------+---------------------------------- Adj R-squared = 0.7171
# Total | 1126.04719 31 36.3241028 Root MSE = 3.2059
#
# ------------------------------------------------------------------------------
# mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
# -------------+----------------------------------------------------------------
# cyl | -2.87579 .3224089 -8.92 0.000 -3.534237 -2.217343
# _cons | 37.88458 2.073844 18.27 0.000 33.64922 42.11993
# ------------------------------------------------------------------------------
# regress mpg i.cyl
#
# Source | SS df MS Number of obs = 32
# -------------+---------------------------------- F(2, 29) = 39.70
# Model | 824.78459 2 412.392295 Prob > F = 0.0000
# Residual | 301.262597 29 10.3883654 R-squared = 0.7325
# -------------+---------------------------------- Adj R-squared = 0.7140
# Total | 1126.04719 31 36.3241028 Root MSE = 3.2231
#
# ------------------------------------------------------------------------------
# mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
# -------------+----------------------------------------------------------------
# cyl |
# 6 | -6.920779 1.558348 -4.44 0.000 -10.10796 -3.733599
# 8 | -11.56364 1.298623 -8.90 0.000 -14.21962 -8.907653
# |
# _cons | 26.66364 .9718008 27.44 0.000 24.67608 28.65119
# ------------------------------------------------------------------------------
test_that("Standard errors from lm are correct", {
expect_equal(summary(cmod)$coefficients[2, 2], 0.322408,
tolerance = 0.0001)
expect_equal(summary(fmod)$coefficients[2, 2], 1.558348,
tolerance = 0.0001)
expect_equal(summary(fmod)$coefficients[3, 2], 1.298623,
tolerance = 0.0001)
})
For lm
, other things that might be tested include the
R-squared, the F-statistic, and the p-values. Generally speaking, we
might also want to test that the model is of the appropriate class (also
lm
in this case), or that the model has the expected number
of coefficients, that the function issues warnings and/or errors at
appropriate times.
info.txt
It is easiest to write the info.txt
file once all tests
have been written. It provides a listing of what has been tested in
prose form and serves as a quick overview of the tests.
The default text the file contains a single line:
Tests for package stats
Extra details on the tests that we have performed should be added. In this case, we might modify it to:
Tests for package stats
- coefficients and SEs from a unvariate model with continuous and factor predictors. SEs were checked against Stata.
Where tests are for/from a specific version of the package, as might be the case for newly added or deprecated functions, this should also be noted.
setup-stats.R
This file should contain the code necessary to load the package and
any other packages that are required for the tests. In this case, we
need the stats
package and the testthat
package.
The testthat package is always needed. In general, loading the
package is necessary. As stats
is a standard R package,
it’s no necessary in this case.
We also try to leave the environment as we found it, so we detach the
packages via the withr::defer
function. Again, in this
case, we don’t want to detach it, as it is a standard package.
if(!require(stats)) install.packages("stats")
library(stats)
library(testthat)
withr::defer({
# most of the time, we would want to detach packages, in this case we don't
# detach(package:stats)
}, teardown_env())
Assuming the tests are in a folder called stats
, which
is within out current working directory, we can run the tests with:
We specify download = FALSE
because
validation::test
will download the files from GitHub and
run those tests by default. download = FALSE
tells it to
use the local copy of the files instead.
The output should return various information on our tests and system. The first part comes from testthat itself while it runs the tests, the remainder (after “Copy and paste the following…”) provides summary information that should be copied into a github issue :
✔ | F W S OK | Context
✔ | 7 | lm
══ Results ══════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 7 ]
## Copy and paste the following output into the indicated sections of a new issue
ISSUE NAME:
[Package test]: stats version 4.3.1
### Name of the package you have validated:
stats
### What version of the package have you validated?
4.3.1
### When was this package tested?
2024-03-18
### What was tested?
Tests for package stats
- coefficients and SEs from a unvariate model with continuous and factor predictors. SEs were checked against Stata.
### Test results
PASS
### Test output:
|file |context |test | nb| passed|skipped |error | warning|
|:---------|:-------|:------------------------------------|--:|------:|:-------|:-----|-------:|
|test-lm.R |lm |lm returns the expected coefficients | 8| 4|FALSE |FALSE | 4|
|test-lm.R |lm |Standard errors from lm are correct | 3| 3|FALSE |FALSE | 0|
### SessionInfo:
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=German_Switzerland.utf8 LC_CTYPE=German_Switzerland.utf8
[3] LC_MONETARY=German_Switzerland.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.utf8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] graphics grDevices utils datasets methods base
other attached packages:
[1] gh_1.4.0 validation_0.1.0 testthat_3.2.0
loaded via a namespace (and not attached):
[1] xfun_0.40 httr2_0.2.3 htmlwidgets_1.6.2 devtools_2.4.5
[5] remotes_2.4.2.1 processx_3.8.2 callr_3.7.3 vctrs_0.6.5
[9] tools_4.3.1 ps_1.7.5 generics_0.1.3 curl_5.1.0
[13] tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3 desc_1.4.2
[17] lifecycle_1.0.4 compiler_4.3.1 stringr_1.5.1 brio_1.1.3
[21] httpuv_1.6.12 htmltools_0.5.6.1 usethis_2.2.2 yaml_2.3.8
[25] pkgdown_2.0.7 tidyr_1.3.0 later_1.3.1 pillar_1.9.0
[29] crayon_1.5.2 urlchecker_1.0.1 ellipsis_0.3.2 cranlogs_2.1.1
[33] rsconnect_1.1.1 cachem_1.0.8 sessioninfo_1.2.2 mime_0.12
[37] tidyselect_1.2.0 digest_0.6.33 stringi_1.8.3 dplyr_1.1.4
[41] purrr_1.0.2 rprojroot_2.0.3 fastmap_1.1.1 cli_3.6.2
[45] magrittr_2.0.3 pkgbuild_1.4.2 utf8_1.2.4 withr_3.0.0
[49] waldo_0.5.1 prettyunits_1.2.0 promises_1.2.1 rappdirs_0.3.3
[53] roxygen2_7.3.0 rmarkdown_2.25 httr_1.4.7 gitcreds_0.1.2
[57] stats_4.3.1 memoise_2.0.1 shiny_1.8.0 evaluate_0.22
[61] knitr_1.45 miniUI_0.1.1.1 profvis_0.3.8 rlang_1.1.3
[65] Rcpp_1.0.11 xtable_1.8-4 glue_1.7.0 xml2_1.3.5
[69] pkgload_1.3.3 rstudioapi_0.15.0 jsonlite_1.8.7 R6_2.5.1
[73] fs_1.6.3
### Where is the test code located for these tests?
please enter manually
### Where the test code is located in a git repository, add the git commit SHA
please enter manually, if relevant
RStudio has a built-in git interface, which is a good way to manage your git repositories if you use RStudio.
The Happy Git and GitHub for the useR book is a comprehensive guide to working with git and GitHub. Of particular use are chapters 9 to 12 on connecting your computer with GitHub.
The GitHub desktop app is a good way to manage your git repositories if you are not comfortable with the command line. This is also an easy way to connect your computer with your GitHub account. There are many other GUIs for working with git repositories. See here for a listing of some of them.