-
Notifications
You must be signed in to change notification settings - Fork 7
Asymptotic complexity testing
R package developers currently use ad-hoc tests of asymptotic computational complexity via empirical timings of functions and visual diagnostic plots. However, there is no framework in R for systematically testing the empirical computational complexity of functions. This is a problem because such a testing framework could be essential for identifying big speed gains in R code. Examples include the quadratic time algorithms that were used prior to R-3.6.0 for gregexpr/substring, https://github.com/tdhock/namedCapture-article#6-mar-2019
This is not an isolated issue. Many R packages could benefit from an asymptotic testing framework. For example rhdf5 https://github.com/grimbough/rhdf5/issues/31 has quadratic time access to some kinds of data, which should be linear time.
There are many R packages for unit tests, and there is Rperform for performance testing.
Create a new package, testComplexity, which will provide a suite of functions that developers can use to systematically and reliably test empirical asymptotic complexity of their functions.
- functions for quantifying empirical time complexity of R expressions/functions. This should be relatively straightforward using existing packages, e.g. microbenchmark. For example a function asymptoticTimings(fun(N)) should return a data.frame DF with numeric columns N (number of data) and T (time in seconds). Optional parameters will be range of N values, and we will work to ensure that defaults enable accurate/informative data collection in a minimal amount of time.
- functions for determining asymptotic complexity classes (linear, log-linear, quadratic, etc) based on noisy empirical data. For example for the data.frame DF described above we will implement a function asymptoticComplexityClass(DF$N, DF$T) which will return a character string e.g. “linear” if T = O(N), “quadratic” if T = O(N^2), etc.
- testing functions, using syntax similar to testthat package. For example expect_linear_time(fun(N)) should fail with an error if fun(N) is not linear in N, i.e. O(N).
We also propose to create a suite of test cases of functions/algorithms in R with known asymptotic complexity, in order to test the testComplexity package functions. Finally we propose to use testComplexity on a wide range of R packages in order to identify potential improvements in performance.
This project will provide a new package that will be useful for testing and improving the speed of various R packages.
Students, please contact mentors below after completing at least one of the tests below.
- EVALUATING MENTOR: Neeraj Dhanraj Bokde [email protected] is the co-author of GuessCompx - A package for empirically estimating an algorithm's complexity.
- Toby Hocking [email protected] is the author of several R packages and has written several research articles that empirically characterize the asymptotic complexity of algorithms.
Students, please do one or more of the following tests before contacting the mentors above.
- Easy: for several rpois simulated data set sizes N, use microbenchmark to compute computation times of PeakSegDP::cDPA (quadratic) versus PeakSegOptimal::PeakSegPDPA (log-linear), and plot them on a log-log scale as a function of N.
- Medium: write a preliminary function for classifying log-linear versus quadratic timings.
- Hard: put that function in an R package with documentation and tests.
Students, please post a link to your test results here.
- Ahmed Karaman, GitHub profile, Test Results.
- Anirban Chetia, Test Results, Github Profile.
- Meet Bhatnagar, Test Results, GitHub Profile