Skip to content

Releases: JackEdTaylor/LexOPS

v0.4.0

16 Jan 13:58
Compare
Choose a tag to compare

Important: This update includes a major change which may alter the reproducibility of some old pipelines - especially if split_by() was used on columns of type double. Take care to use versions prior to this release when re-running old code with LexOPS.

Update to split_by():

Simplified numeric splits in split_by(). This includes removing the use of the cut() method, and using the same method for double and integer types. The old method may have produced some unexpected behaviour when splitting by columns stored as double if the levels overlapped. See issue #6 for more details.

Another change with this new method is that, while splits can still be specified out of order (e.g., 4:5 ~ 1:3), the specified order is now preserved, whereas before an attempt was made to sort them. This means that A1 will now be 4:5, and A2 will be 1:3, whereas previous versions would have forced A1 to be the lower level of 1:3, and A2 to be the higher level of 4:5.

Other Major updates:

  • Related to the change above, numeric splits can no longer be overlapping at all (e.g., 1:2 ~ 2:3 used to be acceptable, but will now produce an error, as it is unclear to which group a value of 2 would belong).
  • Added equal_size argument to split_random(). Setting equal_size=TRUE will ensure that the split has equally (or as close to equal as possible) sized groups. This option will typically enable more candidate matches. This option was added in response to issue #4.
  • The generate() function checks that the id_col uniquely identifies items, and gives an error if this is not the case. This avoids duplicate IDs causing incorrect matching. Addresses issue #5.
  • run_shiny() now checks for stringdist package and will generate code to install if missing.

Minor Updates:

  • Updated to base R pipe, |>, in examples.
  • Unnecessary dependencies (vwr, plyr) have been removed from the shiny app.
  • All S3 methods now exported (previously only print.LexOPS_pipeline was exported).

Updates to Tests:

  • Removed deprecated testthat argument.
  • Now tests the equal_size argument of split_random().
  • Now tests that duplicates in id_col gives an error.
  • Ensured that variables that undergo scale() in tests are stored explicitly as numeric vectors. This addressed a deprecation warning from dplyr::filter() about 1-column matrices that was produced from one test.
  • Removed overlapping levels from all tests.
  • Added tests for split_by() errors.

v0.3.1

28 Jul 13:07
Compare
Choose a tag to compare

Major Updates:

  • Removed vwr package dependency in Shiny app as package has been removed from CRAN, replacing code with stringdist functions. This may fix instances of the Shiny app failing to load.
  • Can now pass arguments via ... to control_for_map().

Minor Updates:

  • S3 object printing will print "?" if no splits defined.
  • Changed control_for_map() documentation to use stringdist instead of vwr.
  • More informative errors if no splits or controls are defined.
  • New tests for errors.
  • New tests for split_random().
  • New tests for passing arguments to control_for_map() via ....
  • New tests for effects of function calls order in pipeline.
  • Updated fontawesome icons in Shiny app.

v0.3.0

09 Jun 14:15
Compare
Choose a tag to compare

Major updates:

  • Added a new S3 class with a generic print function - printing objects output by split_by(), control_for(), and variants of these functions will now print a summary of the pipeline rather than the original dataframe.
  • Improved data compression for LexOPS::lexops (now uses xz).
  • Fixed rare bug which could sometimes cause an error if an iteration failed by returning the wrong object from the function.

Minor updates:

  • Added test for Euclidean weights from control_for_euc().
  • Default plots are now neater (avoid colour recycling and use clearer theme).

v0.2.7

28 Apr 17:29
Compare
Choose a tag to compare

Major updates:

  • Fixed the calculation of weighted Euclidean distance. Previous versions would have calculated the distance of weighted items from unweighted targets, rather than comparing the items and target in the same space. This has been fixed, and will change the results of any code using weights in euc_dists() or control_for_euc().
  • Implemented weighting standardisation in calculating Euclidean distances, so that c(1, 3), c(10, 30), and c(100.33, 300.99) are all equivalent to c(0.5, 1.5). This means that similar tolerances can be used when weighting schemes change. The standardisation is weights/mean(weights) so that sum(weights)==length(weights). Standardisation can be disabled by setting standardise_weights=FALSE.

Minor updates:

  • Updated the Euclidean distance vignette to reflect the changes made to weighting.
  • Added tests that weights and weight standardisation for Euclidean distance work as described.

v0.2.6

23 Apr 08:53
Compare
Choose a tag to compare

Minor updates:

  • Updated the hex sticker.
  • Updated the documentation to mention support for no id_col variable in set_options().
  • Simplified the readme.

v0.2.5

27 Jan 23:29
Compare
Choose a tag to compare

Major update:

  • Fixed a bug with the concatenation of calls to control_for_map() where some calls would be overwritten/ignored.

Minor update:

  • If id_col (defined in set_options()) is missing, the output will use row numbers to uniquely identify items.

v0.2.4

14 Dec 15:06
Compare
Choose a tag to compare

Minor updates:

  • Printing progress now happens every 5% rather than 10%, and uses carriage return
  • Printing progress in the Shiny app now matches console output exactly
  • The readme now uses a more accessible minimal example
  • Fixed typos
  • Renamed some files (but not functions)
  • Solved more tidyselect ambiguities
  • Updated citation in citEntry()

v0.2.3

27 Oct 17:19
Compare
Choose a tag to compare

Major updates:

  • Massive improvements in speed of generating matches, especially when using many controls. This comes from vectorising a key part of the generate() function. If you set the seed with set.seed() function, your pipeline may produce a different result now, as a result of this. If you set the seed with the seed function of generate(), the result for a given seed should stay the same.

Minor updates:

  • tidyselect ambiguity in generate() solved by using dplyr::all_of().
  • Shiny app will give more informative error message when reviewing filters but no filters are used.
  • Added citation for the LexOPS paper with citEntry().
  • Updated documentation for generate() to make sure the "inclusive" option for match_null is mentioned.

v0.2.2

19 Jul 12:37
Compare
Choose a tag to compare

Mostly minor update:

  • Fix bug where non-overlapping but non-linearly ordered levels in split_by() (e.g. 1:2 ~ 5:6 ~ 3:4) would be incorrectly detected as overlapping and give an error
  • Properly export the walrus operator (rlang::`:=`) for quasiquotation in functions
  • Simplify and correct typos in documentation
  • Restyle the links in the info tab of the shiny app

v0.2.1

12 May 11:48
Compare
Choose a tag to compare

The two largest changes for this update are:

  • Manual cond_col assignment is now done in the set_options() function rather than in individual functions.
  • If overlapping splits are used for a double variable an error is thrown, rather than reading the splits incorrectly.
  • plot_sample() now includes the generated stimuli in the underlying distribution. At small Ns this makes little difference, but at large Ns ensures that the high representativeness is actually represented.
  • generate() now allows the user to silence console output with the silent argument
  • A new vignette on participant selection has been added (rendered version available here: https://jackedtaylor.github.io/LexOPSdocs/vignettes/participant-selection.html)

Other changes are:

  • Fixed typos
  • Prettified documentation
  • Added tests of reproducibility using random seeds for generate()
  • Added test of control_for_euc()
  • Tests are now run with console output from generate() suppressed