Releases: JackEdTaylor/LexOPS
v0.4.0
Important: This update includes a major change which may alter the reproducibility of some old pipelines - especially if split_by()
was used on columns of type double. Take care to use versions prior to this release when re-running old code with LexOPS.
Update to split_by()
:
Simplified numeric splits in split_by()
. This includes removing the use of the cut()
method, and using the same method for double and integer types. The old method may have produced some unexpected behaviour when splitting by columns stored as double if the levels overlapped. See issue #6 for more details.
Another change with this new method is that, while splits can still be specified out of order (e.g., 4:5 ~ 1:3
), the specified order is now preserved, whereas before an attempt was made to sort them. This means that A1 will now be 4:5
, and A2 will be 1:3
, whereas previous versions would have forced A1 to be the lower level of 1:3
, and A2 to be the higher level of 4:5
.
Other Major updates:
- Related to the change above, numeric splits can no longer be overlapping at all (e.g.,
1:2 ~ 2:3
used to be acceptable, but will now produce an error, as it is unclear to which group a value of2
would belong). - Added
equal_size
argument tosplit_random()
. Settingequal_size=TRUE
will ensure that the split has equally (or as close to equal as possible) sized groups. This option will typically enable more candidate matches. This option was added in response to issue #4. - The
generate()
function checks that theid_col
uniquely identifies items, and gives an error if this is not the case. This avoids duplicate IDs causing incorrect matching. Addresses issue #5. run_shiny()
now checks forstringdist
package and will generate code to install if missing.
Minor Updates:
- Updated to base R pipe,
|>
, in examples. - Unnecessary dependencies (
vwr
,plyr
) have been removed from the shiny app. - All S3 methods now exported (previously only
print.LexOPS_pipeline
was exported).
Updates to Tests:
- Removed deprecated
testthat
argument. - Now tests the
equal_size
argument ofsplit_random()
. - Now tests that duplicates in
id_col
gives an error. - Ensured that variables that undergo
scale()
in tests are stored explicitly as numeric vectors. This addressed a deprecation warning fromdplyr::filter()
about 1-column matrices that was produced from one test. - Removed overlapping levels from all tests.
- Added tests for
split_by()
errors.
v0.3.1
Major Updates:
- Removed
vwr
package dependency in Shiny app as package has been removed from CRAN, replacing code withstringdist
functions. This may fix instances of the Shiny app failing to load. - Can now pass arguments via
...
tocontrol_for_map()
.
Minor Updates:
- S3 object printing will print "?" if no splits defined.
- Changed
control_for_map()
documentation to usestringdist
instead ofvwr
. - More informative errors if no splits or controls are defined.
- New tests for errors.
- New tests for
split_random()
. - New tests for passing arguments to
control_for_map()
via...
. - New tests for effects of function calls order in pipeline.
- Updated fontawesome icons in Shiny app.
v0.3.0
Major updates:
- Added a new S3 class with a generic print function - printing objects output by
split_by()
,control_for()
, and variants of these functions will now print a summary of the pipeline rather than the original dataframe. - Improved data compression for
LexOPS::lexops
(now usesxz
). - Fixed rare bug which could sometimes cause an error if an iteration failed by returning the wrong object from the function.
Minor updates:
- Added test for Euclidean weights from
control_for_euc()
. - Default plots are now neater (avoid colour recycling and use clearer theme).
v0.2.7
Major updates:
- Fixed the calculation of weighted Euclidean distance. Previous versions would have calculated the distance of weighted items from unweighted targets, rather than comparing the items and target in the same space. This has been fixed, and will change the results of any code using weights in
euc_dists()
orcontrol_for_euc()
. - Implemented weighting standardisation in calculating Euclidean distances, so that
c(1, 3)
,c(10, 30)
, andc(100.33, 300.99)
are all equivalent toc(0.5, 1.5)
. This means that similar tolerances can be used when weighting schemes change. The standardisation isweights/mean(weights)
so thatsum(weights)==length(weights)
. Standardisation can be disabled by settingstandardise_weights=FALSE
.
Minor updates:
- Updated the Euclidean distance vignette to reflect the changes made to weighting.
- Added tests that weights and weight standardisation for Euclidean distance work as described.
v0.2.6
Minor updates:
- Updated the hex sticker.
- Updated the documentation to mention support for no
id_col
variable inset_options()
. - Simplified the readme.
v0.2.5
Major update:
- Fixed a bug with the concatenation of calls to
control_for_map()
where some calls would be overwritten/ignored.
Minor update:
- If
id_col
(defined inset_options()
) is missing, the output will use row numbers to uniquely identify items.
v0.2.4
Minor updates:
- Printing progress now happens every 5% rather than 10%, and uses carriage return
- Printing progress in the Shiny app now matches console output exactly
- The readme now uses a more accessible minimal example
- Fixed typos
- Renamed some files (but not functions)
- Solved more tidyselect ambiguities
- Updated citation in
citEntry()
v0.2.3
Major updates:
- Massive improvements in speed of generating matches, especially when using many controls. This comes from vectorising a key part of the
generate()
function. If you set the seed withset.seed()
function, your pipeline may produce a different result now, as a result of this. If you set the seed with theseed
function ofgenerate()
, the result for a given seed should stay the same.
Minor updates:
- tidyselect ambiguity in
generate()
solved by usingdplyr::all_of()
. - Shiny app will give more informative error message when reviewing filters but no filters are used.
- Added citation for the LexOPS paper with
citEntry()
. - Updated documentation for
generate()
to make sure the"inclusive"
option formatch_null
is mentioned.
v0.2.2
Mostly minor update:
- Fix bug where non-overlapping but non-linearly ordered levels in
split_by()
(e.g.1:2 ~ 5:6 ~ 3:4
) would be incorrectly detected as overlapping and give an error - Properly export the walrus operator (
rlang::`:=`
) for quasiquotation in functions - Simplify and correct typos in documentation
- Restyle the links in the info tab of the shiny app
v0.2.1
The two largest changes for this update are:
- Manual
cond_col
assignment is now done in theset_options()
function rather than in individual functions. - If overlapping splits are used for a double variable an error is thrown, rather than reading the splits incorrectly.
plot_sample()
now includes the generated stimuli in the underlying distribution. At small Ns this makes little difference, but at large Ns ensures that the high representativeness is actually represented.generate()
now allows the user to silence console output with thesilent
argument- A new vignette on participant selection has been added (rendered version available here: https://jackedtaylor.github.io/LexOPSdocs/vignettes/participant-selection.html)
Other changes are:
- Fixed typos
- Prettified documentation
- Added tests of reproducibility using random seeds for
generate()
- Added test of
control_for_euc()
- Tests are now run with console output from
generate()
suppressed