Skip to content

Commit

Permalink
Ensure numeric matching respectes precisions as described in our docu…
Browse files Browse the repository at this point in the history
…mentation (#166)

#163

### Description of changes:

As was reported in issue 163, ruler today ignores precision and causes false matches for numbers or rules with high precision numbers. This change moves away from using `double` for doing arithmetic adjustments within `ComparableNumber`. 

Along the way the API to generate comparable numbers is changed from using `Strings` instead of `Double`. This allows for more accurate rule matching for numbers with 6+ digits without compromising on performance. 

A bunch of our tests needed to be changed / fixed as a result of this change. These have been fixed. We're added additional test cases to help catch precision issues in future.
  • Loading branch information
baldawar authored Jul 8, 2024
1 parent ccafd48 commit 23e75a2
Show file tree
Hide file tree
Showing 18 changed files with 492 additions and 200 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -919,7 +919,7 @@ Events are processed at over 220K/second except for:
- wildcard matches, which are processed at over 170K/second.
- anything-but matches, which are processed at over 150K/second.
- numeric matches, which are processed at over 120K/second.
- complex array matches, which are processed at over 2.5K/second.
- complex array matches, which are processed at over 35K/second.
### Suggestions for better performance
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<groupId>software.amazon.event.ruler</groupId>
<artifactId>event-ruler</artifactId>
<name>Event Ruler</name>
<version>1.7.3</version>
<version>1.7.4</version>
<description>Event Ruler is a Java library that allows matching Rules to Events. An event is a list of fields,
which may be given as name/value pairs or as a JSON object. A rule associates event field names with lists of
possible values. There are two reasons to use Ruler: 1/ It's fast; the time it takes to match Events doesn't
Expand Down
4 changes: 1 addition & 3 deletions src/main/software/amazon/event/ruler/ByteMachine.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package software.amazon.event.ruler;

import com.fasterxml.jackson.core.io.doubleparser.JavaDoubleParser;
import software.amazon.event.ruler.input.InputByte;
import software.amazon.event.ruler.input.InputCharacter;
import software.amazon.event.ruler.input.InputCharacterType;
Expand Down Expand Up @@ -114,8 +113,7 @@ Set<NameStateWithPattern> transitionOn(final String valString) {
// patterns starting/ending with a double quotation, where as numeric patterns never do.
if (hasNumeric.get() > 0) {
try {
final double numerically = JavaDoubleParser.parseDouble(valString);
doTransitionOn(ComparableNumber.generate(numerically), transitionTo, TransitionValueType.NUMERIC);
doTransitionOn(ComparableNumber.generate(valString), transitionTo, TransitionValueType.NUMERIC);
return transitionTo;
} catch (Exception e) {
// no-op, couldn't treat this as a sensible number
Expand Down
99 changes: 65 additions & 34 deletions src/main/software/amazon/event/ruler/ComparableNumber.java
Original file line number Diff line number Diff line change
@@ -1,55 +1,86 @@
package software.amazon.event.ruler;

import com.fasterxml.jackson.core.io.doubleparser.JavaBigDecimalParser;

import java.math.BigDecimal;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
* Represents a number, turned into a comparable string
* Numbers are allowed in the range -50**9 .. +50**9, inclusive
* Comparisons are precise to 6 digits to the right of the decimal point
* They are all treated as floating point
* They are turned into strings by:
* 1. Add 10**9 (so no negatives), then multiply by 10**6 to remove the decimal point
* 2. Format to a 14 char string left padded with 0 because hex string converted from 5e9*1e6=10e15 has 14 characters.
* Note: We use Hex because of a) it can save 3 bytes memory per number than decimal b) it aligned IP address radix.
* If needed, we can consider to use 32 or 64 radix description to save more memory,e.g. the string length will be 10
* for 32 radix, and 9 for 64 radix.
*
* Note:
* The number is parsed to be java double to support number with decimal fraction, the max range supported is from
* -5e9 to 5e9 with precision of 6 digits to the right of decimal point.
* There is well known issue that double number will lose the precision while calculation among double and other data
* types, the higher the number, the lower the accuracy that can be maintained.
* For example: 0.30d - 0.10d = 0.19999999999999998 instead of 0.2d, and if extend to 1e10, the test result shows only
* 5 digits of precision from right of decimal point can be guaranteed with existing implementation.
* The current max number 5e9 is selected with a balance between keeping the committed 6 digits of precision from right
* of decimal point and the memory cost (each number is parsed into a 14 characters HEX string).
*
* CAVEAT:
* When there is need to further enlarging the max number, PLEASE BE VERY CAREFUL TO RESERVE THE NUMBER PRECISION AND
* TAKEN THE MEMORY COST INTO CONSIDERATION, BigDecimal shall be used to ensure the precision of double calculation ...
* Represents a number as a comparable string.
* <br/>
* Numbers are allowed in the range -5,000,000,000 to +5,000,000,000 (inclusive).
* Comparisons are precise to 15 decimal places, with six to the right of the decimal.
* Numbers are treated as floating-point values.
* <br>
* Numbers are converted to strings by:
* 1. Multiplying by 1,000,000 to remove the decimal point and then adding 5,000,000,000 (to remove negatives), then
* 2. Formatting to a 14-character hexadecimal string left-padded with zeros, because the hexadecimal string
* converted from 5,000,000,000 * 1,000,000 = 5,000,000,000,000,000 has 14 characters.
* <br/>
* Hexadecimal representation is used because:
* 1. It saves 3 bytes of memory per number compared to decimal representation.
* 2. It is lexicographically comparable, which is useful for maintaining sorted order of numbers.
* 2. It aligns with the radix used for IP addresses.
* If needed, a radix of 32 or 64 can be used to save more memory (e.g., the string length will be 10 for radix 32,
* and 9 for radix 64).
* <br/>
* The number is parsed as a Java {@code BigDecimal} to support decimal fractions. We're avoiding double as
* there is a well-known issue that double numbers can lose precision when performing calculations involving
* other data types. The higher the number, the lower the accuracy that can be maintained. For example,
* {@code 0.30d - 0.10d = 0.19999999999999998} instead of {@code 0.2d}. If extended to {@code 1e10}, the test
* results show that only 5 decimal places of precision can be guaranteed when using doubles.
* <br/>
* CAVEAT:
* The current range of +/- 5,000,000,000 is selected as a balance between maintaining the committed 6
* decimal places of precision and memory cost (each number is parsed into a 14-character hexadecimal string).
* When trying to increase the maximum number, PLEASE BE VERY CAREFUL TO PRESERVE THE NUMBER PRECISION AND
* CONSIDER THE MEMORY COST.
* <br/>
* Also, while {@code BigDecimal} can ensure the precision of double
* calculations, it has been shown to be 2-4 times slower for basic mathematical and comparison operations, so.
* we turn to long integer arithmetic.
*/
class ComparableNumber {
private static final double TEN_E_SIX = 1E6;
static final int MAX_LENGTH_IN_BYTES = 16;
private static final String HEXES = new String(Constants.HEX_DIGITS, StandardCharsets.US_ASCII);
public static final int NIBBLE_SIZE = 4;
static final int MAX_DECIMAL_PRECISON = 6;

// 1111 0000
private static final int UPPER_NIBBLE_MASK = 0xF0;
public static final BigDecimal TEN_E_SIX = new BigDecimal("1E6");
public static final long FIV_BILL_TEN_E_SIX = new BigDecimal(Constants.FIVE_BILLION).multiply(TEN_E_SIX).longValueExact();

// 0000 1111
private static final int LOWER_NIBBLE_MASK = 0x0F;
private static final String HEXES = new String(Constants.HEX_DIGITS, StandardCharsets.US_ASCII);
public static final int NIBBLE_SIZE = 4;
private static final int UPPER_NIBBLE_MASK = 0xF0; // 1111 0000
private static final int LOWER_NIBBLE_MASK = 0x0F; // 0000 1111

private ComparableNumber() {
}

static String generate(final double f) {
if (f < -Constants.FIVE_BILLION || f > Constants.FIVE_BILLION) {
/**
* Generates a hexadecimal string representation of a given decimal string value,
* with a maximum precision of 6 decimal places and a range between -5,000,000,000
* and 5,000,000,000 (inclusive).
*
* @param str the decimal string value to be converted
* @return the hexadecimal string representation of the input value
* @throws IllegalArgumentException if the input value has more than 6 decimal places
* or is outside the allowed range
*/
static String generate(final String str) {
final BigDecimal number = JavaBigDecimalParser.parseBigDecimal(str).stripTrailingZeros();
if (number.scale() > MAX_DECIMAL_PRECISON) {
throw new IllegalArgumentException("Only values upto 6 decimals are supported");
}

final long shiftedBySixDecimals = number.multiply(TEN_E_SIX).longValueExact();

// faster than doing bigDecimal comparisons
if (shiftedBySixDecimals < -FIV_BILL_TEN_E_SIX || shiftedBySixDecimals > FIV_BILL_TEN_E_SIX) {
throw new IllegalArgumentException("Value must be between " + -Constants.FIVE_BILLION +
" and " + Constants.FIVE_BILLION + ", inclusive");
}
return toHexStringSkippingFirstByte((long) (TEN_E_SIX * (Constants.FIVE_BILLION + f)));

return toHexStringSkippingFirstByte(shiftedBySixDecimals + FIV_BILL_TEN_E_SIX);
}

/**
Expand Down
20 changes: 10 additions & 10 deletions src/main/software/amazon/event/ruler/JsonRuleCompiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ private static void writeRules(final List<Map<String, List<Patterns>>> rules,
* so make that condition survivable.
*/
try {
values.add(Patterns.numericEquals(parser.getDoubleValue()));
values.add(Patterns.numericEquals(parser.getText()));
} catch (Exception e) {
// no-op
}
Expand Down Expand Up @@ -579,7 +579,7 @@ private static Patterns processAnythingButListMatchExpression(JsonParser parser)
break;
case VALUE_NUMBER_FLOAT:
case VALUE_NUMBER_INT:
values.add(ComparableNumber.generate(parser.getDoubleValue()));
values.add(ComparableNumber.generate(parser.getText()));
hasNumber = true;
break;
default:
Expand Down Expand Up @@ -654,7 +654,7 @@ private static Patterns processAnythingButMatchExpression(JsonParser parser,
break;
case VALUE_NUMBER_FLOAT:
case VALUE_NUMBER_INT:
values.add(ComparableNumber.generate(parser.getDoubleValue()));
values.add(ComparableNumber.generate(parser.getText()));
hasNumber = true;
break;
default:
Expand Down Expand Up @@ -728,7 +728,7 @@ private static Patterns processNumericMatchExpression(final JsonParser parser) t
if (!token.isNumeric()) {
barf(parser, "Value of equals must be numeric");
}
final double val = parser.getDoubleValue();
final String val = parser.getText();
if (parser.nextToken() != JsonToken.END_ARRAY) {
tooManyElements(parser);
}
Expand All @@ -737,7 +737,7 @@ private static Patterns processNumericMatchExpression(final JsonParser parser) t
if (!token.isNumeric()) {
barf(parser, "Value of >= must be numeric");
}
final double val = parser.getDoubleValue();
final String val = parser.getText();
token = parser.nextToken();
if (token == JsonToken.END_ARRAY) {
return Range.greaterThanOrEqualTo(val);
Expand All @@ -748,7 +748,7 @@ private static Patterns processNumericMatchExpression(final JsonParser parser) t
if (!token.isNumeric()) {
barf(parser, "Value of > must be numeric");
}
final double val = parser.getDoubleValue();
final String val = parser.getText();
token = parser.nextToken();
if (token == JsonToken.END_ARRAY) {
return Range.greaterThan(val);
Expand All @@ -759,7 +759,7 @@ private static Patterns processNumericMatchExpression(final JsonParser parser) t
if (!token.isNumeric()) {
barf(parser, "Value of <= must be numeric");
}
final double top = parser.getDoubleValue();
final String top = parser.getText();
if (parser.nextToken() != JsonToken.END_ARRAY) {
tooManyElements(parser);
}
Expand All @@ -769,7 +769,7 @@ private static Patterns processNumericMatchExpression(final JsonParser parser) t
if (!token.isNumeric()) {
barf(parser, "Value of < must be numeric");
}
final double top = parser.getDoubleValue();
final String top = parser.getText();
if (parser.nextToken() != JsonToken.END_ARRAY) {
tooManyElements(parser);
}
Expand All @@ -787,7 +787,7 @@ private static Patterns processNumericMatchExpression(final JsonParser parser) t

private static Patterns completeNumericRange(final JsonParser parser,
final JsonToken token,
final double bottom,
final String bottom,
final boolean openBottom) throws IOException {
if (token != JsonToken.VALUE_STRING) {
barf(parser, "Bad value in numeric range: " + parser.getText());
Expand All @@ -803,7 +803,7 @@ private static Patterns completeNumericRange(final JsonParser parser,
if (!parser.nextToken().isNumeric()) {
barf(parser, "Value of " + operator + " must be numeric");
}
final double top = parser.getDoubleValue();
final String top = parser.getText();
if (parser.nextToken() != JsonToken.END_ARRAY) {
barf(parser, "Too many terms in numeric range expression");
}
Expand Down
21 changes: 20 additions & 1 deletion src/main/software/amazon/event/ruler/Patterns.java
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,13 @@ public static AnythingBut anythingButMatch(final String anythingBut) {
return new AnythingBut(Collections.singleton(anythingBut), false);
}

/** use anythingButMatch(String) instead */
@Deprecated
public static AnythingBut anythingButMatch(final double anythingBut) {
return anythingButNumberMatch(Double.toString(anythingBut));
}

public static AnythingBut anythingButNumberMatch(final String anythingBut) {
return new AnythingBut(Collections.singleton(ComparableNumber.generate(anythingBut)), true);
}

Expand All @@ -78,9 +84,16 @@ public static AnythingButValuesSet anythingButIgnoreCaseMatch(final Set<String>
return new AnythingButValuesSet(MatchType.ANYTHING_BUT_IGNORE_CASE, anythingButs);
}

/** Use anythingButNumbersMatch(Set<String>) that accepts String */
@Deprecated
public static AnythingBut anythingButNumberMatch(final Set<Double> anythingButs) {
final Set<String> anyButsAsStrings = anythingButs.stream().map(d -> Double.toString(d)).collect(Collectors.toSet());
return anythingButNumbersMatch(anyButsAsStrings);
}

public static AnythingBut anythingButNumbersMatch(final Set<String> anythingButs) {
Set<String> normalizedNumbers = new HashSet<>(anythingButs.size());
for (Double d : anythingButs) {
for (String d : anythingButs) {
normalizedNumbers.add(ComparableNumber.generate(d));
}
return new AnythingBut(normalizedNumbers, true);
Expand Down Expand Up @@ -112,7 +125,13 @@ public static AnythingButValuesSet anythingButWildcard(final Set<String> anythin
return new AnythingButValuesSet(MatchType.ANYTHING_BUT_WILDCARD, anythingButs);
}

/** Use the other numericEquals(String) instead */
@Deprecated
public static ValuePatterns numericEquals(final double val) {
return numericEquals(Double.toString(val));
}

public static ValuePatterns numericEquals(final String val) {
return new ValuePatterns(MatchType.NUMERIC_EQ, ComparableNumber.generate(val));
}

Expand Down
Loading

0 comments on commit 23e75a2

Please sign in to comment.