diff --git a/SEP/README.md b/SEP/README.md index 0e0a392..69ae6ea 100644 --- a/SEP/README.md +++ b/SEP/README.md @@ -14,4 +14,5 @@ This area contains the proposals (SEPs). * [SEP-0007](SEP-0007/sep-0007.md): Variable Substitution * [SEP-0008](SEP-0008/sep-0008.md): SHA-3 * [SEP-0009](SEP-0009/sep-0009.md): SPARQL CDTs: extensions for composite datatypes (lists and maps) +* [SEP-0010](SEP-0010/sep-0010.md): Alignment of SPARQL Built-in Functions with ISO SQL Standard Functions diff --git a/SEP/SEP-0010/sep-0010.md b/SEP/SEP-0010/sep-0010.md new file mode 100644 index 0000000..75ab6da --- /dev/null +++ b/SEP/SEP-0010/sep-0010.md @@ -0,0 +1,132 @@ +## Alignment of SPARQL Built-in Functions with ISO SQL Standard Functions + +## Short name +SPARQL-SQL-FUNCTIONS + +## SEP number +SEP-10 + +## Authors +Dominik Tomaszuk (University of Bialystok) + +## Abstract +SPARQL 1.1 defines a limited set of built-in functions for string manipulation, numeric operations, date/time handling, and conditional logic. However, many commonly used functions standardized in ISO/IEC 9075:2023 (SQL:2023) are not currently available in SPARQL. This SEP proposes extending SPARQL with additional non-aggregate functions from the SQL standard to improve interoperability, completeness, and usability. Functions such as `TRIM`, `LPAD`, `RPAD`, `MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `DATE_ADD`, `TIMESTAMPDIFF`, `CASE`, `NULLIF`, `GREATEST`, and `LEAST` are widely used in database query processing but lack equivalents in SPARQL. By introducing these functions, SPARQL can align better with existing standards, reduce the learning curve for developers, and provide richer query expressivity for RDF data. + +## Motivation +SPARQL 1.1 (2013) provides only a minimal set of built-in functions compared to SQL. +Key limitations include: +- Missing string manipulation functions (`TRIM`, `LPAD`, `RPAD`, `POSITION`). +- Missing numeric/math functions (`MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `SIN`, `COS`, `TAN`). +- Limited date/time support (no `DATE_ADD`, `TIMESTAMPDIFF`, or `INTERVAL` arithmetic). +- Missing conditional/logical functions (`CASE`, `NULLIF`). +- No generalized comparative functions (`GREATEST`, `LEAST`). + +These gaps limit SPARQL’s usability in data integration and analytics scenarios where users expect similar functionality to SQL. They also complicate interoperability in hybrid systems where RDF data is queried alongside relational databases. + +Scope: This change affects the **SPARQL functions and operators specification**, not the core query language semantics. + +## Rationale and Alternatives +Rationale: +- **Interoperability**: SQL (ISO/IEC 9075:2023) is the most widely deployed query language. Aligning SPARQL functions with SQL reduces friction in adopting SPARQL. +- **Developer familiarity**: Many practitioners know SQL but not SPARQL. Familiar function names and semantics ease adoption. +- **Expressivity**: The missing functions require complex workarounds or external processing in current SPARQL. + +Alternatives considered: +1. Keep SPARQL minimal and rely on external application logic. +2. Define SPARQL-only extensions with new function names. +3. Adopt ISO SQL function names directly to ensure compatibility. + +This SEP recommends option (3) for consistency with established standards. + +## Evidence of consensus +- Multiple research works and developer reports highlight frustration with missing SPARQL functions. +- W3C Community Group discussions on SPARQL 1.2 already acknowledge gaps in function support. +- SQL alignment (ISO/IEC 9075:2023) has been proposed informally in workshops and mailing lists. + +## Specification +The following new functions are proposed to be added to SPARQL: + +### String functions +- `TRIM(string)`, `LTRIM(string)`, `RTRIM(string)` +- `LPAD(string, length, padchar)` +- `RPAD(string, length, padchar)` +- `POSITION(substring IN string)` + +### Numeric functions +- `MOD(numeric, numeric)` +- `POWER(x, y)` +- `SQRT(x)` +- `EXP(x)` +- `LN(x)`, `LOG10(x)` +- `SIN(x)`, `COS(x)`, `TAN(x)` + +### Date/Time functions +- `DATE_ADD(date, interval)` +- `TIMESTAMPDIFF(unit, t1, t2)` +- Support for `INTERVAL` literals (e.g., `INTERVAL '7' DAY`) + +### Conditional and logical functions +- `CASE WHEN ... THEN ... ELSE ... END` +- `NULLIF(x, y)` + +### Comparative functions +- `GREATEST(x1, x2, …)` +- `LEAST(x1, x2, …)` + +Each function should follow ISO/IEC 9075:2023 semantics, adapted for RDF datatypes (notably `xsd:dateTime`, `xsd:decimal`, etc.). + +## Backwards Compatibility +- No impact on existing queries: all proposed functions are new additions. +- Existing SPARQL functions (`STRLEN`, `UCASE`, `LCASE`, etc.) remain valid. +- Overlaps (e.g., `CONCAT`) follow existing SPARQL semantics aligned with SQL. + +## Tests and Implementations +- Test cases must cover typical inputs, edge cases (e.g., empty strings, NaN, null-equivalent values), and datatype conversions. +- Prototype implementations could be built on top of Apache Jena ARQ and RDF4J. +- Alignment tests should compare outputs against equivalent SQL queries on relational backends. + +--- + +## Appendix A: Function Mapping between SQL and SPARQL 1.1 + +| SQL Function | SPARQL 1.1 Equivalent | +|-----------------|------------------------| +| LENGTH | STRLEN | +| TRIM | | +| LTRIM | | +| RTRIM | | +| LPAD | | +| RPAD | | +| POSITION | | +| UPPER | UCASE | +| LOWER | LCASE | +| SUBSTRING | SUBSTR | +| CONCAT | CONCAT | +| REPLACE | REPLACE | +| REGEXP_MATCHES | REGEX | +| ABS | ABS | +| MOD | | +| CEIL / CEILING | CEIL | +| FLOOR | FLOOR | +| ROUND | ROUND | +| EXP | | +| LN | | +| LOG10 | | +| POWER | | +| SQRT | | +| SIN | | +| COS | | +| TAN | | +| CURRENT_TIMESTAMP | NOW | +| EXTRACT | YEAR, MONTH, DAY, HOURS, MINUTES, SECONDS | +| INTERVAL | | +| DATE_ADD | | +| TIMESTAMPDIFF | | +| CASE | | +| COALESCE | COALESCE | +| NULLIF | | +| GREATEST | | +| LEAST | | +| CAST | STR(), xsd:type(...) | +| CURRENT_USER | | +