Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 18 additions & 29 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,13 @@ Following `justfile` commands are helpful for development:

- `just develop`: compiles everything and installs the latest compiled
state of `sqlquerypp` into the current python virtual environment
which is located at `.venv/` at the repository root. Please note that
which is located in `.venv/` at the repository root. Please note that
you might need to activate it manually using `source .venv/bin/activate`.

- `just lint` checks whether all coding conventions (as defined in
`pyproject.toml` and `rustfmt.toml`) are fulfilled.

- `just format` autoformats code according to coding conventions
as much as possible.
- `just format` autoformats as much code as possible according to coding conventions.

- `just test` runs all lints and tests.

Expand All @@ -32,51 +31,41 @@ This package is mainly separated into two components:
in `sqlquerypp.compiler.Compiler` and its subclasses.

- Rust API: `src/`

- `lib.rs` is the main entrypoint to look at. It constructs a module with
the full-qualified name `sqlquerypp.sqlquerypp`. It is internal to the
- `lib.rs` is the main entrypoint to look at. It constructs a module with
the fully qualified name `sqlquerypp.sqlquerypp`. It is internal to the
Python API and exposes internally used, fast SQL preprocessor
implementations. Its python interface declaration is located in
implementations. Its Python interface declaration is located in
`python/sqlquerypp/sqlquerypp.pyi`.

- `error.rs`, `lex.rs`, `scanner.rs` and `types.rs` should be quite self-
explaining.

- The code within `parser/` is responsible for parsing nodes (i.e.
representations of `sqlquerypp` directives) and generating codes
for them.

- `ParserState` is a state automaton based parser implementation
which does the "magic" transforming `sqlquerypp` code strings
- `error.rs`, `lex.rs`, `scanner.rs` and `types.rs` should be self-explanatory.
- The code within `parser/` is responsible for parsing nodes (i.e.
representations of `sqlquerypp` directives) and generating codes
for them.
- `ParserState` is a state automaton based parser implementation
that handles the "magic" of transforming `sqlquerypp` code strings
into internal data structures (in terms of compiler construction,
called "nodes" in abstract syntax tree, although `sqlquerypp`
does not provide a correct, academic-style AST-oriented implementation).

- For example, while parsing `combined_result` instructions are
- For example, while parsing `combined_result`, instructions are
reflected as `CombinedResultNode` instances
(`src/parser/nodes/combined_result.rs`). These node objects
are obviously very low-level and stateful (many public and
optional fields).

- When generating code, it's most recommended to use
- When generating code, it is recommended to use
`CompleteCombinedResultNode` objects. This strategy
applies to all nodes `sqlquerypp` supports. See also:
- `ParserState::finalize()`
- `FinalParserState`

- `codegen/` provides common structs, traits and functions for
- `ParserState::finalize()`
- `FinalParserState`
- `codegen/` provides common structs, traits and functions for
generating valid SQL statements from a `FinalParserState`.

## Manual release workflow

- `source .venv/bin/activate`

- `maturin build --release`

- if successful, returns output like "Built wheel for CPython 3.13 to 'PATH'"
- if successful, returns output like "Built wheel for CPython 3.13 to 'PATH'"

- `maturin upload <PATH>` (use 'PATH' from last command)

- **NOTE**: This requires token-based authentication. As this is just a
- **NOTE**: This requires token-based authentication. As this is just a
quick-and-dirty solution which should not be necessary for long, I
won't document this further.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ for both maintainability and high performance.

Currently, only MySQL 8.4 syntax is supported.

## Why preprocessing SQL queries?
## Why preprocess SQL queries?

SQL (Structed Query Language) follows a declarative paradigm, i.e. a query
explains "what should be done" not "how should it be done". This stands in
contrast to imperative programming, which expresses the "how should a
certain task be fulfilled" aspect.

Database systems' internals are responsible for maintaining this aspect.
But, however, for certain and large data structures, writing down "naive"
However, for certain and large data structures, writing down "naive"
queries sometimes result in poor performance.

## Supported performance optimizations
Expand All @@ -25,21 +25,21 @@ queries sometimes result in poor performance.

Consider the following original query:

```
```sql
SELECT entity_b.*
FROM entity_b
INNER JOIN entity_a
ON entity_a.id = entity_b.entity_a_id
AND entity_a.criteria = 1337;
```

This is a very simplified example, but if you assume `entity_b` contains very
many items, even correct index conditions may exhaust any DBMS' join buffer.
This is a very simplified example, but if you assume `entity_b` contains
a multitude of items, even correct index conditions may exhaust any DBMS' join buffer.

An alternative approach might be doing a loop at application side (Python
pseudocode), if network overhead is acceptable:
If network overhead is acceptable, a fitting alternative approach could
be a loop on the application side (Python pseudocode):

```
```python
all_matches_in_entity_b = []
for entity_a_id in [rec.id
for rec in mysql_query("SELECT id FROM entity_a "
Expand All @@ -49,15 +49,15 @@ pseudocode), if network overhead is acceptable:
all_matches_in_entity_b += inner_result
```

The following statement, being no valid SQL, translates to a MySQL
The following statement, which is invalid SQL, translates to a MySQL
native construct of `Recursive Common Table Expression` and `UNION`
fragments when being compiled by `sqlquerypp`. This allows for maximal
fragments when compiled by `sqlquerypp`. This allows for maximal
query performance, because the inner query with reduced complexity
is still taken into account. At the same time, it grants minimal I/O
overhead as only one query is executed on the database:

```
```text
combined_result (SELECT id FROM entity_a WHERE criteria = 1337) AS $id {
SELECT * FROM entity_b WHERE entity_a_id = $id;
}
```
```
8 changes: 4 additions & 4 deletions flake.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions flake.nix
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05";
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
flake-utils.url = "github:numtide/flake-utils";
rust-overlay = {
url = "github:oxalica/rust-overlay";
inputs.nixpkgs.follows = "nixpkgs";
};
};

outputs = { self, nixpkgs, rust-overlay, flake-utils }:
outputs = { self, nixpkgs, rust-overlay, flake-utils}:
flake-utils.lib.eachDefaultSystem (system:
let
overlays = [ (import rust-overlay) ];
Expand All @@ -30,6 +30,7 @@
]))
just
rust
mado
];
};
}
Expand Down
1 change: 1 addition & 0 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ lint:
mypy --check
cargo fmt --check
cargo clippy --all-targets --all-features -- --deny warnings
mado check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 'mado' have an entrypoint like 'mado format' which we could add to the 'just format' command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I could find. If that is important, then we'd need to rely on another tool or develop one ourselves that reformats based on the rules we define in mado.toml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine this way. There are other lints as well (e.g. mypy) which do not have an automatic fix. You can just resolve this suggestion.


format:
ruff format
Expand Down
45 changes: 45 additions & 0 deletions mado.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
[lint]
rules = [
"MD001",
"MD002",
"MD003",
"MD004",
"MD005",
"MD006",
# Ist etwas instabil bei nested lists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our public GitHub repositories are usually written in English entirely. Could you translate this one?

# "MD007",
"MD009",
"MD010",
"MD012",
"MD013",
"MD014",
"MD018",
"MD019",
"MD020",
"MD021",
"MD022",
"MD023",
"MD024",
"MD025",
"MD026",
"MD027",
"MD028",
"MD029",
"MD030",
"MD031",
"MD032",
"MD033",
"MD034",
"MD035",
"MD036",
"MD037",
"MD038",
"MD039",
"MD040",
"MD041",
"MD046",
"MD047",
]

[lint.md026]
punctuation = ".,;:!"