-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of helper library for Rust UDFs #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general tips on how the code structure could be improved:
The scylla_bindgen
function is very complex, has lots of variables and operates directly on AST which is a bit hard to parse. It definitely needs to be broken down into smaller components. It's not really clear at the first glance what is the general idea behind the transformation it performs.
In order to improve readability and maintainability, I suggest introducing an intermediate representation - an abstract model that would describe the user-supplied function in such a way that generating code later is easy.
For example (I'm making this up, might not compile):
struct UserDefinedFunctionDesc {
arguments: Vec<ArgumentDesc>,
return_type: ReturnTypeDesc,
original_function_name: syn::Ident,
original_body: syn::Block,
}
struct ArgumentDesc {
name: syn::Ident,
typ: syn::Type,
}
struct ReturnType {
typ: syn::Type,
}
First, you could build the model:
// Purposefully avoiding error handling here for simplicity
impl UserDefinedFunctionDesc {
fn new(attrs: TokenStream, input: TokenStream) -> Self {
let func = parse::<ItemFn>(item).unwrap();
let arguments = func.sig.inputs.iter().map(ArgumentDesc::new).collect();
let return_type = ReturnType::new(func.sig.output);
// ... more stuff ...
Self {
arguments,
return_type,
original_function_name,
original_body,
}
}
}
...and then synthesize the code based on it:
impl UserDefinedFunctionDesc {
fn generate(&self) -> TokenStream {
let exported_function = self.generate_exported_function();
let inner_function = self.generate_inner_function();
quote! {
#exported_function
#inner_function
}
}
fn generate_exported_function(&self) -> syn::ItemFn { /* ... */ }
fn generate_inner_function(&self) -> syn::ItemFn { /* ... */ }
}
impl ArgumentDesc {
fn exported_arg_type(&self) -> syn::Argument { /* ... */ }
fn inner_arg_type(&self) -> syn::Argument { /* ... */ }
}
// etc. etc.
I addressed most of the comments (though not all yet) and added a few more features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks much better now, thanks. It's easier to see what is going on.
Apart from the comments, there are some other considerations:
- All public items should be documented: traits (
FromCQL
,AsCQL
,ToColumnType
,WasmConvertible
, ...), types (SizePointer
), maybe more.
If you want to hide some items from the users but you still need to make them public for the purpose of the procedural macros, then you can use the#[doc(hidden)]
attribute. You can also add docstrings to those items that will state that those types are not a part of the public API and may change in minor releases. - Some basic CI would be nice: format check, compilation, clippy check and tests. You can use scylla-rust-driver's CI definition as a base (you should slim it down however, e.g. no need to setup a scylla instance, at least not yet).
- I wrote it already in one of the comments, but I'll repeat here: please add tests!
- Dependency on the local
scylla-cql
crate should be dropped, after all changes are upstreamed there.
bindgen-macros/src/udf.rs
Outdated
static mut DECLARED_ABI: std::sync::atomic::AtomicBool = | ||
std::sync::atomic::AtomicBool::new(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this approach really work? Imagine that you are having multiple dependency crates that define some UDFs. Most likely, separate crates will be compiled by separate processes, so they will have separate instances of this flag and each will emit its own _scylla_abi
. I would expect that you will get into trouble if you try to link everything together.
Maybe we could have a separate (declarative?) macro that defines the _scylla_abi
symbol? You would only have to define it in the root crate.
Alternatively, maybe there are some linker attributes that can be used to make it possible to define the symbol multiple times (it would be deduplicated)? There is a risk that such a solution could be non-portable, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Not up to date, see answer below
I tried a different approach in the rebase - the _scylla_abi
static is now exported directly in the crate, and not in the macro.
I exported it with #[no_mangle]
and #[used]
attributes, but in theory it may not be enough to make it end up in the final wasm binary. This would be achieved using the #[used(linker)]
unstable feature, but if I'm not mistaken, #[used]
guarantees that it reaches the linking stage, and we can make sure it doesn't get stripped during linking by referencing it in a public function.
We could reference it in the UDF, but that would make it (slightly) slower. We could also create a new function that does reference it - that makes the binary (slightly) bigger - I went with that for now, because for a very simple function that only performs some calculations the cost of accessing the global seems proportionally higher than the bigger executable, which will always be quite sizeable because it contains the implementation of malloc
.
Note that, for now, the function is actually not needed - for non-ELF targets the #[used]
attribute does all that we want (in fact, even #[no_mangle]
is enough), but in theory it's not guaranteed.
I'm also not against adding a reference to the original generated function implementation, if that would be good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that neither #[used]
nor pub
have any useful effect in terms of exporting the symbol in the final binary. At least in the current Rust version, we have to tag an item with #[no_mangle]
or #[export_name = ...]
and it will be exported regardless of whether it was placed in the root crate or whether it was even pub
.
However, this behavior is not guaranteed to continue, but I found no immediate plans of changing it.
Because of that, I suggest that we depend on it for now. It has 2 main benefits: we don't have to make implementation hacks to make it work, and we don't have to require users to be extra careful in terms of placement of the items exported to Scylla and their visibility setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the current Rust version, we have to tag a item with
#[no_mangle]
or#[export_name = ...]
and it will be exported regardless of whether it was placed in the root crate or whether it was evenpub
. However, this behavior is not guaranteed to continue, but I found no immediate plans of changing it.
Could you point me to the relevant documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior here isn't well documented. We can see that the items survive until the linking phase here, but the fact that they are also present in the final binary is not guaranteed in general.
This behavior is actually causing some issues because there is no way to prevent exporting an item tagged with #[no_mangle]
in the final library. The discussion however, is more about how to make it possible to hide such items, and not about changing this default behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about it recently on the meeting and it looks like we should prioritize ease of use. I'm OK with the current approach, especially if it doesn't seem that it will be changed soon.
bindgen-macros/src/udt.rs
Outdated
// We have to make the struct public, otherwise we can't use "<$struct_name as WasmConvertible>::WasmType" | ||
// in the exported function signature even if WasmType is public. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a bit surprising for the user to find out that their struct is made public, even though they didn't mark it that way.
Is the error message clear in case the struct is not pub
, i.e. tells that the problem is caused by the struct having wrong visibility and explains why it is needed? If so then I think it's better to inform the user and let them make the fix themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the rebase, the exported UDF function doesn't even have to be pub
so we also don't have to concern ourselves with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that there is still a similar issue in the export_newtype
macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just forgot to remove the comment there.
There are still some comments I haven't addressed in the last rebase, but at least all the comments from the previous review are done now. |
76d89d9
to
39b96ec
Compare
The last rebase fixed a few comments and again changed the structure of the project - this time mostly due to a lot of renames, but also the The things that weren't fixed in the last rebase are: CI and tests, so I think we can start reviewing this part already.
I'm not actually exposing anything other than the macros and some cql re-exports from The other thing that I have to add is a README, but it can also wait until after the code is fixed, and the ability to rename this crate (will be practically the same as what we've added to |
154351b
to
b142867
Compare
Ok. It looks like those items are not exported because their modules are not exported, so it's fine.
No need, the docstrings in the generated documentation should be inherited from |
737cd83
to
0b93f96
Compare
Looks like I'll have to split your attention again @piodul, as we decided we'll try to finish this before UDF permissions after all. |
One additional consideration before publishing the crate is whether we want to change the repo name. |
The crates that you are developing are supposed to help with developing UDFs, so I think they should have I think that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wasm_conversion.rs
file is empty? I see that in one of the force-pushes you moved its contents out to another file, but perhaps forgot to delete the old file?
82d01af
to
87be2a3
Compare
Correct, it should be deleted in the rebase |
README.md
Outdated
| BLOB | Vec\<u8\> | | ||
| BOOLEAN | bool | | ||
| COUNTER | scylla_udf::Counter | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: some rows in that table are mis-formatted, please fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
I can't get the crate to compile correctly:
I think you missed some places when renaming the crates. Besides, the directory names for the crates are still I recommend rebasing the PR on top of main/master, so that the CI has a chance to run. |
975c18c
to
1df7487
Compare
All the cases of scylla-bindgen should be now correctly replaced with scylla-udf. I also made some fixes for rustfmt, clippy and rustdoc, so that CI passes |
I think it looks OK. I have some final requests before we can merge:
|
5693722
to
007c39c
Compare
I tested examples with using Scylla's The tests uncovered that one example can't be used with scylla - the combine.rs example took both a counter and a cql duration as arguments, but they cannot be columns of the same table (all non-counter columns in a counter table have to be parts of the partition key, and duration cannot be in a partition key). The topn.rs has also been updated, because the current implementation produced a bit unexpected results - it did return the n longest strings without repetitions, but different strings with the same length were also counted as repetitions. Also, the previous implementation of the row function didn't handle nulls, so if a null row was encountered in the UDA execution, the UDA returned null as well. I also noticed that the |
@piodul is it ok if we merge this before transferring the repo to Scylla? |
@wmitros please squash all the commits (apart from |
Ok, everything's squashed (I merged |
No description provided.