Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustify DOMXPath::quote #13545

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

divinity76
Copy link
Contributor

@divinity76 divinity76 commented Feb 27, 2024

I am fairly certain DOMXPath::quote is bug-free due to the extensive testing done by @nielsdos , quote:

I've let it fuzz to search for crashes for a while now. It tried about 23 million cases and all seems good.
If anyone is interested, here's the fuzz harness: https://gist.github.com/nielsdos/44981a753808129b0222f3ef28429c69

That is impressive, and made me think, what if we take it 1 step further and re-write it in Rust, with it's memory safety properties?

For those not in the know, Rust is a memory-safe high-performance language.

The Linux Kernel and the Windows kernel and Mozilla Firefox has adopted Rust for it's performance and safety, and the Chromium browser developers is looking into doing the same for Chromium,

Maybe PHP can adopt it too? 😄

@TimWolla
Copy link
Member

Maybe PHP can adopt it too? 😄

Adding a new build-time dependency most certainly requires an RFC, because it will exclude platforms where a rust compiler is not available or not officially supported.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a big fan of Rust, and have used it in my own projects.
Adopting this in PHP may be beneficial, but requires the right abstractions to be put in place, and it's not a call I can make on my own. Tim is right that the introduction of any other language into PHP needs an RFC.

ext/dom/xpath_rust.h Outdated Show resolved Hide resolved
};

// Update length
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should at least be able to get rid of this unsafety here.
I suggest returning a ffi-compatible struct that contains both the string as raw parts, and the length.

}

// Convert Vec<u8> to *mut c_char
let c_str = CString::new(result).expect("CString::new failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this would leak memory, as you're not freeing the (persistently-allocated) memory of the string at the call site. Using RETURN_STRINGL also makes an extra allocation + copy which is unfortunate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question, would it be better to handle this more "smoothly" ? ie using match pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can actually fail because the input may contain \0 bytes. So we must not panic by using expect. And this should probably use a Zend API to avoid the reallocation too, will be much cleaner in the end with the right access and abstractions to Zend APIs.


#[no_mangle]
pub extern "C" fn domxpath_quote_literal(input: *const c_char, len: *mut usize) -> *mut c_char {
let slice = unsafe { std::slice::from_raw_parts(input as *const u8, *len as usize) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should preferably contain a "safety requirement" comment, i.e. that it is expected that the caller does not hold any mutable references to the input. I know it's a bit trivial in this case, but when using unsafe I'd rather see it documented, and the caller must know about it too to avoid soundness issues.

let double_quote_absent = !slice.contains(&b'"');

let result = if single_quote_absent {
let mut res = Vec::with_capacity(slice.len() + 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without additional infrastructure this bypasses PHP's memory manager and thus PHP's memory limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I wonder how difficult it is to give rust access to php's memory manager 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way I see is to implement GlobalAlloc to use emalloc/efree.

I can submit a separate PR for this, and possibly an RFC, if anyone's interested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GlobalAlloc makes sense, request-based allocations are the most common case anyway.
If you go for an RFC then we'd need more than just the allocation interface, but we also need proper access to the Zend data structures. So this is more work than we might expect at first, but my personal opinion is that it's a good idea to allow memory safe languages inside PHP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danog please do. I won't have time/energy to contribute for some time.

@@ -0,0 +1 @@
extern char* domxpath_quote_literal(const char *const input, uintptr_t *const len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of uintptr_t looks odd for a size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's adopted from rust's usize, quote

Primitive Type usize: The pointer-sized unsigned integer type.

php-src's uintptr_t should be equivalent to rust's usize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

php-src's uintptr_t should be equivalent to rust's usize

Nope. size_t is the type to use for sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, apparently Rust's usize maps to uintptr_t indeed, but this isn't really accurate from a C perspective 🤔 I expect both types to be the same on modern platforms, but it bothers me.

Copy link
Contributor Author

@divinity76 divinity76 Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh there actually is a core::ffi::c_size_t in Rust-nightly, but it's not released/stable yet, quoting https://doc.rust-lang.org/core/ffi/type.c_size_t.html :

🔬This is a nightly-only experimental API. (c_size_t #88345)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume we'd depend on libc at some point anyway, so libc::size_t can probably be used.

Comment on lines +446 to +447
const char *ouput = domxpath_quote_literal(input, &input_len);
RETURN_STRINGL(ouput, input_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ouput. This construct will also require a full allocation + copy into a new zend_string().

@divinity76
Copy link
Contributor Author

RFC

I know :) but I had some questions, like

  • how difficult is it to add .rs / rust files to the build system? buildconf/configure/make all that stuff
  • how difficult is it to pass strings to/from rust/c?

and this is a cool, functional proof of concept :) (at least it works locally on my machine. The CI's need rustc)

@nielsdos
Copy link
Member

how difficult is it to add .rs / rust files to the build system? buildconf/configure/make all that stuff

For Makefiles, it's just a matter of invoking rustc. For buildconf and configure I'm not too sure, better ask the build system gurus :-)

how difficult is it to pass strings to/from rust/c?

Ideally, we'd be able to use zend_string both inside Rust and C.
Do you know about this? https://github.com/davidcole1340/ext-php-rs I haven't used it personally but it might be interesting to look at. They expose the Zend types.

@divinity76
Copy link
Contributor Author

Ideally, we'd be able to use zend_string both inside Rust and C.

Yeah that would be ideal

Do you know about this? https://github.com/davidcole1340/ext-php-rs I haven't used it personally but it might be interesting to look at. They expose the Zend types.

Nope! Neat! Wonder if i could get those guys' input here, ill try contacting them

@Crell
Copy link
Contributor

Crell commented Feb 28, 2024

Adding a new build-time dependency most certainly requires an RFC, because it will exclude platforms where a rust compiler is not available or not officially supported.

Are there Rust-unsupported platforms that PHP for-reals supports? (Not just claimed to 20 years ago and forgot to check.)

This would definitely require an RFC as it's a big policy change, though I'd be supportive, depending on the specifics.

@TimWolla
Copy link
Member

Are there Rust-unsupported platforms that PHP for-reals supports? (Not just claimed to 20 years ago and forgot to check.)

As per https://doc.rust-lang.org/nightly/rustc/platform-support.html:

OpenBSD is in Tier 3 support for Rust, which means:

Tier 3 targets are those which the Rust codebase has support for, but which the Rust project does not build or test automatically, so they may or may not work.

Solaris / illumos is in Tier 2 without Host Tools which means:

Automated tests are not always run so it's not guaranteed to produce a working build, but tier 2 targets often work to quite a good degree and patches are always welcome!

I'm fairly confident that PHP works fine on both and I expect both of them to be in production use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants