Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0] Consider abandoning the polyfill #92

Closed
rtheunissen opened this issue Jul 19, 2017 · 75 comments
Closed

[2.0] Consider abandoning the polyfill #92

rtheunissen opened this issue Jul 19, 2017 · 75 comments
Assignees
Milestone

Comments

@rtheunissen
Copy link
Member

There are a few reasons why I believe we should consider abandoning the polyfill:

  • It's a performance risk. Someone might not realise that they don't have the extension enabled or share code with someone that doesn't have the extension installed, which results in a very inefficient implementation of the structures.
  • It's blocking some changes in the extension, like operator overloading on sets (for diff, xor, union and complement) and the ability to allow == to behave like it does for arrays.
  • There isn't a good reason for anyone to use the polyfill on purpose. If they don't have access to their environment to install the extension, they shouldn't be using the library at all.
  • It's extra work to maintain the polyfill, extension and tests in separate repositories. If we drop the polyfill we can move the tests into the extension repository and tag them together.
@rtheunissen rtheunissen self-assigned this Jul 19, 2017
@rtheunissen rtheunissen added this to the 2.0.0 milestone Jul 19, 2017
@rtheunissen
Copy link
Member Author

One thing to note is that the polyfill also acts as an autocomplete source. If we decide to abandon the polyfill in 2.0, we would need to keep the stubs anyway. Something we could do is maintain a composer package that contains the tests and the stubs, and requires the extension as ext-ds, and have the extension repository require that package. That way we're still separating the extension from the tests and the stubs, and we're enforcing that the extension is installed if a project has the composer package as a dependency.

@kelunik
Copy link

kelunik commented Jul 20, 2017

Without a polyfill we (amphp) will never use ds. Things must just work for all basic things.

@rtheunissen
Copy link
Member Author

rtheunissen commented Jul 20, 2017

@kelunik but you're taking two steps back by using the polyfill at all. Anyone without knowledge of the extension using amphp will suffer serious performance and memory issues.

Edit: noticed that php-ds is not actually a dependent of amphp?

@kelunik
Copy link

kelunik commented Jul 20, 2017

Right, we currently don't use it, but if we would, we'd want it to be runnable by default without any additional extensions. It's currently not worth for the few cases we have.

@stepzerosolutions
Copy link

With first installation attempt fail, I've installed the extension without any issue on a new php build. I'm new to php-ds and still doing testing, I don't see any reason not to abandoned it if nobody is using it.

@rtheunissen
Copy link
Member Author

@TheCelavi @jkuchar @dantudor @krakjoe @nikic in light of this issue and #8 (Pure PHP implementation), I would like to ask if anyone has any strong opinions here.

In my opinion:

Reasons to abandon:

  • No chance that the library decreases performance (either extension or not at all).
  • No need to maintain the second implementation - can focus on the extension.
  • There isn't a practical reason why anyone would be using the polyfill, eg. if you're in an environment where you don't have access to install the extension, you shouldn't be using the library at all.

Reason to maintain:

  • Improves adoption, can play around without installed the extension.
  • Allows code to run in an environment where the extension isn't installed.
  • Doubles as autocomplete stubs.
  • Most of it is already implemented.
  • We'll have to maintain it for 1.0 anyway.

@jkuchar
Copy link

jkuchar commented Jul 23, 2017 via email

@rtheunissen
Copy link
Member Author

rtheunissen commented Jul 25, 2017

@jkuchar I think what dawned on me is that you would be better off not upgrading PHP or not using the extension than if you were using the polyfill.

I do not want to mess with the compilation process on Windows

You shouldn't need to at all, there are .dll's in the releases on Github and on PECL here.

@jkuchar
Copy link

jkuchar commented Jul 31, 2017 via email

@rtheunissen
Copy link
Member Author

rtheunissen commented Aug 3, 2017

I didn't know it made it into PHP 7.1 default bundle

Not that I'm aware of, but it's compiled and available on PECL.

The mission of polyfill - to overcome the fear of getting dependent on new extension whose development can be abandoned - is completed.

The mission of the polyfill is:

  • Prevent code from not working at all in environments where the extension couldn't be installed or hasn't been installed yet
  • Provide IDE autocomplete by having the method signatures in your /vendor
  • Provide an easy way for people to "play with" the API without the need to install the extension.

I'm starting to feel like maybe failing hard when the extension is not available is a good thing. Because you'd be well worse off then than if you were using arrays.

I'm not convinced either way yet, so this is still just a consideration. 🤔

@NoodlesNZ
Copy link
Contributor

I agree with abandoning the polyfill. If the performance is not there then it's not worth having (it defeats the purpose of using ds). Maintaining the polyfill takes effort away from the extension.

It shouldn't be that hard to maintain stubs for autocomplete.

@rtheunissen
Copy link
Member Author

That's what I'm leaning towards at the moment. I fail to see what value it has going forward. Either use the extension or don't use the library at all.

@nikic
Copy link
Contributor

nikic commented Aug 3, 2017

How bad is the performance of the polyfill?

@rtheunissen
Copy link
Member Author

It's not horrendous. The only operation that makes me nervous is put and get on Map being O(n).

Haven't done any benchmarks.

@NoodlesNZ
Copy link
Contributor

The polyfill test suite runs on my mac:

Time: 24.97 seconds, Memory: 20.00MB

OK (2790 tests, 7059 assertions)

extension:

Time: 1.14 seconds, Memory: 12.00MB

OK (2790 tests, 7068 assertions)

@nikic
Copy link
Contributor

nikic commented Aug 3, 2017

The only operation that makes me nervous is put and get on Map being O(n).

OMG. That's really bad.

If that's the state of the polyfill implementation, I would definitely abandon it. On the other hand, with that issue resolved performance might be acceptable.

@rtheunissen
Copy link
Member Author

I don't know if it's a solvable problem, unless we build a hashtable using an array, but the memory usage would be huge if we do that. Strong types on the keys and objects as keys makes it difficult to use an array as the internal store because we can't index.

On top of that we still have to maintain insertion order.

@nikic
Copy link
Contributor

nikic commented Aug 3, 2017

@rtheunissen The current definition of the Hashable interface is what makes this hard. If Hashable::hash() would be required to uniquely identify an object (of the given class) it should be easy to implement something relatively efficient.

@rtheunissen
Copy link
Member Author

@nikic that would make it easy yes, but that's not realistic here. We can't use hash for equality also, it's only to improve distribution of the keys.

@nikic
Copy link
Contributor

nikic commented Aug 3, 2017

@rtheunissen Given how people would implement this function in userland, I don't think it's unrealistic. As we don't expose hashing primitives (for strings in particular), the most obvious ways of implementing hash() are also going to be collision-free.

@rtheunissen
Copy link
Member Author

For example, a database entity's primary key? I can see how the common implementation would double as an identity function. But I don't like the idea of changing the definition simply for the sake of accomodating the polyfill.

There's also the case of using an array as a key.. where the hash is now the length of the array, and get/put is O(k) because we have to compare the elements of the array.

Intuitively I can't get my head around how we can do this efficiently in userland without a proper native array type that we can use to build a hashtable.

@nikic
Copy link
Contributor

nikic commented Aug 3, 2017

I wouldn't worry about performance of array keys, that's a very unusual use-case.

You can't do this really efficiently in userland, but implementing a userland HT should also not be much less memory efficient than what you have now (due to the general object overhead, adding two more fields on top of Pair is not going to double the memory usage).

But then again, implement anything here is going to take time and I can see how that time might be better spent on something else...

@rtheunissen
Copy link
Member Author

Yeah I just don't see the point of going out of our way here.

@kelunik
Copy link

kelunik commented Aug 3, 2017

Either use the extension or don't use the library at all.

That only works if the user of the library is also controlling the system where the app is to be deployed.

@rtheunissen
Copy link
Member Author

@kelunik which is why I'm saying if you don't control the system, don't use the library. In saying that, if it's a dependency of another dependency, you're in trouble.

But how is that any different to extensions like mongo or pthreads?

My concern is that if you don't have control of the system now, you probably won't ever have, and you're then stuck with slow code under the hood. Could argue that it's then insignificant because you won't be processing a lot if you're not controlling your system anyway.

@kelunik
Copy link

kelunik commented Aug 3, 2017

My concern is that if you don't have control of the system now, you probably won't ever have, and you're then stuck with slow code under the hood. Could argue that it's then insignificant because you won't be processing a lot if you're not controlling your system anyway.

Right, on those systems I / we don't care about performance, but pure portability and runability.

But how is that any different to extensions like mongo or pthreads?

We abstract those where possible, e.g. using multiple processes instead of threads where they're not available in https://github.com/amphp/parallel. If you require mongo to run, you can also require the extension, because the user has to be in control then anyway to run mongo. But that's nothing desirable for generic data structures.

It definitely depends on the use case, but for some cases it's just easier using arrays or Spl then, even if they're worse implementation wise.

@enumag
Copy link

enumag commented Sep 18, 2018

I think this right here is the primary goal. Once a library like this lands in core (or is used widely enough anyway), making updates to the extension will get harder. Updating a composer package that builds on it is a lot easier. The goal then becomes to make sure that the low level structures are solid and cover enough.

In that case I would recommend making 2.0 of this extension with the low-level structures and build a composer package on top of them as you said. Then have the community use it like that and focus on further improving and stabilizing the low-level structures before making a RFC to include them in PHP core.

I agree that moving to core soon is not necessarily a good idea, but it would be nice to prepare DS 2.0 with that in mind as eventual goal. Having to focus on and stabilize just a few low level structures should make this goal easier.

Yes, high level structures would lose some performance that way but in my opinion it's reasonable in this case. High level structures can be implemented in C later on as well.

@rtheunissen
Copy link
Member Author

I do not think it is in PHP's best interests nor this project's to move to core soon

I'm not too phased about this to be honest. If it stabilizes well and adoption is good, we can consider a proposal but at this point the benefit isn't obvious to be.

@wrossmann
Copy link

Devil's Advocate here. You folks seem quite preoccupied with the performance of the polyfill as compared to the extension, and there's no doubt that the extension is superior in virtually every respect.

However, one thing that doesn't seem to have been considered within the context of this discussion is the performance of the equivalent data structures that would be written by bozos like me who aren't as intimately familiar with the minutiae of PHP internals or even the underlying concepts themselves.

Given enough rope to hang myself I might just base everything on a linked list and call it a day.

Taken on its own as a userland implementation of the data structures it is some very fine code that I would recommend people use.

Within this context I think that the polyfill should be considered a gateway to the extension, which is a point that I've seen raised several times already. I would suggest addressing the performance concerns via documentation and perhaps an informative message in a Composer post-install-cmd.

@designcise
Copy link

I'm not sure what the current state of development is regarding the entire discussion in here, but I landed here wanting to use the data structures in my microframework. My decision to use it is heavily based on the polyfill, because I would want the microframework to work for people who can't/don't want to install the extension. For people who want the performance gain, they would go the extra mile. If you're looking for greater adaptation, you have got to build for the bigger audience, and the bigger audience of PHP doesn't care for extensions all too much.

@rtheunissen
Copy link
Member Author

We are actively planning ext-ds 2.0 on the 2.0 branch, and at this stage I do not foresee a polyfill implementation that mirrors the extension. I believe that this extension will only be successful if it is required, otherwise 3/5 projects will never actually install it and never truly depend on it or get any value from it. If you want to use them in your microframework, you should consider a hard dependency on the extension, or internal implementations of the interfaces.

What we could also do.. is provide a standard userland implementation of the interfaces, but have the classes not aliased. Eg. Ds\Compat\HashMap implements Ds\Map. It wouldn't be a "polyfill" technically but you could then use the interfaces in your microframework and allow the user to install the extension if they want.

I really, really want to avoid this though.

@designcise
Copy link

Totally understand. In that case, you should really stick to your priorities. Interfaces with a basic implementation sounds like a nice idea too, it might work to add a fallback when the extension doesn't exist.

When do you plan to release version 2? Do you have a date yet? And do you have anything new planned? Would love to hear about it.

@rtheunissen
Copy link
Member Author

When do you plan to release version 2? Do you have a date yet? And do you have anything new planned? Would love to hear about it.

Hopefully by the end of the year. If I could work on this full-time, maybe 3 months from now. I'm rewriting everything so don't expect much backward-compatibility. ^^

FWIW check out https://github.com/php-ds/ext-ds/blob/2.0/DRAFT.php

@kelunik
Copy link

kelunik commented Mar 13, 2019

@rtheunissen Does it make sense to have a separate extension name, so both versions could be installed at the same time?

@rtheunissen
Copy link
Member Author

rtheunissen commented Mar 13, 2019

@kelunik I am open to that, but I haven't considered it. I think just a new major version is okay if we can somehow preserve the documentation from v1.

@kelunik
Copy link

kelunik commented Mar 13, 2019

@rtheunissen Allowing both versions being installed at the same time would allow for a smoother transition, because you can migrate things partially instead of all or nothing.

@rtheunissen
Copy link
Member Author

Wouldn't that just cause more confusion? The namespace would have to change as well. I think either using one or the other is the best way forward. Following the migration guide should be easy enough to not require a temporary mixed state.

@kelunik
Copy link

kelunik commented Mar 13, 2019

Depends on how much changes, if you have libraries and own code depending on ext-ds, doing everything at once might not be that easy. But the namespace would have to change as well, yes.

@rtheunissen
Copy link
Member Author

Supporting two versions of the same library at the same time is madness. 😂

@glensc
Copy link
Contributor

glensc commented Mar 21, 2019

The existence of polyfill was the reason I even considered adopting \Ds in software I maintain. I can not control the environment my software is installed, and I do not want to limit installations (of my software) because there's a problem compiling the extension.

I'm ok if there's worse performance using the polyfill, if users (of my software) have the performance as an issue, they can install the extension. and I'm perfectly fine with that.

Thank you for maintaining both versions for now!

Perhaps the end goal should be to get support classes to php-core, like ext-sodium, until then maintain the code as external source.

(sorry I did not read the full thread here if this POV was already covered).

@hopeseekr
Copy link

hopeseekr commented Mar 30, 2019

@rtheunissen

One thing to note is that the polyfill also acts as an autocomplete source. If we decide to abandon the polyfill in 2.0, we would need to keep the stubs anyway. Something we could do is maintain a composer package that contains the tests and the stubs, and requires the extension as ext-ds, and have the extension repository require that package. That way we're still separating the extension from the tests and the stubs, and we're enforcing that the extension is installed if a project has the composer package as a dependency.

Look at my brand new PR at php-ds/polyfill#51: It's a PhpStorm stub. You could make that ENTIRE source code of record for the entire php-ds/php-ds composer package, completely abandon the php-ds/polyfill and STILL have 100% code completion + documentation.

In fact, once the PhpStorm stub is officially accepted by PhpStorm (Open a PR over at https://github.com/JetBrains/phpstorm-stubs), then that's EXACTLY what I recommend doing, professionally, if performance really is terrible and a design problem.

Because there are over 250,000 installs reported by packagist, I'd recommend deprecating php-ds/php-ds and scaring everyone over to either a new php-ds/php-ds-stub that contains just my PR, or a php-ds/php-ds-compat with clear and visible warnings on why you should never use it, etc. With benchmarks, please.

There is literally zero warning about performance, memory, etc. on the polyfill README. I had no idea it was "that bad" until I stumbled upon this thread. I also highly second everything @yybalam said in his php-ds/polyfill apology.

@ghost
Copy link

ghost commented Apr 19, 2019

I haven't seen this mentioned yet, but perhaps in PHP 7.4 it would become a possibility to use the FFI to implement this extension in PHP instead?

I realize the first version of the FFI does not have the performance of an actual extension, but IIRC there are plans to improve this (i.e. JIT in PHP 8).

@jkuchar
Copy link

jkuchar commented May 1, 2019

FFI will be slower then native extension, but still with the same big-O complexity (just with added constant penalty for each function call, which is more then acceptable).

@glensc
Copy link
Contributor

glensc commented May 1, 2019

link for lazy ones about ffi that got merged: https://wiki.php.net/rfc/ffi

@rtheunissen
Copy link
Member Author

@tominventisbe @jkuchar for a project like this (data structures being a special case), performance and low-level functionality is worth more than the benefit of using the FFI / implementing it in PHP. I don't know that for sure, but I would probably vote to keep the C implementation as the primary, and potentially implement it with FFI as well as an alternative option.

@designcise
Copy link

if you want wider adaptation then FFI looks like a really good option. It may be slow when it's introduced, but surely it will improve in newer versions

@morrisonlevi
Copy link

morrisonlevi commented May 6, 2019

I don't understand how to use FFI to "polyfill" this. If you have the extension, then use it. If you don't have it, how are you going to call into it via FFI? Am I missing something?

@rtheunissen
Copy link
Member Author

rtheunissen commented May 6, 2019

if you want wider adaptation then FFI looks like a really good option. It may be slow when it's introduced, but surely it will improve in newer versions

I disagree with this. No one else is calling for polyfill implementations of other extensions. If a library/extension provides enough value for enough people, it will be adopted by the community. Good examples are PDO, xdebug, gd/imagick, etc. We would be spending a lot of time implementing and supporting an inferior version of something simply to avoid the need to install and enable an extension, which in my opinion indicates that the problem to solve is part of PHP's dependency management infrastructure, rather than extensions themselves.

If a polyfill must exist, it won't concern itself with FFI.

@rtheunissen
Copy link
Member Author

I don't understand how to use FFI to "polyfill" this. If you have the extension, then use it. If you don't have it, how are you going to call into it via FFI? Am I missing something?

I read it as a rewrite/alternative version of the main implementation, but instead of an extension it's written in PHP using native C bits and pieces? Anyone is welcome to attempt that, but it will not be an effort within the php-ds repo, at least from my side.

@glensc
Copy link
Contributor

glensc commented May 6, 2019

FFI is just a bridge to create bindings for shared libraries (libfoo.so) symbols, as there is no underlying library, FFI offers nothing here. for example imagick ext could be implemented as FFI, as there is exists libMagickCore.so and libMagickWand.so

@ghost
Copy link

ghost commented May 7, 2019

Yes, unfortunately, as I see it (though I may be missing something): if you also want to keep the extension around, you'd still be stuck maintaining two code bases as the FFI allows you to write extensions in PHP by binding to e.g. C libraries from PHP, as already mentioned by @glensc.

Since it's mostly real PHP code and some PHP wrapping/mapping the C functions, it's not native (e.g. C) code like a native extension would be, so you'd still need to maintain both.

You could use the FFI to load in a new php-ds C library instead (which would then also be used by the extension), but this then poses the problem of getting that new C library on your (shared) hosting. This could be done in some cases via static libraries - I believe there are some examples of this, such as wkhtmltopdf - but it's definitely not ideal.

I believe the most ideal course of action would be to "simply" (try and get) this extension merged in PHP itself as core extension. I think there is room for a good (better) set of container and data structures in PHP, though I'm not a core developer, who may think differently.

There may also be requirements, such as the extension having a certain amount of popularity if it is useful enough, such as mentioned by @rtheunissen. I must admit I'm afraid that this is not one of those types of extensions that is going to pull that amount of weight as it's not like databases where you may need to communicate with something a customer demands, and thus pretty much need the extension without question and get it installed.

This library is something that is usually seen as optional as there are some alternatives in pure PHP on Packagist (though arguably not as good and extensive, or we wouldn't be having this discussion 😄), but there is little hard material to convince people to have such an extension installed on their servers - if that is at all possible and they are not on shared hosting - unless they require the performance, which is also a minority, I believe. If they don't require the performance, they are probably going to look for something not requiring any installation, especially since a custom extension also poses an additional security vector.

@hopeseekr
Copy link

If the polyfill is abandoned, so too goes the adoption. It's literally the only reason I'll risk distributing code w/ php-ds.

@rtheunissen
Copy link
Member Author

rtheunissen commented May 15, 2019

If the polyfill is abandoned, so too goes the adoption. It's literally the only reason I'll risk distributing code w/ php-ds.

This makes me sad, and I'm not sure if I agree, but I am open to the idea of a polyfill as long as it is strongly recommended that projects install the extension if it is at all possible to do so.

I don't like the idea of potentially slowing down what could be most projects if the aversion to install an extension is shared by a majority.

I'm just curious... if you are writing something that you would like to distribute (library, etc), and you'd like to use php-ds, why not require php-ds as a dependency? If a user can't install it, they also can't use your library. The problem to solve here is why they can't install it (or don't want to), and I worry that maintaining a polyfill encourages that aversion, and discourages the innovation we need around extension dependencies.

Classic pragmatism vs. idealism situation here. Do we bite the bullet and maintain the polyfill for the sake of practicality, or do we insist that others should get with the times or be left behind?

Looking at 2.0, the API will almost definitely be trimmed way down, which would make the polyfill a lot easier to manage. If I had to vote on this today, I think I would vote to keep it because there is no clear consensus to abandon it. But if I ever hear of someone not installing the extension by choice... 👮‍♂️

So until further notice, the decision here is to keep it. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests