Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost 0.90 Roadmap #4389

Closed
18 tasks done
hcho3 opened this issue Apr 21, 2019 · 56 comments
Closed
18 tasks done

XGBoost 0.90 Roadmap #4389

hcho3 opened this issue Apr 21, 2019 · 56 comments

Comments

@hcho3
Copy link
Collaborator

hcho3 commented Apr 21, 2019

This thread is to keep track of all the good things that will be included in 0.90 release. It will be updated as the planned release date (May 1, 2019 as soon as Spark 2.4.3 is out) approaches.

@CodingCat
Copy link
Member

as we are going to have breaking changes like #4349 and #4377

shall we bump version to 0.9?

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 22, 2019

@CodingCat Sure, we can bump to 0.90, if the breaking change is significant. Can you do me a favor and write one-paragraph description of why #4349 was needed?

@CodingCat
Copy link
Member

sure,

@alexvorobiev
Copy link

* Spark 2.3 is reaching its end-of-life in a few months

Is there an official statement on that? They released 2.2.3 in January and 2.3.3 in February. Our vendor (MapR) still ships 2.3.1.

@CodingCat
Copy link
Member

@alexvorobiev #4350, you can check with @srowen from databricks

@srowen
Copy link
Contributor

srowen commented Apr 22, 2019

This is not a question for Databricks but for the Spark project. The default policy is maintenance releases for branches for 18 months: https://spark.apache.org/versioning-policy.html That would put 2.3.x at EOL in about July, so wouldn't expect more 2.3.x releases after that from the OSS project.

@alexvorobiev
Copy link

@srowen Thanks!

@hcho3 hcho3 changed the title XGBoost 0.83 Roadmap XGBoost 0.90 Roadmap Apr 22, 2019
@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 24, 2019

@srowen @CodingCat @alexvorobiev Let's also discuss the possibility of supporting Scala 2.12 / 2.13. Right now, XGBoost4J is compiled for Scala 2.11:

<scala.version>2.11.12</scala.version>
<scala.binary.version>2.11</scala.binary.version>

A user reported that XGBoost4J JARs compiled for Scala 2.11 is not binary compatible with Scala 2.12.

@hcho3 hcho3 pinned this issue Apr 24, 2019
@srowen
Copy link
Contributor

srowen commented Apr 24, 2019

Yeah, 2.11 / 2.12 are still binary-incompatible, and Spark has two distributions. Both are supported in 2.4.x though 2.12 is the default from here on in 2.4.x. 3.0 will drop Scala 2.11 support.

It may just be a matter of compiling two versions rather than much or any code change. If you run into any funny errors in 2.12 let me know because I stared at lots of these issues when updating Spark.

2.13 is still not GA and think it will be a smaller change from 2.12->2.13 than 2.11->2.12 (big difference here is totally different representation of lambdas).

@alexeygrigorev

This comment has been minimized.

@hcho3

This comment has been minimized.

@CodingCat
Copy link
Member

CodingCat commented Apr 25, 2019

the only issue is that we need to introduce a breaking change to the artifact name of xgboost in maven, xgboost4j-spark => xgboost4j-spark_2.11/xgboost4j-spark_2.12, like spark https://mvnrepository.com/artifact/org.apache.spark/spark-core and we need to double check if we have any transient dependency on 2.11 (I think no)

Hi, @srowen though 2.12 is the default from here on in 2.4.x, I checked branch-2.4 pom.xml, if you don't specify profile scala-2.12, you still get a 2.11 build, no?

@srowen
Copy link
Contributor

srowen commented Apr 25, 2019

You could choose to only support 2.12 in 0.9x, and then you don't have to suffix the artifact name. If you support both, yeah, you'd really want to change the artifact name unfortunately and have _2.11 and _2.12 versions.

Yes the default Spark 2.4.x build will be for 2.11; -Pscala-2.12 gets the 2.12 build.

@CodingCat
Copy link
Member

thanks, I'd stay conservative in supporting 2.12 at least for the coming version

as far as I know, most of Spark users are still using 2.11 since they are used to following previous versions of Spark

I may not have bandwidth to go through every test I have for introducing 2.12 support

I would choose to support 2.12 + 2.11 or 2.12 in 1.0 release...

@CodingCat
Copy link
Member

@hcho3 FYI, I just removed the dense matrix support from the roadmap given the limited bandwidth

@trivialfis
Copy link
Member

@hcho3 Could you take a look at dmlc/dmlc-core#514 when time allows? It might be worth merging before the next release hit.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 26, 2019

@trivialfis Will look at it

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 28, 2019

@CodingCat I think we should push back the release date, as Spark 2.4.1 and 2.4.2 have issues. What do you think?

@srowen Do you know when Spark 2.4.3 would be out?

@CodingCat
Copy link
Member

I think it’s fine to have some slight delay

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 28, 2019

Okay, let’s wait until Spark 2.4.3 is out

@tovbinm
Copy link
Contributor

tovbinm commented Apr 29, 2019

Would there be the last 0.83 release for Spark 2.3.x?

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 29, 2019

@CodingCat What if we make two parallel releases 0.83 and 0.90, where 0.83 includes all commits just before #4377? The 0.83 version would be only released as JVM packages, and Python and R packages would get 0.90. It won't be any more work for me, since I have to write a release note for 0.90 anyway.

One issue though is the user experience with missing value handling. Maybe forcing everyone to use Spark 2.4.x will prevent them from messing up with missing values (the issue which motivated #4349)

@CodingCat
Copy link
Member

CodingCat commented Apr 29, 2019

@hcho3 I am a bit concerned on the inconsistency of different versions in the availability of pkgs.

I can imagine questions like hey, I find 0.83 in maven so I upgrade our Spark pkg, but I cannot use 0.83 in notebook when attempting to explore my new model setup with a small amount of data with python pkg?

I would suggest we either have a full maintenance release on 0.8x branch or nothing

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 29, 2019

@CodingCat Got it. We'll do consistent releases for all packages. What's your take on 0.83 release then? Should we do it?

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 29, 2019

@CodingCat Actually, this will create work for other maintainers, we'll need to ask them first

@CodingCat
Copy link
Member

CodingCat commented Apr 29, 2019

short answer from a personal view is yes in theory, but it might be more than cutting right before a commit (as you said, it will create work for others as well) (but I am kind of hesitated to do this because of the limited resources in the community...)

here is my 2 cents about how we should think about maintenance release like 0.8x

  1. the reason to have a maintenance release is to bring in critical bug fixes, like 2d875ec and 995698b

  2. on the other side, to make the community sustainable other than burning out all committers, we should drop support of previous version periodically

  3. the innovations and improvements should be brought to the users through a feature release (jump from 0.8 to 0.9)

if we decide to go 0.83, we need to collect opinions from @RAMitchell @trivialfis as well and use their judge to see if we have important (more about correctness) bug fixes which are noticed by them

and then create a 0.83 branch based on 0.82 to cherry-pick commits......a lot of work actually

@RAMitchell
Copy link
Member

If I understand correctly, 0.9 will not support older versions of spark, hence the proposal to support a 0.83 version as well as 0.9 to continue support for older spark versions while including bug fixes?

Generally I am against anything that uses developer time. Aren't we busy enough already? I do see some value in having a stable version however.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 1, 2019

@tovbinm You can build XGBoost with commit 711397d to use Spark 2.3.x.

@tovbinm
Copy link
Contributor

tovbinm commented May 1, 2019

Great. So why not make a public release from that commit?

@hcho3
Copy link
Collaborator Author

hcho3 commented May 1, 2019

As @CodingCat said, maintenance releases are not simply a matter of cutting before a commit. Also, making public releases are implicit promises of support. I do not think maintainers are up for supporting two new releases at this point in time.

I'll defer to @CodingCat as to whether we should make a release from 711397d

@hlbkin
Copy link

hlbkin commented May 1, 2019

External memory with GPU predictor - this would mean code would not crash with what(): std::bad_alloc: out of memory anymore? (i.e. temporarily swap into RAM?)

related issue I guess #4184 - this was mainly on temporal bursts of memory, the process of fitting itself never require so much memory

@hcho3
Copy link
Collaborator Author

hcho3 commented May 1, 2019

@hlbkin You'll need to explicitly enable external memory, according to https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html

@CAM-Gerlach
Copy link

I assume its not possible to switch otherwise without a major version bump (i.e. 1.0), but when you do, could you consider supporting conformant PEP 440 version numbers (i.e. x.y.z), and preferably semantic versioning? The standard interpretation of 0.90 (rather than 0.9.0) would be that it is the 90th minor release of the major version 0.x (i.e. pre-stable-release) series, and is no more significant than 0.83. Furthermore, this restricts you to a maximum of 9 point releases per minor version, and creates difficulties for some tools (and people) to interpret. Thanks!

@CodingCat
Copy link
Member

+1

@hcho3
Copy link
Collaborator Author

hcho3 commented May 3, 2019

@CAM-Gerlach We'll consider it when we release 1.0. On the other hand, we don't want to rush to 1.0. We want 1.0 to be a milestone of some sort, in terms of features, stability, and performance.

@CAM-Gerlach
Copy link

Thanks for the explanation, @hcho3 .

You probably want to make sure you set the python_requires argument to '>=3.5' in setup() to ensure users with Python 2 don't get upgraded to an incompatible version accidentally.

@hlbkin
Copy link

hlbkin commented May 4, 2019

@hcho3 External memory is not available with GPU algorithms

@hcho3
Copy link
Collaborator Author

hcho3 commented May 4, 2019

@hlbkin You are right. External memory will be available only for GPU predictor, not training.

@rongou @sriramch Am I correct that GPU training isn't available with external memory?

@sriramch
Copy link
Contributor

sriramch commented May 6, 2019

@hcho3 yes you are correct. we are working on it. the changes are here if you are interested. i'll have to sync this change with master and write some tests.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 6, 2019

@sriramch Awesome! Should we aim to include external memory training in the 0.90 release, or should we come back to it after 0.90?

@CodingCat
Copy link
Member

just my two cents, let's reserve a bit on compacting many new features in 0.x (in a rush manner) and consider what is to be put in 1.0 as a milestone version

@hcho3
Copy link
Collaborator Author

hcho3 commented May 6, 2019

@CodingCat I agree. FYI, I deleted distributed customized objective from 0.90 roadmap, since there was substantial disagreement in #4280. We'll consider it again after 0.90.

@sriramch Let's consider external memory training after 0.90 release. Thanks a lot for your hard work.

@RAMitchell
Copy link
Member

This might be a good time to release the cuda 9.0 binaries instead of 8.0. I think 9.0 will now be sufficiently supported by users driver version. Additionally the 9.0 binaries will not need to be JIT compiled for the newer Volta architectures.

@CodingCat
Copy link
Member

@hcho3 are we ready to go?

@hcho3
Copy link
Collaborator Author

hcho3 commented May 10, 2019

Almost. I think #4438 should be merged.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 10, 2019

All good now. I will go ahead and start working on the next release. ETA: May 16, 2019

@hcho3
Copy link
Collaborator Author

hcho3 commented May 11, 2019

@RAMitchell Should we use CUDA 9.0 or 9.2 for wheel releases?

@RAMitchell
Copy link
Member

Lets use 9.2 as that is already set up on CI. The danger is that we require Nvidia drivers that are too new. For reference here is the table showing the correspondence between cuda version and drivers: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver

As far as I know this should not impact CPU algorithms in anyway. If users begin to report issues then we can address this in future with better error messages around driver compatibility.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 11, 2019

Hmm in that case I can try down-grading one of CI worker to CUDA 9.0. Since we are using Docker containers extensively, it should not be too difficult.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 14, 2019

I'm going to prepare 0.90 release now. My goal is to have the release note complete by end of this week.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 20, 2019

Closed by #4475

@hcho3 hcho3 closed this as completed May 20, 2019
@hcho3 hcho3 unpinned this issue May 22, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Aug 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests