[Queries] Regarding usage of LLVM built with Pretrained Models and Development Mode #350
-
Hi, I have successfully built a toolchain using the model inlining-Oz-v1.1 released [here(https://github.com/google/ml-compiler-opt/releases). However, I have some queries regarding its usage while building an application in release mode, as well as some questions pertaining to development mode. Release Mode
Development Mode
|
Beta Was this translation helpful? Give feedback.
Replies: 14 comments 5 replies
-
It depends upon if you're doing (Thin)LTO. If you're not using (Thin)LTO, then you should be fine omitting it from the linker. If you are using some form of LTO, then you need to pass it to the linker too so that it will use the policy for inlining there.
No. The build options for LLVM should not matter.
There shouldn't be anything major. You can use (Thin)LTO to build the application. You just need to make sure to pass the flag to the linker too so that it will use the correct inlining policy. The policy might also change in effectiveness when going to LTO, depending upon the corpus that it was trained on.
No. You should be able to use pretty much whatever build options you like for LLVM.
Ideally it should be representative of how you build your application in production. If you don't use (Thin)LTO there, then training on a (Thin)LTO corpus does not make sense. If you do, then training on a non-(Thin)LTO corpus does not make a lot of sense.
It's not a LLVM flag. It would be flags/different scripts within this repository that drive the training pipeline. The demos are currently written to use PPO. There is no end-to-end script that uses ES for training currently, although getting one written isn't too big of a deal now that most of the ES stuff is upstreamed. |
Beta Was this translation helpful? Give feedback.
-
I assume your build performs some kind of LTO. There's no hard and fast rule, I'd experiment with/without enabling in the backend optimization. FWIW, the model we have here was trained assuming no LTO step.
(IIUC this is about building e.g.
No restrictions, but "mileage may vary": for example, if you trained on a corpus of post-thinLTO IR modules, you'll get best results when applying that model to similar modules. One culprit to this is that features get quantized (bucketized), and if the distribution of feature values is too far off, benefits would degrade.
You don't need LLVM built in any different way, btw, to collect a corpus. The functionality for corpus collection is in any build of clang. The main thing is to use the same compiler version (i.e. from the same llvm repo githash) when collecting the corpus as when later compiling it, just to avoid things like IR breaking changes. So you could use the
You can collect the IR corpus from either before the pre-thinlink compilation or from post-thinlink. We never do anything with LTO, only ThinLTO, so we never added that support to full LTO. To answer your question, it's less about how you build that application and more about which IR you want to train on. If your scenario involves ThinLTO, I'd recommend starting by training on the post-link IR first - i.e. have the normal inliner in the frontend, and ML in the backend. Then it gets tricky and you need to experiment - you could stop there (i.e. if you get reasonable savings, just use ML in the post-thinlink); or try the model in both front and back; or you could build a second corpus from the frontend IR and continue training there; or (probably best) collect the 2 corpora first, do quantization on them, then train on one and then finetune on the other. We did the "train mostly on the back, finetune in front" without quantization for Chrome on Android 32 bit (@Northbadge did that and he can correct me if I misremember), and only "back" for 64 bit, for example (that bit is fresher in memory, @alekh @tvmarino's work).
Not yet, but I have a cludge that demonstrates using ES in my fork: https://github.com/mtrofin/ml-compiler-opt/tree/es. Focus on "cludge". @boomanaiden154 has, I think, a plan to bring ES into the fold cleanly. |
Beta Was this translation helpful? Give feedback.
-
Oh, just saw @boomanaiden154 also replied. Sorry for some duplicate info! |
Beta Was this translation helpful? Give feedback.
-
Thank you for the detailed response, @mtrofin and @boomanaiden154.
I was going through demo and noticed LTO being disabled in developnment mode hence the question. I am building clang without any LTO as well however wanted to clarify is this was a necessity or can we be build clang with any options.
I missed the point that corpus collection is only supported for thin LTO at the moment and not full LTO.
Thanks for sharing this. I'll definitely try this out. Just to be clear, this follows the same instructions as mentioned the demo and it will use ES strategy to train the model? I will keep this issue open for some time while I work on this project in case I have any further queries or comments. Thank you once again! |
Beta Was this translation helpful? Give feedback.
-
...and no-lto (i.e. just frontend - like, IIUC, your scenario)
In broad strokes, yes, i.e. if you treat the training script as a black box, then everything else should be the same; but I'd recommend checking (like debugging or |
Beta Was this translation helpful? Give feedback.
-
To clarify, is there currently support for corpus collection at both thin LTO and no LTO? Apologies for asking this again, but I interpreted your response as "corpus collection is only supported for thin LTO and no LTO". As you mentioned, I am building the application with no LTO, and after I do extract_ir, my corpus description contains no modules. So wanted to know if I am messing at some place or if corpus extraction for application built with no LTO is not supported. If this is the case, could you provide some suggestion on how to enable corpus extraction for application built with no LTO? |
Beta Was this translation helpful? Give feedback.
-
Yup, see here: https://github.com/llvm/llvm-project/blob/main/llvm/utils/mlgo-utils/mlgo/corpus/extract_ir.py#L12
There are some more nuances with local thinlto, if you chase the |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response! I successfully extracted IR, generated a corpus, and trained a warmstart model. Currently, the training of the RL model is still in progress. I want to get a rough idea of the training time because the data I’m using is fairly small—only 88 modules, as mentioned in the info after trace collection.
It took about 45 minutes to train the warmstart model, and it has been more than 8 hours since the RL model training began. Is there any rough estimate of how long the training might take for the above number of modules on a 32-core machine with 64 GB of RAM? I am using the default set of parameters for the model as mentioned in Additionally, since I am still a novice in model engineering, any advice on what values to set or how to decide the values for the parameters mentioned in the above gin file for the small training dataset would be appreciated. TIA |
Beta Was this translation helpful? Give feedback.
-
If you look at the tensorboard progression of the reward, especially since you are (IIUC) processing at each pass the entire corpus, that (tensorboard) should give you an indication (e.g. if it's not making much progress in improving the reward anymore, it probably learned enough). You could also try the current saved model (it's under the output directory - make sure you don't pick the one called
IIRC we did a hyperparameter sweep using xmanager. The infra should be easily adaptable to that - and we did, internally, but haven't yet pushed upstream. But all that says is "trial and error", really. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
You want to look at (this is mentioned in passing in the inlining demo, if you search for "tensorboard") |
Beta Was this translation helpful? Give feedback.
-
Hi The code size and perf number mentioned in the below spreadsheets are with which of the pre-trained models from the release page?
Are there any plans to release more pre-trained models for other architecture including Arm and RISCV? Have there been any attempts to train a model on zephyr for riscv by someone from MLGO community or from some other individual contributor owners might be aware of? Can the demo be extended to build Fuchsia and train a model for it on RISC-V as well? Currently, it only supports x64, and the same instructions can be used for ARM64 with minimal changes. However, for RISC-V, it seems that more modifications are needed. Given that code size is crucial for embedded systems where RISC-V is predominantly used, being able to train a model on Fuchsia for RISC-V would be a significant advantage. Please let me know if you need anything from my side for this. Additionally, I can create a separate issue to gather more input from the community if needed. |
Beta Was this translation helpful? Give feedback.
-
... which may be actually all that can be squeezed - for small projects that are already hyper-optimized for size, there's only so much headroom left.
Try combining the corpora instead - i.e. from N small corpora (which you already extracted), you consolidate them all into one. Then do quantization ("vocab"), then training, on that combined one. Another possibility is to use the ComPile database of IR modules. @boomanaiden154, does that come with a way to get a corpus.json? There are more nuances to discuss here - in fact, it'd be an interesting to explore methodology here: as a hypothesis, if we collected the vocab for each small corpus and measure the distance (Euclidian) between them; then how would a model trained on the whole ComPile compare to a model trained on the combined corpus; or combined + those elements in ComPile with features within the radius of the corpus... etc (lots of hand waviness here on my end) Anyway, I'd suggest starting with combining the corpora from your project though :) |
Beta Was this translation helpful? Give feedback.
-
Thanks @mtrofin and @boomanaiden154 for answering all my queries promptly. I have no further questions and will close this thread. |
Beta Was this translation helpful? Give feedback.
... which may be actually all that can be squeezed - for small projects that are already hyper-optimized for size, there's only so much headroom left.
Try combining the corpora instead - i.e. from N small corpora (which you already extracted), you consolidate them all into one. Then do quantization ("vocab"), then training, on that combined one.
Another possibility is to use the ComPile database of IR modules. @boomanaiden154, does that come with a way to get a corpus.json? There are more nuances to discuss here - in fact, it'd be an interesting to explore methodology here: as a hypothes…