New 'FlexC' dataset and `CbcPruneExtractor` #7

Bastacyclop · 2023-08-29T15:15:08Z

Hi all,

This PR:

adds a new dataset that comes from a CGRA mapping tool called 'FlexC', I'll provide a link to an arXiv paper soon so that you can learn more details about the data origin. This dataset contains enodes of cost 10000, which actually means infinity in this context.
customizes the 'plot.py' to display per-(dataset, extractor) statistics. I could revert these changes, or keep the original 'plot.py' and rename mine to 'plot_per_dataset_extractor.py', or something else if you have other ideas.
adds a new CbcPruneExtractor extractor, that:
1. is based on the LpExtractor from egg 0.9.0, meaning that it does not use the fix from LpExtractor creates an infeasible problem for CBC egg#207 (comment) , which was counterproductive in our own experiments. From what I have seen, CbcPruneExtractor also seems to use a more simplistic cycle pruning strategy than the current CbcExtractor.
2. prunes all e-nodes whose cost is above a certain threshold (here, 1000 was picked, can be changed), significantly reducing the ILP problem size for a dataset like 'FlexC', speeding up solving time and therefore finding better solutions.

This is the output of make FEATURES=ilp-cbc,ilp-cbc-prune:

Loaded 900 jsons.
---- output/babble -- bottom-up results:
dag         mean: 202.1272
micros      mean: 2821.5780
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 91.00, 1218.50, 2385.00, 3782.50, 14182.00
---- output/babble -- greedy-dag results:
dag         mean: 202.1272
micros      mean: 26294.7457
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 613.00, 10598.00, 23472.00, 34872.00, 143166.00
---- output/babble -- ilp-cbc results:
dag         mean: 327.8092
micros      mean: 376925.7630
dag    quantiles:    38.00, 215.00, 305.00, 413.50, 931.00
micros quantiles: 27411.00, 129946.00, 185129.00, 467031.00, 6323663.00
---- output/babble -- ilp-cbc-prune results:
dag         mean: 210.6705
micros      mean: 5861529.7746
dag    quantiles:    38.00, 125.00, 204.00, 280.50, 524.00
micros quantiles: 33529.00, 306099.00, 1213987.00, 5428585.00, 30791418.00
---- output/egg -- bottom-up results:
dag         mean: 3.2857
micros      mean: 6676.3929
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 12.00, 26.25, 54.00, 507.75, 146008.00
---- output/egg -- greedy-dag results:
dag         mean: 3.2500
micros      mean: 89650.7500
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 38.00, 67.25, 187.50, 2679.50, 2128443.00
---- output/egg -- ilp-cbc results:
dag         mean: 3.3929
micros      mean: 2918235.7143
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 5876.00, 7348.00, 14357.50, 63831.75, 31206857.00
---- output/egg -- ilp-cbc-prune results:
dag         mean: 3.2857
micros      mean: 1301775.2857
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 4790.00, 10245.25, 18175.00, 58661.00, 29950483.00
---- output/flexc -- bottom-up results:
dag         mean: 85.0000
micros      mean: 63664.1429
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 27256.00, 28729.00, 59139.00, 99794.25, 109424.00
---- output/flexc -- greedy-dag results:
dag         mean: 85.0000
micros      mean: 470624.5000
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 227782.00, 237727.25, 488242.00, 609998.00, 1015563.00
---- output/flexc -- ilp-cbc results:
dag         mean: 870.2143
micros      mean: 5953963.8571
dag    quantiles:    88.00, 1000.00, 1000.00, 1000.00, 1000.00
micros quantiles: 3089985.00, 3172634.50, 4381422.00, 5427063.75, 26593684.00
---- output/flexc -- ilp-cbc-prune results:
dag         mean: 84.3571
micros      mean: 304448.7857
dag    quantiles:    35.00, 36.00, 91.50, 136.00, 136.00
micros quantiles: 142994.00, 188060.00, 301022.00, 390848.25, 662473.00
---- output/tensat -- bottom-up results:
dag         mean: 5.9862
micros      mean: 797192.6000
dag    quantiles:    0.82, 0.94, 4.41, 8.35, 18.81
micros quantiles: 848.00, 22166.75, 189955.50, 1953350.75, 2280621.00
---- output/tensat -- greedy-dag results:
dag         mean: 5.4986
micros      mean: 50049918.5000
dag    quantiles:    0.82, 0.93, 4.41, 7.74, 16.37
micros quantiles: 31720.00, 695890.50, 5754178.50, 108900651.00, 201932405.00
---- output/tensat -- ilp-cbc results:
dag         mean: 7.0305
micros      mean: 14237603.0000
dag    quantiles:    1.33, 3.68, 4.86, 10.59, 15.57
micros quantiles: 21442.00, 293923.25, 17017974.00, 24817515.00, 32409827.00
---- output/tensat -- ilp-cbc-prune results:
dag         mean: 6.0963
micros      mean: 20061043.9000
dag    quantiles:    0.83, 1.06, 4.41, 8.35, 18.81
micros quantiles: 37260.00, 179445.75, 22344417.00, 34961842.75, 50468856.00

I assume that the difference between ilp-cbc and ilp-cbc-prune for the other datasets than 'FlexC' is due to the difference between the LpExtractor from egg 0.9.0 and the CbcExtractor from this repo, since no e-nodes should be banned. ilp-cbc-prune seems to take more time and find better costs than ilp-cbc for the 'babble' and 'tensat' datasets, and the opposite for the 'egg' dataset.

On the 'FlexC' dataset, ilp-cbc-prune is clearly better than ilp-cbc both in time and cost, thanks to the ILP model pruning. A nice observation for us is that bottom-up is actually doing really well, almost finding the same costs in much less time.

I know that the PR needs some cleaning up before being mergeable, so please let me know what you would like me to change, and I hope people will find these additions useful :)

Bastacyclop · 2023-09-25T08:42:57Z

arXiv paper from which the dataset originates: https://arxiv.org/abs/2309.09112

mwillsey · 2023-10-23T22:27:09Z

This looks great!! I'm a little worried about the threshold in the extractor though, it will make it hard to benchmark across different datasets. Is there a way to autotune it somehow? Like perhaps start at some percentile of all node costs, and then move it up if extraction fails.

Bastacyclop · 2023-10-25T08:25:12Z

Autotuning the 'infinity value' would be interesting to explore for datasets where the 'infinity value' is unknown, maybe it could be the 99th percentile of the entire dataset rather than of an individual e-graph. Another way to go would be to ask datasets/egraphs to encode their 'infinity value', and if there is none, to either avoid using the pruning extractor, use a dummy threshold above all costs, or use a percentile guess.

Now if we take the same pruning strategy and forget about trying to find an 'infinity value', but simply use the threshold as a heuristic, I could imagine an extractor that, e.g. tries to ILP solve using nodes with cost <=50th percentile, then with cost <=75th percentile, <= 87.5th percentile, etc. as some kind of dichotomy that assumes expensive nodes are less likely to be useful. Maybe that is worth experimenting with. That strategy could also be compatible with an explicitly encoded 'infinity threshold': the heuristic threshold should not move up past the 'infinity value' and instead extraction should fail early.

mwillsey · 2023-11-02T22:59:23Z

I would love to merge the dataset first if possible, especially since it seems the new extractor is maybe hard to generalize. Also, consider running the improved CBC extractor on this dataset to see how it compares.

Bastacyclop · 2023-11-24T10:09:07Z

Sounds good: #18

TrevorHansen · 2024-01-01T05:49:15Z

Autotuning the 'infinity value' would be interesting to explore for datasets where the 'infinity value' is unknown,...

This is a great idea. I've had a go at implementing something similar in #16

The optimal extractor runs an approximate extractor (say bottom-up) to get the an upper-bound on the dag-cost, then removes any node from the candidate extraction that have a cost greater than that upper-bound.

In the data/flexc the largest dag-cost that the bottom-up extractor finds is 137, so all of the "infinity" nodes with a cost of 1000 are removed.

I suspect there's lots further we can go with the idea.

mwillsey · 2024-01-12T00:31:27Z

Where do we sit with this PR? The dataset was merged in a different PR, and there are a lot of other ILP prs flying around. Is this still looking like a distinct enough extractor to merge?

Bastacyclop · 2024-01-12T08:50:28Z

As far as I understand, @TrevorHansen's #16 will supersede the CbcPruneExtractor of this PR. I can run a quick benchmark to confirm this once #16 is merged, then close this PR.

I'd also like to see better benchmark statistics/plotting in main, which would also supersede my changes to plot.py. I think that being able to see per-(dataset, extractor) quantiles is very valuable for analysing results.

Bastacyclop added 2 commits August 23, 2023 18:59

ilp-cbc-prune

6db8475

flexc data

4a2c751

Bastacyclop mentioned this pull request Aug 31, 2023

Greedy bottom up alternative based on egraph analysis algorithm #8

Closed

This was referenced Dec 15, 2023

New 'FlexC' dataset #18

Merged

Why is fast-greedy-dag worse in terms of dag size? #19

Open

add FlexC dataset README

18cb5cc

mwillsey mentioned this pull request Jan 12, 2024

faster-ILP extractor #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New 'FlexC' dataset and `CbcPruneExtractor` #7

New 'FlexC' dataset and `CbcPruneExtractor` #7

Bastacyclop commented Aug 29, 2023

Bastacyclop commented Sep 25, 2023

mwillsey commented Oct 23, 2023

Bastacyclop commented Oct 25, 2023

mwillsey commented Nov 2, 2023

Bastacyclop commented Nov 24, 2023

TrevorHansen commented Jan 1, 2024 •

edited

Loading

mwillsey commented Jan 12, 2024

Bastacyclop commented Jan 12, 2024

New 'FlexC' dataset and CbcPruneExtractor #7

Are you sure you want to change the base?

New 'FlexC' dataset and CbcPruneExtractor #7

Conversation

Bastacyclop commented Aug 29, 2023

Bastacyclop commented Sep 25, 2023

mwillsey commented Oct 23, 2023

Bastacyclop commented Oct 25, 2023

mwillsey commented Nov 2, 2023

Bastacyclop commented Nov 24, 2023

TrevorHansen commented Jan 1, 2024 • edited Loading

mwillsey commented Jan 12, 2024

Bastacyclop commented Jan 12, 2024

New 'FlexC' dataset and `CbcPruneExtractor` #7

New 'FlexC' dataset and `CbcPruneExtractor` #7

TrevorHansen commented Jan 1, 2024 •

edited

Loading