AC-CFT

Implementation of the Caprara, Fischetti, and Toth algorithm for the Set Covering problem.

The project uses the C++11 standard. All the algorithmic components are placed in header files (src folder) to simplify the integration into other projects.

References

Caprara, A., Fischetti, M., & Toth, P. (1999). A Heuristic Method for the Set Covering Problem. Operations Research, 47(5), 730–743. doi:10.1287/opre.47.5.730

Building and Running

Configure the project and build it:

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

The binary will be located in build/accft. You can run it with:

./build/accft -i instances/rail/rail507 -p RAIL

See ./build/accft --help for the list of available command line arguments and their meaning.

Tests and Coverage

To produce a debug build with tests enabled:

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DUNIT_TESTS=ON 
cmake --build build -j

Run the tests:

ctest --test-dir build -j

Use the following commands to generate code coverage statistics, note that you need to have lcov installed.

mkdir -p coverage
lcov -c -d build -o coverage/tests_cov.info
lcov -e coverage/tests_cov.info cft/src/*/ -o coverage/tests_cov.info

Finally, to generate an HTML-report that can be viewed from your browser:

genhtml coverage/tests_cov.info --legend --output-directory=./coverage

You can check out the coverage statistics by opening coverage/index.html.

Algorithm Structure

Here we'll briefly describe the key components of the algorithm. For a more in-depth and formal description you can refer to the original paper, or directly look at the code.

The CFT algorithm has a hierarchical structure, with each layer applying different column selection techniques to simplify the problem for the next step. Here's a breakdown of the main components, starting from the outermost layer:

cft::run: This function contains the full algorithm, it implements what is called "Refinement" in the original paper, which fixes part of the best solution found so far and then invokes the 3-phase procedure on the obtained subinstance.
cft::ThreePhase: This function, as the name suggests, involves three steps:
- A subgradient step that finds near optimal multipliers to guide the search and also selects a small set of high quality columns (core-instance) to intensify and speed-up the search.
- A heuristic phase that performs some other subgradient iterations to explore diversified multipliers and invokes the greedy procedure on each one of them to find good-quality solutions.
- A column fixing step that selects and fixes some elements that have a good probability of being part of high-quality solutions, incrementally reducing the problem size.
cft::Greedy: Fast greedy algorithm run during the heuristic phase of the 3-phase. It starts from a set of Lagrangian multipliers and then it iteratively selects columns based on their quality. At the end, it checks and removes redundant elements from the obtained solution (if any).

Integration in other projects

The project comes in the form of an executable that reads an instance file and runs the full CFT algorithm. If you want to integrate the CFT into your own project, here we'll described some of it's main component that can be useful on their own.

Instance

To start, a well formed instance is essential to run the algorithm correctly. The Instance struct is defined in Instance.hpp:

    struct Instance {
        SparseBinMat<ridx_t>             cols;
        std::vector<std::vector<cidx_t>> rows;
        std::vector<real_t>              costs;
    };

cols: it contains a sparse "column-major" view of the constraint matrix, where each sparse-column in placed one after the other in a single std::vector, and a second vector contains the begin and end indexes of each column in the vector.
row: the same could be done for storing rows, however, since the "row-major" view is used less frequently and since we assume that the number of columns is way larger than the number of rows, a vector of vectors of indexes is used instead, since it is simpler to manage and does not add a noticeable overhead.
costs: a vector containing the cost for each column.

Here's an example of how to build a dummy instance:

    auto inst = cft::Instance();

    // Insert the columns one at a time
    inst.cols.push_back({0, 1, 2, 3});
    inst.cols.push_back({1, 3, 0, 4});
    inst.cols.push_back({4, 1, 2});

    // Define columns costs (Note, every column must have a cost)
    inst.costs = {1.0, 2.0, 3.0};

    // Rows can be automatically filled from columns using this helper function
    cft::fill_rows_from_cols(inst.cols, 5, inst.rows);

    // Check that the defined instance is well formed (costly operation)
    CFT_IF_DEBUG(cft::col_and_rows_check(inst.cols, inst.rows));

Full run

To run the full CFT algorithm you have to first declare an Environment, possibly changing the default parameters values, look at cft.hpp for more details on the available parameters. Then you can call cft::run which, on completion, will return the best solution found.

    auto env       = cft::Environment();
    env.time_limit = 10.0;  // Time limit in seconds
    env.verbose    = 2;     // Log verbosity level (from 0 to 5)
    env.timer.restart();    // NOTE: the timer is used also to test the time limit
    auto sol = cft::run(env, inst);
    fmt::print("CFT solution cost: {}\n", sol.cost);

3-Phase

If instead you are not interested in the outer-most column fixing (the "Refinement" step), you can call directly the 3-phase. Note that it is provided as a function object:

    auto three_phase = cft::ThreePhase();
    auto result      = three_phase(env, inst);
    fmt::print("3-phase solution cost: {}, LB: {}\n", result.sol.cost, result.nofix_lb);

Greedy

Finally, if you are only interested in a quick solution and do not care too much about its quality, you can ignore most of the "math-based" techniques used to guide the search, and directly run the greedy step like this:

    auto num_rows      = cft::rsize(inst.rows);
    auto lagr_mult     = std::vector<cft::real_t>(num_rows, 0.0);
    auto reduced_costs = inst.costs;  // Zero multipliers implies red-costs = costs

    auto greedy = cft::Greedy();
    auto sol    = cft::Solution();
    sol.cost    = greedy(inst, lagr_mult, reduced_costs, sol.idxs);
    fmt::print("Greedy solution cost: {}\n", sol.cost);

In this case, the multipliers can be viewed as a set of weigths that measure row importance, so they can be manipulated to guide the greedy (which is exactly what it is done within the 3-phase).

Benchmarks

Benchmarks have been run on ThinkPad P16v with an intel i7-13700H processor and 32GB of RAM on an Arch system, compiled with g++ 13.2.1.

We run the tests using the script located in benchmarks/run_all.sh with the project compiled in Release.

We tested each instance with 10 different random seeds. The runs were not subject to any time limit, since we used the termination criterion described in the original paper.

Rail Instances

Instance	#Rows	#Cols	Best Sol	Avg Sol	Avg Time(s)
`rail516`	516	47311	182.00	182.00	2.10
`rail582`	582	55515	211.00	211.00	1.02
`rail507`	507	63009	174.00	175.00	3.77
`rail2586`	2586	920683	950.00	950.80	57.14
`rail4872`	4872	968672	1533.00	1534.60	160.09
`rail2536`	2536	1081841	692.00	694.00	85.69
`rail4284`	4284	1092610	1066.00	1067.50	91.58

Note: for rail2586, better results can be obtained emphasizing multipliers quality.

The results for the other datasets can be found in the benchmarks directory.

Python Bindings

The project also provides Python bindings with a simplified interface for quick usage. While you can also access much of the C++-interface directly, the primary purpose is to provide a point and shoot interface for the algorithm. You can find more details in the separate README.

Coding Style

Regarding the source code, here you can find the general set of rules that we try to enforce. Since this is a project we work on in our free time, we haven't been afraid to experiment with simple rules and convention that we adjusted along the way when we felt something wasn't working for us.

We decided to stick with the C++11 standard for a couple of reasons. First, it keeps the code accessible to more people. Second, it helped us avoid the temptation of using fancy meta-programming features and abstractions introduced in recent standards, which can easily result in moving the focus from the algorithm itself to the how the algorithm is implemented.

External Dependencies

The only external dependecy that we have is fmtlib, a printing/formatting library that is so much superior to iostream that it's a pain when we can't use it (luckily it got into the standard). If you build the project with CMake, it should download and configure it automatically. Note, however, that you'll need an internet connection for that to work.

User Vs Library Code

C++ is a hefty language. One of its main perks is its ability to define abstractions that can be general enough to serve a wide variety of use-cases (the STL is a clear example of that). However, a large part of the features that this language provides are redundant, unnecessary and simply add up to the overall complexity, which often makes code very hard to understand.

For the algorithm implementation, we chose to limit ourselves to a small set of language features, with the freedom of ignoring these restrinctions when doing so can really improve both the coding process and the understanding of the code. We place these helpers tools in the utils folder to keep them separated from the algorithm implementation. This approach creates two coding contexts:

user-code: which is pretty much C code extended with a fairly small set of C++ features.
library-code: which provides a carefully chosen set of abstractions.

User Code

Along with the classic entities proper of the C programming language, we define other two types of "objects":

C-like structs: (structs in the following) simple aggregates of other objects (even C++ objects). There are not many rules about this kind of objects, they simply aim to be the "sum of their types".
Function objects: (improperly called functor in the following) which are functions with an associated state, implemented as structs that define the operator() member function.

Other rules:

We limit implicit casts as much as reasonably possible.
No direct storage of resources that require special handling in constructors/destructors (e.g., pointers to allocated memory or file handles).
No storage of references or pointers within either structs or functors. They can be stored by other abstractions defined in library code.
No const member variables.
No explicit memory management in general, everything is wrapped within a proper abstraction.
No (user-defined) copy/move constructors/assignment operators, no destructors.
All objects must be default-constructible.
structs have only public member variables.
structs state validity is user's responsibility.
structs have no constructors or other user-defined member functions.
functors can have non-default constructors.
functors must always be in a valid state.
functors cannot have public member variables, they can only be used as a function abstraction.

To accommodate potential changes to the basic numeric types, we have introduced custom type-aliases and user-defined literals to express the corresponding numeric constants.

Library Code

Not much to say here, library code tries to adhere to user-code rules, but has the freedom to break some of them for the sake of avoiding bugs or making the abstraction more intuitive/easier to use. Factory methods are often used to have a hand-made CTAD. Templates are used but we avoid meta-programming when possible.

Function Parameters

Looking around you might notice that most function parameters are annotated with tags. Although intuitive, here is their meaning:

in: Input parameter, read-only.
out: Ouput parameter, previous state is ignored and replaced with the new state.
inout: Input/output parameter. Both previous and new states are relevant.
cache: Cache object to avoid costly operations (usually memory allocations). Input and output states are ignored.

Custom Types

This project offers flexibility in choosing the numeric types used for column indexes, row indexes, and real values. We provide type-aliases (cidx_t, ridx_t, real_t) defined in src/core/cft.hpp that you can customize by defining the corresponding macros (CFT_CIDX_TYPE, CFT_RIDX_TYPE, CFT_REAL_TYPE). This allows you to easily switch between native integer/floating-point types depending on your needs.

Beyond native types, we also support defining custom types through a simple interface. You can find an example implementation in test/custom_types_unittests.cpp. For instance, you can use this interface to:

Enforce Checks: Implement custom checks for every mathematical operation performed.
Control Type Conversions: Restrict implicit casts, e.g., to avoid mixing row and column variables.
Introduce Specialized Types: Utilize alternative number representations like fixed-point types for specific use cases.

This approach tries to strike a good a balance between ease of use for common numeric types and the ability to tailor the library to your specific needs.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
.github		.github
.reuse		.reuse
LICENSES		LICENSES
benchmarks		benchmarks
instances		instances
src		src
test		test
.clang-format		.clang-format
.clangd		.clangd
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
README.py.md		README.py.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AC-CFT

References

Building and Running

Tests and Coverage

Algorithm Structure

Integration in other projects

Instance

Full run

3-Phase

Greedy

Benchmarks

Rail Instances

Python Bindings

Coding Style

External Dependencies

User Vs Library Code

User Code

Library Code

Function Parameters

Custom Types

About

Releases 1

Sponsor this project

Packages

Contributors 3

Languages

License

c4v4/cft

Folders and files

Latest commit

History

Repository files navigation

AC-CFT

References

Building and Running

Tests and Coverage

Algorithm Structure

Integration in other projects

Instance

Full run

3-Phase

Greedy

Benchmarks

Rail Instances

Python Bindings

Coding Style

External Dependencies

User Vs Library Code

User Code

Library Code

Function Parameters

Custom Types

About

Resources

License

Stars

Watchers

Forks

Releases 1

Sponsor this project

Packages 0

Contributors 3

Languages

Packages