Skip to content

Commit

Permalink
Updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
thegoldgoat committed Aug 13, 2024
1 parent 1eb573f commit 38394d8
Show file tree
Hide file tree
Showing 6 changed files with 59 additions and 11 deletions.
64 changes: 56 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,71 @@
# CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching


Code regarding Parravicini et al. 2021 paper can be found [here](https://github.com/necst/cicero/tree/parravicini_et_al).

Cicero is a domain specific architecture that can be employed to perform exact regular expression (RE) matching using FPGAs.
The cool fact about Cicero is that - as other software libraries one among the other RE2 - does not suffer from backtracking problem.
The cool fact about Cicero is that - as other software libraries one among the other [RE2](https://github.com/google/re2) - does not suffer from backtracking problem.
This means that when it elaborate a REs that carry some kind of non-determinsm (e.,g. a?a ) it does not take a guess and then backtrack but can explore all the different options in a single pass of the input string.

If you are interested in the topic take a look at [Russ Cox article](https://swtch.com/~rsc/regexp/regexp1.html)
If you are interested in the topic take a look at [Russ Cox article](https://swtch.com/~rsc/regexp/regexp1.html).

## System View

![cicero-mlir-system](./figures/cicero-mlir-system.png)

From a system perspective, Cicero features two components:

1. **A compiler**: which compiles REs into a domain specific ISA binary
2. **An architecture on FPGA**: which receives a compiled RE and an input string, and output wheter the input is matched by the RE or not.

## Compiler Overview

The compiler's code can be found [here](https://github.com/necst/cicero_compiler_cpp). The compiler is implemented using MLIR and ANTLR4. The compilation pipeline can be described as follows:

Here it follows an high level overview of Cicero Engines and how they can be combined together.
1. Parse textual RE into ANTLR4 AST
2. Generate representation of Regex using the proposed `regex` MLIR dialect
3. (optional) Optimization pass on `regex` dialect
4. Lowering conversion of `regex` dialect to proposed `cicero` dialect
5. (optional) Optimization pass on `cicero` dialect
6. Generate Cicero ISA binary code

## Architecture Overview

![cicero_engine_multi_char](./figures/cicero_multi_new.png)
![cicero_multi_new_interconnection 1](./figures/cicero_engine_multi_char.png)
![cicero-engine](./figures/cicero-engine.png)

Cicero has its own [compiler](https://github.com/necst/cicero_compiler/) that converts REs in our custom ISA.
The Cicero architecture features a sliding window of input character. Each character in the window is addressed by a `CC_ID_BITS`-bits wide pointer, as such the window contains `2^CC_ID_BITS` characters.
The Cicero architecture is composed of multiple *engines*, which can be combined together in ring or torus topologies. Execution threads are distributed among engines by a load balancing infrastructure. However, during our studies we found out that an architecture configuration with a single engine is more efficient.
Each engine packs as many FIFOs and CICERO-cores as number of characters in the input window.

## Code Overview

- `bitstream`: pre-compiled bitstreams for Ultra96 v2 board, and their static metrics (board usage percentages and total on-chip power)
- `cicero_compiler`: older compiler implementation
- `cicero_compiler_cpp`: new compiler implementation, using MLIR
- `hdl_src`: System Verilog implementation of the architecture
- `proj`: Vivado project files for the architecture development
- `scripts`: Various helper scripts for development, verification and benchmarking

## Development

If you find this repository useful, please use the following citation:
See [development.md](./development.md)

## Acknowledgment

This work has financial support from ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by European Union – NextGenerationEU.
The authors are grateful to the CGO 2025's anonymous reviewer feedback, the AMD University Program support, and [Valentina Sona](https://github.com/ValentinaSona) for working on the original Cicero architecture simulator.

## Paper Citation

If you find this repository useful, please use the following citations:

```
@article{somaini2025cicero,
title = {Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression Matching},
author = {Andrea Somaini and Filippo Carloni and Giovanni Agosta and Marco D. Santambrogio and Davide Conficconi},
year = 2025,
month = {mar},
}
```

```
@article{parravicini2021cicero,
Expand Down
6 changes: 3 additions & 3 deletions development.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,20 @@ Where:
For example:

```bash
python3 measure.py /home/xilinx/src/higher_ccid_bitstreams/vect_cc_id_4_bb_n_1.bit protomata.input protomata.regex results.csv /home/xilinx/src/cicero_compiler_cpp/ 100 100
python3 measure.py /path/to/bitstream.bit protomata.input protomata.regex results.csv /path/to/cicero_compiler_cpp/ 100 100
```

To check the correctness of the results, use `scripts/measurements/benchmark/check_results.py`, by specifying the `.csv` file and the file with the inputs that was used, for example:

```bash
python3 scripts/measurements/benchmark/check_results.py vect_cc_id_4_bb_n_9_brill_c++.csv ../scripts/measurements/benchmark/brill.input
python3 scripts/measurements/benchmark/check_results.py results.csv brill.input
```

## Benchmark on board

For benchmarking, we essentially execute `scripts/measurements/benchmark/measure.py` over all the desired bitstreams/regexes/inputs/compilers.

A wrapper script is provided, `scripts/measurements/benchmark/test_top.py`. First, update the constants in that script file to match the desired configurations you want to benchmark, and then execute it. A `.csv` file will be created alongside each `.bit` file for each compiler/benchmark.
A wrapper script is provided, `scripts/measurements/benchmark/test_top.py`. First, update the constants in `scripts/measurements/benchmark/bench_config.py` to match the desired configurations you want to benchmark, and then execute `test_top.py`. A `.csv` file will be created alongside each `.bit` file for each compiler/benchmark.

You can then use `scripts/measurements/benchmark/aggregate.py` to aggregate all the results of the benchmarks. For example, if you want the aggregated output in `output.csv` and all the `.csv` from previous step are in the `measurements` folder, run:

Expand Down
Binary file added figures/cicero-engine.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figures/cicero-mlir-system.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed figures/cicero_engine_multi_char.png
Binary file not shown.
Binary file removed figures/cicero_multi_new.png
Binary file not shown.

0 comments on commit 38394d8

Please sign in to comment.