Updated README

thegoldgoat · Aug 13, 2024 · 38394d8 · 38394d8
1 parent 1eb573f
commit 38394d8
Show file tree

Hide file tree

Showing 6 changed files with 59 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1,23 +1,71 @@
 # CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching
-
+
+Code regarding Parravicini et al. 2021 paper can be found [here](https://github.com/necst/cicero/tree/parravicini_et_al).
+
 Cicero is a domain specific architecture that can be employed to perform exact regular expression (RE) matching using FPGAs.
-The cool fact about Cicero is that - as other software libraries one among the other RE2 - does not suffer from backtracking problem.
+The cool fact about Cicero is that - as other software libraries one among the other [RE2](https://github.com/google/re2) - does not suffer from backtracking problem.
 This means that when it elaborate a REs that carry some kind of non-determinsm (e.,g. a?a ) it does not take a guess and then backtrack but can explore all the different options in a single pass of the input string.
 
-If you are interested in the topic take a look at [Russ Cox article](https://swtch.com/~rsc/regexp/regexp1.html)
+If you are interested in the topic take a look at [Russ Cox article](https://swtch.com/~rsc/regexp/regexp1.html).
+
+## System View
+
+![cicero-mlir-system](./figures/cicero-mlir-system.png)
+
+From a system perspective, Cicero features two components:
+
+1. **A compiler**: which compiles REs into a domain specific ISA binary
+2. **An architecture on FPGA**: which receives a compiled RE and an input string, and output wheter the input is matched by the RE or not.
+
+## Compiler Overview
+
+The compiler's code can be found [here](https://github.com/necst/cicero_compiler_cpp). The compiler is implemented using MLIR and ANTLR4. The compilation pipeline can be described as follows:
 
-Here it follows an high level overview of Cicero Engines and how they can be combined together.
+1. Parse textual RE into ANTLR4 AST
+2. Generate representation of Regex using the proposed `regex` MLIR dialect
+3. (optional) Optimization pass on `regex` dialect
+4. Lowering conversion of `regex` dialect to proposed `cicero` dialect
+5. (optional) Optimization pass on `cicero` dialect
+6. Generate Cicero ISA binary code
 
+## Architecture Overview
 
-![cicero_engine_multi_char](./figures/cicero_multi_new.png)
-![cicero_multi_new_interconnection 1](./figures/cicero_engine_multi_char.png)
+![cicero-engine](./figures/cicero-engine.png)
 
-Cicero has its own [compiler](https://github.com/necst/cicero_compiler/) that converts REs in our custom ISA.
+The Cicero architecture features a sliding window of input character. Each character in the window is addressed by a `CC_ID_BITS`-bits wide pointer, as such the window contains `2^CC_ID_BITS` characters.
+The Cicero architecture is composed of multiple *engines*, which can be combined together in ring or torus topologies. Execution threads are distributed among engines by a load balancing infrastructure. However, during our studies we found out that an architecture configuration with a single engine is more efficient.
+Each engine packs as many FIFOs and CICERO-cores as number of characters in the input window.
 
+## Code Overview
 
+- `bitstream`: pre-compiled bitstreams for Ultra96 v2 board, and their static metrics (board usage percentages and total on-chip power)
+- `cicero_compiler`: older compiler implementation
+- `cicero_compiler_cpp`: new compiler implementation, using MLIR
+- `hdl_src`: System Verilog implementation of the architecture
+- `proj`: Vivado project files for the architecture development
+- `scripts`: Various helper scripts for development, verification and benchmarking
 
+## Development
 
-If you find this repository useful, please use the following citation:
+See [development.md](./development.md)
+
+## Acknowledgment
+
+This work has financial support from ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by European Union – NextGenerationEU.
+The authors are grateful to the CGO 2025's anonymous reviewer feedback, the AMD University Program support, and [Valentina Sona](https://github.com/ValentinaSona) for working on the original Cicero architecture simulator.
+
+## Paper Citation
+
+If you find this repository useful, please use the following citations:
+
+```
+@article{somaini2025cicero,
+    title = {Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression Matching},
+    author = {Andrea Somaini and Filippo Carloni and Giovanni Agosta and Marco D. Santambrogio and Davide Conficconi},
+    year = 2025,
+    month = {mar},
+ } 
+```
 
 ```
 @article{parravicini2021cicero,

diff --git a/development.md b/development.md
@@ -64,20 +64,20 @@ Where:
 For example:
 
 ```bash
-python3 measure.py /home/xilinx/src/higher_ccid_bitstreams/vect_cc_id_4_bb_n_1.bit protomata.input protomata.regex results.csv /home/xilinx/src/cicero_compiler_cpp/ 100 100
+python3 measure.py /path/to/bitstream.bit protomata.input protomata.regex results.csv /path/to/cicero_compiler_cpp/ 100 100
 ```
 
 To check the correctness of the results, use `scripts/measurements/benchmark/check_results.py`, by specifying the `.csv` file and the file with the inputs that was used, for example:
 
 ```bash
-python3 scripts/measurements/benchmark/check_results.py vect_cc_id_4_bb_n_9_brill_c++.csv ../scripts/measurements/benchmark/brill.input
+python3 scripts/measurements/benchmark/check_results.py results.csv brill.input
 ```
 
 ## Benchmark on board
 
 For benchmarking, we essentially execute `scripts/measurements/benchmark/measure.py` over all the desired bitstreams/regexes/inputs/compilers.
 
-A wrapper script is provided, `scripts/measurements/benchmark/test_top.py`. First, update the constants in that script file to match the desired configurations you want to benchmark, and then execute it. A `.csv` file will be created alongside each `.bit` file for each compiler/benchmark.
+A wrapper script is provided, `scripts/measurements/benchmark/test_top.py`. First, update the constants in `scripts/measurements/benchmark/bench_config.py` to match the desired configurations you want to benchmark, and then execute `test_top.py`. A `.csv` file will be created alongside each `.bit` file for each compiler/benchmark.
 
 You can then use `scripts/measurements/benchmark/aggregate.py` to aggregate all the results of the benchmarks. For example, if you want the aggregated output in `output.csv` and all the `.csv` from previous step are in the `measurements` folder, run:
 

diff --git a/figures/cicero-engine.png b/figures/cicero-engine.png
diff --git a/figures/cicero-mlir-system.png b/figures/cicero-mlir-system.png
diff --git a/figures/cicero_engine_multi_char.png b/figures/cicero_engine_multi_char.png
diff --git a/figures/cicero_multi_new.png b/figures/cicero_multi_new.png