Final Project

Can you extend a relational database system to support storing and querying over vectors?

Steps

Fork this project
Understand the source code
Write your improvements
Run experiments
Write a report (Details can be seen below)
Push to GitLab and open a merge request

How to Run

Start the server
Load the provided SIFT1M dataset
Stop the server to flush all the changes
Restart the server
Run the provided SIFT benchmark
Check the benchmark result

Note: Your improvement will be evaluated on the provided properties (1M items with 128-dimensional vector embedding each), same machine, same time limit(30 minutes for Loadtestbed+Benchmark excluding Recall Calculation).

Note

The new workload includes insert. Please be sure to reload the testbed each time you run a benchmark.

Hints

We will only use HeuristicQueryPlanner for our vector search operations.
Our naive implementation sorts all the vector and return the top-k closest records to the client.
You can easily beat our performance by implementing any indexing algorithms for the vector search. Note that you still have to consider correctness because we will measure recall.
Make sure TablePlanner calls your index, if you choose to implement one.
You can look into org.vanilladb.core.sql.distfn.EuclideanFn to implement SIMD. Note: Our benchmark will only use EuclideanFn. You may choose not to implement SIMD for CosineFn.
Make sure you run jdk17 with jdk.incubator.vector package (default jdk17 in VScode is not contain this package) to enable SIMD in Java.

Experiments

Based on the workload we provide, show the followings:

Throughput
Recall

Show the comparison between the performance of the unmodified source code and the performance of your modification.

You can then think about the parameter settings that really show your improvements.

Report

Briefly explain what you do
- How you implement your indexes
- How you implement SIMD
- Other improvements you made to speed up the search
Experiments
- Your experiment environment (a list of your hardware components, your operating system)
  - e.g. Intel Core i5-3470 CPU @ 3.2GHz, 16 GB RAM, 128 GB SSD, CentOS 7
- Based on the workload we provide:
  - Show your improvement using graphs
- Your benchmark parameters
- Analysis on the results of the experiments

Note: There is no strict limitation to the length of your report. Generally, a 2-3 page report with some figures and tables is fine. Remember to include all your group members' student IDs

Submission

The procedure is as follows:

Fork the final project
Clone the repository you forked
Finish your work and write the report
Commit your work, push your work to GitLab.
- Name your report [Team Number]_final_project_report.pdf
  - e.g. team1_final_project_report.pdf
Open a merge request to the original repository.
- Source branch: Your working branch
- Target branch: The branch with your team number (e.g. team-1)
- Title: Team-X Submission (e.g. Team-1 Submission)

Note: Only one submission for each team.

No Plagiarism

If we find you copying someone's code, you get 0 point for this assignment.

No all data in memory

Database is not allowed to put all your data into memory.

No modify our benchmark

modify our benchmark is not allowed.

Deadline

Submit your work before 2024/06/16 (Sun) 23:59:59.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bench		bench
core-patch		core-patch
.gitignore		.gitignore
README.md		README.md
Team3_final_project_report.pdf		Team3_final_project_report.pdf
launch.json		launch.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project

Steps

How to Run

Note

Hints

Experiments

Report

Submission

No Plagiarism

No all data in memory

No modify our benchmark

Deadline

About

Releases

Packages

Languages

llshang/db24-final-project

Folders and files

Latest commit

History

Repository files navigation

Final Project

Steps

How to Run

Note

Hints

Experiments

Report

Submission

No Plagiarism

No all data in memory

No modify our benchmark

Deadline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages