gss

Grep for Small Strings.

Background

The grep command on unix based operating systems implements the Boyer-Moore string searching algorithm in order to execute its pattern matching. The problem with Boyer-Moore is that as pattern size decreases, the algorithm takes longer to run. This can significantly slow down people wanting to match small strings in large portions of text, for example those found in DNA sequencing.

I am implementing the Knuth-Morris-Pratt (KMP) pattern matching algorithm, and the Rabin-Karp pattern matching algorithm. The end goal is to achieve a faster runtime than Unix grep when searching for small strings (eg. those found in DNA sequencing, such as ACGT).

Why implement two algorithms?

The problem with KMP is that it has a space complexity of O(len text), and this means that if there is a very long portion of text which needs to be read, gss would segfault - if it only implemented KMP. As such, the Rabin-Karp algorithm has also been implemented, because it has a space complexity of O(1). The idea is that KMP will be run on every line of reasonable length, and when an exceptionally large line is encountered, Rabin-Karp will be run to prevent segfault.

Realistically, there shouldn't be lines which will cause the KMP algorithm to crash, but in the most unlikely instance that there is, Rabin-Karp has also been implemented as a failsafe.

Benchmarks

Benchmarks will be put up here when a command line interface of some form has been written.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
benchmarks		benchmarks
src		src
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.cc		main.cc
main.h		main.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gss

Background

Why implement two algorithms?

Benchmarks

About

Releases

Packages

Languages

License

maxgodfrey2004/gss

Folders and files

Latest commit

History

Repository files navigation

gss

Background

Why implement two algorithms?

Benchmarks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages