Skip to content
/ strparam Public

parameterized pattern matching (faster alternative to golang regexp)

License

Notifications You must be signed in to change notification settings

gebv/strparam

Repository files navigation

strparam

CI Status Go Report Card codecov

40 times faster аlternative to regex for string matching by pattern and extract params. This is solution as a middle point between simple strings and regular expressions.

Features

Introduction

For example. Need to parse the following pattern foo=(..), baz=(..), golang. Instead of .. can be any value. With regexp, the solution would look something like this.

in := "foo=(bar), baz=(日本語), golang"
re := regexp.MustCompile(`foo=\((.*)\), baz=\((.*)\), golang`)
re.FindAllStringSubmatch(str, -1)
// [[foo=(bar), baz=(日本語), golang bar 日本語]]

On the playground

Or even like this.

in := "foo=(bar), baz=(日本語), golang"
re := regexp.MustCompile(`\(([^)]+)\)`)
rex.FindAllStringSubmatch(str, -1)
// [[(bar) bar] [(日本語) 日本語]]

On the playground

But regular expressions is slow on golang.

Follow the benchmarks for naive solution on regexp (see above) and method Loockup for parsed patterns.

BenchmarkParamsViaRegexp1
BenchmarkParamsViaRegexp1-4                	   23230	     56140 ns/op	   19258 B/op	       5 allocs/op
BenchmarkParamsViaRegexp2
BenchmarkParamsViaRegexp2-4                	   52396	     23079 ns/op	   28310 B/op	       8 allocs/op
BenchmarkParamsViaStrparam_NumParams2
BenchmarkParamsViaStrparam_NumParams2-4    	  315464	      3467 ns/op	     295 B/op	       1 allocs/op
BenchmarkParamsViaStrparam_NumParams5
BenchmarkParamsViaStrparam_NumParams5-4    	  193682	      5444 ns/op	     296 B/op	       1 allocs/op
BenchmarkParamsViaStrparam_NumParams20
BenchmarkParamsViaStrparam_NumParams20-4   	   72276	     18467 ns/op	     297 B/op	       1 allocs/op

Faster solution.

in := "foo=(bar), baz=(日本語), golang"
s, _ := Parse("foo=({p1}), baz=({p2}), golang")
found, params := s.Lookup(in)
// true [{Name:p1 Value:bar} {Name:p2 Value:日本語}]

On the playground

Multiple pattern match

Performing multiple pattern match for input string. To use a variety of patterns.

At same level the patterns are sorted (by number of childs and by length constatnt token value) from top to down

Sorting rules:

  • CONST type token has the highest weight
  • longer CONST type token has a higher weight
  • token with more childs has a higher weight

TODO: more details on engine a multiple pattern matching

r := NewStore()
r.Add("foo2{p1}foo2{p2}golang")
r.Add("foo1{p3}foo1{p4}golang")

in := "foo1XXXfoo1YYYgolang"

schema := r.Find(in)
found, params := schema.Lookup(in)

Follow the benchmarks for method Store.Find (without extracting parameters).

BenchmarkStore_Lookup_2_2
BenchmarkStore_Lookup_2_2-4                	  255735	      4071 ns/op	     160 B/op	       2 allocs/op
BenchmarkStore_Lookup_2_102
BenchmarkStore_Lookup_2_102-4              	  108709	     12170 ns/op	     160 B/op	       2 allocs/op

On the playground

Guide

Installation

go get github.com/gebv/strparam

Example

Example for a quick start.

package main

import (
	"fmt"

	"github.com/gebv/strparam"
)

func main() {
	in := "foo=(bar), baz=(日本語), golang"
	s, _ := strparam.Parse("foo=({p1}), baz=({p2}), golang")
	ok, params := s.Lookup(in)
    fmt.Printf("%v %+v", ok, params)
}

On the playground

How does it work?

Pattern is parse into array of

  • tokens with offset information in bytes for constants.
  • tokens with information of parameter (parameter name and other information).

This pattern foo=({p1}), baz=({p2}), golang looks like an array

[
    {Mode:begin}
    {Mode:pattern Len:5 Raw:"foo=("} // constant
    {Mode:parameter Raw:"{p1}"}
    {Mode:pattern Len:8 Raw:"), baz=("}
    {Mode:parameter Raw:"{p2}"}
    {Mode:pattern Len:9 Raw:"), golang"}
    {Mode:end}
]

At the time of parsing the incoming string move around the token array if each token matches. Moving from token to token, we keep the general offset (matching shift). For parameters, look for the next constant (search window) or end of line.

Prefix-tree is used to store the list of patterns.

For example the follow next patterns:

  • foo{p1}bar
  • foo{p1}baz
root
    └── foo
        └── {p1}
	        ├── bar
	        └── baz

As parsing incoming string we are moving to deep in the tree.

TODO

  • multiple patterns, lookup and extract params
  • extend parameters for internal validators, eg {paramName required, len=10}
  • external validators via hooks
  • stream parser
  • sets weight for equal childs (for sorting), eg {paramName1 weight=100}, {paramName2 weight=200} (specific case?)

License

MIT