Skip to content

wildcard search #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nickmorales opened this issue Dec 13, 2017 · 12 comments
Open

wildcard search #154

nickmorales opened this issue Dec 13, 2017 · 12 comments
Labels
BrAPI-Core Related to BrAPI-Core New Parameter Adding a new search/filter parameter Question Outstanding question

Comments

@nickmorales
Copy link
Contributor

would be useful for germplasm and studies

@cpommier
Copy link
Member

cpommier commented Feb 7, 2018

Is this a wildcard on a single specified field or a wildcard on all fields.

@guignonv
Copy link
Member

guignonv commented Mar 6, 2018

This issue seems related to #199.

@BrapiCoordinatorSelby
Copy link
Member

What about something like this added to the POST search object:

{
	paramOne: ["StRiNg*", "st*ng"],
	paramTwo: ["_ring", "str*_"],
	paramThree: ["StRiNg1", "string2"],

	searchConfig: {
		matchMethods: [
			{
				searchParameter: "paramOne",
				wildcard: true,
				caseSensitive: false,
				wildcardCharacter: "*"
			},
			{
				searchParameter: "paramTwo",
				wildcard: true,
				caseSensitive: true,
				wildcardCharacter: "_"
			},
			{
				searchParameter: "paramThree",
				wildcard: false,
				caseSensitive: true,
				wildcardCharacter: null
			}
		],
		sortMethods: [
			{
				searchParameter: "paramTwo",
				sortPriorety: 1,
				sortAccending: true
			}
		]
	}
}

Adding something like this searchConfig object would allow a lot of flexibility without changing the existing calls too much. Those who do not need this level of flexibility could continue using the existing Search calls with searchConfig : null or not included. I would only add this to calls when needed (germplasm-search, studies-search) to limit the complexity of the server side code.

Another alternative is we keep the existing calls as they are, simple and without any wild card or match method, and peruse a complex-search structure as described here #193. This will be more complex to implement, but much more flexible.

@GuilhemSempere
Copy link
Contributor

What about simply supporting regex searches? Are there any cases we would'nt be able to deal with?

@BrapiCoordinatorSelby
Copy link
Member

yes, Regex is definitely a valid option, though in most cases I would caution against it for performance reasons. In general, regex in SQL is very slow. So if we did want to support regex, I would still want to add an explicit flag in the request to indicate to the server that it should use a REGEXP query. That way, if regex was not explicitly requested, the server could use a much faster version of the search query.

@BrapiCoordinatorSelby BrapiCoordinatorSelby added BrAPI-Core Related to BrAPI-Core New Parameter Adding a new search/filter parameter Question Outstanding question and removed Enhancement labels Nov 18, 2021
@guignonv
Copy link
Member

guignonv commented Oct 23, 2023

Following this morning discussion, I propose a slightly different approach. Instead of replacing a simple text search with a regexp search in the specs, how about adding a (GET or POST) parameter called "operator" that would be by default set to a simple text search and implementation that supports other type of operators would provide the list somewhere (to dig... serverinfo? calls call?).
As type of operators we could define, I see:

  • "=": full case sensitive match for text and equal for numeric values
  • "!=": the neg of "="
  • "<", "<=", ">", ">=": for numeric values
  • "><": when a numeric value is between two others
  • "i=": full case INsensitive match for text (DEFAULT for text)
  • "!i=": the neg of "i="
  • "contains": when contains the exact given text (case insensitive)
  • "has_word": when contains at least one word of a list of space-separated words
  • "has_all": when contains all the given words of a list of space-separated words in any order
  • "begins": when a text begins with the given input
  • "ends": when a text ends with the given input
  • "regex": when a text matches a given regex
  • "shorter_than" and "longer_than": when a text is shorter or longer that a given length
  • ...with their negative versions.

So, for instance, if you want to find all germplasm which name begins with "pisang", it could be managed by /v2/germplasm/?germplasmName=pisang&operator=begins or /v2/germplasm/?germplasmName=^pisang.*$&operator=regex

@BrapiCoordinatorSelby
Copy link
Member

I am thinking about something even simpler. I think just a simple wildcard character would solve >80% of the use cases. We are thinking about really flexible and powerful tools to cover many imaginary scenarios, but there are simple, real, problem scenarios right now that aren't getting solved.

/germplasm?wildCardCharacter=*&germplasmName=exampl*
POST /search/germplasm { "wildCardCharacter" : "*", "germplasmNames" : ["exampl*", "*xample", "*xampl*"]}

wildCardCharacter default is "" and that indicates an exact match

That should cover "beginsWith", "endsWith", "contains", and "exact" matches. Easy to add to the spec, easy to implement in most systems, and it is somewhat extendable later if we find a real need for more complex text matching.

@guignonv
Copy link
Member

guignonv commented Oct 24, 2023

Fair enough. :) Question: how to choose between case sensitive and case insensitive match?

And it does solve the problem for text values but NOT when you want to filter attribute (numeric) values that are above, below or between some other value(s)... :-s

[edit]...and I'm quite sure there should be also a filter needed for dates somewhere!

@BrapiCoordinatorSelby
Copy link
Member

BrapiCoordinatorSelby commented Oct 24, 2023

The general consensus in the hackathon discussion was to keep the specification simple until there was a concrete use case that couldn't be solved with the existing filters. In this case @jframi is dealing with millions of germplasm and needing to search by name. It is not practical to download a larger set and filter client side. I believe it was a similar situation that opened this issue originally.
I also brought up the point that if you need a high level of complexity in your API search/filtering, then perhaps something like a GraphQL API would be a better fit. We are working on BrAPI in GraphQL which provides that level of filtering out-of-the-box, we don't need to recreate it in the RESTful API.

To answer your questions more directly: I would leave it up to the server implementation to decide if they are using case sensitivity or not. Until it becomes an issue, then we can discuss adding to the spec.

Regarding numbers and dates, the POST /search/images endpoint has some examples of using simple Max/Min parameters for searching numbers and RangeStart/RangeEnd parameters for searching date ranges.

"imageTimeStampRangeStart": "2018-01-01T14:47:23Z",
"imageTimeStampRangeEnd": "2018-12-12T23:47:23Z",
"imageWidthMax": 1920,
"imageWidthMin": 1280,

If the need arises, we can add these types of search fields for simple number and date filtering.

@Gabriel-Besombes
Copy link

General consensus at BrAPI Hackathon 2024 was :

  • Wild card : "*"
  • Escape character : ""

@cpommier
Copy link
Member

thanks, backslash escape character isn't displayed though

@cpommier
Copy link
Member

Wild card : "*"
Escape character : "\"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BrAPI-Core Related to BrAPI-Core New Parameter Adding a new search/filter parameter Question Outstanding question
Projects
None yet
Development

No branches or pull requests

6 participants