-
Notifications
You must be signed in to change notification settings - Fork 366
Hash key #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash key #105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Viet, I will take a closer look later. Please see some preliminary questions.
| /** parameters **/ | ||
| private val tableNameOpt: Option[String] = parameters.get(SqlOptionTableName) | ||
| private val keysPatternOpt: Option[String] = parameters.get(SqlOptionKeysPattern) | ||
| private val keysPrefixPattern = keysPatternOpt.orElse(tableNameOpt).getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a dataKeyPattern() function that basically does the same. Also, I would rather use Option that an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
| def tableKey(keysPrefixPattern: String, redisKey: String): String = { | ||
| if (keysPrefixPattern.isEmpty) { | ||
| redisKey | ||
| } else if (keysPrefixPattern.endsWith(":*")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use kind of regex to extract an actual key value from the key? In this case we don't need to put any limitations on the key prefix pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's support asterisk pattern only for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even a compiled regex is pretty slow. While substring takes virtually no computation time. How about we create an option for user to choose? e.g. they can choose some pattern like (*):$tableName:*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let's make a limitation that a pattern should end with an asterisk, but the delimiter could be any char/string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the pattern could be "user:*" or "user#" or "user/"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, the check for :* is redundant, because .endsWith("*") is enough
| 2) "person:Peter" | ||
| ``` | ||
|
|
||
| The keys will not be persisted in Redis hashes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also update 'Reading Redis Hashes' and 'DataFrame options' sections
| override def decodeRow(key: (String, String), value: Any, schema: => StructType, | ||
| inferSchema: Boolean, requiredColumns: Seq[String]): Row = { | ||
| val values = value match { | ||
| case v: JMap[String, String] => v.asScala.toSeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are compilation warnings related to type erasure. Can we fix them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make an explicit cast. That will not make the match safer, but at least the warnings will go aways
| def load(pipeline: Pipeline, key: String, requiredColumns: Seq[String]): Unit | ||
|
|
||
| def encodeRow(value: Row): T | ||
| def encodeRow(key: (String, String), value: Row): T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add javadoc for these methods. It is now not obvious why key is a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
| } | ||
| val rows = keys.zip(pipelineValues).map { case (key, value) => | ||
| val keyMap = keyName -> tableKey(keysPrefixPattern, key) | ||
| persistence.decodeRow(keyMap, value, filteredSchema(requiredColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optimize it executing filteredSchema() only once for all keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already cached in general case. It's little tricky with the inferSchema option. However, I think it's still possible to optimize more.
|
@fe2s , thanks for your comments. Please see the changes |
|
Looks good. Thanks. |
@fe2s , these changes will remove key column from Redis hashes if
key.columnis specified