Skip to content

Conversation

@ntviet18
Copy link
Contributor

@ntviet18 ntviet18 commented Oct 28, 2018

@fe2s , these changes will remove key column from Redis hashes if key.column is specified

Copy link
Contributor

@fe2s fe2s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Viet, I will take a closer look later. Please see some preliminary questions.

/** parameters **/
private val tableNameOpt: Option[String] = parameters.get(SqlOptionTableName)
private val keysPatternOpt: Option[String] = parameters.get(SqlOptionKeysPattern)
private val keysPrefixPattern = keysPatternOpt.orElse(tableNameOpt).getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a dataKeyPattern() function that basically does the same. Also, I would rather use Option that an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

def tableKey(keysPrefixPattern: String, redisKey: String): String = {
if (keysPrefixPattern.isEmpty) {
redisKey
} else if (keysPrefixPattern.endsWith(":*")) {
Copy link
Contributor

@fe2s fe2s Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use kind of regex to extract an actual key value from the key? In this case we don't need to put any limitations on the key prefix pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's support asterisk pattern only for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even a compiled regex is pretty slow. While substring takes virtually no computation time. How about we create an option for user to choose? e.g. they can choose some pattern like (*):$tableName:*

Copy link
Contributor

@fe2s fe2s Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let's make a limitation that a pattern should end with an asterisk, but the delimiter could be any char/string.

Copy link
Contributor

@fe2s fe2s Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the pattern could be "user:*" or "user#" or "user/"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the check for :* is redundant, because .endsWith("*") is enough

2) "person:Peter"
```

The keys will not be persisted in Redis hashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also update 'Reading Redis Hashes' and 'DataFrame options' sections

override def decodeRow(key: (String, String), value: Any, schema: => StructType,
inferSchema: Boolean, requiredColumns: Seq[String]): Row = {
val values = value match {
case v: JMap[String, String] => v.asScala.toSeq
Copy link
Contributor

@fe2s fe2s Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are compilation warnings related to type erasure. Can we fix them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make an explicit cast. That will not make the match safer, but at least the warnings will go aways

def load(pipeline: Pipeline, key: String, requiredColumns: Seq[String]): Unit

def encodeRow(value: Row): T
def encodeRow(key: (String, String), value: Row): T
Copy link
Contributor

@fe2s fe2s Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add javadoc for these methods. It is now not obvious why key is a tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

}
val rows = keys.zip(pipelineValues).map { case (key, value) =>
val keyMap = keyName -> tableKey(keysPrefixPattern, key)
persistence.decodeRow(keyMap, value, filteredSchema(requiredColumns),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize it executing filteredSchema() only once for all keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already cached in general case. It's little tricky with the inferSchema option. However, I think it's still possible to optimize more.

@ntviet18
Copy link
Contributor Author

@fe2s , thanks for your comments. Please see the changes

@fe2s
Copy link
Contributor

fe2s commented Oct 31, 2018

Looks good. Thanks.

@fe2s fe2s merged commit c348532 into RedisLabs:dataframe Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants