Skip to content

Conversation

@MechCoder
Copy link
Contributor

  1. Remove sorting of indices and assume that the user gives a sorted tuple of indices, values etc
  2. Avoid iterating twice to get the indices and values if the argument provided is a dict.
  3. Add checks such that the length of the indices should be less than the size provided.

@MechCoder
Copy link
Contributor Author

@JoshRosen Were the pylint checks removed?

@MechCoder
Copy link
Contributor Author

I moved the buffer checks upward, so that it is easier to perform the (tuple / dict) conversion and check.

@SparkQA
Copy link

SparkQA commented Aug 1, 2015

Test build #39358 has finished for PR 7854 at commit f173c16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BytesToBytesMapIterator implements Iterator<Location>
    • class StopWordsRemover(override val uid: String)
    • public abstract class UnsafeKeyValueSorter
    • abstract class InternalRow extends GenericSpecializedGetters with Serializable
    • trait GenericSpecializedGetters extends SpecializedGetters
    • class SpecificUnsafeProjection extends $
    • abstract class UnsafeRowJoiner
    • |class SpecificUnsafeRowJoiner extends $
    • case class GetArrayItem(child: Expression, ordinal: Expression)
    • case class GetMapValue(child: Expression, key: Expression)
    • case class SubstringIndex(strExpr: Expression, delimExpr: Expression, countExpr: Expression)
    • class ArrayBasedMapData(val keyArray: ArrayData, val valueArray: ArrayData) extends MapData
    • class GenericArrayData(array: Array[Any]) extends ArrayData with GenericSpecializedGetters
    • abstract class MapData extends Serializable
    • public abstract class KVIterator<K, V>

@MechCoder
Copy link
Contributor Author

@jkbradley should this find a place in 1.5 or can this wait?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items() has no guarantees on the ordering of the keys... Is it okay that indices may not be sorted after this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should use an OrderedDict here

@feynmanliang
Copy link
Contributor

It looks like the Python constructors enforce sorted indices (consistent with Scala) only with numeric (and not String) args. This should be made consistent, but it's not clear whether we should guarantee ordering at the cost of sorting overhead or relax this constraint at the risk of breaking code elsewhere.

We should clarify and clearly document if there is any guarantee on the ordering of indices, since features like #7794 depend on this ordering.

CC @mengxr @jkbradley

@MechCoder
Copy link
Contributor Author

might not be able to work on this. feel free to cherry pick and complete.

@MechCoder MechCoder closed this Oct 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants