-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vinai/load data - Part 1 very basic functionality review #316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, but needs a different approach. See comments
enginetest/load_queries.go
Outdated
} | ||
|
||
// Creates a directory of files | ||
func CreateDummyFiles(dir string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a confusing way to set up test data. The test data should be declared alongside the test scenario.
My suggestion is to just check files into source control in a testdata directory, and then issue actual LOAD DATA
commands for the various files in the setup portion of the scripts above.
server/handler.go
Outdated
return err | ||
} | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should die loudly here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to do this. Just return an error if it's not local
sql/parse/parse.go
Outdated
return nil, err | ||
} | ||
|
||
ignoreNumVal, ok = ign.(int8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be larger than 255 right?
sql/parse/parse.go
Outdated
} | ||
} | ||
|
||
return plan.NewLoadData(bool(d.Local), d.Infile, unresolvedTable, columnsToStrings(d.Columns), d.Fields, d.Lines, ignoreNumVal), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good first draft, but what you really need to do is create a new Insert node with the LoadData node as the data source. You need the Insert to be the top-level node so that other logic (like type conversions, trigger application, constraint checking) etc. just works. Otherwise you'll end up duplicating the insert logic anywhere, as a bunch of your TODOs mention)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer, but still a couple major issues with this approach. See commentsd
sql/numbertype.go
Outdated
@@ -237,9 +237,15 @@ func (t numberTypeImpl) Convert(v interface{}) (interface{}, error) { | |||
} | |||
return uint32(num), nil | |||
case sqltypes.Int32: | |||
// If empty return the nil value. | |||
if v == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this is true for all the integer types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the MySQL behavior:
insert into mytable (i) values ("");
ERROR 1366 (HY000): Incorrect integer value: '' for column 'i' at row 1
sql/plan/load_data.go
Outdated
) | ||
|
||
var ( | ||
fieldsTerminatedByDelim = "\t" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these settings in the query?
Vinai: There are defaults that mysql uses when nothing is specified.
sql/plan/load_data.go
Outdated
func (l *LoadData) String() string { | ||
pr := sql.NewTreePrinter() | ||
|
||
_ = pr.WriteNode("LOAD DATA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just do LOAD DATA (filename)
sql/plan/load_data.go
Outdated
type LoadData struct { | ||
Local bool | ||
File string | ||
Destination sql.Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer want this here
Vinai: I use this to infer some schema to determine the right default values.
sql/plan/load_data.go
Outdated
return []sql.Node{l.Destination} | ||
} | ||
|
||
func (l *LoadData) RowIter(ctx *sql.Context, row sql.Row) (sql.RowIter, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to scan file lines one at a time as you load rows, not all at once. Otherwise we read the entire file before we begin making progress inserting rows, and we need the entire file in memory (which may be massive). So open the file and scanner in RowIter, then return a sql.RowIter implementation that reads lines off of it one at a time.
sql/plan/load_data.go
Outdated
if l.Lines != nil { | ||
ll := l.Lines | ||
if ll.StartingBy != nil { | ||
linesStartingByDelim = string(ll.StartingBy.Val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong approach. These vary by statement, so make them fields on the struct, not global vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a little more work, the tests are a bit lacking
enginetest/load_queries.go
Outdated
{ | ||
Name: "Basic load data with enclosed values.", | ||
SetUpScript: []string{ | ||
fmt.Sprintf("create table %s(pk int primary key)", tableNameConst), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really no reason to have this const. Just hardcode it everywhere it's needed. Makes the tests easier to read
}, | ||
}, | ||
{ | ||
Name: "Load data with csv with prefix.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a test of loading data into a table that doesn't exist?
server/handler.go
Outdated
return err | ||
} | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to do this
sql/numbertype.go
Outdated
@@ -155,7 +156,7 @@ func (t numberTypeImpl) Compare(a interface{}, b interface{}) (int, error) { | |||
|
|||
// Convert implements Type interface. | |||
func (t numberTypeImpl) Convert(v interface{}) (interface{}, error) { | |||
if v == nil { | |||
if v == nil || (reflect.ValueOf(v).Kind() == reflect.Ptr && reflect.ValueOf(v).IsNil()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, this actually necessary?
sql/plan/load_data.go
Outdated
} | ||
|
||
// updateParsingConsts parses the LoadData object to update the 5 constants that are used for file parsing. | ||
func (l *LoadData) updateParsingConsts() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name and comment are no longer accurate
sql/plan/load_data.go
Outdated
} | ||
|
||
// parseLines finds the delim that terminates each line and returns the overall line. | ||
func (l LoadData) parseLines(scanner *bufio.Scanner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad name for this method.
Just make the splitFunc function top level, rename it splitLines
, and call scanner.Split(splitLines)
in RowIter
sql/plan/load_data.go
Outdated
} | ||
|
||
func (l loadDataIter) Close(ctx *sql.Context) error { | ||
if !l.scanner.Scan() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who cares if the scanner is done or not, this method has to close everything up
sql/plan/load_data.go
Outdated
|
||
func addNullsToValues(exprs []sql.Expression, diff int) []sql.Expression { | ||
for i := diff; i > 0; i-- { | ||
exprs = append(exprs, expression.NewLiteral(nil, sql.Null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, just make the expr array the same size as the dest schema and the remaining entries will already be nil
sql/plan/load_data.go
Outdated
// create the values that are returned as a row iter. | ||
var values [][]sql.Expression | ||
values = append(values, exprs) | ||
newValue := NewValues(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a lot of garbage just to return a sql.Row. Just return the row directly
sql/plan/load_data.go
Outdated
colDiff := len(l.destination.Schema()) - len(exprs) | ||
|
||
// append NULLS for the rest of the fields | ||
exprs = addNullsToValues(exprs, colDiff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a test of this -- all the test data files have the same number of columns as their target schema
…at the problem is. Looks like bytes to string in quotes chars is causinng problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, just a couple comments
enginetest/script_queries.go
Outdated
@@ -33,12 +33,16 @@ type ScriptTest struct { | |||
Expected []sql.Row | |||
// For tests that make a single assertion, ExpectedErr can be set for the expected error | |||
ExpectedErr *errors.Kind | |||
// For tests that need to be skipped | |||
Skip bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this, put tests that need to be skipped into their own blocks and call them from a different method (TestBrokenLoadFile)
server/handler.go
Outdated
return err | ||
} | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to do this. Just return an error if it's not local
enginetest/enginetests.go
Outdated
@@ -460,6 +460,22 @@ func TestInsertIntoErrors(t *testing.T, harness Harness) { | |||
} | |||
} | |||
|
|||
func TestLoadData(t *testing.T, harness Harness) { | |||
for _, script := range LoadDataScripts { | |||
if !script.Skip { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below about the right way to do skipped tests.
Also, whenever you skip a test in go, you need to explicitly call t.skip(), not just fail to run it. Otherwise we have no visibility into how many tests are being skipped like this.
|
||
var LoadDataScripts = []ScriptTest{ | ||
{ | ||
Name: "Basic load data with enclosed values.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need tests for when the file and the target schema have different number of columns
This PR introduces basic functionality for LOAD DATA.