Vinai/load data - Part 1 very basic functionality review #316

VinaiRachakonda · 2021-02-26T01:14:42Z

This PR introduces basic functionality for LOAD DATA.

zachmu

Good start, but needs a different approach. See comments

zachmu · 2021-03-01T20:01:50Z

enginetest/load_queries.go

+}
+
+// Creates a directory of files
+func CreateDummyFiles(dir string) error {


This is a confusing way to set up test data. The test data should be declared alongside the test scenario.

My suggestion is to just check files into source control in a testdata directory, and then issue actual LOAD DATA commands for the various files in the setup portion of the scripts above.

zachmu · 2021-03-01T20:06:14Z

server/handler.go

+				return err
+			}
+		}
+	default:


Should die loudly here

Still need to do this

Still need to do this. Just return an error if it's not local

zachmu · 2021-03-01T20:13:04Z

sql/parse/parse.go

+			return nil, err
+		}
+
+		ignoreNumVal, ok = ign.(int8)


This can be larger than 255 right?

zachmu · 2021-03-01T20:21:20Z

sql/parse/parse.go

+		}
+	}
+
+	return plan.NewLoadData(bool(d.Local), d.Infile, unresolvedTable, columnsToStrings(d.Columns), d.Fields, d.Lines, ignoreNumVal), nil


This is a good first draft, but what you really need to do is create a new Insert node with the LoadData node as the data source. You need the Insert to be the top-level node so that other logic (like type conversions, trigger application, constraint checking) etc. just works. Otherwise you'll end up duplicating the insert logic anywhere, as a bunch of your TODOs mention)

zachmu

Getting closer, but still a couple major issues with this approach. See commentsd

zachmu · 2021-03-01T23:36:32Z

sql/numbertype.go

@@ -237,9 +237,15 @@ func (t numberTypeImpl) Convert(v interface{}) (interface{}, error) {
 		}
 		return uint32(num), nil
 	case sqltypes.Int32:
+		// If empty return the nil value.
+		if v == "" {


Presumably this is true for all the integer types?

This isn't the MySQL behavior:

insert into mytable (i) values (""); ERROR 1366 (HY000): Incorrect integer value: '' for column 'i' at row 1

zachmu · 2021-03-01T23:58:39Z

sql/plan/load_data.go

+)
+
+var (
+	fieldsTerminatedByDelim = "\t"


Aren't these settings in the query?

Vinai: There are defaults that mysql uses when nothing is specified.

zachmu · 2021-03-01T23:59:20Z

sql/plan/load_data.go

+func (l *LoadData) String() string {
+	pr := sql.NewTreePrinter()
+
+	_ = pr.WriteNode("LOAD DATA")


Probably just do LOAD DATA (filename)

zachmu · 2021-03-01T23:59:32Z

sql/plan/load_data.go

+type LoadData struct {
+	Local              bool
+	File               string
+	Destination        sql.Node


No longer want this here

Vinai: I use this to infer some schema to determine the right default values.

zachmu · 2021-03-02T00:01:28Z

sql/plan/load_data.go

+	return []sql.Node{l.Destination}
+}
+
+func (l *LoadData) RowIter(ctx *sql.Context, row sql.Row) (sql.RowIter, error) {


You want to scan file lines one at a time as you load rows, not all at once. Otherwise we read the entire file before we begin making progress inserting rows, and we need the entire file in memory (which may be massive). So open the file and scanner in RowIter, then return a sql.RowIter implementation that reads lines off of it one at a time.

zachmu · 2021-03-02T00:02:17Z

sql/plan/load_data.go

+	if l.Lines != nil {
+		ll := l.Lines
+		if ll.StartingBy != nil {
+			linesStartingByDelim = string(ll.StartingBy.Val)


This is the wrong approach. These vary by statement, so make them fields on the struct, not global vars

zachmu

Needs a little more work, the tests are a bit lacking

zachmu · 2021-03-03T01:39:54Z

enginetest/load_queries.go

+	{
+		Name: "Basic load data with enclosed values.",
+		SetUpScript: []string{
+			fmt.Sprintf("create table %s(pk int primary key)", tableNameConst),


really no reason to have this const. Just hardcode it everywhere it's needed. Makes the tests easier to read

zachmu · 2021-03-03T01:40:14Z

enginetest/load_queries.go

+		},
+	},
+	{
+		Name: "Load data with csv with prefix.",


What about a test of loading data into a table that doesn't exist?

zachmu · 2021-03-03T01:40:45Z

server/handler.go

+				return err
+			}
+		}
+	default:


Still need to do this

zachmu · 2021-03-03T01:41:03Z

sql/numbertype.go

@@ -155,7 +156,7 @@ func (t numberTypeImpl) Compare(a interface{}, b interface{}) (int, error) {

 // Convert implements Type interface.
 func (t numberTypeImpl) Convert(v interface{}) (interface{}, error) {
-	if v == nil {
+	if v == nil || (reflect.ValueOf(v).Kind() == reflect.Ptr && reflect.ValueOf(v).IsNil()) {


Oof, this actually necessary?

zachmu · 2021-03-03T01:42:35Z

sql/plan/load_data.go

+}
+
+// updateParsingConsts parses the LoadData object to update the 5 constants that are used for file parsing.
+func (l *LoadData) updateParsingConsts() error {


This method name and comment are no longer accurate

zachmu · 2021-03-03T01:44:41Z

sql/plan/load_data.go

+}
+
+// parseLines finds the delim that terminates each line and returns the overall line.
+func (l LoadData) parseLines(scanner *bufio.Scanner) {


This is a bad name for this method.

Just make the splitFunc function top level, rename it splitLines, and call scanner.Split(splitLines) in RowIter

zachmu · 2021-03-03T01:46:18Z

sql/plan/load_data.go

+}
+
+func (l loadDataIter) Close(ctx *sql.Context) error {
+	if !l.scanner.Scan() {


Who cares if the scanner is done or not, this method has to close everything up

zachmu · 2021-03-03T01:48:30Z

sql/plan/load_data.go

+
+func addNullsToValues(exprs []sql.Expression, diff int) []sql.Expression {
+	for i := diff; i > 0; i-- {
+		exprs = append(exprs, expression.NewLiteral(nil, sql.Null))


Instead of this, just make the expr array the same size as the dest schema and the remaining entries will already be nil

zachmu · 2021-03-03T01:49:58Z

sql/plan/load_data.go

+	// create the values that are returned as a row iter.
+	var values [][]sql.Expression
+	values = append(values, exprs)
+	newValue := NewValues(values)


This creates a lot of garbage just to return a sql.Row. Just return the row directly

zachmu · 2021-03-03T01:50:24Z

sql/plan/load_data.go

+	colDiff := len(l.destination.Schema()) - len(exprs)
+
+	// append NULLS for the rest of the fields
+	exprs = addNullsToValues(exprs, colDiff)


I don't see a test of this -- all the test data files have the same number of columns as their target schema

…at the problem is. Looks like bytes to string in quotes chars is causinng problems

zachmu

LG, just a couple comments

zachmu · 2021-03-08T20:25:28Z

enginetest/script_queries.go

@@ -33,12 +33,16 @@ type ScriptTest struct {
 	Expected []sql.Row
 	// For tests that make a single assertion, ExpectedErr can be set for the expected error
 	ExpectedErr *errors.Kind
+	// For tests that need to be skipped
+	Skip bool


Don't do this, put tests that need to be skipped into their own blocks and call them from a different method (TestBrokenLoadFile)

zachmu · 2021-03-08T20:26:09Z

server/handler.go

+				return err
+			}
+		}
+	default:


Still need to do this. Just return an error if it's not local

zachmu · 2021-03-08T20:27:25Z

enginetest/enginetests.go

@@ -460,6 +460,22 @@ func TestInsertIntoErrors(t *testing.T, harness Harness) {
 	}
 }

+func TestLoadData(t *testing.T, harness Harness) {
+	for _, script := range LoadDataScripts {
+		if !script.Skip {


See comment below about the right way to do skipped tests.

Also, whenever you skip a test in go, you need to explicitly call t.skip(), not just fail to run it. Otherwise we have no visibility into how many tests are being skipped like this.

zachmu · 2021-03-08T20:31:17Z

enginetest/load_queries.go

+
+var LoadDataScripts = []ScriptTest{
+	{
+		Name: "Basic load data with enclosed values.",


Need tests for when the file and the target schema have different number of columns

VinaiRachakonda added 12 commits February 22, 2021 15:51

changes

2a42870

merge master

75b8474

I get how inserting works now LOL

a261dd2

Add functionality that integrates with temporary file

f9636f1

Allow the reading of a file

669391b

Additional changes

6323c26

basic parsing implementation

4072935

changes

0763d44

Additional changes

8537507

formatting

12ea3f3

Add comments and allow for nulls to be placed with columns

b4e5b3d

update test case

147165f

VinaiRachakonda changed the title ~~[WIP] Vinai/load data~~ Vinai/load data - Part 1 very basic functionality review Mar 1, 2021

zachmu reviewed Mar 1, 2021

View reviewed changes

VinaiRachakonda added 2 commits March 1, 2021 15:04

updates

bc9c1a2

partial inserts

f3bb9f2

zachmu reviewed Mar 2, 2021

View reviewed changes

VinaiRachakonda added 8 commits March 1, 2021 16:44

can handle partial case

6d4816a

works with iter now

89e0bac

nits

da428bc

nits

b69e32c

Merge branch 'master' into vinai/load-data

2f6feab

add to load qs to see status

0116884

try with binaries

a8f2b6f

cleanup

4e2145c

zachmu reviewed Mar 3, 2021

View reviewed changes

VinaiRachakonda added 4 commits March 3, 2021 17:40

pr feedback

02c2e1b

add some escaping behavior and skipped tests. Need to commit

2be9230

Having trouble with escaped experimenting with some changes to see wh…

8c82dbe

…at the problem is. Looks like bytes to string in quotes chars is causinng problems

Add session variables, pray this works on windows

eed8383

VinaiRachakonda added 4 commits March 4, 2021 17:05

Merge branch 'master' into vinai/load-data

e5ae5c8

test fix and formatting

bcbb5ed

add more tests

11ad282

Formatting

d0add13

zachmu approved these changes Mar 8, 2021

View reviewed changes

last min pr review

2153366

VinaiRachakonda merged commit 320ebfb into master Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vinai/load data - Part 1 very basic functionality review #316

Vinai/load data - Part 1 very basic functionality review #316

VinaiRachakonda commented Feb 26, 2021 •

edited

Loading

zachmu left a comment

zachmu Mar 1, 2021

zachmu Mar 1, 2021

zachmu Mar 3, 2021

zachmu Mar 8, 2021

zachmu Mar 1, 2021

zachmu Mar 1, 2021

zachmu left a comment

zachmu Mar 1, 2021

zachmu Mar 1, 2021

zachmu Mar 1, 2021 •

edited by VinaiRachakonda

Loading

zachmu Mar 1, 2021

zachmu Mar 1, 2021 •

edited by VinaiRachakonda

Loading

zachmu Mar 2, 2021

zachmu Mar 2, 2021

zachmu left a comment

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu Mar 3, 2021

zachmu left a comment

zachmu Mar 8, 2021

zachmu Mar 8, 2021

zachmu Mar 8, 2021

zachmu Mar 8, 2021

Vinai/load data - Part 1 very basic functionality review #316

Vinai/load data - Part 1 very basic functionality review #316

Conversation

VinaiRachakonda commented Feb 26, 2021 • edited Loading

zachmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachmu Mar 1, 2021 • edited by VinaiRachakonda Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachmu Mar 1, 2021 • edited by VinaiRachakonda Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VinaiRachakonda commented Feb 26, 2021 •

edited

Loading

zachmu Mar 1, 2021 •

edited by VinaiRachakonda

Loading

zachmu Mar 1, 2021 •

edited by VinaiRachakonda

Loading