Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement #8

edwardsdm · 2024-07-10T20:44:57Z

The information below has been share with the Databricks UC engineering team. They suggested the issue is likely with the DuckDB UC extension.

Credential serving is enabled on the Databricks instance the code is connecting to.

Duckdb CLI installed on MacOS - Retry SHOW TABLES query

R - Retry SHOW TABLES query

Python - Retry SHOW TABLES query

Additionally even after a successful SHOW TABLES query no tables are found and the table queried successfully in R cannot be queried using Python.

bschulth · 2024-07-11T01:14:47Z

At least part of the problem appears to be that the extension is trying to convert column types to an internal type:

uc_catalog/src/uc_utils.cpp

Lines 32 to 144 in 71e18a9

    
           if (type_text == "tinyint") { 
        
           	return LogicalType::TINYINT; 
        
           } else if (type_text == "smallint") { 
        
           	return LogicalType::SMALLINT; 
        
           } else if (type_text == "bigint") { 
        
           	return LogicalType::BIGINT; 
        
           } else if (type_text == "int") { 
        
           	return LogicalType::INTEGER; 
        
           } else if (type_text == "long") { 
        
           	return LogicalType::BIGINT; 
        
           } else if (type_text == "string") { 
        
           	return LogicalType::VARCHAR; 
        
           } else if (type_text == "double") { 
        
           	return LogicalType::DOUBLE; 
        
           } else if (type_text == "float") { 
        
           	return LogicalType::FLOAT; 
        
           } else if (type_text == "boolean") { 
        
           	return LogicalType::BOOLEAN; 
        
           } else if (type_text == "timestamp") { 
        
           	return LogicalType::TIMESTAMP; 
        
           } else if (type_text == "binary") { 
        
           	return LogicalType::BLOB; 
        
           } else if (type_text == "date") { 
        
           	return LogicalType::DATE; 
        
           } else if (type_text == "timestamp") { 
        
           	return LogicalType::TIMESTAMP; // TODO: Is this the right timestamp 
        
           } else if (type_text.find("decimal(") == 0) { 
        
           	size_t spec_end = type_text.find(')'); 
        
           	if (spec_end != string::npos) { 
        
           		size_t sep = type_text.find(','); 
        
           		auto prec_str = type_text.substr(8, sep - 8); 
        
           		auto scale_str = type_text.substr(sep + 1, spec_end - sep - 1); 
        
           		uint8_t prec = Cast::Operation<string_t, uint8_t>(prec_str); 
        
           		uint8_t scale = Cast::Operation<string_t, uint8_t>(scale_str); 
        
           		return LogicalType::DECIMAL(prec, scale); 
        
           	} 
        
           } else if (type_text.find("array<") == 0) { 
        
           	size_t type_end = type_text.rfind('>'); // find last, to deal with nested 
        
           	if (type_end != string::npos) { 
        
           		auto child_type_str = type_text.substr(6, type_end - 6); 
        
           		auto child_type = UCUtils::TypeToLogicalType(context, child_type_str); 
        
           		return LogicalType::LIST(child_type); 
        
           	} 
        
           } else if (type_text.find("map<") == 0) { 
        
           	size_t type_end = type_text.rfind('>'); // find last, to deal with nested 
        
           	if (type_end != string::npos) { 
        
           		// TODO: Factor this and struct parsing into an iterator over ',' separated values 
        
           		vector<LogicalType> key_val; 
        
           		size_t cur = 4; 
        
           		auto nested_opens = 0; 
        
           		for (;;) { 
        
           			size_t next_sep = cur; 
        
           			// find the location of the next ',' ignoring nested commas 
        
           			while (type_text[next_sep] != ',' || nested_opens > 0) { 
        
           				if (type_text[next_sep] == '<') { 
        
           					nested_opens++; 
        
           				} else if (type_text[next_sep] == '>') { 
        
           					nested_opens--; 
        
           				} 
        
           				next_sep++; 
        
           				if (next_sep == type_end) { 
        
           					break; 
        
           				} 
        
           			} 
        
           			auto child_str = type_text.substr(cur, next_sep - cur); 
        
           			auto child_type = UCUtils::TypeToLogicalType(context, child_str); 
        
           			key_val.push_back(child_type); 
        
           			if (next_sep == type_end) { 
        
           				break; 
        
           			} 
        
           			cur = next_sep + 1; 
        
           		} 
        
           		if (key_val.size() != 2) { 
        
           			throw NotImplementedException("Invalid map specification with %i types", key_val.size()); 
        
           		} 
        
           		return LogicalType::MAP(key_val[0], key_val[1]); 
        
           	} 
        
           } else if (type_text.find("struct<") == 0) { 
        
           	size_t type_end = type_text.rfind('>'); // find last, to deal with nested 
        
           	if (type_end != string::npos) { 
        
           		child_list_t<LogicalType> children; 
        
           		size_t cur = 7; 
        
           		auto nested_opens = 0; 
        
           		for (;;) { 
        
           			size_t next_sep = cur; 
        
           			// find the location of the next ',' ignoring nested commas 
        
           			while (type_text[next_sep] != ',' || nested_opens > 0) { 
        
           				if (type_text[next_sep] == '<') { 
        
           					nested_opens++; 
        
           				} else if (type_text[next_sep] == '>') { 
        
           					nested_opens--; 
        
           				} 
        
           				next_sep++; 
        
           				if (next_sep == type_end) { 
        
           					break; 
        
           				} 
        
           			} 
        
           			auto child_str = type_text.substr(cur, next_sep - cur); 
        
           			size_t type_sep = child_str.find(':'); 
        
           			if (type_sep == string::npos) { 
        
           				throw NotImplementedException("Invalid struct child type specifier: %s", child_str); 
        
           			} 
        
           			auto child_name = child_str.substr(0, type_sep); 
        
           			auto child_type = UCUtils::TypeToLogicalType(context, child_str.substr(type_sep + 1, string::npos)); 
        
           			children.push_back({child_name, child_type}); 
        
           			if (next_sep == type_end) { 
        
           				break; 
        
           			} 
        
           			cur = next_sep + 1; 
        
           		} 
        
           		return LogicalType::STRUCT(children); 
        
           	} 
        
           }

But it does not have a mapping for "varchar(xxxx)", so it throws an exception:

uc_catalog/src/uc_utils.cpp

Line 146 in 71e18a9

    
           throw NotImplementedException("Tried to fallback to unknown type for '%s'", type_text);

Scanning our (databricks) system.information_schema.columns for unique full_data_types

select distinct data_type, full_data_type from system.information_schema.columns where data_type not in ('DECIMAL', 'STRUCT', 'ARRAY', 'MAP') order by data_type, full_data_type

It looks like you might also need to map:

varchar(xxxx)
char(xxx)

Maybe also the 'unsized' versions for extra diligence?

char
varchar

samansmink · 2024-07-31T08:31:58Z

Hi @edwardsdm, thanks for reporting this. We will take a look at this at some point. Currently we have prioritized the development of the Delta extension over the uc_catalog extension so I can not give a timeline yet!

riot-lbrandi · 2024-10-17T02:34:23Z

Hey team, this issue is also happening to me and I could reproduce exactly what was described here. I can read the metadata for all tables normally after the 3rd attempt, WHICH IS SUPER COOL. But I can't query anything due to permission errors on S3, in my org we need to query that through UC and not using the S3 path directly (that I believe is what is happening under the hood with the Delta extension).

bschulth · 2024-11-19T21:56:07Z

@samansmink

Adding these 4 lines seems generally to fix the issues in our environment:

	} else if (type_text.find("varchar") == 0) {
	    return LogicalType::VARCHAR;
	} else if (type_text.find("char") == 0) {
	    return LogicalType::VARCHAR;

before:
- https://github.com/duckdb/uc_catalog/blob/main/src/uc_utils.cpp#L58

I built the plugin locally with these changes and tested it in our (@edwardsdm and me) environment.

edwardsdm changed the title ~~Querying Unity Catalog with R & Python DuckDB packages requires retrying the first SQL statement~~ Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement Jul 10, 2024

gdubya mentioned this issue Sep 26, 2024

Azure support? #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement #8

Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement #8

edwardsdm commented Jul 10, 2024 •

edited

Loading

bschulth commented Jul 11, 2024 •

edited

Loading

samansmink commented Jul 31, 2024

riot-lbrandi commented Oct 17, 2024

bschulth commented Nov 19, 2024 •

edited

Loading

Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement #8

Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement #8

Comments

edwardsdm commented Jul 10, 2024 • edited Loading

Duckdb CLI installed on MacOS - Retry SHOW TABLES query

R - Retry SHOW TABLES query

Python - Retry SHOW TABLES query

bschulth commented Jul 11, 2024 • edited Loading

samansmink commented Jul 31, 2024

riot-lbrandi commented Oct 17, 2024

bschulth commented Nov 19, 2024 • edited Loading

edwardsdm commented Jul 10, 2024 •

edited

Loading

bschulth commented Jul 11, 2024 •

edited

Loading

bschulth commented Nov 19, 2024 •

edited

Loading