Conversation
apavlo
left a comment
There was a problem hiding this comment.
Thanks for doing this. Changes are requested.
One potential problem with this PR is that we are switching catalog objects to be called 'entries', but then we are using object identifiers (oid_t) to reference them.
| // using catalog object to retrieve meta-data | ||
| auto table_object = catalog::Catalog::GetInstance()->GetTableObject( | ||
| db_name, schema_name, table_name, txn); | ||
| auto table_object = catalog::Catalog::GetInstance()->GetTableObject(txn, |
There was a problem hiding this comment.
We should probably rename this as GetTableEntry too.
There was a problem hiding this comment.
I thought about this and decided to leave this as is for the following reason. A "Table" is a CatalogEntry inside the TableCatalog, and we are getting a "Table" outside of it, which makes sense. Naming this GetTableObject instead of GetTable makes this clear that we are getting a bunch of information we have on the table (the table "object" in a system), and not the contents of the table itself. (This would make even more sense if we have a glossary somewhere explaining this and use it as a naming convention.) "GetTableEntry" has the same confusing double meaning, and "GetTableCatalogEntry" is not ambiguous but long and doesn't make much sense without looking at the return type name. So the naming here is fine, but the typename needs to renamed because the type "TableObject" wouldn't make sense on its own.
Let me know what you think.
There was a problem hiding this comment.
We should vote on this. I think Get*CatalogEntry would be best.
There was a problem hiding this comment.
I vote Get*CatalogEntry. I think the accuracy of function name is more important than length. And I think this name is not so long.
There was a problem hiding this comment.
I'd go with apavlo, ksaito7 and vote for ... Entry.
There was a problem hiding this comment.
Fine, I'll rename them to GetxxxCatalogEntry
| Catalog::GetInstance()->CreateTable( | ||
| catalog_database_name, catalog_schema_name, catalog_table_name, | ||
| std::unique_ptr<catalog::Schema>(catalog_table_schema), txn, true); | ||
| Catalog::GetInstance()->CreateTable(txn, |
There was a problem hiding this comment.
Unless this function also creates the DataTable object (which it shouldn't), maybe we should rename this to CreateTableEntry.
There was a problem hiding this comment.
Yes. Sorry I missed this. I will rename this to TableObject or TableEntry depending on what you think makes sense for the above comment.
There was a problem hiding this comment.
Ironically, I think it does create the DataTable object (Not that it should)
| storage::Database *pg_catalog = nullptr, | ||
| type::AbstractPool *pool = nullptr, | ||
| concurrency::TransactionContext *txn = nullptr); | ||
| static DatabaseCatalog *GetInstance(concurrency::TransactionContext *txn = nullptr, |
There was a problem hiding this comment.
Can we remove the default values?
There was a problem hiding this comment.
I tried but CLion's refactor feature really doesn't like our code and couldn't handle it. We can add another issue so somebody who has time to go through these by hand can do so in the future. (I will add this under #1398 later)
| // Insert peloton database into pg_database | ||
| DatabaseCatalog::GetInstance()->InsertDatabase( | ||
| CATALOG_DATABASE_OID, CATALOG_DATABASE_NAME, pool_.get(), txn); | ||
| DatabaseCatalog::GetInstance(nullptr, nullptr, nullptr)->InsertDatabase(txn, |
There was a problem hiding this comment.
Why pass a null txn pointer to DatabaseCatalog::GetInstance() when you actually have the txn pointer?
There was a problem hiding this comment.
Oops, that CLion refactor thing I was talking about.
|
|
||
| auto database_object = | ||
| DatabaseCatalog::GetInstance()->GetDatabaseObject(database_name, txn); | ||
| DatabaseCatalog::GetInstance(nullptr, |
There was a problem hiding this comment.
Same here. You actually have the txn pointer.
| const type::TypeId return_type, oid_t prolang, const std::string &func_src, | ||
| std::shared_ptr<peloton::codegen::CodeContext> code_context, | ||
| concurrency::TransactionContext *txn) { | ||
| void Catalog::AddPlpgsqlFunction(concurrency::TransactionContext *txn, |
There was a problem hiding this comment.
I don't think that this should be called AddPlpgsqlFunction. You are passing in the prolang argument, so it should just be called AddFunction, right? Furthermore, we are are referring to UDFs as procedures (i.e., pg_proc table), so it really should be called AddProcedure.
There was a problem hiding this comment.
I had no idea what this is supposed to do. Will change.
| // add "internal" language | ||
| if (!LanguageCatalog::GetInstance().InsertLanguage("internal", pool_.get(), | ||
| txn)) { | ||
| if (!LanguageCatalog::GetInstance().InsertLanguage(txn, |
There was a problem hiding this comment.
This is not your problem, but we should not be initializing the language table in this function.
There was a problem hiding this comment.
Can you make an issue for this so that we don't forget it?
There was a problem hiding this comment.
First, let me say that this looks good. The changes in formatting help readability significantly.
Have added comments, but the things I feel could use some additional attention (mostly pre-existing):
-
Exceptions vs. PELOTON_ASSERT. Haven't analyzed the code, but it looks to me as if there are quite a few locations where they should be asserts. We are trying to enforce an internal requirement, it isn't a recoverable run-time error.
-
LOG_DEBUG. Unnecessary LOG_DEBUG where it should probably be LOG_TRACE. We should have a discussion about debug logging sometime, because the noise level, when one turns on tracing, makes it almost useless. To improve that situation, I think LOG_DEBUG should be used sparingly.
-
Use of the Proc abbreviation in function / class names. I think this should be more explicitly Procedure. While the it is mostly clear that it is Procedure and not Process, we should just be explicit.
| concurrency::TransactionContext *txn, | ||
| expression::AbstractExpression *predicate, | ||
| std::vector<oid_t> column_offsets) { | ||
| if (txn == nullptr) throw CatalogException("Scan table requires transaction"); |
There was a problem hiding this comment.
CatalogException vs. PELOTON_ASSERT.
Should this be an assert? Is it ever legitimate to call this function without a transaction? If not, it should be an ASSERT.
| std::vector<type::Value> scan_values, | ||
| std::vector<oid_t> update_columns, | ||
| std::vector<type::Value> update_values) { | ||
| if (txn == nullptr) throw CatalogException("Scan table requires transaction"); |
There was a problem hiding this comment.
ditto for exception vs. PELOTON_ASSERT comment above.
| concurrency::TransactionContext *txn) { | ||
| ResultType Catalog::CreateSchema(concurrency::TransactionContext *txn, | ||
| const std::string &database_name, | ||
| const std::string &schema_name) { | ||
| if (txn == nullptr) |
There was a problem hiding this comment.
Exception vs. PELOTON_ASSERT
| index_name, | ||
| {column_id}, | ||
| true, | ||
| IndexType::BWTREE); | ||
| LOG_DEBUG("Added a UNIQUE index on %s in %s.", col_name.c_str(), |
| // Check if UDF already exists | ||
| auto proc_catalog_obj = | ||
| ProcCatalog::GetInstance().GetProcByName(name, argument_types, txn); | ||
| ProcCatalog::GetInstance().GetProcByName(txn, name, argument_types); |
There was a problem hiding this comment.
I think it would be clearer and more consistent to not use Proc. So rename to Procedure
ProcedureCatalog
GetProcedureByName
InsertProcedure
etc.
The local variables IMO can stay as is, the class names and class methods though, should change.
| concurrency::TransactionContext *txn) { | ||
| void SystemCatalogs::Bootstrap(concurrency::TransactionContext *txn, | ||
| const std::string &database_name) { | ||
| LOG_DEBUG("Bootstrapping database: %s", database_name.c_str()); |
| pg_trigger_ = new TriggerCatalog(txn, database_name); | ||
| } | ||
|
|
||
| // if (!pg_proc) { |
There was a problem hiding this comment.
If this is dead code, remove?
|
|
||
| // Maximum column name size for catalog schemas | ||
| static const size_t max_name_size = 64; | ||
| static const size_t max_name_size_ = 64; |
There was a problem hiding this comment.
Since we are going to change it ... lets replace with a more descriptive name. e.g. max_column_name_size_
max_name_size is very generic, could be any name.
|
|
||
| enum ColumnId { | ||
| OID = 0, | ||
| LANNAME = 1, |
There was a problem hiding this comment.
Poor abbreviation. Should be LANG or LANGUAGE
| // using catalog object to retrieve meta-data | ||
| auto table_object = catalog::Catalog::GetInstance()->GetTableObject( | ||
| db_name, schema_name, table_name, txn); | ||
| auto table_object = catalog::Catalog::GetInstance()->GetTableObject(txn, |
There was a problem hiding this comment.
I'd go with apavlo, ksaito7 and vote for ... Entry.
pervazea
left a comment
There was a problem hiding this comment.
As per conversation with Tian Yu, naming changes done, exception/assert and logging added to follow on issue.
* Catalog code cleanup * Rename "XXXObject" to "CatalogEntry" * Rename AddPlpgsqlFunction
Addresses issue #1398.
This PR fixes 1, 2 and renames CatalogObjects to Entries. Other minor code style fixes are included as well.