-
Notifications
You must be signed in to change notification settings - Fork 623
[15721] Index Suggestion #1347
base: master
Are you sure you want to change the base?
[15721] Index Suggestion #1347
Conversation
1. Add test file in brain for what-if API. 2. Implement a basic test to insert some tuples and hypothetical indexes and get the cost. (Not working)
catalog cache eviction is not done properly.
# Conflicts: # src/brain/what_if_index.cpp
warning: the specified comparator type does not provide a const call operator [-Wuser-defined-warnings]
DEFUALT_SCHEMA_NAME can't be found error. Fix this when merging with master.
DEFUALT_SCHEMA_NAME can't be found error. Fix this when merging with master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the code quality is good! The documentation is very good. I left some comments to fix. Besides the comments, there're also two things:
- I didn't check all the files, but it looks like you didn't use forward-declaration to reduce the dependency. You should check where you're only using pointers in the .h file and forward-declare the classes and move the includes to the .cpp file as much as possible.
- Some tests on Jenkins are failing. PLease fix them as well.
} | ||
} | ||
|
||
Workload *w; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this w used?
// Column is a table column name. | ||
// 2. GROUP BY (if present) | ||
// 3. ORDER BY (if present) | ||
// 4. all updated columns for UPDATE query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only get the columns in the where clause of the UPDATE query, not all updated columns for UPDATE query, right? The code looks correct, but the comment looks wrong?
auto result = std::hash<std::string>()(key.second->GetInfo()); | ||
for (auto index : indexes) { | ||
// TODO[Siva]: Use IndexObjectHasher to hash this | ||
result ^= std::hash<std::string>()(index->ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to fix this now.
// The mapping from the object to the shared pointer | ||
std::unordered_map<HypotheticalIndexObject, | ||
std::shared_ptr<HypotheticalIndexObject>, | ||
IndexObjectHasher> map_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we decide to do it this way instead of directly storing a set of HypotheticalIndexObject
s? Is it for efficiency consideration?
} | ||
|
||
PELOTON_ASSERT(index->column_oids.size() > 0); | ||
auto response = request.send().wait(client.getWaitScope()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check the response and through some warning if it does not succeed?
// Get index objects | ||
bool InsertIndexObject(std::shared_ptr<IndexCatalogObject> index_object); | ||
bool EvictIndexObject(oid_t index_oid); | ||
bool EvictIndexObject(const std::string &index_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again. These should be protected and only used by what-if API through friend class.
@@ -79,6 +67,9 @@ class TableCatalogObject { | |||
inline oid_t GetDatabaseOid() { return database_oid; } | |||
inline uint32_t GetVersionId() { return version_id; } | |||
|
|||
// NOTE: should be only used by What-if API. | |||
void SetValidIndexObjects(bool is_valid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected
.
// try get from cache | ||
auto pg_table = Catalog::GetInstance() | ||
->GetSystemCatalogs(database_oid) | ||
->GetTableCatalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct way to do this. We should not directly access the catalog instance. We should get the database catalog object from txn->catalog_cache_
, then get table objects, and then index objects. Everything should go through the local catalog cache of the transaction but no the global catalog instance.
|
||
// TODO: Avoid using this function. | ||
// Copied from SQL testing util. | ||
// Execute a SQL query end-to-end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a hack.... Is there a way to fix this?
for (size_t idx = 0; idx < key_attr_list.size(); ++idx) { | ||
// If index cannot further reduce scan range, break | ||
if (idx == op->key_column_id_list.size() || | ||
key_attr_list[idx] != op->key_column_id_list[idx]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenboy It looks like this thing requires the key_column_id_list
has exactly the same order as key_attr_list
? But in fact, an index with col(a, b, c) can also help a query with predicates(b=2 AND a=1 AND c=3), right?
This PR implements a search based algorithm for suggesting the best set of indexes for a given workload based on the AutoAdmin paper from Microsoft.
The main workflow and classes are given below:
The helper classes and methods for our algorithm are defined in brain/index_selection_util[.h/.c]. This includes the
IndexSelectionKnobs
(tunable knobs of the algorithm),HypotheticalIndexObject
(a hypothetical index class),IndexConfiguration
(the set of hypothetical indexes). Multiple helper methods and overloaded operators forIndexConfiguration
are defined hereIndexSelection
is the main wrapper around our tool which takes in a workload (set of queries) and the tunable parameters of the algorithm. It returns the best index configuration through the main external APIGetBestIndexes
The methods for running the search based algorithm are present in brain/index_selection[.h/.c]. The three modules which are used for each search iteration are:
GenerateCandidateIndexes
generates the admissible indexes (indexes that benefit at least one query in the workload) from the provided queries and prune the useless onesEnumerate
that gets the top k indexes for the workload which would reduce the cost of executing them through a combination of exhaustive search (ExhaustiveEnumeration
) and greedy search (GreedySearch
)GenerateMultiColumnIndexes
generates multi-column indexes from the single column indexes by doing a cross product and adds it into the result which will be used for the next iteration.The
WhatIfIndex
in brain/what_if_index[.h/.c] returns best physical plan tree and the cost associated with a hypothetical index configuration. This is possible with the set of appropriate changes made to optimizer/optimizer[.h/.c]The
IndexSelectionContext
in brain/index_selection_context[.h/.c] memoizes the cost of the query for a given configuration to reduce the number of calls sent to WhatIfAPIIntegrating this module to work with the self-driving infrastructure of Peloton (Brain) is a work in progress