[15721] Index Suggestion #1347

sivaprasadsudhir · 2018-05-05T05:05:19Z

This PR implements a search based algorithm for suggesting the best set of indexes for a given workload based on the AutoAdmin paper from Microsoft.

The main workflow and classes are given below:

The helper classes and methods for our algorithm are defined in brain/index_selection_util[.h/.c]. This includes the IndexSelectionKnobs (tunable knobs of the algorithm), HypotheticalIndexObject (a hypothetical index class), IndexConfiguration (the set of hypothetical indexes). Multiple helper methods and overloaded operators for IndexConfiguration are defined here
IndexSelection is the main wrapper around our tool which takes in a workload (set of queries) and the tunable parameters of the algorithm. It returns the best index configuration through the main external API GetBestIndexes
The methods for running the search based algorithm are present in brain/index_selection[.h/.c]. The three modules which are used for each search iteration are:
- GenerateCandidateIndexes generates the admissible indexes (indexes that benefit at least one query in the workload) from the provided queries and prune the useless ones
- Enumerate that gets the top k indexes for the workload which would reduce the cost of executing them through a combination of exhaustive search (ExhaustiveEnumeration) and greedy search (GreedySearch)
- GenerateMultiColumnIndexes generates multi-column indexes from the single column indexes by doing a cross product and adds it into the result which will be used for the next iteration.
The WhatIfIndex in brain/what_if_index[.h/.c] returns best physical plan tree and the cost associated with a hypothetical index configuration. This is possible with the set of appropriate changes made to optimizer/optimizer[.h/.c]
The IndexSelectionContext in brain/index_selection_context[.h/.c] memoizes the cost of the query for a given configuration to reduce the number of calls sent to WhatIfAPI
Integrating this module to work with the self-driving infrastructure of Peloton (Brain) is a work in progress

1. Add test file in brain for what-if API. 2. Implement a basic test to insert some tuples and hypothetical indexes and get the cost. (Not working)

catalog cache eviction is not done properly.

…ments

# Conflicts: # src/brain/what_if_index.cpp

warning: the specified comparator type does not provide a const call operator [-Wuser-defined-warnings]

DEFUALT_SCHEMA_NAME can't be found error. Fix this when merging with master.

linmagit

Overall the code quality is good! The documentation is very good. I left some comments to fix. Besides the comments, there're also two things:

I didn't check all the files, but it looks like you didn't use forward-declaration to reduce the dependency. You should check where you're only using pointers in the .h file and forward-declare the classes and move the includes to the .cpp file as much as possible.
Some tests on Jenkins are failing. PLease fix them as well.

linmagit · 2018-05-11T21:25:53Z

src/include/brain/index_selection.h

+    }
+  }
+
+  Workload *w;


Where is this w used?

linmagit · 2018-05-13T05:48:27Z

src/brain/index_selection.cpp

+  //    Column is a table column name.
+  // 2. GROUP BY (if present)
+  // 3. ORDER BY (if present)
+  // 4. all updated columns for UPDATE query.


I think we should only get the columns in the where clause of the UPDATE query, not all updated columns for UPDATE query, right? The code looks correct, but the comment looks wrong?

linmagit · 2018-05-13T05:55:46Z

src/include/brain/index_selection_context.h

+    auto result = std::hash<std::string>()(key.second->GetInfo());
+    for (auto index : indexes) {
+      // TODO[Siva]: Use IndexObjectHasher to hash this
+      result ^= std::hash<std::string>()(index->ToString());


You probably want to fix this now.

linmagit · 2018-05-13T18:13:23Z

src/include/brain/index_selection_util.h

+  // The mapping from the object to the shared pointer
+  std::unordered_map<HypotheticalIndexObject,
+                     std::shared_ptr<HypotheticalIndexObject>,
+                     IndexObjectHasher> map_;


Why did we decide to do it this way instead of directly storing a set of HypotheticalIndexObjects? Is it for efficiency consideration?

linmagit · 2018-05-18T19:09:14Z

src/brain/index_selection_job.cpp

+  }
+
+  PELOTON_ASSERT(index->column_oids.size() > 0);
+  auto response = request.send().wait(client.getWaitScope());


Can you check the response and through some warning if it does not succeed?

linmagit · 2018-05-18T19:59:14Z

src/include/catalog/table_catalog.h

+  // Get index objects
+  bool InsertIndexObject(std::shared_ptr<IndexCatalogObject> index_object);
+  bool EvictIndexObject(oid_t index_oid);
+  bool EvictIndexObject(const std::string &index_name);


Again. These should be protected and only used by what-if API through friend class.

linmagit · 2018-05-18T19:59:26Z

src/include/catalog/table_catalog.h

@@ -79,6 +67,9 @@ class TableCatalogObject {
  inline oid_t GetDatabaseOid() { return database_oid; }
  inline uint32_t GetVersionId() { return version_id; }

+  // NOTE: should be only used by What-if API.
+  void SetValidIndexObjects(bool is_valid);


protected.

linmagit · 2018-05-18T20:06:48Z

src/catalog/index_catalog.cpp

+  // try get from cache
+  auto pg_table = Catalog::GetInstance()
+                      ->GetSystemCatalogs(database_oid)
+                      ->GetTableCatalog();


This is not the correct way to do this. We should not directly access the catalog instance. We should get the database catalog object from txn->catalog_cache_, then get table objects, and then index objects. Everything should go through the local catalog cache of the transaction but no the global catalog instance.

linmagit · 2018-05-18T20:09:47Z

src/include/network/peloton_rpc_handler_task.h

+
+  // TODO: Avoid using this function.
+  // Copied from SQL testing util.
+  // Execute a SQL query end-to-end


Yeah this is a hack.... Is there a way to fix this?

linmagit · 2018-05-18T20:27:29Z

src/optimizer/cost_calculator.cpp

+  for (size_t idx = 0; idx < key_attr_list.size(); ++idx) {
+    // If index cannot further reduce scan range, break
+    if (idx == op->key_column_id_list.size() ||
+        key_attr_list[idx] != op->key_column_id_list[idx]) {


@chenboy It looks like this thing requires the key_column_id_list has exactly the same order as key_attr_list? But in fact, an index with col(a, b, c) can also help a query with predicates(b=2 AND a=1 AND c=3), right?

pbollimp and others added 30 commits May 5, 2018 16:43

added the files for cost evaluation

d18033d

llvm for mac

5fdadea

Basic classes

ec6c94b

added the configuration enumeration files

492b95f

Add Whatif API

8410136

Add optimizer cost query func skeleton

96eadf4

Complete what if API implementation. Testing pending.

9087931

1. Add test file in brain for what-if API. 2. Implement a basic test to insert some tuples and hypothetical indexes and get the cost. (Not working)

Ignore query planning

0908588

Analyze tables was missing. Fixed it

5e2cbff

fix the query

fcfe058

add comments, fix some code style

04e49f8

Fix whatif API test

d62462b

run formatter

2e19c1c

Add index selection module skeleton

ac653aa

skeleton for admissible column parsing

4d44009

adding cost model classes

371fd38

cleanup and reorganize the code

c23cc36

Intermediate changes. Query parser not complete.

4d694ec

Intermediate changes. Query parser not complete.

a51fe84

removed cost model class

d043128

Add IndexObject Pool

32f9040

Memoization support completed

324e430

Complete query parser

5978d32

Complete query parser

a24ded7

multi column index, wip

11bc159

Add tests for admissible indexes

e0cac79

Fix what if index and admissive indexes test

83c1b44

added outline for naive enumeration method

1e5925c

Fix get admissible indexes test

4b463dc

Fix get admissible indexes test

96a41b1

vkonagar and others added 29 commits May 12, 2018 13:23

Fix the AnalyzeStats crash

51f5a1a

Fix: Index Selection returns empty set because the

5c322c1

catalog cache eviction is not done properly.

Fix a bug during where clause parsing to make it work with TPCC

3ef9128

Fix the compilation error

146100d

Address some of the code review comments

d250fbe

Fix create/drop index -- running TPCC

3230ec3

Fix analyze stats crash. Fix query history logging for PREPARED state…

dc424ea

…ments

Change knobs

43b742b

More misc

c422a63

addressing commits

27a0df0

Restructure code

a06189a

Reformat code

332543f

small correction to make it compile in debug mode

9d0a005

remove the unnecessary commented parts of test and code

11d2f3e

Restructure code, fix nits

59ee8d3

remove #define

6817300

Merge remote-tracking branch 'origin/auto_index' into auto_index

3546f6a

Restructure code

e2e4578

Run formatter

4f48831

fix errors for compilation in debug mode

4dc06ac

Merge remote-tracking branch 'origin/auto_index' into auto_index

65d5a06

# Conflicts: # src/brain/what_if_index.cpp

fix query logger test

480ae4d

trying to pass the compilation on travis

81420e7

change debug logging to trace level logging

28483e5

Fix warning in IndexConfigComparator

e1bd8ba

warning: the specified comparator type does not provide a const call operator [-Wuser-defined-warnings]

trace-->debug

f8e6eda

Hack to make travis pass the build.

597e798

DEFUALT_SCHEMA_NAME can't be found error. Fix this when merging with master.

Hack to make travis pass the build.

b99312a

DEFUALT_SCHEMA_NAME can't be found error. Fix this when merging with master.

remove multiple of unnecessary debug statements

50db015

linmagit suggested changes May 18, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[15721] Index Suggestion #1347

[15721] Index Suggestion #1347

sivaprasadsudhir commented May 5, 2018 •

edited by pbollimp

Loading

linmagit left a comment

linmagit May 11, 2018

linmagit May 13, 2018

linmagit May 13, 2018

linmagit May 13, 2018

linmagit May 18, 2018

linmagit May 18, 2018

linmagit May 18, 2018

linmagit May 18, 2018

linmagit May 18, 2018

linmagit May 18, 2018

[15721] Index Suggestion #1347

Are you sure you want to change the base?

[15721] Index Suggestion #1347

Conversation

sivaprasadsudhir commented May 5, 2018 • edited by pbollimp Loading

linmagit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sivaprasadsudhir commented May 5, 2018 •

edited by pbollimp

Loading