[WIP DNR]fix(connector): Return fresh table metadata when listing tables after collection rename in Mongodb#27383
Conversation
Reviewer's GuideMongoDB connector now returns only live collections when listing tables by reconciling actual Mongo collections with _schema metadata, cleaning up stale metadata (and cache) on the fly, and adds in‑memory MongoDB integration tests to cover table renames and stale metadata scenarios. Sequence diagram for MongoDB table listing and stale metadata cleanupsequenceDiagram
actor User
participant PrestoEngine
participant MongoConnector
participant MongoSession
participant MongoDB
User->>PrestoEngine: SHOW TABLES IN mongodb.schema
PrestoEngine->>MongoConnector: listTables(schema)
MongoConnector->>MongoSession: getAllTables(schema)
MongoSession->>MongoDB: listCollectionNames(schema)
MongoDB-->>MongoSession: collectionNames
MongoSession->>MongoSession: filter out schemaCollection and SYSTEM_TABLES
MongoSession->>MongoSession: actualCollections = filtered names
MongoSession->>MongoSession: metadataTableNames = getTableMetadataNames(schema)
MongoSession->>MongoSession: staleMetadataTableNames = metadataTableNames - actualCollections
loop for each staleTableName
MongoSession->>MongoSession: schemaTableName = new SchemaTableName(schema, staleTableName)
MongoSession->>MongoSession: deleteTableMetadata(schemaTableName)
MongoSession->>MongoSession: tableCache.invalidate(schemaTableName)
end
MongoSession-->>MongoConnector: actualCollections
MongoConnector-->>PrestoEngine: actualCollections
PrestoEngine-->>User: table list with only live collections
Entity relationship diagram for MongoDB collections and _schema metadataerDiagram
SCHEMA ||--o{ COLLECTION : has
SCHEMA ||--o{ SCHEMA_METADATA : has
SCHEMA {
string name
}
COLLECTION {
string name
}
SCHEMA_METADATA {
string table_name
string other_metadata
}
COLLECTION ||--o| SCHEMA_METADATA : is_backed_by
%% Stale metadata is when SCHEMA_METADATA.table_name has no matching COLLECTION.name
Class diagram for updated MongoSession metadata handlingclassDiagram
class MongoSession {
- MongoClient client
- LoadingCache tableCache
- String schemaCollection
- Set SYSTEM_TABLES
+ List getAllSchemas()
+ Set getAllTables(String schema)
+ MongoTable getTable(SchemaTableName tableName)
+ Document getTableMetadata(SchemaTableName schemaTableName)
- Set getTableMetadataNames(String schema)
- void deleteTableMetadata(SchemaTableName schemaTableName)
}
class MongoClient {
+ MongoDatabase getDatabase(String schema)
}
class MongoDatabase {
+ MongoIterable listCollectionNames()
}
class LoadingCache {
+ void invalidate(SchemaTableName schemaTableName)
}
class SchemaTableName {
+ SchemaTableName(String schemaName, String tableName)
+ String getSchemaName()
+ String getTableName()
}
class MongoTable
class Document
class MongoIterable
MongoSession --> MongoClient : uses
MongoSession --> LoadingCache : uses
MongoSession --> SchemaTableName : constructs
MongoSession --> MongoTable : returns
MongoSession --> Document : returns
MongoClient --> MongoDatabase : returns
MongoDatabase --> MongoIterable : returns
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
a1f5a34 to
cf44648
Compare
|
@steveburnett Can you help me to fix the release note? |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Changing
getAllTablesto return onlyactualCollectionsinstead of the union withgetTableMetadataNamesmay drop support for tables that exist only in metadata (e.g., pre-created schema entries); consider whether this behavior change is intentional and, if so, document or guard against cases where metadata-only tables are still needed. - Having
getAllTablesperform side effects like deleting metadata and invalidatingtableCachecan be surprising for a read-style method; consider moving the cleanup into a dedicated maintenance path or clearly documenting thatgetAllTablescan mutate metadata and cache state.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Changing `getAllTables` to return only `actualCollections` instead of the union with `getTableMetadataNames` may drop support for tables that exist only in metadata (e.g., pre-created schema entries); consider whether this behavior change is intentional and, if so, document or guard against cases where metadata-only tables are still needed.
- Having `getAllTables` perform side effects like deleting metadata and invalidating `tableCache` can be surprising for a read-style method; consider moving the cleanup into a dedicated maintenance path or clearly documenting that `getAllTables` can mutate metadata and cache state.
## Individual Comments
### Comment 1
<location path="presto-mongodb/src/main/java/com/facebook/presto/mongodb/MongoSession.java" line_range="152-163" />
<code_context>
+ Set<String> metadataTableNames = getTableMetadataNames(schema);
+
+ // Find stale metadata entries (metadata exists but collection doesn't) and remove them
+ metadataTableNames.stream()
+ .filter(name -> !actualCollections.contains(name))
+ .forEach(staleTableName -> {
+ SchemaTableName schemaTableName = new SchemaTableName(schema, staleTableName);
+ log.debug("Removing stale metadata for table: %s", schemaTableName);
+ deleteTableMetadata(schemaTableName);
+ tableCache.invalidate(schemaTableName);
+ });
+ return actualCollections;
</code_context>
<issue_to_address>
**suggestion:** Avoid side effects inside the stream pipeline when deleting metadata and invalidating cache
Using `forEach` in the stream to perform deletions and cache invalidations obscures the side effects and may become problematic if this is ever converted to a parallel stream or extended. Iterating over the filtered stale names with a standard `for`/enhanced `for` loop would make the control flow and side effects clearer and safer.
```suggestion
Set<String> metadataTableNames = getTableMetadataNames(schema);
// Find stale metadata entries (metadata exists but collection doesn't) and remove them
for (String staleTableName : metadataTableNames) {
if (!actualCollections.contains(staleTableName)) {
SchemaTableName schemaTableName = new SchemaTableName(schema, staleTableName);
log.debug("Removing stale metadata for table: %s", schemaTableName);
deleteTableMetadata(schemaTableName);
tableCache.invalidate(schemaTableName);
}
}
return actualCollections;
```
</issue_to_address>
### Comment 2
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="64-66" />
<code_context>
+ private MongoSession session;
+ private TypeManager typeManager;
+
@Test
public void testBuildQuery()
{
</code_context>
<issue_to_address>
**issue (testing):** The new test does not exercise the stale metadata cleanup path described in the bug/implementation
In the current test, you rename the collection and also update the `_schema` document from `table_a` to `table_b`, so there is never any stale metadata (`metadataTableNames == actualCollections`). As a result, `deleteTableMetadata` and `tableCache.invalidate` are not exercised, and the test would still pass even if the stale-metadata cleanup were removed.
To align with the real bug scenario, you could instead:
1. Create collection `table_a` and its `_schema` metadata.
2. Optionally call `session.getAllTables(schemaName)` once to warm any caches.
3. Rename `table_a` to `table_b` in Mongo **without** updating `_schema`, leaving stale metadata for `table_a`.
4. Call `session.getAllTables(schemaName)` again and assert:
* The result includes `table_b` but not `table_a`.
* `_schema` no longer contains a document for `table_a`.
This ensures the test will fail if the stale-metadata cleanup logic regresses.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-mongodb/src/main/java/com/facebook/presto/mongodb/MongoSession.java
Show resolved
Hide resolved
| @Test | ||
| public void testBuildQuery() | ||
| { |
There was a problem hiding this comment.
issue (testing): The new test does not exercise the stale metadata cleanup path described in the bug/implementation
In the current test, you rename the collection and also update the _schema document from table_a to table_b, so there is never any stale metadata (metadataTableNames == actualCollections). As a result, deleteTableMetadata and tableCache.invalidate are not exercised, and the test would still pass even if the stale-metadata cleanup were removed.
To align with the real bug scenario, you could instead:
- Create collection
table_aand its_schemametadata. - Optionally call
session.getAllTables(schemaName)once to warm any caches. - Rename
table_atotable_bin Mongo without updating_schema, leaving stale metadata fortable_a. - Call
session.getAllTables(schemaName)again and assert:- The result includes
table_bbut nottable_a. _schemano longer contains a document fortable_a.
- The result includes
This ensures the test will fail if the stale-metadata cleanup logic regresses.
There was a problem hiding this comment.
Sure! I'm generating a new review now.
There was a problem hiding this comment.
Hey @nishithakbhaskaran, I've posted a new review for you!
66b12f5 to
3c29b01
Compare
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Changing getAllTables to return only live MongoDB collections means any tables that exist only in metadata (e.g., created ahead of collection creation or representing special tables) will no longer be visible; double-check this behavior is acceptable for all MongoDB connector use cases and not relying on metadata-only tables.
- getTableMetadata was made public solely for test usage; consider reducing its visibility (e.g., package-private) and annotating with @VisibleForTesting or exposing test hooks another way to avoid expanding the public surface of MongoSession.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Changing getAllTables to return only live MongoDB collections means any tables that exist only in metadata (e.g., created ahead of collection creation or representing special tables) will no longer be visible; double-check this behavior is acceptable for all MongoDB connector use cases and not relying on metadata-only tables.
- getTableMetadata was made public solely for test usage; consider reducing its visibility (e.g., package-private) and annotating with @VisibleForTesting or exposing test hooks another way to avoid expanding the public surface of MongoSession.
## Individual Comments
### Comment 1
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="67-69" />
<code_context>
+ private MongoSession session;
+ private TypeManager typeManager;
+
@Test
public void testBuildQuery()
{
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for collections that exist without any `_schema` metadata
The new tests cover renamed collections and stale metadata, but they don’t cover the case where a MongoDB collection exists with no `_schema` entry (e.g., created directly in MongoDB). Please add a test that:
1. Creates a collection (e.g., `no_metadata_table`) without inserting `_schema` metadata.
2. Calls `session.getAllTables(schemaName)`.
3. Asserts that `no_metadata_table` is included in the result.
This will lock in the expectation that table listing is independent of `_schema` metadata.
Suggested implementation:
```java
@Test
public void testGetAllTablesIncludesCollectionsWithoutSchemaMetadata()
{
String schema = "test_schema";
String collectionName = "no_metadata_table";
// Create a collection directly in MongoDB without inserting any _schema metadata
MongoDatabase database = client.getDatabase(schema);
MongoCollection<Document> collection = database.getCollection(collectionName);
collection.insertOne(new Document("col", 1));
// Call session.getAllTables and verify the collection is listed
List<SchemaTableName> tables = session.getAllTables(schema);
assertTrue(tables.contains(new SchemaTableName(schema, collectionName)),
"Collections without _schema metadata should still be listed by getAllTables");
}
@Test
public void testBuildQuery()
```
To make this compile and integrate with the existing test class, you will also need to:
1. Add the following imports at the top of `TestMongoSession.java` (adjust if some are already present):
- `import com.facebook.presto.spi.connector.SchemaTableName;`
- `import com.mongodb.client.MongoCollection;`
- `import com.mongodb.client.MongoDatabase;`
- `import org.bson.Document;`
- `import java.util.List;`
- `import static org.testng.Assert.assertTrue;`
2. Ensure that `client` and `session` are initialized in the existing `@BeforeMethod` / setup method (this should already be in place for the other tests that use `session.getAllTables`).
3. If the rest of the test class uses a constant or helper for the schema name instead of the string `"test_schema"`, update the new test to reuse that same constant/helper for consistency.
</issue_to_address>
### Comment 2
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="255-213" />
<code_context>
+ public void testGetAllTablesWithStaleMetadataAfterDirectRename()
</code_context>
<issue_to_address>
**suggestion (testing):** Consider extending this test to assert behavior of `getTable` after stale metadata cleanup
Since this test already validates `getAllTables` after a direct rename leaves `_schema` stale, you can further improve coverage by asserting `session.getTable` behavior once `getAllTables` has run: verify `session.getTable(..., "table_b")` returns a non-null `MongoTable`, and that `session.getTable(..., "table_a")` throws `TableNotFoundException`. This would confirm the metadata access path is also consistent after stale entries are purged.
Suggested implementation:
```java
// Create _schema metadata for table_a
MongoCollection<Document> schemaCollection = database.getCollection("_schema");
// Trigger a metadata scan to ensure stale entries are cleaned up
session.getAllTables();
// After stale metadata cleanup, getTable should expose only the renamed table
SchemaTableName renamedTable = new SchemaTableName(schemaName, "table_b");
SchemaTableName oldTable = new SchemaTableName(schemaName, "table_a");
assertNotNull(
session.getTable(renamedTable),
"Renamed table_b should be accessible after stale metadata cleanup");
assertThrows(
TableNotFoundException.class,
() -> session.getTable(oldTable),
"Original table_a metadata should be purged and not accessible");
```
To make this compile and align with the existing codebase, check and adjust the following:
1. **Session usage**
- Ensure there is a `MongoSession session` instance available in this test class (most likely as a field initialized in a setup method).
- If `getAllTables` has a different signature (e.g., requires a `ConnectorSession` or returns tables for a specific schema), replace `session.getAllTables()` with the correct call used elsewhere in this test class.
2. **getTable signature**
- If `getTable` takes a `SchemaTableName` only (common in Presto), the snippet is correct.
- If it requires a `ConnectorSession` or other parameters, update:
- `session.getTable(renamedTable)`
- `session.getTable(oldTable)`
to match the existing usage pattern in this file.
3. **Imports**
Make sure the following imports exist at the top of `TestMongoSession.java` (add them if they are missing):
- `import com.facebook.presto.spi.SchemaTableName;`
- `import com.facebook.presto.spi.TableNotFoundException;`
- A static import for `assertThrows` if not already present. In Presto tests this is often:
- `import static io.airlift.testing.Assertions.assertThrows;`
Also ensure `assertNotNull` is available (typically `import static org.testng.Assert.assertNotNull;`).
4. **Duplicate `getAllTables` calls**
- If this test already calls `session.getAllTables()` later to assert stale metadata cleanup, you can:
- Remove the new `session.getAllTables();` line above and instead place the `getTable` assertions immediately after the existing `getAllTables` call.
- The important part is that the `getTable` assertions run *after* the stale metadata cleanup is triggered.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testBuildQuery() | ||
| { |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test for collections that exist without any _schema metadata
The new tests cover renamed collections and stale metadata, but they don’t cover the case where a MongoDB collection exists with no _schema entry (e.g., created directly in MongoDB). Please add a test that:
- Creates a collection (e.g.,
no_metadata_table) without inserting_schemametadata. - Calls
session.getAllTables(schemaName). - Asserts that
no_metadata_tableis included in the result.
This will lock in the expectation that table listing is independent of _schema metadata.
Suggested implementation:
@Test
public void testGetAllTablesIncludesCollectionsWithoutSchemaMetadata()
{
String schema = "test_schema";
String collectionName = "no_metadata_table";
// Create a collection directly in MongoDB without inserting any _schema metadata
MongoDatabase database = client.getDatabase(schema);
MongoCollection<Document> collection = database.getCollection(collectionName);
collection.insertOne(new Document("col", 1));
// Call session.getAllTables and verify the collection is listed
List<SchemaTableName> tables = session.getAllTables(schema);
assertTrue(tables.contains(new SchemaTableName(schema, collectionName)),
"Collections without _schema metadata should still be listed by getAllTables");
}
@Test
public void testBuildQuery()To make this compile and integrate with the existing test class, you will also need to:
-
Add the following imports at the top of
TestMongoSession.java(adjust if some are already present):import com.facebook.presto.spi.connector.SchemaTableName;import com.mongodb.client.MongoCollection;import com.mongodb.client.MongoDatabase;import org.bson.Document;import java.util.List;import static org.testng.Assert.assertTrue;
-
Ensure that
clientandsessionare initialized in the existing@BeforeMethod/ setup method (this should already be in place for the other tests that usesession.getAllTables). -
If the rest of the test class uses a constant or helper for the schema name instead of the string
"test_schema", update the new test to reuse that same constant/helper for consistency.
| Document oldMetadata = schemaCollection.find(new Document("table", "table_a")).first(); | ||
| assertEquals(oldMetadata, null, "Metadata for table_a should not exist"); | ||
|
|
||
| database.drop(); |
There was a problem hiding this comment.
suggestion (testing): Consider extending this test to assert behavior of getTable after stale metadata cleanup
Since this test already validates getAllTables after a direct rename leaves _schema stale, you can further improve coverage by asserting session.getTable behavior once getAllTables has run: verify session.getTable(..., "table_b") returns a non-null MongoTable, and that session.getTable(..., "table_a") throws TableNotFoundException. This would confirm the metadata access path is also consistent after stale entries are purged.
Suggested implementation:
// Create _schema metadata for table_a
MongoCollection<Document> schemaCollection = database.getCollection("_schema");
// Trigger a metadata scan to ensure stale entries are cleaned up
session.getAllTables();
// After stale metadata cleanup, getTable should expose only the renamed table
SchemaTableName renamedTable = new SchemaTableName(schemaName, "table_b");
SchemaTableName oldTable = new SchemaTableName(schemaName, "table_a");
assertNotNull(
session.getTable(renamedTable),
"Renamed table_b should be accessible after stale metadata cleanup");
assertThrows(
TableNotFoundException.class,
() -> session.getTable(oldTable),
"Original table_a metadata should be purged and not accessible");To make this compile and align with the existing codebase, check and adjust the following:
-
Session usage
- Ensure there is a
MongoSession sessioninstance available in this test class (most likely as a field initialized in a setup method). - If
getAllTableshas a different signature (e.g., requires aConnectorSessionor returns tables for a specific schema), replacesession.getAllTables()with the correct call used elsewhere in this test class.
- Ensure there is a
-
getTable signature
- If
getTabletakes aSchemaTableNameonly (common in Presto), the snippet is correct. - If it requires a
ConnectorSessionor other parameters, update:session.getTable(renamedTable)session.getTable(oldTable)
to match the existing usage pattern in this file.
- If
-
Imports
Make sure the following imports exist at the top ofTestMongoSession.java(add them if they are missing):import com.facebook.presto.spi.SchemaTableName;import com.facebook.presto.spi.TableNotFoundException;- A static import for
assertThrowsif not already present. In Presto tests this is often:import static io.airlift.testing.Assertions.assertThrows;
Also ensureassertNotNullis available (typicallyimport static org.testng.Assert.assertNotNull;).
-
Duplicate
getAllTablescalls- If this test already calls
session.getAllTables()later to assert stale metadata cleanup, you can:- Remove the new
session.getAllTables();line above and instead place thegetTableassertions immediately after the existinggetAllTablescall.
- Remove the new
- The important part is that the
getTableassertions run after the stale metadata cleanup is triggered.
- If this test already calls
There was a problem hiding this comment.
Sure! I'm generating a new review now.
There was a problem hiding this comment.
Hey @nishithakbhaskaran, I've posted a new review for you!
This PR fixes stale MongoDB table metadata was being returned when listing tables after a table rename, causing SELECT queries to fail on renamed tables
0e92abf to
728e2e3
Compare
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Changing getTableMetadata from private to public widens the API surface; if this is only needed for tests or internal wiring, consider keeping it package-private or adding a dedicated helper rather than making it public.
- In the new tests, you already import assertNull, so prefer assertNull(...) over assertEquals(..., null) for stale/absent metadata checks to make intent clearer and assertions more idiomatic.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Changing getTableMetadata from private to public widens the API surface; if this is only needed for tests or internal wiring, consider keeping it package-private or adding a dedicated helper rather than making it public.
- In the new tests, you already import assertNull, so prefer assertNull(...) over assertEquals(..., null) for stale/absent metadata checks to make intent clearer and assertions more idiomatic.
## Individual Comments
### Comment 1
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="70-72" />
<code_context>
+ private MongoSession session;
+ private TypeManager typeManager;
+
@Test
public void testBuildQuery()
{
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for schema with only stale metadata and no live collections
The current test mixes live (`active_table`) and stale (`stale_table`) metadata. Please also add a test where the schema has only stale metadata entries and no collections (e.g., all collections dropped out of band), asserting that:
- `getAllTables(schemaName)` returns an empty set, and
- all `_schema` documents for that schema are removed.
This will validate the cleanup logic when every metadata entry is stale.
Suggested implementation:
```java
public class TestMongoSession
{
private static final MongoColumnHandle COL2 = new MongoColumnHandle("col2", createUnboundedVarcharType(), false);
private static final MongoColumnHandle COL3 = new MongoColumnHandle("col3", VARBINARY, false);
private MongoServer server;
private MongoClient client;
private MongoSession session;
private TypeManager typeManager;
@Test
public void testSchemaWithOnlyStaleMetadata()
{
String schemaName = "stale_only_schema";
MongoDatabase database = client.getDatabase(schemaName);
MongoCollection<Document> schemaCollection = database.getCollection("_schema");
// Insert stale metadata for a table that does not have a backing collection
Document staleTable = new Document("schemaName", schemaName)
.append("tableName", "stale_table")
.append("columns", ImmutableList.of(
new Document("name", "col1")
.append("type", "varchar")
.append("ordinalPosition", 0)
.append("hidden", false)));
schemaCollection.insertOne(staleTable);
// Trigger table listing which should clean up stale metadata
assertTrue(session.getAllTables(schemaName).isEmpty());
// Verify that all _schema documents for this schema were removed
assertEquals(0L, schemaCollection.countDocuments(new Document("schemaName", schemaName)));
}
@Test
public void testBuildQuery()
```
To compile successfully and align with the rest of the test class, you may need to ensure:
1. The following imports are present (or equivalent ones, depending on what is already imported):
- `import com.google.common.collect.ImmutableList;`
- `import com.mongodb.client.MongoCollection;`
- `import com.mongodb.client.MongoDatabase;`
- `import org.bson.Document;`
- `import static org.testng.Assert.assertEquals;`
- `import static org.testng.Assert.assertTrue;`
2. The `_schema` collection name and the document field names (`"schemaName"`, `"tableName"`, `"columns"`, etc.) should match the ones used in the existing tests and in `MongoSession`. If the existing tests use different field names or a different schema representation, adjust the inserted `Document` accordingly.
3. If `session.getAllTables(schemaName)` returns a `List` or another collection type rather than a `Set`, the `.isEmpty()` assertion will still work; if it returns something else (e.g., `ConnectorTableMetadata` list), adapt the assertion to the existing tests’ pattern (for example, checking `size()` instead of `isEmpty()`).
</issue_to_address>
### Comment 2
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="225-226" />
<code_context>
+ String schemaName = "test_stale_metadata";
+ MongoDatabase database = client.getDatabase(schemaName);
+
+ // Create a collection with data
+ MongoCollection<Document> collection = database.getCollection("active_table");
+ collection.insertOne(new Document("field1", "value1"));
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for collections that have no `_schema` metadata at all
Current tests cover collections with matching metadata, stale metadata, and renamed collections with stale metadata. Please also add a case where a collection exists in MongoDB but has no `_schema` document (e.g., `collection_only_table`) and assert that `getAllTables` still returns it. This will confirm that relying on `actualCollections` alone doesn’t break discovery of new collections without metadata.
Suggested implementation:
```java
public void testGetAllTablesWithStaleMetadata()
{
String schemaName = "test_stale_metadata";
MongoDatabase database = client.getDatabase(schemaName);
// Create a collection with data and corresponding _schema metadata (active table)
MongoCollection<Document> collection = database.getCollection("active_table");
collection.insertOne(new Document("field1", "value1"));
// Create a collection that has no _schema metadata at all (new collection-only table)
MongoCollection<Document> collectionOnlyTable = database.getCollection("collection_only_table");
collectionOnlyTable.insertOne(new Document("field1", "value1"));
```
To fully implement the suggestion, you should also:
1. Update the assertions in `testGetAllTablesWithStaleMetadata` (where it currently asserts on the result of `getAllTables`) to:
- Expect that `"active_table"` is present.
- Expect that `"collection_only_table"` is also present, even though it has no `_schema` document.
2. If the test currently checks the exact set of tables, add `"collection_only_table"` to that expected set; if it uses `assertTrue`-style membership checks, add an additional assertion for `"collection_only_table"`.
3. Ensure the test setup still includes any stale `_schema` metadata for other collections as originally intended, so this new case is additive rather than replacing the stale metadata scenario.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testBuildQuery() | ||
| { |
There was a problem hiding this comment.
suggestion (testing): Add coverage for schema with only stale metadata and no live collections
The current test mixes live (active_table) and stale (stale_table) metadata. Please also add a test where the schema has only stale metadata entries and no collections (e.g., all collections dropped out of band), asserting that:
getAllTables(schemaName)returns an empty set, and- all
_schemadocuments for that schema are removed.
This will validate the cleanup logic when every metadata entry is stale.
Suggested implementation:
public class TestMongoSession
{
private static final MongoColumnHandle COL2 = new MongoColumnHandle("col2", createUnboundedVarcharType(), false);
private static final MongoColumnHandle COL3 = new MongoColumnHandle("col3", VARBINARY, false);
private MongoServer server;
private MongoClient client;
private MongoSession session;
private TypeManager typeManager;
@Test
public void testSchemaWithOnlyStaleMetadata()
{
String schemaName = "stale_only_schema";
MongoDatabase database = client.getDatabase(schemaName);
MongoCollection<Document> schemaCollection = database.getCollection("_schema");
// Insert stale metadata for a table that does not have a backing collection
Document staleTable = new Document("schemaName", schemaName)
.append("tableName", "stale_table")
.append("columns", ImmutableList.of(
new Document("name", "col1")
.append("type", "varchar")
.append("ordinalPosition", 0)
.append("hidden", false)));
schemaCollection.insertOne(staleTable);
// Trigger table listing which should clean up stale metadata
assertTrue(session.getAllTables(schemaName).isEmpty());
// Verify that all _schema documents for this schema were removed
assertEquals(0L, schemaCollection.countDocuments(new Document("schemaName", schemaName)));
}
@Test
public void testBuildQuery()To compile successfully and align with the rest of the test class, you may need to ensure:
- The following imports are present (or equivalent ones, depending on what is already imported):
import com.google.common.collect.ImmutableList;import com.mongodb.client.MongoCollection;import com.mongodb.client.MongoDatabase;import org.bson.Document;import static org.testng.Assert.assertEquals;import static org.testng.Assert.assertTrue;
-
The
_schemacollection name and the document field names ("schemaName","tableName","columns", etc.) should match the ones used in the existing tests and inMongoSession. If the existing tests use different field names or a different schema representation, adjust the insertedDocumentaccordingly. -
If
session.getAllTables(schemaName)returns aListor another collection type rather than aSet, the.isEmpty()assertion will still work; if it returns something else (e.g.,ConnectorTableMetadatalist), adapt the assertion to the existing tests’ pattern (for example, checkingsize()instead ofisEmpty()).
| // Create a collection with data | ||
| MongoCollection<Document> collection = database.getCollection("active_table"); |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test for collections that have no _schema metadata at all
Current tests cover collections with matching metadata, stale metadata, and renamed collections with stale metadata. Please also add a case where a collection exists in MongoDB but has no _schema document (e.g., collection_only_table) and assert that getAllTables still returns it. This will confirm that relying on actualCollections alone doesn’t break discovery of new collections without metadata.
Suggested implementation:
public void testGetAllTablesWithStaleMetadata()
{
String schemaName = "test_stale_metadata";
MongoDatabase database = client.getDatabase(schemaName);
// Create a collection with data and corresponding _schema metadata (active table)
MongoCollection<Document> collection = database.getCollection("active_table");
collection.insertOne(new Document("field1", "value1"));
// Create a collection that has no _schema metadata at all (new collection-only table)
MongoCollection<Document> collectionOnlyTable = database.getCollection("collection_only_table");
collectionOnlyTable.insertOne(new Document("field1", "value1"));To fully implement the suggestion, you should also:
- Update the assertions in
testGetAllTablesWithStaleMetadata(where it currently asserts on the result ofgetAllTables) to:- Expect that
"active_table"is present. - Expect that
"collection_only_table"is also present, even though it has no_schemadocument.
- Expect that
- If the test currently checks the exact set of tables, add
"collection_only_table"to that expected set; if it usesassertTrue-style membership checks, add an additional assertion for"collection_only_table". - Ensure the test setup still includes any stale
_schemametadata for other collections as originally intended, so this new case is additive rather than replacing the stale metadata scenario.
Happy to! First, looking at the text, all entries must begin with a keyword in the Order of Changes in the Release Notes Guidelines. Would this rephrasing correctly describe your work? Reading the test failure, it appears that the test is not stopping after the closing @yhwang , would you have time to look at this? This isn't the first time I've seen this recently, I encountered this on a PR of mine last week. |
|
(The failing CI check for the PR title should pass when you're ready for this PR to be review and you remove |
|
Created a PR on the other repo to address the issue you found. Wait for the PR, and we probably need to update the dependency version in this repo to use the new code later. |
Description
This PR fixes a critical issue where stale MongoDB table metadata was being returned when listing tables after a table rename, causing SELECT queries to fail on renamed tables.
Motivation and Context
This PR fixes issue #27091
Impact
Test Plan
Rename the table test1234 from MongoDB Backend
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by Sourcery
Clean up stale MongoDB table metadata when listing tables and add coverage for table renames.
Bug Fixes:
Build:
Tests:
Summary by Sourcery
Ensure MongoDB connector only exposes actual collections and cleans up stale metadata when listing tables.
Bug Fixes:
Tests: