Skip to content

[WIP DNR]fix(connector): Return fresh table metadata when listing tables after collection rename in Mongodb#27383

Open
nishithakbhaskaran wants to merge 2 commits intoprestodb:masterfrom
nishithakbhaskaran:mongo-issue-fix
Open

[WIP DNR]fix(connector): Return fresh table metadata when listing tables after collection rename in Mongodb#27383
nishithakbhaskaran wants to merge 2 commits intoprestodb:masterfrom
nishithakbhaskaran:mongo-issue-fix

Conversation

@nishithakbhaskaran
Copy link
Copy Markdown
Contributor

@nishithakbhaskaran nishithakbhaskaran commented Mar 20, 2026

Description

This PR fixes a critical issue where stale MongoDB table metadata was being returned when listing tables after a table rename, causing SELECT queries to fail on renamed tables.

Motivation and Context

This PR fixes issue #27091

Impact

Test Plan

presto> show tables in mongodb.TM_LAKEHOUSE_ENGINE_DB_1;
       Table       
-------------------
 test1234          
 test_t1_testtable 
(2 rows)

Rename the table test1234 from MongoDB Backend

image
presto> show tables in mongodb.TM_LAKEHOUSE_ENGINE_DB_1 ;
       Table       
-------------------
 test12345         
 test_t1_testtable 
(2 rows)


presto> select * from mongodb.TM_LAKEHOUSE_ENGINE_DB_1.test1234;

Query 20260319_125107_00006_d5g7m, FAILED, 0 nodes
Splits: 0 total, 0 done (0.00%)
[Latency: client-side: 0:01, server-side: 0:01] [0 rows, 0B] [0 rows/s, 0B/s]

Query 20260319_125107_00006_d5g7m failed: Table mongodb.TM_LAKEHOUSE_ENGINE_DB_1.test1234 does not exist

presto> select * from mongodb.TM_LAKEHOUSE_ENGINE_DB_1.test12345;
 name 
------
 abc  
(1 row)

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Return fresh table metadata when listing tables after rename

Summary by Sourcery

Clean up stale MongoDB table metadata when listing tables and add coverage for table renames.

Bug Fixes:

  • Ensure getAllTables returns only existing MongoDB collections by removing stale metadata entries and invalidating related cache entries.

Build:

  • Include the MongoDB plugin module in the default test configuration plugin bundles.

Tests:

  • Add an integration-style test using an in-memory MongoDB server to verify table metadata remains consistent after renaming a collection.

Summary by Sourcery

Ensure MongoDB connector only exposes actual collections and cleans up stale metadata when listing tables.

Bug Fixes:

  • Return only existing MongoDB collections from getAllTables and remove stale table metadata entries while invalidating their cache.

Tests:

  • Add in-memory MongoDB–backed tests to verify getAllTables correctly reflects renamed collections and removes stale metadata, including cache invalidation behavior.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 20, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 20, 2026

Reviewer's Guide

MongoDB connector now returns only live collections when listing tables by reconciling actual Mongo collections with _schema metadata, cleaning up stale metadata (and cache) on the fly, and adds in‑memory MongoDB integration tests to cover table renames and stale metadata scenarios.

Sequence diagram for MongoDB table listing and stale metadata cleanup

sequenceDiagram
    actor User
    participant PrestoEngine
    participant MongoConnector
    participant MongoSession
    participant MongoDB

    User->>PrestoEngine: SHOW TABLES IN mongodb.schema
    PrestoEngine->>MongoConnector: listTables(schema)
    MongoConnector->>MongoSession: getAllTables(schema)

    MongoSession->>MongoDB: listCollectionNames(schema)
    MongoDB-->>MongoSession: collectionNames
    MongoSession->>MongoSession: filter out schemaCollection and SYSTEM_TABLES
    MongoSession->>MongoSession: actualCollections = filtered names

    MongoSession->>MongoSession: metadataTableNames = getTableMetadataNames(schema)
    MongoSession->>MongoSession: staleMetadataTableNames = metadataTableNames - actualCollections

    loop for each staleTableName
        MongoSession->>MongoSession: schemaTableName = new SchemaTableName(schema, staleTableName)
        MongoSession->>MongoSession: deleteTableMetadata(schemaTableName)
        MongoSession->>MongoSession: tableCache.invalidate(schemaTableName)
    end

    MongoSession-->>MongoConnector: actualCollections
    MongoConnector-->>PrestoEngine: actualCollections
    PrestoEngine-->>User: table list with only live collections
Loading

Entity relationship diagram for MongoDB collections and _schema metadata

erDiagram
    SCHEMA ||--o{ COLLECTION : has
    SCHEMA ||--o{ SCHEMA_METADATA : has

    SCHEMA {
        string name
    }

    COLLECTION {
        string name
    }

    SCHEMA_METADATA {
        string table_name
        string other_metadata
    }

    COLLECTION ||--o| SCHEMA_METADATA : is_backed_by

    %% Stale metadata is when SCHEMA_METADATA.table_name has no matching COLLECTION.name
Loading

Class diagram for updated MongoSession metadata handling

classDiagram
    class MongoSession {
        - MongoClient client
        - LoadingCache tableCache
        - String schemaCollection
        - Set SYSTEM_TABLES
        + List getAllSchemas()
        + Set getAllTables(String schema)
        + MongoTable getTable(SchemaTableName tableName)
        + Document getTableMetadata(SchemaTableName schemaTableName)
        - Set getTableMetadataNames(String schema)
        - void deleteTableMetadata(SchemaTableName schemaTableName)
    }

    class MongoClient {
        + MongoDatabase getDatabase(String schema)
    }

    class MongoDatabase {
        + MongoIterable listCollectionNames()
    }

    class LoadingCache {
        + void invalidate(SchemaTableName schemaTableName)
    }

    class SchemaTableName {
        + SchemaTableName(String schemaName, String tableName)
        + String getSchemaName()
        + String getTableName()
    }

    class MongoTable
    class Document
    class MongoIterable

    MongoSession --> MongoClient : uses
    MongoSession --> LoadingCache : uses
    MongoSession --> SchemaTableName : constructs
    MongoSession --> MongoTable : returns
    MongoSession --> Document : returns
    MongoClient --> MongoDatabase : returns
    MongoDatabase --> MongoIterable : returns
Loading

File-Level Changes

Change Details Files
Ensure getAllTables returns only existing MongoDB collections and cleans up stale metadata and cache entries.
  • Compute the set of actual Mongo collections for a schema, excluding internal and system tables
  • Read table names from the _schema metadata collection and detect entries with no backing collection
  • Delete stale metadata documents from _schema and invalidate corresponding table cache entries
  • Return only the set of actual Mongo collections instead of the union of collections and metadata entries
presto-mongodb/src/main/java/com/facebook/presto/mongodb/MongoSession.java
Expose table metadata lookup for reuse and add integration-style tests using in-memory MongoDB to validate behavior around renames and stale metadata.
  • Relax visibility of getTableMetadata to public to allow use in new tests or helpers
  • Introduce an in-memory MongoDB server setup/teardown for TestMongoSession
  • Add tests covering getAllTables after renaming a collection with matching metadata updates
  • Add tests verifying stale _schema metadata is cleaned up and not returned by getAllTables, including direct backend renames that leave stale metadata
presto-mongodb/src/main/java/com/facebook/presto/mongodb/MongoSession.java
presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java

Possibly linked issues

  • #MongoDB connector shows both old and new table names after collection rename: PR changes getAllTables to return only live MongoDB collections and delete stale metadata, solving the rename listing bug

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@nishithakbhaskaran nishithakbhaskaran changed the title [WIP DNR] Mongo select stale metadata fix fix: Mongo select stale metadata fix Mar 25, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title fix: Mongo select stale metadata fix fix: Mongo select stale metadata Mar 25, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title fix: Mongo select stale metadata fix(mongodb): Select query stale metadata Mar 25, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title fix(mongodb): Select query stale metadata fix(mongodb): Return fresh table metadata when listing tables after rename Mar 25, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title fix(mongodb): Return fresh table metadata when listing tables after rename fix(connector): Return fresh table metadata when listing tables after rename in Mongodb Mar 25, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title fix(connector): Return fresh table metadata when listing tables after rename in Mongodb fix(connector): Select query return fresh table metadata when listing tables after collection rename in Mongodb Mar 25, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title fix(connector): Select query return fresh table metadata when listing tables after collection rename in Mongodb fix(connector): Return fresh table metadata when listing tables after collection rename in Mongodb Mar 25, 2026
@nishithakbhaskaran
Copy link
Copy Markdown
Contributor Author

@steveburnett Can you help me to fix the release note?

@nishithakbhaskaran nishithakbhaskaran marked this pull request as ready for review March 26, 2026 04:19
@nishithakbhaskaran nishithakbhaskaran requested a review from a team as a code owner March 26, 2026 04:19
@prestodb-ci prestodb-ci requested review from a team, faizdani-ibm and shuangli999 and removed request for a team March 26, 2026 04:19
@nishithakbhaskaran nishithakbhaskaran requested review from agrawalreetika and sumi-mathew and removed request for faizdani-ibm and shuangli999 March 26, 2026 04:20
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Changing getAllTables to return only actualCollections instead of the union with getTableMetadataNames may drop support for tables that exist only in metadata (e.g., pre-created schema entries); consider whether this behavior change is intentional and, if so, document or guard against cases where metadata-only tables are still needed.
  • Having getAllTables perform side effects like deleting metadata and invalidating tableCache can be surprising for a read-style method; consider moving the cleanup into a dedicated maintenance path or clearly documenting that getAllTables can mutate metadata and cache state.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Changing `getAllTables` to return only `actualCollections` instead of the union with `getTableMetadataNames` may drop support for tables that exist only in metadata (e.g., pre-created schema entries); consider whether this behavior change is intentional and, if so, document or guard against cases where metadata-only tables are still needed.
- Having `getAllTables` perform side effects like deleting metadata and invalidating `tableCache` can be surprising for a read-style method; consider moving the cleanup into a dedicated maintenance path or clearly documenting that `getAllTables` can mutate metadata and cache state.

## Individual Comments

### Comment 1
<location path="presto-mongodb/src/main/java/com/facebook/presto/mongodb/MongoSession.java" line_range="152-163" />
<code_context>
+        Set<String> metadataTableNames = getTableMetadataNames(schema);
+
+        // Find stale metadata entries (metadata exists but collection doesn't) and remove them
+        metadataTableNames.stream()
+                .filter(name -> !actualCollections.contains(name))
+                .forEach(staleTableName -> {
+                    SchemaTableName schemaTableName = new SchemaTableName(schema, staleTableName);
+                    log.debug("Removing stale metadata for table: %s", schemaTableName);
+                    deleteTableMetadata(schemaTableName);
+                    tableCache.invalidate(schemaTableName);
+                });
+        return actualCollections;
</code_context>
<issue_to_address>
**suggestion:** Avoid side effects inside the stream pipeline when deleting metadata and invalidating cache

Using `forEach` in the stream to perform deletions and cache invalidations obscures the side effects and may become problematic if this is ever converted to a parallel stream or extended. Iterating over the filtered stale names with a standard `for`/enhanced `for` loop would make the control flow and side effects clearer and safer.

```suggestion
        Set<String> metadataTableNames = getTableMetadataNames(schema);

        // Find stale metadata entries (metadata exists but collection doesn't) and remove them
        for (String staleTableName : metadataTableNames) {
            if (!actualCollections.contains(staleTableName)) {
                SchemaTableName schemaTableName = new SchemaTableName(schema, staleTableName);
                log.debug("Removing stale metadata for table: %s", schemaTableName);
                deleteTableMetadata(schemaTableName);
                tableCache.invalidate(schemaTableName);
            }
        }
        return actualCollections;
```
</issue_to_address>

### Comment 2
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="64-66" />
<code_context>
+    private MongoSession session;
+    private TypeManager typeManager;
+
     @Test
     public void testBuildQuery()
     {
</code_context>
<issue_to_address>
**issue (testing):** The new test does not exercise the stale metadata cleanup path described in the bug/implementation

In the current test, you rename the collection and also update the `_schema` document from `table_a` to `table_b`, so there is never any stale metadata (`metadataTableNames == actualCollections`). As a result, `deleteTableMetadata` and `tableCache.invalidate` are not exercised, and the test would still pass even if the stale-metadata cleanup were removed.

To align with the real bug scenario, you could instead:
1. Create collection `table_a` and its `_schema` metadata.
2. Optionally call `session.getAllTables(schemaName)` once to warm any caches.
3. Rename `table_a` to `table_b` in Mongo **without** updating `_schema`, leaving stale metadata for `table_a`.
4. Call `session.getAllTables(schemaName)` again and assert:
   * The result includes `table_b` but not `table_a`.
   * `_schema` no longer contains a document for `table_a`.

This ensures the test will fail if the stale-metadata cleanup logic regresses.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 64 to 66
@Test
public void testBuildQuery()
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): The new test does not exercise the stale metadata cleanup path described in the bug/implementation

In the current test, you rename the collection and also update the _schema document from table_a to table_b, so there is never any stale metadata (metadataTableNames == actualCollections). As a result, deleteTableMetadata and tableCache.invalidate are not exercised, and the test would still pass even if the stale-metadata cleanup were removed.

To align with the real bug scenario, you could instead:

  1. Create collection table_a and its _schema metadata.
  2. Optionally call session.getAllTables(schemaName) once to warm any caches.
  3. Rename table_a to table_b in Mongo without updating _schema, leaving stale metadata for table_a.
  4. Call session.getAllTables(schemaName) again and assert:
    • The result includes table_b but not table_a.
    • _schema no longer contains a document for table_a.

This ensures the test will fail if the stale-metadata cleanup logic regresses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'm generating a new review now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nishithakbhaskaran, I've posted a new review for you!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai dismiss

@nishithakbhaskaran nishithakbhaskaran force-pushed the mongo-issue-fix branch 3 times, most recently from 66b12f5 to 3c29b01 Compare March 26, 2026 12:24
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Changing getAllTables to return only live MongoDB collections means any tables that exist only in metadata (e.g., created ahead of collection creation or representing special tables) will no longer be visible; double-check this behavior is acceptable for all MongoDB connector use cases and not relying on metadata-only tables.
  • getTableMetadata was made public solely for test usage; consider reducing its visibility (e.g., package-private) and annotating with @VisibleForTesting or exposing test hooks another way to avoid expanding the public surface of MongoSession.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Changing getAllTables to return only live MongoDB collections means any tables that exist only in metadata (e.g., created ahead of collection creation or representing special tables) will no longer be visible; double-check this behavior is acceptable for all MongoDB connector use cases and not relying on metadata-only tables.
- getTableMetadata was made public solely for test usage; consider reducing its visibility (e.g., package-private) and annotating with @VisibleForTesting or exposing test hooks another way to avoid expanding the public surface of MongoSession.

## Individual Comments

### Comment 1
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="67-69" />
<code_context>
+    private MongoSession session;
+    private TypeManager typeManager;
+
     @Test
     public void testBuildQuery()
     {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for collections that exist without any `_schema` metadata

The new tests cover renamed collections and stale metadata, but they don’t cover the case where a MongoDB collection exists with no `_schema` entry (e.g., created directly in MongoDB). Please add a test that:

1. Creates a collection (e.g., `no_metadata_table`) without inserting `_schema` metadata.
2. Calls `session.getAllTables(schemaName)`.
3. Asserts that `no_metadata_table` is included in the result.

This will lock in the expectation that table listing is independent of `_schema` metadata.

Suggested implementation:

```java
    @Test
    public void testGetAllTablesIncludesCollectionsWithoutSchemaMetadata()
    {
        String schema = "test_schema";
        String collectionName = "no_metadata_table";

        // Create a collection directly in MongoDB without inserting any _schema metadata
        MongoDatabase database = client.getDatabase(schema);
        MongoCollection<Document> collection = database.getCollection(collectionName);
        collection.insertOne(new Document("col", 1));

        // Call session.getAllTables and verify the collection is listed
        List<SchemaTableName> tables = session.getAllTables(schema);

        assertTrue(tables.contains(new SchemaTableName(schema, collectionName)),
                "Collections without _schema metadata should still be listed by getAllTables");
    }

    @Test
    public void testBuildQuery()

```

To make this compile and integrate with the existing test class, you will also need to:

1. Add the following imports at the top of `TestMongoSession.java` (adjust if some are already present):
   - `import com.facebook.presto.spi.connector.SchemaTableName;`
   - `import com.mongodb.client.MongoCollection;`
   - `import com.mongodb.client.MongoDatabase;`
   - `import org.bson.Document;`
   - `import java.util.List;`
   - `import static org.testng.Assert.assertTrue;`

2. Ensure that `client` and `session` are initialized in the existing `@BeforeMethod` / setup method (this should already be in place for the other tests that use `session.getAllTables`).

3. If the rest of the test class uses a constant or helper for the schema name instead of the string `"test_schema"`, update the new test to reuse that same constant/helper for consistency.
</issue_to_address>

### Comment 2
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="255-213" />
<code_context>
+    public void testGetAllTablesWithStaleMetadataAfterDirectRename()
</code_context>
<issue_to_address>
**suggestion (testing):** Consider extending this test to assert behavior of `getTable` after stale metadata cleanup

Since this test already validates `getAllTables` after a direct rename leaves `_schema` stale, you can further improve coverage by asserting `session.getTable` behavior once `getAllTables` has run: verify `session.getTable(..., "table_b")` returns a non-null `MongoTable`, and that `session.getTable(..., "table_a")` throws `TableNotFoundException`. This would confirm the metadata access path is also consistent after stale entries are purged.

Suggested implementation:

```java
        //  Create _schema metadata for table_a
        MongoCollection<Document> schemaCollection = database.getCollection("_schema");

        // Trigger a metadata scan to ensure stale entries are cleaned up
        session.getAllTables();

        // After stale metadata cleanup, getTable should expose only the renamed table
        SchemaTableName renamedTable = new SchemaTableName(schemaName, "table_b");
        SchemaTableName oldTable = new SchemaTableName(schemaName, "table_a");

        assertNotNull(
                session.getTable(renamedTable),
                "Renamed table_b should be accessible after stale metadata cleanup");

        assertThrows(
                TableNotFoundException.class,
                () -> session.getTable(oldTable),
                "Original table_a metadata should be purged and not accessible");

```

To make this compile and align with the existing codebase, check and adjust the following:

1. **Session usage**
   - Ensure there is a `MongoSession session` instance available in this test class (most likely as a field initialized in a setup method).  
   - If `getAllTables` has a different signature (e.g., requires a `ConnectorSession` or returns tables for a specific schema), replace `session.getAllTables()` with the correct call used elsewhere in this test class.

2. **getTable signature**
   - If `getTable` takes a `SchemaTableName` only (common in Presto), the snippet is correct.
   - If it requires a `ConnectorSession` or other parameters, update:
     - `session.getTable(renamedTable)`
     - `session.getTable(oldTable)`
     to match the existing usage pattern in this file.

3. **Imports**
   Make sure the following imports exist at the top of `TestMongoSession.java` (add them if they are missing):
   - `import com.facebook.presto.spi.SchemaTableName;`
   - `import com.facebook.presto.spi.TableNotFoundException;`
   - A static import for `assertThrows` if not already present. In Presto tests this is often:
     - `import static io.airlift.testing.Assertions.assertThrows;`
   Also ensure `assertNotNull` is available (typically `import static org.testng.Assert.assertNotNull;`).

4. **Duplicate `getAllTables` calls**
   - If this test already calls `session.getAllTables()` later to assert stale metadata cleanup, you can:
     - Remove the new `session.getAllTables();` line above and instead place the `getTable` assertions immediately after the existing `getAllTables` call.
   - The important part is that the `getTable` assertions run *after* the stale metadata cleanup is triggered.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 67 to 69
@Test
public void testBuildQuery()
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test for collections that exist without any _schema metadata

The new tests cover renamed collections and stale metadata, but they don’t cover the case where a MongoDB collection exists with no _schema entry (e.g., created directly in MongoDB). Please add a test that:

  1. Creates a collection (e.g., no_metadata_table) without inserting _schema metadata.
  2. Calls session.getAllTables(schemaName).
  3. Asserts that no_metadata_table is included in the result.

This will lock in the expectation that table listing is independent of _schema metadata.

Suggested implementation:

    @Test
    public void testGetAllTablesIncludesCollectionsWithoutSchemaMetadata()
    {
        String schema = "test_schema";
        String collectionName = "no_metadata_table";

        // Create a collection directly in MongoDB without inserting any _schema metadata
        MongoDatabase database = client.getDatabase(schema);
        MongoCollection<Document> collection = database.getCollection(collectionName);
        collection.insertOne(new Document("col", 1));

        // Call session.getAllTables and verify the collection is listed
        List<SchemaTableName> tables = session.getAllTables(schema);

        assertTrue(tables.contains(new SchemaTableName(schema, collectionName)),
                "Collections without _schema metadata should still be listed by getAllTables");
    }

    @Test
    public void testBuildQuery()

To make this compile and integrate with the existing test class, you will also need to:

  1. Add the following imports at the top of TestMongoSession.java (adjust if some are already present):

    • import com.facebook.presto.spi.connector.SchemaTableName;
    • import com.mongodb.client.MongoCollection;
    • import com.mongodb.client.MongoDatabase;
    • import org.bson.Document;
    • import java.util.List;
    • import static org.testng.Assert.assertTrue;
  2. Ensure that client and session are initialized in the existing @BeforeMethod / setup method (this should already be in place for the other tests that use session.getAllTables).

  3. If the rest of the test class uses a constant or helper for the schema name instead of the string "test_schema", update the new test to reuse that same constant/helper for consistency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai dismiss

Document oldMetadata = schemaCollection.find(new Document("table", "table_a")).first();
assertEquals(oldMetadata, null, "Metadata for table_a should not exist");

database.drop();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider extending this test to assert behavior of getTable after stale metadata cleanup

Since this test already validates getAllTables after a direct rename leaves _schema stale, you can further improve coverage by asserting session.getTable behavior once getAllTables has run: verify session.getTable(..., "table_b") returns a non-null MongoTable, and that session.getTable(..., "table_a") throws TableNotFoundException. This would confirm the metadata access path is also consistent after stale entries are purged.

Suggested implementation:

        //  Create _schema metadata for table_a
        MongoCollection<Document> schemaCollection = database.getCollection("_schema");

        // Trigger a metadata scan to ensure stale entries are cleaned up
        session.getAllTables();

        // After stale metadata cleanup, getTable should expose only the renamed table
        SchemaTableName renamedTable = new SchemaTableName(schemaName, "table_b");
        SchemaTableName oldTable = new SchemaTableName(schemaName, "table_a");

        assertNotNull(
                session.getTable(renamedTable),
                "Renamed table_b should be accessible after stale metadata cleanup");

        assertThrows(
                TableNotFoundException.class,
                () -> session.getTable(oldTable),
                "Original table_a metadata should be purged and not accessible");

To make this compile and align with the existing codebase, check and adjust the following:

  1. Session usage

    • Ensure there is a MongoSession session instance available in this test class (most likely as a field initialized in a setup method).
    • If getAllTables has a different signature (e.g., requires a ConnectorSession or returns tables for a specific schema), replace session.getAllTables() with the correct call used elsewhere in this test class.
  2. getTable signature

    • If getTable takes a SchemaTableName only (common in Presto), the snippet is correct.
    • If it requires a ConnectorSession or other parameters, update:
      • session.getTable(renamedTable)
      • session.getTable(oldTable)
        to match the existing usage pattern in this file.
  3. Imports
    Make sure the following imports exist at the top of TestMongoSession.java (add them if they are missing):

    • import com.facebook.presto.spi.SchemaTableName;
    • import com.facebook.presto.spi.TableNotFoundException;
    • A static import for assertThrows if not already present. In Presto tests this is often:
      • import static io.airlift.testing.Assertions.assertThrows;
        Also ensure assertNotNull is available (typically import static org.testng.Assert.assertNotNull;).
  4. Duplicate getAllTables calls

    • If this test already calls session.getAllTables() later to assert stale metadata cleanup, you can:
      • Remove the new session.getAllTables(); line above and instead place the getTable assertions immediately after the existing getAllTables call.
    • The important part is that the getTable assertions run after the stale metadata cleanup is triggered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'm generating a new review now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nishithakbhaskaran, I've posted a new review for you!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai dismiss

This PR fixes stale MongoDB table metadata was being returned when listing tables after a table rename,
causing SELECT queries to fail on renamed tables
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Changing getTableMetadata from private to public widens the API surface; if this is only needed for tests or internal wiring, consider keeping it package-private or adding a dedicated helper rather than making it public.
  • In the new tests, you already import assertNull, so prefer assertNull(...) over assertEquals(..., null) for stale/absent metadata checks to make intent clearer and assertions more idiomatic.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Changing getTableMetadata from private to public widens the API surface; if this is only needed for tests or internal wiring, consider keeping it package-private or adding a dedicated helper rather than making it public.
- In the new tests, you already import assertNull, so prefer assertNull(...) over assertEquals(..., null) for stale/absent metadata checks to make intent clearer and assertions more idiomatic.

## Individual Comments

### Comment 1
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="70-72" />
<code_context>
+    private MongoSession session;
+    private TypeManager typeManager;
+
     @Test
     public void testBuildQuery()
     {
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for schema with only stale metadata and no live collections

The current test mixes live (`active_table`) and stale (`stale_table`) metadata. Please also add a test where the schema has only stale metadata entries and no collections (e.g., all collections dropped out of band), asserting that:
- `getAllTables(schemaName)` returns an empty set, and
- all `_schema` documents for that schema are removed.
This will validate the cleanup logic when every metadata entry is stale.

Suggested implementation:

```java
public class TestMongoSession
{
    private static final MongoColumnHandle COL2 = new MongoColumnHandle("col2", createUnboundedVarcharType(), false);
    private static final MongoColumnHandle COL3 = new MongoColumnHandle("col3", VARBINARY, false);

    private MongoServer server;
    private MongoClient client;
    private MongoSession session;
    private TypeManager typeManager;

    @Test
    public void testSchemaWithOnlyStaleMetadata()
    {
        String schemaName = "stale_only_schema";

        MongoDatabase database = client.getDatabase(schemaName);
        MongoCollection<Document> schemaCollection = database.getCollection("_schema");

        // Insert stale metadata for a table that does not have a backing collection
        Document staleTable = new Document("schemaName", schemaName)
                .append("tableName", "stale_table")
                .append("columns", ImmutableList.of(
                        new Document("name", "col1")
                                .append("type", "varchar")
                                .append("ordinalPosition", 0)
                                .append("hidden", false)));

        schemaCollection.insertOne(staleTable);

        // Trigger table listing which should clean up stale metadata
        assertTrue(session.getAllTables(schemaName).isEmpty());

        // Verify that all _schema documents for this schema were removed
        assertEquals(0L, schemaCollection.countDocuments(new Document("schemaName", schemaName)));
    }

    @Test
    public void testBuildQuery()

```

To compile successfully and align with the rest of the test class, you may need to ensure:

1. The following imports are present (or equivalent ones, depending on what is already imported):

- `import com.google.common.collect.ImmutableList;`
- `import com.mongodb.client.MongoCollection;`
- `import com.mongodb.client.MongoDatabase;`
- `import org.bson.Document;`
- `import static org.testng.Assert.assertEquals;`
- `import static org.testng.Assert.assertTrue;`

2. The `_schema` collection name and the document field names (`"schemaName"`, `"tableName"`, `"columns"`, etc.) should match the ones used in the existing tests and in `MongoSession`. If the existing tests use different field names or a different schema representation, adjust the inserted `Document` accordingly.

3. If `session.getAllTables(schemaName)` returns a `List` or another collection type rather than a `Set`, the `.isEmpty()` assertion will still work; if it returns something else (e.g., `ConnectorTableMetadata` list), adapt the assertion to the existing tests’ pattern (for example, checking `size()` instead of `isEmpty()`).
</issue_to_address>

### Comment 2
<location path="presto-mongodb/src/test/java/com/facebook/presto/mongodb/TestMongoSession.java" line_range="225-226" />
<code_context>
+        String schemaName = "test_stale_metadata";
+        MongoDatabase database = client.getDatabase(schemaName);
+
+        // Create a collection with data
+        MongoCollection<Document> collection = database.getCollection("active_table");
+        collection.insertOne(new Document("field1", "value1"));
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for collections that have no `_schema` metadata at all

Current tests cover collections with matching metadata, stale metadata, and renamed collections with stale metadata. Please also add a case where a collection exists in MongoDB but has no `_schema` document (e.g., `collection_only_table`) and assert that `getAllTables` still returns it. This will confirm that relying on `actualCollections` alone doesn’t break discovery of new collections without metadata.

Suggested implementation:

```java
    public void testGetAllTablesWithStaleMetadata()
    {
        String schemaName = "test_stale_metadata";
        MongoDatabase database = client.getDatabase(schemaName);

        // Create a collection with data and corresponding _schema metadata (active table)
        MongoCollection<Document> collection = database.getCollection("active_table");
        collection.insertOne(new Document("field1", "value1"));

        // Create a collection that has no _schema metadata at all (new collection-only table)
        MongoCollection<Document> collectionOnlyTable = database.getCollection("collection_only_table");
        collectionOnlyTable.insertOne(new Document("field1", "value1"));

```

To fully implement the suggestion, you should also:
1. Update the assertions in `testGetAllTablesWithStaleMetadata` (where it currently asserts on the result of `getAllTables`) to:
   - Expect that `"active_table"` is present.
   - Expect that `"collection_only_table"` is also present, even though it has no `_schema` document.
2. If the test currently checks the exact set of tables, add `"collection_only_table"` to that expected set; if it uses `assertTrue`-style membership checks, add an additional assertion for `"collection_only_table"`.
3. Ensure the test setup still includes any stale `_schema` metadata for other collections as originally intended, so this new case is additive rather than replacing the stale metadata scenario.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 70 to 72
@Test
public void testBuildQuery()
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add coverage for schema with only stale metadata and no live collections

The current test mixes live (active_table) and stale (stale_table) metadata. Please also add a test where the schema has only stale metadata entries and no collections (e.g., all collections dropped out of band), asserting that:

  • getAllTables(schemaName) returns an empty set, and
  • all _schema documents for that schema are removed.
    This will validate the cleanup logic when every metadata entry is stale.

Suggested implementation:

public class TestMongoSession
{
    private static final MongoColumnHandle COL2 = new MongoColumnHandle("col2", createUnboundedVarcharType(), false);
    private static final MongoColumnHandle COL3 = new MongoColumnHandle("col3", VARBINARY, false);

    private MongoServer server;
    private MongoClient client;
    private MongoSession session;
    private TypeManager typeManager;

    @Test
    public void testSchemaWithOnlyStaleMetadata()
    {
        String schemaName = "stale_only_schema";

        MongoDatabase database = client.getDatabase(schemaName);
        MongoCollection<Document> schemaCollection = database.getCollection("_schema");

        // Insert stale metadata for a table that does not have a backing collection
        Document staleTable = new Document("schemaName", schemaName)
                .append("tableName", "stale_table")
                .append("columns", ImmutableList.of(
                        new Document("name", "col1")
                                .append("type", "varchar")
                                .append("ordinalPosition", 0)
                                .append("hidden", false)));

        schemaCollection.insertOne(staleTable);

        // Trigger table listing which should clean up stale metadata
        assertTrue(session.getAllTables(schemaName).isEmpty());

        // Verify that all _schema documents for this schema were removed
        assertEquals(0L, schemaCollection.countDocuments(new Document("schemaName", schemaName)));
    }

    @Test
    public void testBuildQuery()

To compile successfully and align with the rest of the test class, you may need to ensure:

  1. The following imports are present (or equivalent ones, depending on what is already imported):
  • import com.google.common.collect.ImmutableList;
  • import com.mongodb.client.MongoCollection;
  • import com.mongodb.client.MongoDatabase;
  • import org.bson.Document;
  • import static org.testng.Assert.assertEquals;
  • import static org.testng.Assert.assertTrue;
  1. The _schema collection name and the document field names ("schemaName", "tableName", "columns", etc.) should match the ones used in the existing tests and in MongoSession. If the existing tests use different field names or a different schema representation, adjust the inserted Document accordingly.

  2. If session.getAllTables(schemaName) returns a List or another collection type rather than a Set, the .isEmpty() assertion will still work; if it returns something else (e.g., ConnectorTableMetadata list), adapt the assertion to the existing tests’ pattern (for example, checking size() instead of isEmpty()).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai dismiss

Comment on lines +225 to +226
// Create a collection with data
MongoCollection<Document> collection = database.getCollection("active_table");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test for collections that have no _schema metadata at all

Current tests cover collections with matching metadata, stale metadata, and renamed collections with stale metadata. Please also add a case where a collection exists in MongoDB but has no _schema document (e.g., collection_only_table) and assert that getAllTables still returns it. This will confirm that relying on actualCollections alone doesn’t break discovery of new collections without metadata.

Suggested implementation:

    public void testGetAllTablesWithStaleMetadata()
    {
        String schemaName = "test_stale_metadata";
        MongoDatabase database = client.getDatabase(schemaName);

        // Create a collection with data and corresponding _schema metadata (active table)
        MongoCollection<Document> collection = database.getCollection("active_table");
        collection.insertOne(new Document("field1", "value1"));

        // Create a collection that has no _schema metadata at all (new collection-only table)
        MongoCollection<Document> collectionOnlyTable = database.getCollection("collection_only_table");
        collectionOnlyTable.insertOne(new Document("field1", "value1"));

To fully implement the suggestion, you should also:

  1. Update the assertions in testGetAllTablesWithStaleMetadata (where it currently asserts on the result of getAllTables) to:
    • Expect that "active_table" is present.
    • Expect that "collection_only_table" is also present, even though it has no _schema document.
  2. If the test currently checks the exact set of tables, add "collection_only_table" to that expected set; if it uses assertTrue-style membership checks, add an additional assertion for "collection_only_table".
  3. Ensure the test setup still includes any stale _schema metadata for other collections as originally intended, so this new case is additive rather than replacing the stale metadata scenario.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai dismiss

@nishithakbhaskaran nishithakbhaskaran changed the title fix(connector): Return fresh table metadata when listing tables after collection rename in Mongodb [WIP DNR]fix(connector): Return fresh table metadata when listing tables after collection rename in Mongodb Mar 26, 2026
@steveburnett
Copy link
Copy Markdown
Contributor

@steveburnett Can you help me to fix the release note?

Happy to!

First, looking at the text, all entries must begin with a keyword in the Order of Changes in the Release Notes Guidelines. Would this rephrasing correctly describe your work?

== RELEASE NOTES ==

General Changes
* Fix listing tables after rename to return fresh table metadata.

Reading the test failure, it appears that the test is not stopping after the closing ``` .

2026-03-26T13:56:02.201Z	ERROR	main	com.facebook.presto.release.tasks.GenerateReleaseNotesTask	Bad release notes for PR #0: expect section header, found [Summary by Sourcery]

@yhwang , would you have time to look at this? This isn't the first time I've seen this recently, I encountered this on a PR of mine last week.

@steveburnett
Copy link
Copy Markdown
Contributor

(The failing CI check for the PR title should pass when you're ready for this PR to be review and you remove [WIP DNR] whenever you're ready to.)

@yhwang
Copy link
Copy Markdown
Member

yhwang commented Mar 26, 2026

Hi @steveburnett

Created a PR on the other repo to address the issue you found. Wait for the PR, and we probably need to update the dependency version in this repo to use the new code later.

@steveburnett
Copy link
Copy Markdown
Contributor

Hi @steveburnett

Created a PR on the other repo to address the issue you found. Wait for the PR, and we probably need to update the dependency version in this repo to use the new code later.

Thank you @yhwang!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants