Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Postgres so that similaritySearchVectorWithScore actually connects via TLS/SSL when specified #1677

Merged
merged 5 commits into from
Feb 10, 2024

Conversation

dkindlund
Copy link
Contributor

Hey @HenryHengZJ, as a follow up to this comment:
#1107 (comment)

The core bug currently is:

  • If you configure a Postgres DB to ONLY accept remote TLS/SSL connections
  • You specify similar additionalConfig settings (see below)
  • Then the node will successfully upsert documents into Postgres via TLS/SSL
  • BUT, when you attempt to use the node's similaritySearchVectorWithScore function for comparisons, those connections will fail because the node attempts to connect to the Postgres DB in the clear (not using TLS/SSL) -- even though the settings were specified in additionalConfig

Here's a PR that covers the bugfix for getting the Postgres vectorstore node to actually talk to the DB via TLS/SSL all the time. Previously, if you specified the additionalConfig of:

{  
  "ssl": true,  
  "extra": {  
    "ssl": {  
      "rejectUnauthorized": false,  
      "ca": "<your_self_signed_cert_name>"  
    }  
  }  
}  

While those settings were getting correctly used by the postgresConnectionOptions for record upserts, NONE of the TLS/SSL settings were actually getting propagated down to the poolOptions parameter in the similaritySearchVectorWithScore function.

This PR fixes that mismatch. I've been testing this for a couple of days now and it seems to work for me. Let me know if you have any additional questions.

…archVectorWithScore function actually connects to Postgres via TLS/SSL when specified in the additionalConfig.
@@ -253,7 +253,8 @@ const similaritySearchVectorWithScore = async (
port: postgresConnectionOptions.port,
user: postgresConnectionOptions.username,
password: postgresConnectionOptions.password,
database: postgresConnectionOptions.database
database: postgresConnectionOptions.database,
ssl: postgresConnectionOptions.extra?.ssl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the ... additionalConfig? Instead of just ssl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryHengZJ , so additionalConfiguration isn't accessible within the similaritySearchVectorWithScore function, because it was never passed nodeData.

I understand that you're trying to make poolOptions appear similar in function to postgresConnectionOptions, but it is hard to recreate that because the function doesn't have all the inputs it needs to access additionalConfiguration.

I'm open to an alternative solution here, but could you give me a clearer idea of what you're proposing instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative would be to structure the poolOptions declaration like this, instead:

const poolOptions = {  
    ...postgresConnectionOptions,  
    host: postgresConnectionOptions.host,  
    port: postgresConnectionOptions.port,  
    user: postgresConnectionOptions.username,  
    password: postgresConnectionOptions.password,  
    database: postgresConnectionOptions.database,  
    ssl: postgresConnectionOptions.extra?.ssl || postgresConnectionOptions.ssl,  
}  

^ @HenryHengZJ , does this work for you and satisfy your original intent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm im not sure if this works, but can we just get rid of the poolOptions and just do const pool = new Pool(postgresConnectionOptions) ? because postgresConnectionOptions already contains all the configs, so we don't have to redeclare the options again with poolOptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryHengZJ , sure, but there's one catch with that approach: pg Pool expects all the ssl settings to be nested within the ssl key and not under extra.ssl. Here's some example code that illustrates how to do this:

let postgresConnectionOptions = {  
    host: 'localhost',  
    port: 5432,  
    user: 'youruser',  
    password: 'yourpassword',  
    database: 'yourdatabase',  
    ssl: true,  
    extra: {  
        ssl: {  
            rejectUnauthorized: false,  
            ca: "<your_self_signed_cert_name>"  
        }  
    }  
}  
  
if (postgresConnectionOptions.extra?.ssl) {  
    postgresConnectionOptions.ssl = {  
        ...postgresConnectionOptions.ssl,  
        ...postgresConnectionOptions.extra.ssl  
    };  
    delete postgresConnectionOptions.extra;  
}  
  
const pool = new Pool(postgresConnectionOptions);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryHengZJ , that works. To be crystal clear, would the new logic expect additionalConfig to look like:

Keeping the extra field (where we still have to deal with cleaning this up before passing it through to pg):

{  
  "ssl": true,  
  "extra": {  
    "ssl": {  
      "rejectUnauthorized": false,  
      "ca": "<your_self_signed_cert_name>"  
    }  
  }  
}  

OR would it look like this, in not keeping the extra field (and we pass it straight through to pg):

{  
  "ssl": true  
} 

{  
  "ssl": {  
    "rejectUnauthorized": false,  
    "ca": "<your_self_signed_cert_name>"  
  }  
}   

What are you thinking here, @HenryHengZJ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be up to user responsibility to add the extra field in the additionalConfig when needed. The SSL thing is tricky, some works with extra, some doesnt, some with boolean, some with cert. So the best way is to let user specify on their own and we just parsed the JSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryHengZJ , okay, sounds good. Shall I proceed with the refactor on this PR, then?

All of these changes will be Postgres-specific and not apply to any other vector stores, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryHengZJ , okay, I'm testing my fix and will update the PR once all tests have passed.

Keep in mind, this change will break any existing Flowise deployments that previously had ssl.extra.ssl settings defined in their additionalConfig. The workaround is to collapse their settings from ssl.extra.ssl to just ssl, accordingly.

You may want to put that guidance in the upcoming release notes, once this PR has been merged. FYI.

… consistency in how every function connects with Postgres in the same manner.
…Pool expects it to be 'user'. Need to populate both to avoid error messages.
…atflows that used TLS/SSL. This will help users remind them to upgrade their corresponding nodes.
@dkindlund
Copy link
Contributor Author

Okay, @HenryHengZJ , the fixes appear to pass all my local tests -- both upserts and vector search is working correctly with TLS/SSL where I now have the following additionalConfig specified:

{  
  "ssl": {  
    "rejectUnauthorized": false,  
    "ca": "<your_self_signed_cert_name>"  
  }  
}   

I did bump the version of this node, because it will help remind users that they need to upgrade their node -- especially if they had TLS/SSL enabled.

Again, I'd recommend adding something like the following to the upcoming release nodes that will contain this PR:

Postgres Vector Store Changes (v3.0):

  • Removed the SSL Connection toggle.
  • Instead, if you want TLS/SSL enabled, you will need to explicitly add a directive like { ssl: true } to your Additional Configuration parameter, instead.
  • Previously, vector search via TLS/SSL was broken. Now, this node supports upserts and vector search via TLS/SSL to Postgres where the DB has self-signed certificates.

Let me know if you have any other comments or questions about these fixes.

@HenryHengZJ
Copy link
Contributor

yep we definitely need to document it down to the docs page. thanks @dkindlund !

@HenryHengZJ HenryHengZJ merged commit 55dad31 into FlowiseAI:main Feb 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants