feat(source-hubspot): Migrate deals stream to Low Code #59127

tolik0 · 2025-04-28T17:41:11Z

What

Migrate deals stream to low code.

Resolves: https://github.com/airbytehq/airbyte-internal-issues/issues/12485

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

YES 💚
NO ❌

vercel · 2025-04-28T17:41:16Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 2, 2025 4:17am

…ate-deals-stream-to-low-code

tolik0 · 2025-04-30T14:46:22Z

/format-fix

Format-fix job started... Check job output.

✅ Changes applied successfully. (d6a5dd5)

aldogonzalez8 · 2025-04-30T15:45:37Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/source.py

@@ -78,6 +78,7 @@
 scopes = {
    "email_subscriptions": {"content"},
    "marketing_emails": {"content"},
+    "deals": {"contacts", "crm.objects.deals.read"},


Now Deals class is 🪓, I think you can remove it from the import section.

tolik0 · 2025-04-30T16:10:17Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/components.py

+    parameters: Mapping[str, Any] = {}
+
+    access_token = config["credentials"]["access_token"]
+    authenticator = BearerAuthenticator(


I forgot to add SelectiveAuthenticator here, I will update the PR.

brianjlai

this is really really good work so far! I'm super hyped for seeing how such a complex stream can actually make use of so many low-code concepts we already have like selective streams, grouping, etc We have so many specialized features in the CDK its cool to see that we did end up needing them to migrate hubspot

airbyte-integrations/connectors/source-hubspot/source_hubspot/streams.py

brianjlai · 2025-04-30T22:00:45Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/manifest.yaml

+                type: DpathExtractor
+                field_path: []
+        request_body_json:
+          limit: 10


This should be 100 right? In the existing CRMSearchStream._process_search we hardcode this to 100

brianjlai · 2025-04-30T22:06:14Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/manifest.yaml

+        pagination_strategy:
+          type: CustomPaginationStrategy
+          class_name: source_hubspot.components.HubspotCRMSearchPaginationStrategy
+          page_size: 10


Similar question, why is page_size 10?

Forgot to change back after testing

brianjlai · 2025-04-30T22:18:22Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/components.py

@@ -67,3 +100,183 @@ def migrate(self, stream_state: Mapping[str, Any]) -> Mapping[str, Any]:

    def should_migrate(self, stream_state: Mapping[str, Any]) -> bool:
        return stream_state.get(self.cursor_field) == ""
+
+
+class HubspotAssociationsTransformation(RecordTransformation):


nit: Can you add a comment about why we need the custom component. just to mention that we flatten the associations stored in the record and why DpathFlatten fields isn't enough

Also, can rename this HubspotFlattenAssociationsTransformation, just for readability from the manifest and specifying what the purpose of the transformation is

brianjlai · 2025-04-30T22:25:10Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/components.py

+            yield from records_by_pk.values()
+
+
+def build_associations_retriever(


Instead of writing a method build_associations_retriever, would it be possible to instead have this all defined as an associations_retriever in the manifest.yaml? And then in the above HubspotAssociationsExtractor, we also allow for it to take in another field: associations_retriever: SimpleRetriever.

I'm not strictly opposed to how you have it, but it might be nice to lean more on defining things in manifest when we can instead of this custom flow to invoke the constructors in our custom code. wdyt?

I'm also not a fan of low-code in Python) However, how would we safely inject the body with the IDs retrieved by the extractor?

From extract_records() can we take the identifier and insert them into _slice under extra_fields or wherever and then from the HttpRequester:

request_options_provider=InterpolatedRequestOptionsProvider( request_body_json={ "inputs": "{{ [{"id": id} for id in stream_partition.extra_fields['identifiers'] ] }}", }, config=config, parameters=parameters, )

Something like that? I think it would be nice to avoid having to instantiate a new retriever/requester we call extract_records() which could be frequent based on the number of pages we read

brianjlai · 2025-04-30T22:25:54Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/components.py

+            slices = assoc_retriever.stream_slices()
+
+            for _slice in slices:
+                logger.info(f"Reading {_slice} associations of {self.entity_primary_key}")


i know this was previously in our hubspot python code, but we probably don't need this clogging up our logs

Changed to debug log

tolik0 · 2025-05-01T15:02:21Z

/format-fix

Format-fix job started... Check job output.

✅ Changes applied successfully. (43d594b)

brianjlai

I noticed that we might not be generating an accurate catalog for our various Hubspot streams related to entity. We probably need to solve this before we move forward

Basically for streams like deals, companies, contacts, etc, we actually don't rely on a static schema but rather on a dynamic schema based on the customer-specific properties for their unique implementation. A schema will contain a map of properties fields like hs_is_in_first_deal_stage and it's flattened properties_hs_is_in_first_deal_stage. And to make things a little more complicated, we also need:

Some of the schema fields will be static
We dynamically get the properties fields from the properties endpoint
We remap the Hubspot types back to types the Airbyte protocol understands.

I think this had originally gone unnoticed because our deals stream already had quite a few of these extra properties key/values, but after inspecting them the schemas between the low-code migration here and latest master do not match up.

So I don't think we can't use the InlineSchemaLoader like we have been for our non-entity streams. I'm gonna try spiking out if the DynamicSchemaLoader has all the features we need to generate

brianjlai · 2025-05-01T21:33:37Z

airbyte-integrations/connectors/source-hubspot/source_hubspot/components.py

+            yield from records_by_pk.values()
+
+
+def build_associations_retriever(


From extract_records() can we take the identifier and insert them into _slice under extra_fields or wherever and then from the HttpRequester:

request_options_provider=InterpolatedRequestOptionsProvider( request_body_json={ "inputs": "{{ [{"id": id} for id in stream_partition.extra_fields['identifiers'] ] }}", }, config=config, parameters=parameters, )

Something like that? I think it would be nice to avoid having to instantiate a new retriever/requester we call extract_records() which could be frequent based on the number of pages we read

…s low-code stream

Migrate deals stream without associations in incremental part

80ed797

octavia-squidington-iii added the connectors/source/hubspot label Apr 28, 2025

vercel bot deployed to Preview April 28, 2025 17:46 View deployment

Add associations transformation

c5fee08

tolik0 self-assigned this Apr 29, 2025

vercel bot deployed to Preview April 29, 2025 16:36 View deployment

tolik0 added 2 commits April 30, 2025 16:49

Merge remote-tracking branch 'origin' into tolik0/source-hubspot/migr…

f9e3b63

…ate-deals-stream-to-low-code

Add flattening associations transformation

c746194

chore: auto-fix lint and format issues

d6a5dd5

vercel bot deployed to Preview April 30, 2025 14:59 View deployment

Delete python implementation

e447fc2

tolik0 marked this pull request as ready for review April 30, 2025 15:26

tolik0 requested a review from a team as a code owner April 30, 2025 15:26

tolik0 requested review from darynaishchenko and brianjlai April 30, 2025 15:26

vercel bot deployed to Preview April 30, 2025 15:33 View deployment

aldogonzalez8 reviewed Apr 30, 2025

View reviewed changes

Update changelog

3e729d1

vercel bot deployed to Preview April 30, 2025 15:59 View deployment

tolik0 commented Apr 30, 2025

View reviewed changes

brianjlai reviewed Apr 30, 2025

View reviewed changes

Fix issues from review

be1deee

chore: auto-fix lint and format issues

43d594b

vercel bot deployed to Preview May 1, 2025 15:16 View deployment

brianjlai reviewed May 1, 2025

View reviewed changes

add dynamic deals_schema_loader to support dynamic properties in deal…

2a428c5

…s low-code stream

vercel bot deployed to Preview May 2, 2025 04:17 View deployment

brianjlai mentioned this pull request May 2, 2025

feat(source-hubspot): Migrate deals_archived, forms, form_submissions, owners, owners_archived to low-code #58105

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(source-hubspot): Migrate deals stream to Low Code #59127

feat(source-hubspot): Migrate deals stream to Low Code #59127

tolik0 commented Apr 28, 2025 •

edited

Loading

vercel bot commented Apr 28, 2025 •

edited

Loading

tolik0 commented Apr 30, 2025 •

edited by github-actions bot

Loading

aldogonzalez8 Apr 30, 2025 •

edited

Loading

tolik0 Apr 30, 2025

tolik0 Apr 30, 2025

brianjlai left a comment

brianjlai Apr 30, 2025

tolik0 May 1, 2025

brianjlai Apr 30, 2025

tolik0 May 1, 2025

tolik0 May 1, 2025

brianjlai Apr 30, 2025

tolik0 May 1, 2025

brianjlai Apr 30, 2025

tolik0 May 1, 2025

brianjlai May 1, 2025

brianjlai Apr 30, 2025

tolik0 May 1, 2025

tolik0 commented May 1, 2025 •

edited by github-actions bot

Loading

brianjlai left a comment

brianjlai May 1, 2025

		yield from records_by_pk.values()


		def build_associations_retriever(

feat(source-hubspot): Migrate deals stream to Low Code #59127

Are you sure you want to change the base?

feat(source-hubspot): Migrate deals stream to Low Code #59127

Conversation

tolik0 commented Apr 28, 2025 • edited Loading

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented Apr 28, 2025 • edited Loading

tolik0 commented Apr 30, 2025 • edited by github-actions bot Loading

aldogonzalez8 Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianjlai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tolik0 commented May 1, 2025 • edited by github-actions bot Loading

brianjlai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tolik0 commented Apr 28, 2025 •

edited

Loading

vercel bot commented Apr 28, 2025 •

edited

Loading

tolik0 commented Apr 30, 2025 •

edited by github-actions bot

Loading

aldogonzalez8 Apr 30, 2025 •

edited

Loading

tolik0 commented May 1, 2025 •

edited by github-actions bot

Loading