Skip to content

Conversation

@Apmats
Copy link
Contributor

@Apmats Apmats commented Oct 8, 2025

Closes https://github.com/elastic/search-team/issues/11093

This PR includes a new data source / connector for Gitlab - mainly fetching project management related entities.
We fetch and generate docs for:

Projects - Repository metadata (name, description, visibility, stars, forks, etc.)
Issues - Project issues with full metadata (assignees, labels, comments, status)
Epics - Group-level epics for tracking larger initiatives (Premium/Ultimate tier)
Merge Requests - Code review requests with discussions, approvals, and reviewers
Releases - Version releases with changelogs, milestones, and asset links
Files - README files (.md, .rst, .txt) from project repositories

Currently this is yielding docs like this:

Epic:
--------------------------------------------------------------------------------
{
  "_id": "epic_testapmats_1",
  "_timestamp": "2025-10-08T13:01:19Z",
  "type": "Epic",
  "group_id": "testapmats",
  "group_path": "testapmats",
  "iid": 1,
  "title": "Nulla tempore voluptatibus error.",
  "description": "### Voluptate\nQui quaerat praesentium. Voluptates temporibus quae. Libero aliquid quod. Nihil rerum earum. Inventore et illum.\n`Quod.`",
  "state": "OPEN",
  "created_at": "2025-10-08T05:25:54Z",
  "updated_at": "2025-10-08T13:01:19Z",
  "closed_at": null,
  "web_url": "https://gitlab.com/groups/testapmats/-/epics/1",
  "author": "apmats",
  "author_name": "Apostolos Matsagkas",
  "assignees": [
    "apmats"
  ],
  "labels": [],
  "children_count": 1,
  "children": [
    {
      "id": "gid://gitlab/WorkItem/174153667",
      "iid": 10,
      "title": "Id tempore minus molestiae."
    }
  ],
  "linked_items_count": 1,
  "linked_items": [
    {
      "link_id": "gid://gitlab/WorkItems::RelatedWorkItemLink/7148349",
      "link_type": "relates_to",
      "link_created_at": "2025-10-08T13:00:55Z",
      "link_updated_at": "2025-10-08T13:00:55Z",
      "work_item_id": "gid://gitlab/WorkItem/174154216",
      "work_item_iid": 28,
      "work_item_title": "Ut quae nesciunt facere."
    }
  ],
  "notes": [
    {
      "id": "gid://gitlab/DiscussionNote/2805807667",
      "body": "### Voluptatem\nQuae sint porro. Error id ut. Molestiae nisi iusto. Mollitia iste saepe. Dolor delectus soluta.\n* Voluptatem. \n* Unde. \n* Deserunt. \n* Sunt. \n* Vel.\n\n *By Administrator on 2021-02-15T15:56:33*",
      "created_at": "2021-02-15T15:56:33Z",
      "updated_at": "2021-02-15T15:56:33Z",
      "system": false,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    },
    {
      "id": "gid://gitlab/Note/2805807678",
      "body": "Hey",
      "created_at": "2025-10-06T08:19:32Z",
      "updated_at": "2025-10-06T08:19:32Z",
      "system": false,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    },
    {
      "id": "gid://gitlab/Note/2805807679",
      "body": "assigned to @apmats",
      "created_at": "2025-10-07T22:35:13Z",
      "updated_at": "2025-10-07T22:35:13Z",
      "system": true,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    }
  ]
}


File:
--------------------------------------------------------------------------------
{
  "_id": "file_74928810_README.md",
  "_timestamp": "2025-10-08T05:25:55Z",
  "type": "File",
  "project_id": 74928810,
  "project_path": "testapmats/connectortestproj",
  "file_path": "README.md",
  "file_name": "README.md",
  "extension": ".md",
  "web_url": "https://gitlab.com/testapmats/connectortestproj/-/blob/master/README.md"
}


Issue:
--------------------------------------------------------------------------------
{
  "_id": "issue_74928810_30",
  "_timestamp": "2025-10-08T05:25:55Z",
  "type": "Issue",
  "project_id": 74928810,
  "project_path": "testapmats/connectortestproj",
  "iid": 30,
  "title": "Nulla tempore voluptatibus error.",
  "description": "### Voluptate\nQui quaerat praesentium. Voluptates temporibus quae. Libero aliquid quod. Nihil rerum earum. Inventore et illum.\n`Quod.`",
  "state": "CLOSED",
  "created_at": "2021-02-15T15:56:04Z",
  "updated_at": "2025-10-08T05:25:55Z",
  "closed_at": "2025-10-08T05:25:55Z",
  "web_url": "https://gitlab.com/testapmats/connectortestproj/-/issues/30",
  "author": "apmats",
  "author_name": "Apostolos Matsagkas",
  "assignees": [
    "apmats"
  ],
  "labels": [
    "et-facere",
    "nisi-et",
    "suscipit-consectetur"
  ],
  "children_count": 0,
  "children": [],
  "linked_items_count": 0,
  "linked_items": [],
  "notes": [
    {
      "id": "gid://gitlab/LabelNote/0dad54c6400422b1e72e91ef6890651b1e2c74af",
      "body": "added ~42915054 ~42915032 ~42915100 labels",
      "created_at": "2021-02-15T15:56:04Z",
      "updated_at": "2021-02-15T15:56:04Z",
      "system": true,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    },
    {
      "id": "gid://gitlab/DiscussionNote/2788636988",
      "body": "### Voluptatem\nQuae sint porro. Error id ut. Molestiae nisi iusto. Mollitia iste saepe. Dolor delectus soluta.\n* Voluptatem. \n* Unde. \n* Deserunt. \n* Sunt. \n* Vel.\n\n *By Administrator on 2021-02-15T15:56:33*",
      "created_at": "2021-02-15T15:56:33Z",
      "updated_at": "2021-02-15T15:56:33Z",
      "system": false,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    },
    {
      "id": "gid://gitlab/Note/2800180491",
      "body": "Hey",
      "created_at": "2025-10-06T08:19:32Z",
      "updated_at": "2025-10-06T08:19:32Z",
      "system": false,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    }
  ]
}


Merge Request:
--------------------------------------------------------------------------------
{
  "_id": "mr_74928810_30",
  "_timestamp": "2025-10-06T13:53:21Z",
  "type": "Merge Request",
  "project_id": 74928810,
  "project_path": "testapmats/connectortestproj",
  "iid": 30,
  "title": "Id blanditiis consequatur ut.",
  "description": "##### Voluptatum\nPorro et quo. Laborum molestias ducimus. Labore dolorum adipisci. Quisquam est quis. Sint accusamus maxime.\n* Veritatis. \n* Eos. \n* Adipisci. \n* Quibusdam. \n* Sint. \n* Consequuntur. \n* Hic. \n* Voluptate. \n* Velit.",
  "state": "opened",
  "created_at": "2021-02-15T15:55:38Z",
  "updated_at": "2025-10-06T13:53:21Z",
  "merged_at": null,
  "closed_at": null,
  "web_url": "https://gitlab.com/testapmats/connectortestproj/-/merge_requests/30",
  "source_branch": "laudantium-unde-et-iste-et",
  "target_branch": "master",
  "author": "apmats",
  "author_name": "Apostolos Matsagkas",
  "assignees": [],
  "reviewers": [],
  "approved_by": [],
  "merged_by": null,
  "labels": [
    "asperiores-ex",
    "quidem-labore",
    "sed-consequuntur"
  ],
  "notes": [
    {
      "id": "gid://gitlab/Note/2788661393",
      "body": "###### Quos\nVero ipsam consequatur. Eum provident consequatur. Saepe aut mollitia. Sit possimus nihil. Aspernatur sed id.\n0. Aspernatur. \n1. Perspiciatis. \n2. Dolorem. \n3. Ut. \n4. Impedit. \n5. Esse. \n6. Ratione. \n7. Et. \n8. Repellendus. \n9. Laudantium.\n\n *By Administrator on 2021-02-15T15:55:59*",
      "created_at": "2021-02-15T15:55:59Z",
      "updated_at": "2021-02-15T15:55:59Z",
      "system": false,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas"
    },
    {
      "id": "gid://gitlab/DiffNote/2801394791",
      "body": "Test test test it's me!",
      "created_at": "2025-10-06T13:53:21Z",
      "updated_at": "2025-10-06T13:53:21Z",
      "system": false,
      "author": "apmats",
      "author_name": "Apostolos Matsagkas",
      "position": "new_line=2 old_line=None new_path='README.md' old_path='README.md' position_type='text'"
    }
  ]
}


Project:
--------------------------------------------------------------------------------
{
  "_id": "project_74928810",
  "_timestamp": "2025-10-08T05:25:55Z",
  "type": "Project",
  "id": 74928810,
  "name": "Connectortestproj",
  "path": "connectortestproj",
  "full_path": "testapmats/connectortestproj",
  "description": "",
  "visibility": "private",
  "star_count": 0,
  "forks_count": 0,
  "created_at": "2025-09-30T13:38:51Z",
  "last_activity_at": "2025-10-08T05:25:55Z",
  "archived": false,
  "default_branch": "master",
  "web_url": "https://gitlab.com/testapmats/connectortestproj"
}


Release:
--------------------------------------------------------------------------------
{
  "_id": "release_74928810_test-release-tag",
  "_timestamp": "2025-10-07T09:12:14Z",
  "type": "Release",
  "project_id": 74928810,
  "project_path": "testapmats/connectortestproj",
  "tag_name": "test-release-tag",
  "name": "Testrelease1",
  "description": "Test release notes",
  "created_at": "2025-10-07T09:12:14Z",
  "released_at": "2025-10-07T09:12:14Z",
  "author": "apmats",
  "author_name": "Apostolos Matsagkas",
  "milestones": [],
  "asset_count": 4,
  "commit_sha": "27329d3afac51fbf2762428e12f2635d1137c549",
  "commit_title": "Update README.md"
}

Still remaining

  • Kibana work (done but waiting for OK on this piece of work to set up a PR)
  • Test account to be captured in connectors-infra (subscription required to have access to premium features like Epics)

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • For bugfixes: backport safely to all minor branches still receiving patch releases
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Changes Requiring Extra Attention

  • Security-related changes (encryption, TLS, SSRF, etc)
  • New external service dependencies added.

Related Pull Requests

Release Note

@@ -0,0 +1,193 @@
#!/usr/bin/env python3
"""Quick test script for GitLab connector with real credentials."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be deleted, just providing it for anyone doing an early review if they want to test how output looks, if pagination works etc.

@Apmats Apmats force-pushed the am/gitlab-connector branch 2 times, most recently from f71c74a to 62b5f30 Compare October 8, 2025 18:22
aiogoogle==5.3.0
uvloop==0.20.0; sys_platform != 'win32'
fastjsonschema==2.16.2
pydantic==2.10.6
Copy link
Contributor Author

@Apmats Apmats Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pydantic to parse responses from APIs - I think this dependency is worth picking up. Makes the code much more readable and we will get descriptive errors when things fail if the API changes, and the failure will be when we parse the response, not down the line in the code when we try to access an unstructured dictionary.

@Apmats Apmats force-pushed the am/gitlab-connector branch 10 times, most recently from de63dbd to 0805ea2 Compare October 13, 2025 13:02
- Implements GitLab connector using GraphQL API and Work Items API
- Fetches projects, issues, merge requests, epics, and releases
- Includes pagination support and remote validation
- Adds functional tests and unit tests with >92% coverage
@Apmats Apmats force-pushed the am/gitlab-connector branch from 4112983 to 5962dd0 Compare October 13, 2025 13:33
@Apmats Apmats marked this pull request as ready for review October 13, 2025 14:00
@Apmats Apmats requested a review from a team as a code owner October 13, 2025 14:00
@Apmats Apmats changed the title Gitlab connector - initial draft Gitlab data source connector Oct 13, 2025
@Apmats Apmats force-pushed the am/gitlab-connector branch from 76152f5 to 322b99e Compare October 13, 2025 16:52
@Apmats Apmats force-pushed the am/gitlab-connector branch from 263b24a to 724f8b8 Compare October 14, 2025 11:13
@Apmats
Copy link
Contributor Author

Apmats commented Oct 24, 2025

Looks great! I left some comments/questions to understand better the system.
The PR is huuuuge, so reviewing takes time. Might be worth to do a sync review together, or maybe even make a demo of how you set it up, run a sync and what's synced, so that it would be easier to grasp it?

@artem-shelkovnikov you're right, this got huge because of the queries and the tests and also me opting to add what's effectively structured containers for data coming in from the graphQL and going out of the data source/connector module - the core logic isn't too big I think, but the problem is that reviewing that logic requires you to understand as a reviewer the Gitlab data model.

I tested this by creating a large project (Gitlab conveniently provides templates that you can use to build test projects).
During dev I built a test script that basically runs get_docs and I can verify that I'm getting what I should be getting through the API.
I also smoke tested after the end of dev end to end by setting up the connector in Kibana and observing the indexed docs.

I think what you're suggesting makes the most sense - we can do a review session synchronously. I'll finish up with the last changes post review and some cleanup and follow up with you.

@Apmats Apmats force-pushed the am/gitlab-connector branch from cb53716 to cdedb5a Compare October 24, 2025 09:51
@Apmats Apmats force-pushed the am/gitlab-connector branch from c2a703f to 361dfdd Compare October 24, 2025 11:28
@artem-shelkovnikov
Copy link
Member

Also to just verify - in the description of the PR the file object doesn't have content, is the content of README extracted somewhere?

@Apmats Apmats force-pushed the am/gitlab-connector branch from c24c401 to 80b00fd Compare October 27, 2025 08:51
@Apmats
Copy link
Contributor Author

Apmats commented Oct 27, 2025

Also to just verify - in the description of the PR the file object doesn't have content, is the content of README extracted somewhere?

@artem-shelkovnikov yeah that's on me. I posted what I got by printing out docs as I'm running get_docs. In our connectors framework, file contents like this readme are structured to be fetched and extracted as a separate lazy download operation. The get_docs method yields tuples of (document, download_func) where the download function is called later by the framework to fetch the actual content.

@Apmats Apmats force-pushed the am/gitlab-connector branch from daaf344 to 1cfe01d Compare October 31, 2025 15:31
@Apmats Apmats force-pushed the am/gitlab-connector branch from f4bed7a to 93ba23d Compare November 4, 2025 15:06
@Apmats Apmats force-pushed the am/gitlab-connector branch from 2ec1ef2 to 3a066de Compare November 4, 2025 17:22
- Change connectors.source to connectors_sdk.source in client, datasource, and tests
- Simplify validation_utils.py module docstring to 3 concise lines
@Apmats Apmats force-pushed the am/gitlab-connector branch from cd6deb5 to 06c9888 Compare November 5, 2025 08:51
@Apmats Apmats force-pushed the am/gitlab-connector branch from 09629df to 6c65e09 Compare November 5, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants