Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new RFC for monorepo #31

Merged
merged 3 commits into from Apr 19, 2021
Merged

Add new RFC for monorepo #31

merged 3 commits into from Apr 19, 2021

Conversation

dorianj
Copy link
Contributor

@dorianj dorianj commented Mar 30, 2021

This is a continuation / down-scoping of #30. From that discussion, it seems that there is clear support for moving to a mono repo. We'd like to figure out the details of that change and get that rolling and defer the other things that were bundled in the prior RFC.

Signed-off-by: Dorian Johnson <[email protected]>
@dorianj dorianj requested a review from a team as a code owner March 30, 2021 22:53
@Golodhros
Copy link
Member

Thanks a lot!
This looks like the perfect step forward!

- amundsencommon
- amudnsendatabuilder

This RFC proposes merging all of these repositories into a single repo.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to add amundsengremlin to the list as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention for amundsengremlin to get merged into one of the proxies, or is the long-term intention to leave it as a separate package? I'm not super familiar with the code split there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a point that could have more attention in this rfc. We currently have two approaches for proxy development:

  1. 'old' proxies like neo4j, elasticsearch or atlas are kept directly in amundsenmetadata and amundsensearch respectively
  2. 'new' proxies like rds or gremlin are separate git repositories.

Maybe we should unify the approach ? afaik the idea with having proxies in different repositories made contributing easier - you could have a team that would be maintainers only for proxy repo, not core amundsen repos. @feng-tao would be more knowledgeable here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make more extensive use of codeowners files for each package, so we keep the ownership clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed some language to consider the incubating repos. Open to feedback!

My thought is the separate packages should only be used for incubating, unsupported packages. Once it it officially supported by the project, it should require CR and RFCs from approved maintainers. Otherwise, we will have officially-supported parts of the project that aren't necessarily governed by our normal processes (RFC and CR). If the outside contributor is still making meaningful contributions after the package is landed, we should consider suggesting them as a maintainer (in which case @Golodhros's suggestion of stronger codeowners would make sense -- it might make sense to have that new maintainer start as specialized to the specific area they were previously working on, and if they want to contribute outside of that, we add that on later via codeowner additions)

Copy link
Member

@feng-tao feng-tao Apr 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently neptune and RDS backend is introduced as part of RFC process. Though I think it is easier for maintain with different repos. But I wouldn't call them incubating repos. This is same as Atlas which the Lyft engineering team built the whole integration with neo4j while atlas is maintained by @bolkedebruin ING team. Both are officially supported backend in Amundsen for long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense.

Though I think it is easier for maintain with different package

Can you elaborate on what is made easier here? I'd like to capture the benefits of those packages without the overhead of actually having separate repos, if that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC here is to propose mono repo with different packages. For RDS, gremlin based, it seemed to be easier to maintain them separately given the people who developed those are not part of the core committer team if that makes sense. It will help to iterate those features faster without asking existing committer to signoff.

But once the repos/features are mature, we should bring them into same repo each of which to be a new package.


This RFC proposes merging all of these repositories into a single repo.

There will be no change to the built artifacts. Users who use PyPi packages or Docker images will have no change to their deployments. Users with forks or submodule based customization will have to make minor changes to their CI/CD pipelines to pull from the correct directories, but the required changes will be minor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smooth change management, nice !

However, with current structure, when someone is using the submodule approach can freely choose versions (even refs) for each product separately (I can use frontend in different version than metadata and search). This change somehow limits that capability. It's worth to explicitly mention it in drawbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good point, added

- amundsencommon
- amudnsendatabuilder

This RFC proposes merging all of these repositories into a single repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a point that could have more attention in this rfc. We currently have two approaches for proxy development:

  1. 'old' proxies like neo4j, elasticsearch or atlas are kept directly in amundsenmetadata and amundsensearch respectively
  2. 'new' proxies like rds or gremlin are separate git repositories.

Maybe we should unify the approach ? afaik the idea with having proxies in different repositories made contributing easier - you could have a team that would be maintainers only for proxy repo, not core amundsen repos. @feng-tao would be more knowledgeable here


The submodule architecture creates the following issues and is making it hard for companies to adopt Amundsen, and to customize it accordingly. It also causes substantial developer toil in the form of having to create multiple small PRs to merge one

1. Dependency Management: Python packages are pretty much the same for each proxy, which makes it hard to sync across all the above repositories. As a result most of the time, each proxy is running its own version of the dependency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach (or used alonside changes from this RFC to strengthen the dependency management even more) would be introducing the pip constraints files https://pip.pypa.io/en/stable/user_guide/#constraints-files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are new to me! That's super cool, added.


and many more... This change will put all the packages in just one place, making practically to keep dependencies similar accross verisons, simply because it will be possible to open a single PR that updates multiple packages. Practically speaking this will improve the status quo slightly.

1. Development Efforts: Having multiple repositories makes it really hard to implement a feature. Implementation and testing require efforts to synchronize and then code reviews, and finally, all the PRs across multiple repositories need to land in master at a certain time or at the same time. This change results in faster development, one PR to fix a bug or a feature, no dependency hell as we are facing today, hence attract more contributors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question - do we have any hard data (like results of questionnaires) to back the validity of those issues ? or is it more the result of interacting with users and observation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't -- we did the call for feedback in December or January from community members but only got a handful of detailed responses. This is from our interactions with many community members (both longer-time installs) as well as people new to the project. Both groups are impacted, but in different ways.

@dorianj dorianj added Status: Final Comment Period (FCP) On final comment period (seven days) and removed Status: Draft labels Apr 10, 2021
@feng-tao
Copy link
Member

as long as different package / artifacts are maintained within single monorepo, I am +1 for the change. But it would be good for @allisonsuarez , @dkunitsk , @jinhyukchang @youngyjd backend ICs from Lyft to take a look and see anything related to existing integrations.

@dorianj dorianj merged commit c1c7001 into amundsen-io:master Apr 19, 2021
@dorianj dorianj deleted the dj-monorepo branch April 19, 2021 21:26
allisonsuarez pushed a commit that referenced this pull request May 5, 2021
* Add new RFC for monorepo

Signed-off-by: Dorian Johnson <[email protected]>

* monorepo: add constraints files, acknowledge a drawback

Signed-off-by: Dorian Johnson <[email protected]>

* monorepo: speak to incubating repos

Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Allison Suarez Miranda <[email protected]>
@Golodhros Golodhros added Status: Landed The proposed changes are shipped in an actual release and removed Status: Final Comment Period (FCP) On final comment period (seven days) labels May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Landed The proposed changes are shipped in an actual release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants