Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: consolidating issue tracking from 50+ repos to top-level repo #676

Closed
raulk opened this issue Jul 12, 2019 · 23 comments
Closed

Proposal: consolidating issue tracking from 50+ repos to top-level repo #676

raulk opened this issue Jul 12, 2019 · 23 comments

Comments

@raulk
Copy link
Member

raulk commented Jul 12, 2019

Context and problem statement

  1. The libp2p core team is looking to improve our workflows and tooling for work structuring, planning and project management. 🙌

  2. The result will be increased transparency, accountability, clarity, diligence, towards (and within) the libp2p community and ecosystem. 🔎

  3. We're evaluating Zenhub. It's an overlay on top of Github, is OSS-friendly, and allows everyone to view the pipelines/workspaces publicly.

  4. Whichever tool we use, in needs to be snappy, otherwise we’ll all get frustrated quickly, adherence will plummet, and we’ll abandon it. 🏃💨

  5. Unfortunately, go-libp2p issue tracking is scattered across 50+ inner repos, which makes tooling extremely slow, and in some cases outright impracticable if they impose hard limits (like GitHub Projects – max. 25).

  6. In general, this creates friction and makes the project hard to approach and navigate. 🌊

    • To file an issue, you need to know in which repo to file it first. This harms UX.
    • In some cases, people file issues in go-libp2p, in other cases in the right repo. This creates ambiguity. Now an issue can be in two different places.
    • Searching is difficult; people frequently have to resort to Google (including me).
  7. Eventually, we'd also like to track work for cross-implementation epics (e.g. multistream 2.0, NAT hole punching, etc.) in a single workspace easily. This will enable the core team to drive alignment across Go, JS, Rust, Python, jvm, cpp, etc. implementations more efficaciously. 🚢

What we're going to do

Consolidation: We're planning to run an experiment to consolidate all issue tracking for go-libp2p-* inner repos under the top-level project (go-libp2p).

Labelling: We'll adopt a well-organised and clear labelling taxonomy to categorise issues. Inspiration: Kubernetes, Rust.

  • We'll label issues by area (dht, pubsub, core, relay, etc.), kind (bug, feature, etc.), priority, etc.
  • Eventually we aim to normalise these labels across implementations (go, js, rust, py, cpp, jvm, etc.), to drive further alignment.

Migration: The “transfer issue” feature of GitHub has now graduated from beta, so migrating issues from inner repos to their top-level counterparts should be straightforward. Alas, it's not available via the API for a batch migration, so we'll have to do this manually ⛏

Sunsetting the child issue trackers: We can then (a) disable the issue tracker in inner repos, or (b) keep it open with a pinned issue serving as a NOTICE forwarding to the appropriate top-level repo. I prefer (b) because it provides better navigation and less surprise. If we find users keep disregarding the NOTICE and opening unwanted issues, we can automate the transfer to the appropriate top-level repo via a bot.

Email notifications: The only concrete "regression" that has been pointed out to me is that some people only subscribe GitHub notifications for specific repos.

  • Now they'll have to subscribe to the unified go-libp2p notification stream, which will create additional inbox traffic for them.
  • This gives people more opportunities to contribute (as they'll be exposed to issues across the board), but OTOH it may also deter other people from subscribing.
  • A solution is to use Octobox to tame their notifications and filter by label.
  • On the flip side, this might actually solve a problem. What if you want to subscribe to all go-libp2p- notifications?* You'd have to bounce around 50+ repos hitting "watch". With this new approach, you subscribe to a single repo.
  • Honestly, I believe we stand to gain much more in terms of the benefits outlined at the top, than what we may lose. IMO, the impact of this regression is low.

Closing issues via keywords cross-repository: supported. See https://help.github.com/en/articles/closing-issues-using-keywords#closing-an-issue-in-a-different-repository.


By a show of emoji, please signal what you think about revamping our issue management in the manner outlined here.

If you’re opposed on any level, please refer to arguments rooted in evidence, facts and projections, and offer an alternative solution. Mere preference signalling is counterproductive here. Solving the issue/task overview and unification stands in the critical path of bringing more order, structure and clarity to libp2p project/product management (both technical and non-technical).

@marten-seemann
Copy link
Contributor

Before commenting on the issue of consolidating the issues, which I will call the issue-mono-repo for this discussion, I'd like to ask the question that is underpinning this whole discussion.

Why do we keep code in separate repositories?

We do that because we believe that each of those repositories is a separate entity of code, that, while it can be used in conjunction with other libp2p packages, can also be used on its own (or with a small subset of other packages).

I'm aware that the question of separate repos vs. mono-repos is a discussion that is probably almost as old as software engineering itself, and has been discussed a bunch of times inside of Protocol Labs. People tend to have very strong opinions in either direction. The main reason for a mono-repo seems to be that it helps developer (and user) usability to have all the code in one place. The main argument for separate repos is that it makes sense to split software into small, independent sub-parts, that can be used, tested and improved upon indecently of the rest of the code base. I can see the arguments in both direction, and for the time being, I'm feeling agnostic towards this question. I'm happy to go with whatever we decide is best for libp2p.

Setting aside this controversial discussion, I believe that issues should live where the code lives:

  • If we believe that our code is so tightly coupled that we're served best by keeping all the code in a single repository, this is where the issues belong. A mono-repo naturally is an issue-mono-repo.
  • If on the other hand we believe that keeping our code modular is the right way to go, issues belong with their respective module. It would be extremely counter-intuitive for a user who's just using one or two of the libp2p-modules to go to go-libp2p and report the issue there.

The counter-argument to this of course is that there are users who're using libp2p as a whole, and that we can't expect them to dig through our code base to find the correct repository to open issue in. I have a lot of sympathy for this argument, but don't think this requires us to consolidate all issues under go-libp2p. Instead, we need to create and communicate a workflow that encourages users to report the issues.
One way that comes to mind is telling users that it's ok to report issues at go-libp2p if they don't know the root cause of the problem. We can then use Github's new "transfer issue" feature to transfer this issue to the correct place (note that this is no more overhead than correctly categorizing and tagging the issue, which would be necessary in the issue-mono-repo proposal).

Thoughts on the Notification Problem

Aside from these conceptual considerations, subscribing and unsubscribing to individual repositories is one of the main features for me. For the most part, I'm currently only interested in the notifications of go-libp2p-quic-transport and go-libp2p-tls, and am loosely following what's going on in some other repositories. I unsubscribe from notifications from other repos as soon as they create too much noise in my inbox and distract me from focusing on my work.
Creating an issue-mono-repo would leave me with no choice but to unsubscribe from that repo in order to escape the flood of notifications that are irrelevant for my work. This would also cut me off from notifications from the two repositories that I am actually maintaining, and where I'm trying to respond to issues, review PRs etc. in a timely manner.

@fabioberger
Copy link

Another small drawback is that you won't be able to use Github's handy fixes: keyword in PR descriptions and have it automatically close the issue on merge.

@raulk
Copy link
Member Author

raulk commented Jul 12, 2019

@fabioberger actually, GitHub does support cross-repository keyword triggers: https://help.github.com/en/articles/closing-issues-using-keywords#closing-an-issue-in-a-different-repository. I should’ve noted it in the body! Will edit.

@yusefnapora
Copy link
Contributor

I like @marten-seemann's suggestion to direct new users to file issues to go-libp2p and then move them to the "leaf repos". But I don't think it addresses @raulk's motivation for the proposal, which (as I read it) is about getting a "bird's eye" or project-level view of issues across repos / modules. Since the tools available seem to fall over with a large number of repos, getting the bird's eye view seems to require consolidation...

I agree that having separate code repos with an issue-mono-repo seems like a bit of an awkward hybrid setup. Are we viewing this proposal as a kind of "trial balloon" for consolidation, to see if a mono-repo might work for code as well? I'm kind of into the mono-repo idea personally, but I definitely see the other view and see why it's contentious.

@raulk
Copy link
Member Author

raulk commented Jul 12, 2019

🚨 I specifically wanted to stay away from the mono-repo debate. 🚨

This discussion is about task/issue/bug/feature management, and facilitating workflows as a team. Code architecture will remain as-is within the scope of this debate. We tend to get very philosophical when it comes to modularity. And with good reason: modularity and composability are key principles of libp2p that will stay intact. However, this issue is about practicality of management, prioritisation, and workflows.

@marten-seemann, just a short overarching remark though. I believe you’re mixing modularity/composability/pluggability and component independence/“standalone-ness”.

Modularity and composability can be realised with a monorepo — have a look at Apache Camel as an example. It contains 200+ modules, all of which are released at the same time (they are not independent), and the user only depends on core + the modules they pick. Same with “baptised” Linux kernel modules. Modularity and composability are about APIs, not about physical code layout.

Conversely, none of the go-libp2p-* components are truly independent/standalone. They cannot be used outside of libp2p. They are designed to be plugged into libp2p via composition. Even the higher level protocols, e.g. Kad DHT, pubsub, etc. depend on all of the libp2p machinery. Despite hard-depending only core abstractions, those abstractions have to be fulfilled at runtime by the rest of the libp2p stack.

Once again, this is not the place for the mono-repo discussion, but I did want to pull some strands apart that tend to get interweaved, at times tangling discussions that hit architectural topics even if tangentially.

@marten-seemann
Copy link
Contributor

@yusefnapora Yeah, that's right, I don't know a lot about third-party Github tools around, so I can't comment on the features and shortcomings of any particular tools out there. I've never used Waffle, and in fact, I'm quite happy that it's gone now, because I found the auto-assignment of issues and PRs quite distracting, since it caused a bunch of email notifications that I couldn't unsubscribe from.

I realize that this probably doesn't apply to all people working on libp2p, but I for my part am happy with the Github tooling as it is, and am not planning to adopt any new tools in my workflow. As I'm unable to suggest any alternative tools for people who're unhappy with what Github is providing, I'm probably not the right one to say this, but to me it feels a bit weird to change the way we're organizing our issues in response to the shortcomings of one particular third-party tool.

@raulk I fully agree, and my intention is not to restart the mono-repo discussion here. What I tried to bring across in my previous post is that code organization and issue organization are not orthogonal problems, and to me it makes little sense to have a multi-repo for code and a mono-repo for issues (or vice versa).

@raulk
Copy link
Member Author

raulk commented Jul 12, 2019

We need tooling that works for our community, ecosystem, stakeholders, engineers and users, and that allows us to:

  • align the work of all-around contributors, scoped contributors (like you @marten-seemann), sporadic contributors, etc.
  • produce continuous reporting.
  • group tasks in epics, assigning those to people.
  • track cross-implementation goals.
  • track progress and aging.
  • much more.

These elements are just as much part of making the libp2p project successful as is the code itself.

We are pampered by GitHub automatically creating a issue tracker for each repo. But for the better part of history, large projects were not managed co-locating code with ticket/issue management (Linux, Chromium, Firefox, etc).

I believe this Github default is biasing people, wrongly making us believe that code needs to be colocated with the management tools. It is not the case.

I’m open to other options, but it needs facilitate technical and project management. Keeping things as they are does not. Just look at the amount of times we’ve tried to triage the massive backlog and failed in the attempt.

@yusefnapora
Copy link
Contributor

Just wanted to go on record that I'm for the proposal, btw. I'd rather have one project-level view that's possible to filter than many separate views that are (practically) impossible to aggregate. As long as it's not JIRA, I can definitely live with it 😄

@raulk
Copy link
Member Author

raulk commented Jul 12, 2019

Lol. The outlook of adopting JIRA is a great forcing function here.

@marten-seemann: I do think you bring a legitimate use case for any new management workflow: email notifications for scoped contributors. I’m thinking we can trivially put together a bot that pings specific users when issues are labelled with their labels of interest.

@lanzafame
Copy link
Contributor

lanzafame commented Jul 13, 2019

So I have read your initial post @raulk and to myself and correct me if I miss the mark, but there is two separate problems that are attempting to be solved by the consolidation of issues into one repo:

  1. Users reporting issues don't know where they need to create the issue because of all the repos. (user)
  2. Current tooling doesn't handle consolidating the hundreds of repos we have. (maintainer)

Both of these could be solved by consolidating issues into a single repo but I personally think this calls for two separate tools. There are other tools out there other than Zenhub that meet the requirements of the maintainer, i.e. https://github.com/marketplace/azure-boards, and I am sure there are more. As such, I think a decent tool analysis should be done before we undertake a change in how our issue repos are structured.

And there a other solutions to the single entry point problem for users reporting issues like the many bug reporting/support tools that are out there.

My gut instinct on this entire issue is that GitHub gives us a hammer aka a repo, and makes us try to do mental gymnastics to see everything as a nail. I agree that what you outline is a problem, two problems to specific, and I believe we need to do some more research to determine whether there are no other options available to us before taking the mono-issue-repo route.

EDIT: there is actually a third group, which is open source contributors who are very used to the GitHub model of open source (aka the colocation of issues and code). Changing that is a breaking change for them and their expectations. I don't mind doing this but it is something that we should keep in mind.

@raulk
Copy link
Member Author

raulk commented Jul 14, 2019

@lanzafame

Both of these could be solved by consolidating issues into a single repo but I personally think this calls for two separate tools.

Whichever tool we use, it needs to be a seamless, snappy, lightweight, opt-in overlay on top of GitHub. GitHub is our source of truth, and the team shuns duplicating work across tools, clunky integrations, or questionable UIs. Zenhub does a pretty good job here.

There are other tools out there other than Zenhub that meet the requirements of the the maintainer, i.e. github.com/marketplace/azure-boards

  1. Who is the maintainer in this context? Most maintainers at PL are favourable to this approach; users are ACK'ing in this issue too. So far only @marten-seemann has pointed out a regression that affects his workflow, which is not a blocker and totally reconcilable in various ways.

  2. Aside from Zenhub, I've analysed these options: JIRA, Trello, Clubhouse, Asana. They don't work for various reasons which aren't relevant now.

i.e. github.com/marketplace/azure-boards

How does this compare to Zenhub, especially in terms of repo limitations/speed?

@lanzafame
Copy link
Contributor

Whichever tool we use, it needs to be a seamless, snappy, lightweight, opt-in overlay on top of GitHub. GitHub is our source of truth, and the team shuns duplicating work across tools, clunky integrations, or questionable UIs. Zenhub does a pretty good job here.

I am not disagreeing with any of these, except that for Zenhub to remain 'snappy' it requires that we consolidate all repos issues into a single repo, which to me suggests that it is not fit for purpose.

Most maintainers at PL are favourable to this approach; users are ACK'ing in this issue too.

My comment was neither a for or against but that by splitting the problem we may be able to find tooling that supports both usecases without the upheaval to the projects issue trackers.

How does this compare to Zenhub, especially in terms of repo limitations/speed?

I can't test the speed due to lack of privileges but it allows a 100 repos to be connected to the one project board. But me suggesting this wasn't a surefire solution but quick google for a tool that supported many repos.

My main point, is that I think a proper requirements and tool analysis should be done before making such drastic changes to so many repos. If there is nothing that meets the needs of the different stakeholders than I am all for this.

@marten-seemann
Copy link
Contributor

@raulk My main point was not the issue with my workflow. Even if the notification issue was resolved, I'd still be opposed to the mono-issue-repo proposal. By avoiding the discussion about keeping our code in a mono-repo, in my opinion, we end up with the worst of both worlds in terms of architecture: we still have the dependency graph / code testing issues of the multi-repo approach, while at the same time giving up on the modularity that multi-repos are supposed to provide us.

@raulk
Copy link
Member Author

raulk commented Jul 15, 2019

Thanks for your input @marten-seemann @lanzafame. This discussion can go on indefinitely and we risk falling into analysis paralysis. So in the interest of making progress, I'll make my final remarks and move on.

  • The libp2p team had discussed consolidating issue tracking onto the top-level repo in the past. Basically: (1) it's evident the current methodology/system is ineffective; (2) we have attempted to organise things several times in the past, and failed; (3) there is new pressure to bring structure/order to this community; (4) tooling is an integral part of this; (5) while we will not buy into the whims of any particular tool, truth is that this maze is unmanageable without tooling, to begin with; (6) if we're so convinced we need per-repo issue tracking, and none of the PM tools satisfies us, theoretically we can build our own PM product; however, we don't have time nor interest, so at one point we have to give in to a trade-off; (7) some of us think there's no trade-off to give into anyway, because our current setup is flawed by principle on various angles, and we should experiment with something different.
  • We tried Waffle in the past. It did not have repo limitations, but ground to a halt as soon as we added a bunch of them. It was totally unusable, with 15 sec load times between issues. Lesson: the vendor not capping repo count doesn't mean it can handle any scale (alluding to the Azure Boards proposal).
  • Speed is one requirement, not the sole one. Blindly choosing a tool "because it supports many repos" is the wrong heuristic.
  • We have evaluated the tools I listed earlier (I'm not sure where the assumption that we haven't comes from). However, let's not turn this issue into a comprehensive product evaluation report. For now, it suffices to say that Zenhub checks most boxes in terms of speed, workflows, reporting, UX, etc. and has no dealbreakers (all others do).
  • Fun fact: I just tried Azure Boards and after signing in with my GitHub account, it wanted me to sign in with a Microsoft account too. raulk runs away 🏃💨
  • @marten-seemann: I believe my previous comments address your concerns in terms of: (1) why do we assume issues need to be colocated with the code for the non-independent component? (2) why do we assume that's the best approach for all users, or are we blindly buying into the GitHub default? (3) the term modularity in this discussion.

@lanzafame
Copy link
Contributor

lanzafame commented Jul 15, 2019

I'm not sure where the assumption that we haven't comes from

Fairly simple, it wasn't communicated that you had...

So in the interest of making progress, I'll make my final remarks and move on.

@raulk Not sure why this is a proposal, it should just be an announcement as you have already made a decision. 👍

EDIT: I should mention that I don't really care if you have made the decision already, if you think it is best for everyone involved in the project, then go for it but don't make out as if there was any chance of swaying the decision, it just leads to frustration.

@raulk
Copy link
Member Author

raulk commented Jul 15, 2019

(self-quote) So in the interest of making progress, I'll make my final remarks and move on.

@lanzafame My bad; that came out a bit abrupt. What I intended to say is that I heard your arguments and I believe I addressed them on various fronts, up to the point that we'd just be going around in circles unproductively, and the majority in the libp2p team at PL and on this issue is favourable to revamping our workflows and tooling. So I much rather focus on making forward progress at this point, because this is time-critical for the libp2p team.

Since your concern was basically "have we evaluated other tools?" and the answer was yes, I'll just make my notes public:

  • Clubhouse: separate tool; does not offer a public view; not an overlay on GitHub.
  • Trello: separate tool; not an overlay on GitHub; completely different app; comments and state transitions won't sync bidirectionally. Requires manual work.
  • GitHub projects: nice for tracking epics as projects, but there's no birds' eye view, and only supports max 25 repos per project.
  • JIRA: vetoed by several, including me.
  • Asana: no public view; not developer oriented; too heavy weight.
  • ZenHub: overlay on GitHub (GH is source of truth); label management; Chrome extension; Kanban or other workflows; useful reporting; used by lots of big names; free for OSS; multiple workspaces.
  • Azure Boards: wasn't in my radar but it requires a Microsoft login.
  • There are probably tens of other smaller tools out there, but they are riskier choices and we need something that's mature, evolved and refined.

@lanzafame
Copy link
Contributor

lanzafame commented Jul 15, 2019

ZenHub: overlay on GitHub (GH is source of truth); label management; Chrome extension; Kanban or other workflows; useful reporting; used by lots of big names; free for OSS; multiple workspaces.

Great, why does it require a mono-issue-repo?

Since your concern was basically "have we evaluated other tools?"

No! My concern is the creation of a mono-issue-repo. If the tooling is forcing that choice, then lets look at other tooling, hence the "have we evaluated other tools?".

@raulk
Copy link
Member Author

raulk commented Jul 16, 2019

It’s all stated in the description of this issue and follow-up comments.

@ghost
Copy link

ghost commented Jul 17, 2019

Continuing #676 (comment) 👍

  • Waffle -- went out of business

On the main topic:

I'm a +1 on moving forward with *something* even if we don't have 100% agreement on it. The points made in this discussion seem reasonable to me, but ultimately we're going to have to pick an approach and some people are going to be unhappy or need to modify their workflows. It'll be for the greater good, though.

@Warchant
Copy link

Warchant commented Jul 17, 2019

Consider also https://zube.io - it has multirepo projects

@raulk
Copy link
Member Author

raulk commented Jul 17, 2019

Consider also zube.io - it has multirepo projects

Most of the tools we evaluated support multirepo projects (including ZenHub). They just grind to a halt when adding all of our repos, or they have maximum caps.

@BigLep
Copy link
Contributor

BigLep commented Jan 10, 2023

@p-shahi and @marten-seemann : can this be closed now in light of finish the monorepo work per #1556 ?

@p-shahi
Copy link
Member

p-shahi commented Jan 10, 2023

Sounds good, thank you for spotting this stale issue.

@p-shahi p-shahi closed this as completed Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants