Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for maintainers #16

Open
Kludex opened this issue Aug 15, 2022 · 14 comments
Open

Looking for maintainers #16

Kludex opened this issue Aug 15, 2022 · 14 comments

Comments

@Kludex
Copy link
Owner

Kludex commented Aug 15, 2022

I don't have time to maintain this project anymore. If you are reading this, and you want to help, the same comment that I made before applies for you: #15 (comment)

@iudeen
Copy link

iudeen commented Aug 16, 2022

I took a look into it.

I have few suggestions (more of questions):

  1. Why do we need to clone all repo? We do get almost all information from search repo.
  2. Is this project meant to be a single place where user could find any project that has FastAPI in it? Or we can add a logic, say it has to have minimum of say 3 starred?
  3. Can we use a SQLlite local db to check for duplicates efficiently instead of having bunch of text files?
  4. Can we add more info to the table? Like number of stars, forks?

Also a very extravagant wish, can we aim of github pages interactive UI with search and other cool stuff?

@Kludex
Copy link
Owner Author

Kludex commented Aug 16, 2022

  1. To be more precise. We don't need to clone the git history tho. I mean, can you fetch the information of packages used without cloning?

  2. We could add a custom logic.

  3. Yes.

  4. Yes.

And yes. The UI would be cool.

@iudeen
Copy link

iudeen commented Aug 17, 2022

Here is the flow I have come up with
image

Few concerns:

  1. Do we need to show dependencies? If we skip this, we can avoid cloning :)
  2. Is deta.sh a good place to run this? Or do you recommend an alternative?

I also would plan for an API interface to query the data from our DBs. Later, we can create a UI for the same!

@Kludex
Copy link
Owner Author

Kludex commented Aug 17, 2022

I believe that what brings value to this project is knowing which dependencies they use.

We can run this on GitHub if we create a static page, and use SQLite. Maybe something like https://gohugo.io/? I'm not too familiar with it...

@iudeen
Copy link

iudeen commented Aug 17, 2022

Okay, thinking of value it adds, it does make sense.

I think we should complete the core before we start thinking about UI.

If we are going all local, then SQLlite is better 😄

Also what do you think of having two lists, curated (not sure about naming) and general? The curated one looks for READMEs and general looks for code.

@Kludex
Copy link
Owner Author

Kludex commented Aug 17, 2022

The number of GitHub queries that can be done is quite limited considering the amount of results. And you'll soon notice that it's important to shrink the numbers.

Like, on a single job, we are not going to be able to query all projects. We query part of them, and store information and with that we query in a paginated way the next time.

The query can probably be refined to something like "from fastapi import FastAPI".

@iudeen
Copy link

iudeen commented Aug 17, 2022

Yes, adding a filter stars>1 reduced results from some 300K to 273. 😅

I'll try to get into coding today on this and see what limitations we hit.

@Kludex
Copy link
Owner Author

Kludex commented Aug 17, 2022

Ah... If that's the case... Do as many queries as you want hahaha

@handreassa
Copy link

Hi @Kludex, do you still have plans to add maintainers to this repo? I can help with that.

@Kludex
Copy link
Owner Author

Kludex commented Feb 28, 2023

I do. The idea is the one in #15 (comment), if anyone is willing to follow that I can give rights here. 🤷

@handreassa
Copy link

I can follow that @Kludex

@vladfedoriuk
Copy link
Collaborator

vladfedoriuk commented Jul 25, 2023

Hi @Kludex
Having looked at some of the discussions and PRs, I would like to contribute as well 🙂

Let me share how I can imagine a solution to the problem and I would appreciate it if you could give me some feedback regarding my suggestions and whether I got the idea of the project right 😄

First, as discussed above, it would be great to implement it as a web app with an SQLite DB (for simplicity and cost efficiency). FastApi with SQLAlchemy could be used. Alternatively, Streamlit could be a good idea because it could ease creating visualizations, though limiting the general UI capabilities. With FastAPI, HTMX can be used on the presentation layer. When it comes to visualizations, packages like Plotly, Bokeh, or ipyvizzu offer support for rendering graphs to HTML so they can get embedded in the templates.

The question is - what kind of visualization do you envision on the page?

Things get trickier when it comes to actually creating the index of the projects. It turns out it is not so obvious to get the dependencies employed by the Python projects. There are a lot of ways one could provide dependencies - either via a requirements file, pyproject.toml, setup.cfg, and so on ... (https://discuss.python.org/t/list-dependencies-of-a-package/12341) The tools exist to parse various formats of dependencies (https://github.com/nexB/dparse2/tree/main) but they are quite limited and do not support most of the formats. Do you have any ideas on how dependencies could be discovered efficiently? Alternatively, we could use a tool like https://pypi.org/project/third-party-imports/ to find the third-party imports - less efficient but might work well

The GitHub Actions could be used to run the periodic scraping of the repositories, as it has been suggested earlier.
I believe the GitHub Search should really be enough. Take a look at the following query: https://github.com/search?q=stars%3A%3E3+path%3A*.py+fastapi+language%3APython&type=&ref=advsearch&l=Python. Does https://sourcegraph.com/search allow to show the dependencies of the projects? Do you think Github Search would end up throttling the app? Which search engine would you prefer?

Please, let me know what you think and I would be happy to create a PR 🙂

@Kludex
Copy link
Owner Author

Kludex commented Jul 26, 2023

The question is - what kind of visualization do you envision on the page?

Simple table with filtering on packages used.

Do you have any ideas on how dependencies could be discovered efficiently?

No. On the first implementation, I just queried all that contained "import fastapi" or "from fastapi import".

Does sourcegraph.com/search allow to show the dependencies of the projects?

I don't know.

Do you think Github Search would end up throttling the app?

The limit is short, but I guess not...

Which search engine would you prefer?

If we can avoid GitHub, it would be great, since I don't want to use my personal token for it.

@Pal-Sandeep
Copy link

Hey @Kludex , I want to contribute to this project.
please add me to the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants