-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest rst does not follow included references for extract #107
Comments
Also need code to recursively walk the docs page and extract sub-pages too. Need html splitter code to split on h2 heading. |
Note:
|
@sunank200 @mpgreg AFAIK we generate html from rst docs since we are ingesting html docs why do we need rst too or I'm missing something here |
Yes, this issue was meant to be closed if/when we change to html ingest. |
cc: @sunank200 @phanikumv |
Closing as discussed with Pankaj and Ankit in the sprint planning call. |
extract_github_rst() does not follow includes or references to other rst docs. This means that much of the airflow docs content is not being ingested or is not able to reference to the correct page.
https://github.com/astronomer/ask-astro/blob/c45487c7f12a9424dbe885580c687e35e30b7de4/airflow/dags/ingestion/ask-astro-load-github.py#L46C10-L46C10
Need to ingest from scrape of airflow docs html pages instead.
https://airflow.apache.org/docs/
The text was updated successfully, but these errors were encountered: