Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate json artifacts #43

Merged
merged 20 commits into from
Jun 14, 2018
Merged

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 8, 2018

Refs to readthedocs/readthedocs.org#3393

Still WIP, I'm testing this locally to see if it works p:

@stsewd
Copy link
Member Author

stsewd commented Jun 8, 2018

This is from the original json builder

{
  "body": "<div class=\"section\" id=\"installation\">\n<h1>Installation<a class=\"headerlink\" href=\"#installation\" title=\"Permalink to this headline\">\\u00b6</a></h1>\n<p>Installing Kong is pretty simple. Here is a step by step plan on how to do it.</p>\n<div class=\"admonition note\">\n<p class=\"first admonition-title\">Note</p>\n<p class=\"last\">Kong is available on Pypi as <code class=\"docutils literal notranslate\"><span class=\"pre\">django-kong</span></code>, but trunk is probably your\nbest best for the most up to date features.</p>\n</div>\n<p>First, obtain <a class=\"reference external\" href=\"http://www.python.org/\">Python</a> and <a class=\"reference external\" href=\"http://pypi.python.org/pypi/virtualenv\">virtualenv</a> if you do not already have them. Using a\nvirtual environment will make the installation easier, and will help to avoid\nclutter in your system-wide libraries. You will also need <a class=\"reference external\" href=\"http://git-scm.com/\">Git</a> in order to\nclone the repository.</p>\n<p>Once you have these, create a virtual environment somewhere on your disk, then\nactivate it:</p>\n<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">virtualenv</span> <span class=\"n\">kong</span>\n<span class=\"n\">cd</span> <span class=\"n\">kong</span>\n<span class=\"n\">source</span> <span class=\"nb\">bin</span><span class=\"o\">/</span><span class=\"n\">activate</span>\n</pre></div>\n</div>\n<p>Kong ships with an example project that should get you up and running quickly. To actually get kong running, do the following:</p>\n<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">git</span> <span class=\"n\">clone</span> <span class=\"n\">http</span><span class=\"p\">:</span><span class=\"o\">//</span><span class=\"n\">github</span><span class=\"o\">.</span><span class=\"n\">com</span><span class=\"o\">/</span><span class=\"n\">ericholscher</span><span class=\"o\">/</span><span class=\"n\">django</span><span class=\"o\">-</span><span class=\"n\">kong</span><span class=\"o\">.</span><span class=\"n\">git</span>\n<span class=\"n\">cd</span> <span class=\"n\">django</span><span class=\"o\">-</span><span class=\"n\">kong</span>\n<span class=\"n\">pip</span> <span class=\"n\">install</span> <span class=\"o\">-</span><span class=\"n\">r</span> <span class=\"n\">requirements</span><span class=\"o\">.</span><span class=\"n\">txt</span>\n<span class=\"n\">pip</span> <span class=\"n\">install</span> <span class=\"o\">.</span> <span class=\"c1\">#Install Kong</span>\n<span class=\"n\">cd</span> <span class=\"n\">example_project</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">syncdb</span> <span class=\"o\">--</span><span class=\"n\">noinput</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">loaddata</span> <span class=\"n\">test_data</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">runserver</span>\n</pre></div>\n</div>\n<p>This will give you a locally running instance with a couple of example sites\nand an example test.</p>\n<p>Now that you have your tests in your database, you need to check that your\ntests run. You can run tests like:</p>\n<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"c1\">#Check all sites</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">check_sites</span>\n<span class=\"c1\">#Only run the front page test</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">check_sites</span> <span class=\"o\">-</span><span class=\"n\">t</span> <span class=\"n\">front</span><span class=\"o\">-</span><span class=\"n\">page</span>\n<span class=\"c1\">#Only check sites of type Mine</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">check_sites</span> <span class=\"o\">-</span><span class=\"n\">T</span> <span class=\"n\">mine</span>\n</pre></div>\n</div>\n<p>The first command is the default way of running kong, and will run the tests for all of your sites.</p>\n<p>The second two different ways will run either a specific test, or a type of test. Both of these can run tests across multiple sites.</p>\n</div>\n",
  "alabaster_version": "0.7.10",
  "display_toc": false,
  "title": "Installation",
  "sourcename": "install.rst.txt",
  "customsidebar": null,
  "metatags": "",
  "current_page_name": "install",
  "next": {
    "link": "../management_commands/",
    "title": "Management Commands"
  },
  "rellinks": [
    [
      "genindex",
      "General Index",
      "I",
      "index"
    ],
    [
      "management_commands",
      "Management Commands",
      "N",
      "next"
    ],
    [
      "index",
      "Welcome to Django Kong\\u2019s documentation!",
      "P",
      "previous"
    ]
  ],
  "meta": {},
  "parents": [],
  "sidebars": [
    "localtoc.html",
    "relations.html",
    "sourcelink.html",
    "searchbox.html"
  ],
  "toc": "<ul>\n<li><a class=\"reference internal\" href=\"#\">Installation</a></li>\n</ul>\n",
  "prev": {
    "link": "../",
    "title": "Welcome to Django Kong\\u2019s documentation!"
  },
  "page_source_suffix": ".rst"
}

And this is generated with this current branch

{
    "body": "<div class=\"section\" id=\"installation\">\n<h1>Installation<a class=\"headerlink\" href=\"#installation\" title=\"Permalink to this headline\">\u00b6</a></h1>\n<p>Installing Kong is pretty simple. Here is a step by step plan on how to do it.</p>\n<div class=\"admonition note\">\n<p class=\"first admonition-title\">Note</p>\n<p class=\"last\">Kong is available on Pypi as <code class=\"docutils literal notranslate\"><span class=\"pre\">django-kong</span></code>, but trunk is probably your\nbest best for the most up to date features.</p>\n</div>\n<p>First, obtain <a class=\"reference external\" href=\"http://www.python.org/\">Python</a> and <a class=\"reference external\" href=\"http://pypi.python.org/pypi/virtualenv\">virtualenv</a> if you do not already have them. Using a\nvirtual environment will make the installation easier, and will help to avoid\nclutter in your system-wide libraries. You will also need <a class=\"reference external\" href=\"http://git-scm.com/\">Git</a> in order to\nclone the repository.</p>\n<p>Once you have these, create a virtual environment somewhere on your disk, then\nactivate it:</p>\n<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">virtualenv</span> <span class=\"n\">kong</span>\n<span class=\"n\">cd</span> <span class=\"n\">kong</span>\n<span class=\"n\">source</span> <span class=\"nb\">bin</span><span class=\"o\">/</span><span class=\"n\">activate</span>\n</pre></div>\n</div>\n<p>Kong ships with an example project that should get you up and running quickly. To actually get kong running, do the following:</p>\n<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">git</span> <span class=\"n\">clone</span> <span class=\"n\">http</span><span class=\"p\">:</span><span class=\"o\">//</span><span class=\"n\">github</span><span class=\"o\">.</span><span class=\"n\">com</span><span class=\"o\">/</span><span class=\"n\">ericholscher</span><span class=\"o\">/</span><span class=\"n\">django</span><span class=\"o\">-</span><span class=\"n\">kong</span><span class=\"o\">.</span><span class=\"n\">git</span>\n<span class=\"n\">cd</span> <span class=\"n\">django</span><span class=\"o\">-</span><span class=\"n\">kong</span>\n<span class=\"n\">pip</span> <span class=\"n\">install</span> <span class=\"o\">-</span><span class=\"n\">r</span> <span class=\"n\">requirements</span><span class=\"o\">.</span><span class=\"n\">txt</span>\n<span class=\"n\">pip</span> <span class=\"n\">install</span> <span class=\"o\">.</span> <span class=\"c1\">#Install Kong</span>\n<span class=\"n\">cd</span> <span class=\"n\">example_project</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">syncdb</span> <span class=\"o\">--</span><span class=\"n\">noinput</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">loaddata</span> <span class=\"n\">test_data</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">runserver</span>\n</pre></div>\n</div>\n<p>This will give you a locally running instance with a couple of example sites\nand an example test.</p>\n<p>Now that you have your tests in your database, you need to check that your\ntests run. You can run tests like:</p>\n<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"c1\">#Check all sites</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">check_sites</span>\n<span class=\"c1\">#Only run the front page test</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">check_sites</span> <span class=\"o\">-</span><span class=\"n\">t</span> <span class=\"n\">front</span><span class=\"o\">-</span><span class=\"n\">page</span>\n<span class=\"c1\">#Only check sites of type Mine</span>\n<span class=\"o\">./</span><span class=\"n\">manage</span><span class=\"o\">.</span><span class=\"n\">py</span> <span class=\"n\">check_sites</span> <span class=\"o\">-</span><span class=\"n\">T</span> <span class=\"n\">mine</span>\n</pre></div>\n</div>\n<p>The first command is the default way of running kong, and will run the tests for all of your sites.</p>\n<p>The second two different ways will run either a specific test, or a type of test. Both of these can run tests across multiple sites.</p>\n</div>\n", 
    "alabaster_version": "0.7.10", 
    "display_toc": false, 
    "title": "Installation", 
    "sourcename": "install.rst.txt", 
    "customsidebar": null, 
    "current_page_name": "install", 
    "next": {
        "link": "management_commands.html", 
        "title": "Management Commands"
    }, 
    "sidebars": [
        "localtoc.html", 
        "relations.html", 
        "sourcelink.html", 
        "searchbox.html"
    ], 
    "meta": {}, 
    "parents": [], 
    "toc": "<ul>\n<li><a class=\"reference internal\" href=\"#\">Installation</a></li>\n</ul>\n", 
    "prev": {
        "link": "index.html", 
        "title": "Welcome to Django Kong\u2019s documentation!"
    }, 
    "page_source_suffix": ".rst", 
    "metatags": ""
}

Note that the result is from different pages and different versions, in the morning I will try to test with the same page. I updated this, both outputs are from the same page/version.

@stsewd
Copy link
Member Author

stsewd commented Jun 8, 2018

I think I need to make this work with a lower sphinx version 😞

build_json = os.path.abspath(
os.path.join(app.outdir, '..', 'json')
)
outjson = os.path.join(build_json, pagename + '.json')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extension for the previous build is .fjson, maybe I should change that

def generate_json_artifacts(app, pagename, templatename, context, doctree):
try:
# We need to get the output directory where the docs are built
# _build/json
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I review a little more the code, this is copied to the artifacts folder in a later step, so we need to copy those there, or maybe do this in the move_files step, I go for the later

'toc',
'prev',
'page_source_suffix',
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into the code https://github.com/rtfd/readthedocs.org/blob/04ce7bb7a65a5d9ce2ad5a7b9a34185ef45c98ce/readthedocs/search/parse_json.py#L92-L124, we don't need all these keys. We only need:

  • current_page_name
  • body
  • title
  • toc

Should I remove the rest? Is it used in other parts of the code that I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the only ones that we use currently. If there is other useful data, we should keep it, but I believe that's the primary data that we care about.

@stsewd
Copy link
Member Author

stsewd commented Jun 10, 2018

This PR is ready, we only need to check for #43 (comment) and #44

if key in context
}
json_file.write(json.dumps(to_context, indent=4))
log.info('{page} processed.'.format(page=outjson))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to change/remove this logging?

json_file.write(json.dumps(to_context, indent=4))
log.info('{page} processed.'.format(page=outjson))
except Exception:
log.exception(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make the build to fail (we need to throw the exception), it should fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it from failing for now, and just log it.

@@ -9,7 +9,12 @@ class LanguageIntegrationTests(unittest.TestCase):

def _run_test(self, test_dir, test_file, test_string, builder='html'):
with build_output(test_dir, test_file, builder) as data:
self.assertIn(test_string, data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replate this to test more easily the output :)

@stsewd stsewd requested a review from ericholscher June 10, 2018 20:48
from sphinx.util.console import bold

from .embed import EmbedDirective
from .mixins import BuilderMixin

try:
# Avaliable from Sphinx 1.6
from sphinx.util.logging import getLogger
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx 1.4 has the logging module, but not getLogger

@stsewd
Copy link
Member Author

stsewd commented Jun 11, 2018

I having doubts about the location of the json files, I need to override the move method from the htmlbuilder in rtd. It feels like the output directory should be something like _build/html/_json

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like quite a simple change (which I thought it might be!) -- I imagine this will break in some fashion in production around unicode or something, but I'd like to get it deployed and tested before turning off the current JSON builds. Perhaps we should include a feature flag for this, and have this PR respect a setting that enables it?

# _build/html/_json.
build_json = os.path.abspath(
os.path.join(app.outdir, '_json')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should output this to _build/json as we have normally? Both would work, but it might make the downstream changes less. For some operational reasons, we might want to keep the JSON & HTML seperated on the web machines, and keeping them out of the same path might make things easier.

json_file.write(json.dumps(to_context, indent=4))
log.info('{page} processed.'.format(page=outjson))
except Exception:
log.exception(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it from failing for now, and just log it.

}
json_file.write(json.dumps(to_context, indent=4))
log.info('{page} processed.'.format(page=outjson))
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use a couple different try/except's here that are more scoped. I see a couple places where things could fail:

  • JSON dumping
  • Filesystem issues

Should probably catch them each specifically if possible, though having a wider try/except might be useful to stop builds from failing, also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

"""
try:
if not app.config.rtd_generate_json_artifacts:
return
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a way of using this configuration in the setup step, so I'm checking it here

html_theme = 'alabaster'
html_static_path = ['_static']
htmlhelp_basename = 'pyexampledoc'
rtd_generate_json_artifacts = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value should be in the conf.py template of the rtd repo :)

@stsewd
Copy link
Member Author

stsewd commented Jun 12, 2018

This is done, I need to change things in the rtd side

if not os.path.exists(outdir):
os.makedirs(outdir)
with open(outjson, 'w+') as json_file:
to_context = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the field should be blank rather than non existence if the key is missing.
It should be something like

to_context = { key: context.get(key, '') for key in KEYS}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, so that the JSON always has the same keys.

for key in KEYS
if key in context
}
json_file.write(json.dumps(to_context, indent=4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using json.dump instead of dumps?
json.dump(to_context, json_file, indent=4)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's shorter

if key in context
}
json_file.write(json.dumps(to_context, indent=4))
except TypeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its capturing a big set of scope. you can add it to the json.dump scope only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the try...except should enclose just the with open(... code instead of all the function?

log.exception(
'Fail to encode JSON for page {page}'.format(page=outjson)
)
except IOError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this

@ericholscher
Copy link
Member

This looks good. I'd like to get it out in a deploy today, so going to merge it in.

@ericholscher ericholscher merged commit e2cf38a into readthedocs:master Jun 14, 2018
@stsewd stsewd deleted the add-json-process branch June 14, 2018 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants