[ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring #65

norbusan · 2024-09-11T14:02:24Z

Lots of changes:

replace the preflight (v1) completely with the new one
replace the old zzrm with the new zzrm
flesh out the logic on how to deal with zzrm and auto_detect
change semantics of what files are compiled (closer to the content of zzrm)
fixes to all unittests
make pytest easier to run against an already running docker container
...

…ersionDriver

Use the options pytest --no-docker-setup --docker-port 6301 to disable docker setup and connect to an already running container at port 6301.

…toml

zerozeroreadme/zerozeroreadme/__init__.py

preflight_parser/preflight_parser/__init__.py

dginev · 2024-09-25T14:26:08Z

preflight_parser/preflight_parser/__init__.py

@@ -1033,7 +1089,7 @@ def compute_toplevel_files(roots: dict[str, ParsedTeXFile], nodes: dict[str, Par
            filename=n.filename,
            process=MainProcessSpec(compiler=CompilerSpec(engine=engine, output=output, lang=lang, postp=postprocess)),
        )
-        compiler: str | None = tl.process.compiler.compiler_string
+        compiler: str | None = None if tl.process.compiler is None else tl.process.compiler.compiler_string


None if X is None else Y is quite the Python idiom.

To my perl eyes X and Y reads easier (if that's the point?):

tl.process.compiler and tl.process.compiler.compiler_string

Not sure what is more readable for me ... the "if X is something falsish, return X, otherwise return Y" is common in Perl, but in Python I always stumble about "What was it again ... didn't and return True/False ..."

BTW, I think it was a failure to design Python like this - because it is Perl dialact ;-)

dginev · 2024-09-25T14:35:13Z

tex2pdf_service/tests/test_docker.py

    meta = None
-    url = service + f"/?timeout={tex2pdf_timeout}"
+    url = service + f"/?timeout={tex2pdf_timeout}{api_args}"


Isn't it generally easier+safer to check once here if api_args has content then add the & separator, compared to starting api_args with & in each caller of this function?

Not sure. I am fine with both

For reference, there's a module: urllib.parse.urlencode

Not sure that is needed here, since we control what makes up the api call, so we know that we don't have spaces to quote etc. IMHO that would only obfuscate things.

Concerning the args, do you think that is better:

if not api_args: # check for empty string or empty list final_api_args = "" else: if isinstance(api_args, str): if api_args.startswith('&'): final_api_args = api_args else: final_api_args = '&' + api_args else: final_api_args = "&".join(["", *api_args])

I am not sure ... it is not for public consumption but testing.

Well, maybe go with a dict as argument and do

params_dict = { "timeout": tex2pdf_timeout } params_dict.update(api_args) params = urllib.parse.urlencode(params_dict) url = f"{service}/?{params}"

Will do that.

Done that now.

dginev · 2024-09-25T14:50:36Z

preflight_parser/tests/fixture/multi_tex_3/newmain.tex

+\include{section1}
+\include{section2}
+
+\end{document}


These are nice.
I also want to suggest a slightly more painful preamble/postamble test:

preamble.tex

\def\hello{world} \documentclass{book}

postamble.tex

%\printbibliography \end{document}

main.tex

\def\main{macros} \input{preamble} \begin{document} some content \input{postamble}

Very nice example, added.
And fixed a bug in the toplevel file detection when the main.tex is not definitive on tex vs latex.
Added unittest usng this example.

dginev · 2024-09-25T14:58:23Z

tex2pdf_service/tex2pdf/converter_driver.py

+
+            pdf_file = runs.get("pdf_file", pdf_file)
+            made_pdf_file = os.path.join(self.in_dir, pdf_file)
+            # I'm not liking this part very much


curious why?

That was a comment from Tai who originally wrote the code ;-)

dginev · 2024-09-25T15:09:43Z

zerozeroreadme/zerozeroreadme/__init__.py

@@ -38,6 +46,13 @@ def yaml_repr_ordered_dict(dumper: RoundTripRepresenter, data: OrderedDict) -> M
    return dumper.represent_mapping("tag:yaml.org,2002:map", dict(data))


+def strip_to_basename(path_list: list[str], extent: None | str = None) -> list[str]:


Looking at the os.path.* methods convention, it seems a single path is the common argument expectation - rather than a path_list. Maybe downgrade to a single path argument to avoid surprises? Use could then switch from

assembly.append(strip_to_basename([fn], ".pdf")[0])

to

assembly.append(strip_to_basename(fn, ".pdf"))

tHmm, indeed. tex2pdf.doc_converter has the very same definition and tex2pdf uses it with lists.

Maybe better to import it from zzrm to tex2pdf.doc_convert? Or use separate function names (add _list` to the doc_converter version.

@dginev WDYT?

The list variant is just not really needed... This function is barely needed as it is :)

On a list one could:

[strip_to_basename(path) for path in paths]

Ok, I have changed the zerozeroreadme's strip_to_basename to take only one path

dginev

✅

norbusan added 30 commits September 5, 2024 00:38

WIP use new zzrm and new preflight in tex2pdf

c6587ef

preflight: allow MainProcessSpec.compiler to be None

a0ffc72

Fix python warning

b764809

Add zerozeroreadme to tex2pdf deps, update poetry.lock

a2f23cd

tex2pdf: switch to zerozeroreadme

3ae117d

pydocstyle: be more relaxed what to accept

3b2e48b

more work

bba6471

Add auto_detect parameter to convert API endpoint and pass it to Conv…

44bf285

…ersionDriver

preflight att tex_compiler property

b5dbf79

Use current development branch in pyproject.toml

6a08a81

preflight: introduce to_json method, use it

b9847cc

tex-inspection: drop unused code and tests

cac84ba

tex_inspection: ruffify code, add to pre-commit check list

02831e0

tex2pdf: use preflight/zzrm

7e5b7eb

Work on unittests

2e31a55

zzrm: ensure assembly files have .pdf extension for .tex sources

5036e47

tex2pdf: bug fixes, processing logic

59c04a2

tex2pdf: fix unittests

8f6da14

tex2pdf pytest: allow running against pre-started docker container

c263a6e

Use the options pytest --no-docker-setup --docker-port 6301 to disable docker setup and connect to an already running container at port 6301.

add tex2pdf/bin/__init__.py for pytest imports

bd9e714

Update poetry.lock files according to current state of branch

79216d0

tex2pdf: proper exception handling

43bff99

preflight-parser: switch to pydantic1

ec3cf2b

zerozeroreadme: switch to pydantic1

59e071b

tex2pdf: switch to pydantic1

9b6ae61

tex2pdf: fix docker test when alternative port is used

e946a14

all: switch back to move the current development branch in pyproject.…

86885c3

…toml

Update some versions and poetry.lock

cda2ca1

More pyproject version alignments.

eeb2327

Sync poetry.lock changes

b117ee8

norbusan added 2 commits September 13, 2024 21:25

adjust .gitignore

86f194d

tex2pdf: Move preflight before ZZRM complete check

e5c5c03

norbusan marked this pull request as ready for review September 18, 2024 14:22

norbusan changed the title ~~[WIP] [ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring~~ [ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring Sep 18, 2024

Add test for preflight run

60c51a1

norbusan requested a review from ntai-arxiv September 18, 2024 15:05

dginev reviewed Sep 19, 2024

View reviewed changes

zerozeroreadme/zerozeroreadme/__init__.py Outdated Show resolved Hide resolved

norbusan added 3 commits September 20, 2024 08:15

Fix missing compiler component in process

e7fbd50

preflight: correctly use documentclass/bye for toplevel file detection

097b4cc

add unit-test for documentclass/bye tests

f2fc82c

jonathanhyoung previously approved these changes Sep 25, 2024

View reviewed changes

dginev reviewed Sep 25, 2024

View reviewed changes

norbusan added 2 commits September 26, 2024 13:08

Fix documentclass and begin{document} in separate file

06ebbd6

Don't force \bye being followed by anything, add unittest for it

c5b5266

norbusan dismissed jonathanhyoung’s stale review via c5b5266 September 26, 2024 04:20

jonathanhyoung previously approved these changes Sep 26, 2024

View reviewed changes

Simplify strip_to_basename function

3ba5eed

norbusan dismissed jonathanhyoung’s stale review via 3ba5eed September 26, 2024 14:15

rework submit_tarball, use urllib.parse

8b34e85

dginev approved these changes Sep 26, 2024

View reviewed changes

norbusan merged commit 3409262 into master Sep 26, 2024
2 checks passed

norbusan deleted the ARXIVCE-2542-use-new-preflight branch September 26, 2024 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring #65

[ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring #65

norbusan commented Sep 11, 2024

dginev Sep 25, 2024

norbusan Sep 26, 2024

dginev Sep 25, 2024 •

edited

Loading

norbusan Sep 26, 2024

dginev Sep 26, 2024

norbusan Sep 26, 2024

norbusan Sep 26, 2024

norbusan Sep 26, 2024

dginev Sep 25, 2024

norbusan Sep 26, 2024

dginev Sep 25, 2024

norbusan Sep 26, 2024

dginev Sep 25, 2024

norbusan Sep 26, 2024

dginev Sep 26, 2024

norbusan Sep 26, 2024

dginev left a comment

		@@ -38,6 +46,13 @@ def yaml_repr_ordered_dict(dumper: RoundTripRepresenter, data: OrderedDict) -> M
		return dumper.represent_mapping("tag:yaml.org,2002:map", dict(data))


		def strip_to_basename(path_list: list[str], extent: None \| str = None) -> list[str]:

[ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring #65

[ARXIVCE-2542] use new preflight and zzrm in tex2pdf, change internal wiring #65

Conversation

norbusan commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dginev Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dginev left a comment

Choose a reason for hiding this comment

dginev Sep 25, 2024 •

edited

Loading