Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Nov 28, 2023

What changes were proposed in this pull request?

1.After pr #44012, the output format of some 'ipynb' tables displayed in HTML format has been disrupted. The pr aims to fix table format error in ipynb docs.

  • Before:
    image

  • After:
    image

2.Fix some minor errors.

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

Yes, only for docs.

How was this patch tested?

Manually test.
Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

"outputs": [],
"source": [
"!$HOME/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:$SPARK_VERSION"
"!$HOME/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.13:$SPARK_VERSION"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I made some changes because currently our scala version is 2.13, 2.12 is no longer supported.

@panbingkun
Copy link
Contributor Author

cc @itholic @HyukjinKwon

"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's present the output results in text format instead of text/html format to avoid formatting errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this example should show the output nicely as spark.sql.repl.eagerEval.enabled is enabled. Wonder if we can fix the docs instead.

Copy link
Contributor Author

@panbingkun panbingkun Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Is the following presentation style appropriate for this special case?

  • Dark theme:
    image

  • Light theme:
    image

"source": [
"df.toPandas()"
"from tabulate import tabulate\n",
"print(tabulate(df.toPandas(), headers = 'keys', tablefmt = 'psql'))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the output format looks fine but the whole point of using spark.sql.repl.eagerEval.enabled is to show a pretty table format without applying any other operations in the notebook.

Can you maybe just manually fix the output text/html to be compatible with both the sphinx dark theme and jupyter notebook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Dark theme:
    image

  • Light theme:
    image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understand what you mean.
For this document, I have modified the style to maintain using spark.sql.repl.eagerEval.enabled purpose to show a pretty table format without applying any other operations in the notebook.

Copy link
Contributor Author

@panbingkun panbingkun Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For python/docs/source/getting_started/quickstart_df.ipynb, are we going to do something similar?
Because this example does not use spark.sql.repl.eagerEval.enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table format is also correct in jupyter notebook
image

@itholic
Copy link
Contributor

itholic commented Nov 29, 2023

Nice fix! +1 for #44049 (comment), otherwise it looks good to me.

"outputs": [
{
"data": {
"text/html": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can you also manually fix the HTML here instead of using print(psdf)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's possible, it would really be great :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll give it a try.

" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
"<table border=\"1\" class=\"dataframe\" style=\"table-layout: auto;margin-right: auto;margin-left: 0;\">\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example result in this place is incorrect, we need to correct it
https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_ps.html#Grouping

Before:
image

After:
image

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for fixing this @panbingkun !!!

@HyukjinKwon
Copy link
Member

Merged to master.

asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
### What changes were proposed in this pull request?

1.After pr apache#44012, the output format of some 'ipynb' tables displayed in HTML format has been disrupted. The pr aims to fix table format error in ipynb docs.

- Before:
   <img width="792" alt="image" src="https://github.com/apache/spark/assets/15246973/2095a2ac-f0b5-44bd-a3c2-ce742d041243">

- After:
   <img width="739" alt="image" src="https://github.com/apache/spark/assets/15246973/ec0be72d-4dc0-44f4-ab75-d9668e32fc51">

2.Fix some minor errors.
### Why are the changes needed?
Fix bug.

### Does this PR introduce _any_ user-facing change?
Yes, only for docs.

### How was this patch tested?
Manually test.
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#44049 from panbingkun/SPARK-46135.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants