Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/maven_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -190,18 +190,18 @@ jobs:
export ENABLE_KINESIS_TESTS=0
# Replace with the real module name, for example, connector#kafka-0-10 -> connector/kafka-0-10
export TEST_MODULES=`echo "$MODULES_TO_TEST" | sed -e "s%#%/%g"`
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} clean install
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} clean install
if [[ "$INCLUDED_TAGS" != "" ]]; then
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.include.tags="$INCLUDED_TAGS" test -fae
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.include.tags="$INCLUDED_TAGS" test -fae
elif [[ "$MODULES_TO_TEST" == "connect" ]]; then
./build/mvn $MAVEN_CLI_OPTS -Dtest.exclude.tags="$EXCLUDED_TAGS" -Djava.version=${JAVA_VERSION/-ea} -pl connector/connect/client/jvm,connector/connect/common,connector/connect/server test -fae
elif [[ "$EXCLUDED_TAGS" != "" ]]; then
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae
elif [[ "$MODULES_TO_TEST" == *"sql#hive-thriftserver"* ]]; then
# To avoid a compilation loop, for the `sql/hive-thriftserver` module, run `clean install` instead
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} clean install -fae
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} clean install -fae
else
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Pspark-ganglia-lgpl -Phadoop-cloud -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} test -fae
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Pspark-ganglia-lgpl -Phadoop-cloud -Pjvm-profiler -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} test -fae
fi
- name: Clean up local Maven repository
run: |
Expand Down
2 changes: 1 addition & 1 deletion connector/profiler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Code profiling is currently only supported for
To get maximum profiling information set the following jvm options for the executor :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to rename the file connector/profiler/README.md to "jvm-profiler-integration.md" and move it to the directory docs/jvm-profiler-integration.md", while linking it in the file docs/building-spark.md? Do we need to do this? @dongjoon-hyun

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


```
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
spark.executor.extraJavaOptions=-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer
```

For more information on async_profiler see the [Async Profiler Manual](https://krzysztofslusarski.github.io/2022/12/12/async-manual.html)
Expand Down
6 changes: 5 additions & 1 deletion connector/profiler/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
</properties>
<packaging>jar</packaging>
<name>Spark Profiler</name>
<description>
Enables code profiling of executors based on the the async profiler.
</description>
<url>https://spark.apache.org/</url>

<dependencies>
Expand All @@ -44,7 +47,8 @@
<dependency>
<groupId>me.bechberger</groupId>
<artifactId>ap-loader-all</artifactId>
<version>3.0-9</version>
<version>${ap-loader.version}</version>
<scope>provided</scope>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @parthchandra , too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to include this feature in the spark release. I feel though, that if we are making it available in the release the ap-loader dependency should be included as well. (Currently, if we build with the jvm-profiler profile, the dependency is included in the build).
Surprisingly, the jar file is not very large (I have version 2.9.7 and it is only 830K).
Either way, including this in the release is already a big step so I'm fine with asking users to download the jar.

</dependency>
</dependencies>
</project>
2 changes: 1 addition & 1 deletion dev/create-release/release-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ SCALA_2_12_PROFILES="-Pscala-2.12"
HIVE_PROFILES="-Phive -Phive-thriftserver"
# Profiles for publishing snapshots and release to Maven Central
# We use Apache Hive 2.3 for publishing
PUBLISH_PROFILES="$BASE_PROFILES $HIVE_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud"
PUBLISH_PROFILES="$BASE_PROFILES $HIVE_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud -Pjvm-profiler"
# Profiles for building binary releases
BASE_RELEASE_PROFILES="$BASE_PROFILES -Psparkr"

Expand Down
2 changes: 1 addition & 1 deletion dev/test-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ export LC_ALL=C
# NOTE: These should match those in the release publishing script, and be kept in sync with
# dev/create-release/release-build.sh
HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pkubernetes -Pyarn -Phive \
-Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud"
-Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud -Pjvm-profiler"
Copy link
Member

@dongjoon-hyun dongjoon-hyun May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Kafka module, we should not include this here.

Do you know how we skip Kafka module's dependency here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ grep kafka dev/deps/spark-deps-hadoop-3-hive-2.3 | wc -l
       0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will remove it.
But I have a question. If the end user wants to use spark-profiler to analyze the executor, does he download 'me.bechberger:ap-loader-all:xxx' from the maven central repository and use it together with module spark-profiler? If that's the way it used to be, I am fine to it.
Maybe we need to add detailed guide in some document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Kafka module, we should not include this here.

Do you know how we skip Kafka module's dependency here?

Let me investigate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me add detailed guide in some document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Kafka module, we should not include this here.

Do you know how we skip Kafka module's dependency here?

I already know about it. It is implemented through maven scope overloading. Let me try.

MVN="build/mvn"
HADOOP_HIVE_PROFILES=(
hadoop-3-hive-2.3
Expand Down
7 changes: 7 additions & 0 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,13 @@ where `spark-streaming_{{site.SCALA_BINARY_VERSION}}` is the `artifactId` as def

./build/mvn -Pconnect -DskipTests clean package

## Building with JVM Profile support

./build/mvn -Pjvm-profiler -DskipTests clean package

**Note:** The `jvm-profiler` profile builds the assembly without including the dependency `ap-loader`,
you can download it manually from maven central repo and use it together with `spark-profiler_{{site.SCALA_BINARY_VERSION}}`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual effect is like this:
image

## Continuous Compilation

We use the scala-maven-plugin which supports incremental and continuous compilation. E.g.
Expand Down
3 changes: 3 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,9 @@
<mima.version>1.1.3</mima.version>
<tomcat.annotations.api.version>6.0.53</tomcat.annotations.api.version>

<!-- Version used in Profiler -->
<ap-loader.version>3.0-9</ap-loader.version>

<CodeCacheSize>128m</CodeCacheSize>
<!-- Needed for consistent times -->
<maven.build.timestamp.format>yyyy-MM-dd HH:mm:ss z</maven.build.timestamp.format>
Expand Down