-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONNECTOR] Add script to start hdfs fuse #1480
base: main
Are you sure you want to change the base?
[CONNECTOR] Add script to start hdfs fuse #1480
Conversation
2bb7f0e
to
b4071eb
Compare
docker/start_hdfs_fuse.sh
Outdated
RUN echo \"user_allow_other\" >> /etc/fuse.conf | ||
|
||
WORKDIR /opt/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs | ||
RUN sed -i -e '18aexport CLASSPATH=\\\${HADOOP_HOME}/etc/hadoop:\`find \\\${HADOOP_HOME}/share/hadoop/ | awk '\"'\"'{path=path\":\"\\\$0}END{print path}'\"'\"'\`' \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這段蠻複雜的,可否說明一下?
docker/start_hdfs_fuse.sh
Outdated
&& ./b2 --without-python \\ | ||
&& ./b2 --without-python install | ||
|
||
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubuntu 安裝 JDK 時應該已經暴露了 JAVA_HOME,請問還有需要再寫一次嗎
@chaohengstudent 另外其實要建構 Hadoop native library,可以重複使用 Hadoop 已經建構好的腳本 (https://github.com/apache/hadoop/blob/trunk/start-build-env.sh) 如此就可以大幅簡化我們自己的腳本,只要做到幾件事情:
你覺得呢? |
想請問是指透過腳本在使用者本地端建構 native library 的意思嗎? |
Hadoop 官方腳本也是用容器來建構 native library,它的容器會先安裝好所有必備的套件,然後將 host 上的hadoop掛載進容器然後建構,所以最後產出的 native library 會保留在 host 上面 我們可以重複利用該腳本,這樣我們可以比較方便獲得 native library,同時也不需花太多心力維護(交給 hadoop 社群)。不過我們的腳本要確保 host 身上有 git, hadoop source code, maven repo 等東西的存在,並且在建構好 native library後把該些東西再包進我們自己的 docker image裡面 |
# Conflicts: # .gitignore
function generateFuseDfsWrapper() { | ||
cat > "$FUSE_DFS_WRAPPER_SH" << 'EOF' | ||
#!/usr/bin/env bash | ||
|
||
export FUSEDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs" | ||
export LIBHDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/target/usr/local/lib" | ||
export PATH=$FUSEDFS_PATH:$PATH | ||
export LD_LIBRARY_PATH=$LIBHDFS_PATH:$JAVA_HOME/lib/server | ||
while IFS= read -r -d '' file | ||
do | ||
export CLASSPATH=$CLASSPATH:$file | ||
done < <(find "$HADOOP_HOME/hadoop-tools" -name "*.jar" -print0) | ||
|
||
fuse_dfs "$@" | ||
EOF | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fuse_dfs_wrapper.sh may not work out of box.
To use it, look at all the paths in fuse_dfs_wrapper.sh and either correct them
or set them in your environment before running.
這是目前修改的方式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
請問一下你觀察到的錯誤是什麼?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
編譯3.3.4版路徑
- LIBHDFS_PATH
#origin
export LIBHDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/usr/local/lib"
#edit
export LIBHDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/target/usr/local/lib"
- LD_LIBRARY_PATH
#origin
export LD_LIBRARY_PATH=$LIBHDFS_PATH:$JAVA_HOME/jre/lib/$OS_ARCH/server
#edit (JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64)
export LD_LIBRARY_PATH=$LIBHDFS_PATH:$JAVA_HOME/lib/server
- CLASSPATH
原本是加入
1.$HADOOP_HOME/hadoop-client
(此版本中路徑為$HADOOP_HOME/hadoop-client-modules
)2.$HADOOP_HOME/hadoop-hdfs-project
的jar
實際操作是需要加入$HADOOP_HOME/hadoop-tools
的 jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這邊的修正有一些太多,可能比較適合貢獻回 Hadoop 社群
你能試著去 Hadoop JiRa 開這個議題並且提供 patch 嗎?
等社群修正後,這隻PR就可以改成抓 trunk branch 下來建置
cloneSrcIfNeed | ||
cd $HADOOP_SRC_PATH | ||
git checkout rel/release-${VERSION} | ||
replaceLine 17 USER=\$\(whoami\) start-build-env.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這裡是想避免 ./start-build-env.sh
的 USER 參數被 docker_build_common.sh
的 USER 影響,導致建構失敗。
@chaohengstudent 麻煩把開到 Hadoop jira 的連接寫到描述,並請說明一下狀況 |
related to #1438
啟動方法
進入 container 操作 hdfs
03/12 update
hdfs jira: https://issues.apache.org/jira/browse/HDFS-16930
目前狀況:
測試時腳本中會需要用到
以上 jar檔
還在理解需要如何使用到這些依賴