Hadoop API package support mulit data layer such as blobstore (Erasure-Code) to adapte to datalake.
mvn package -Dmaven.test.skip=true
Notes: You are suggested to skip the test.
When using the SDK, you need to use two jars and one so and one profile.
Dependent packages
- cubefs-hadoop-x.x.x.jar(build by mvn pakcage)
- libcfs.so(build from cubefs/libsdk)
- jna-x.x.x.jar(you can find it in the maven repository, minimum version is 4.0)
Configuration
Modify the configuration file(core-site.xml or hdfs-site.xml) and add the following items to it.
<property>
<name>fs.cfs.impl</name>
<value>io.cubefs.CubefsFileSystem</value>
</property>
<property>
<name>cfs.master.address</name>
<value>your.master.address[ip:port,ip:port,ip:port]</value>
</property>
<property>
<name>cfs.log.dir</name>
<value>your.log.dir[/tmp/cfs-access-log]</value>
</property>
<property>
<name>cfs.log.level</name>
<value>INFO</value>
</property>
<property>
<name>cfs.access.key</name>
<value>your.access.key</value>
</property>
<property>
<name>cfs.secret.key</name>
<value>your.secret.key</value>
</property>
<property>
<name>cfs.min.buffersize</name>
<value>8388608</value>
</property>
- Put the two jars to $HADOOP_HOME/share/hadoop/common/lib
- Put the so to $HADOOP_HOME/lib/native
- Modify the configuration file $HADOOP_HOME/etc/hadoop/core-site.xml
Same to HDFS Shell on CubeFS.
- Put the three dependent packages to $SPARK_HOME/jars
Same to HDFS Shell on CubeFS.
- Put the three dependent packages to $PRESTO_HOME/plugin/hive-hadoop2
- Put the two jars to $TRINO_HOME/plugin/iceberg for iceberg table
- Link libcfs.so (ln -s $PRESTO_HOME/plugin/hive-hadoop2/libcfs.so /usr/lib; sudo ldconfig)
- Put the three dependent packages to $FLINK_HOME/lib
- Link libcfs.so (ln -s $FLINK_HOME/lib/libcfs.so /usr/lib; sudo ldconfig)