Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import org.apache.hudi.cli.TableHeader;
import org.apache.hudi.common.fs.FSUtils;
import org.apache.hudi.common.model.HoodieLogFile;
import org.apache.hudi.common.table.HoodieTableMetaClient;
import org.apache.hudi.common.table.log.HoodieLogFormat;
import org.apache.hudi.common.table.log.HoodieLogFormat.Reader;
import org.apache.hudi.common.table.log.block.HoodieAvroDataBlock;
Expand Down Expand Up @@ -63,7 +64,7 @@ public String showArchivedCommits(
throws IOException {
System.out.println("===============> Showing only " + limit + " archived commits <===============");
String basePath = HoodieCLI.getTableMetaClient().getBasePath();
Path archivePath = new Path(basePath + "/.hoodie/.commits_.archive*");
Path archivePath = new Path(HoodieCLI.getTableMetaClient().getArchivePath() + "/.commits_.archive*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming HoodieCLI.getTableMetaClient().getArchivePath() returns the path with .hoodie ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public String getArchivePath() {
  String archiveFolder = tableConfig.getArchivelogFolder();
  if (archiveFolder.equals(HoodieTableConfig.DEFAULT_ARCHIVELOG_FOLDER)) {
    return getMetaPath();
  } else {
    return getMetaPath() + "/" + archiveFolder;
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming HoodieCLI.getTableMetaClient().getArchivePath() returns the path with .hoodie ?

Yes, it read from metadata and include .hoodie.

if (folder != null && !folder.isEmpty()) {
archivePath = new Path(basePath + "/.hoodie/" + folder);
}
Expand Down Expand Up @@ -138,9 +139,11 @@ public String showCommits(
throws IOException {

System.out.println("===============> Showing only " + limit + " archived commits <===============");
String basePath = HoodieCLI.getTableMetaClient().getBasePath();
HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
String basePath = metaClient.getBasePath();
Path archivePath = new Path(metaClient.getArchivePath() + "/.commits_.archive*");
FileStatus[] fsStatuses =
FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + "/.hoodie/.commits_.archive*"));
FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can affect all the old tables read archives.

Copy link
Contributor

@n3nash n3nash Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archivePath = new Path(metaClient.getArchivePath() + "/.commits_.archive*") is equivalent of new Path(basePath + "/.hoodie/.commits_.archive*"). the metaClient.getArchivePath() should return basePath + "/.hoodie" for all old tables.
@hmatu what concerns do you have ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash, they are different,
one: /table/.hoodie/archived/.commits_.archive*
another: /table/.hoodie/.commits_archive*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right way is add archiveFolderPattern to show archived commits command
like show archived commit stats dose.

Copy link
Contributor Author

@hddong hddong Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmatu @n3nash It return /table/.hoodie/.commits_archive* if old tables use DEFAULT '' and return /table/.hoodie/archived/.commits_.archive* if old tables use path archived to archive. So it will return the correct path with archive path stored in .hoodie.
On the other hand,archiveFolderPattern allow for users to be able to provide a full path of the files under the archive folder and read it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explain, but I still think it's a right way that add archiveFolderPattern to show archived commits command like show archived commit stats dose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmatu I think this is just a clean up of code to remove the hard-coded "archived" and providing it through the archive folder name, this is fine IMO since this does not break any backwards compatibility @hddong confirm ?
What you are referring to is also correct, but that should be another PR to allow for reading a path pattern for show archived commits just like others have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash, this pr aims to "Adjust the read and write path of archive". IMO, it is unnecessary to create an new pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmatu I agree, I think this PR is just doing some basic cleanup so we should just rename this PR and merge it and fix the adjusting of the read/write path in a different one.
@hddong please rename this PR as "archived commits command code cleanup" and we can merge this

List<Comparable[]> allCommits = new ArrayList<>();
for (FileStatus fs : fsStatuses) {
// read the archived file
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,13 @@ public void testRunHoodieJavaApp(String command, String hiveTableName, String ta
String hdfsPath = "/" + hiveTableName;
String hdfsUrl = HDFS_BASE_URL + hdfsPath;

// Delete hdfs path if it exists
try {
executeCommandStringInDocker(ADHOC_1_CONTAINER, "hdfs dfs -rm -r " + hdfsUrl, true);
} catch (AssertionError ex) {
// Path not exists, pass
}

// Drop Table if it exists
try {
dropHiveTables(hiveTableName, tableType);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,11 @@ import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.avro.HoodieAvroUtils
import org.apache.hudi.client.{HoodieWriteClient, WriteStatus}
import org.apache.hudi.common.config.TypedProperties
import org.apache.hudi.common.model.{HoodieRecordPayload, HoodieTableType, WriteOperationType}
import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient}
import org.apache.hudi.common.model.HoodieRecordPayload
import org.apache.hudi.common.model.HoodieTableType
import org.apache.hudi.common.model.WriteOperationType
import org.apache.hudi.common.table.HoodieTableConfig
import org.apache.hudi.common.table.HoodieTableMetaClient
import org.apache.hudi.common.table.timeline.HoodieActiveTimeline
import org.apache.hudi.common.util.ReflectionUtils
import org.apache.hudi.config.HoodieBootstrapConfig.{BOOTSTRAP_BASE_PATH_PROP, BOOTSTRAP_INDEX_CLASS_PROP, DEFAULT_BOOTSTRAP_INDEX_CLASS}
Expand Down Expand Up @@ -105,8 +108,10 @@ private[hudi] object HoodieSparkSqlWriter {
handleSaveModes(mode, basePath, tableConfig, tblName, operation, fs)
// Create the table if not present
if (!tableExists) {
val archiveLogFolder = parameters.getOrElse(
HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
val tableMetaClient = HoodieTableMetaClient.initTableType(sparkContext.hadoopConfiguration, path.get,
HoodieTableType.valueOf(tableType), tblName, "archived", parameters(PAYLOAD_CLASS_OPT_KEY),
HoodieTableType.valueOf(tableType), tblName, archiveLogFolder, parameters(PAYLOAD_CLASS_OPT_KEY),
null.asInstanceOf[String])
tableConfig = tableMetaClient.getTableConfig
}
Expand Down Expand Up @@ -244,8 +249,10 @@ private[hudi] object HoodieSparkSqlWriter {
}

if (!tableExists) {
val archiveLogFolder = parameters.getOrElse(
HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
HoodieTableMetaClient.initTableTypeWithBootstrap(sparkContext.hadoopConfiguration, path,
HoodieTableType.valueOf(tableType), tableName, "archived", parameters(PAYLOAD_CLASS_OPT_KEY),
HoodieTableType.valueOf(tableType), tableName, archiveLogFolder, parameters(PAYLOAD_CLASS_OPT_KEY),
null, bootstrapIndexClass, bootstrapBasePath)
}

Expand Down