Skip to content

Commit

Permalink
[HOPSWORKS-3125] Feature view (#923)
Browse files Browse the repository at this point in the history
* [HOPSWORKS-2945] [FeatureView] Implement activity endpoints (#829)

* init

* big fixes

* Update ActivityResource.java

* Update FeatureViewController.java

* Update FeatureViewController.java

* add logging of activity for FV

* Update feature_store_activity_spec.rb

* Update feature_store_activity_spec.rb

* Update featurestore_helper.rb

* Update featurestore_helper.rb

* Update FeatureViewController.java

* add statistics

* [HOPSWORKS-2947] [ModelFeature] Implement PrepareStatementResource (#824)

* init

* improvements

* small changes to the backend

* temp

* Update FeatureViewController.java

* Update preparedstatements_spec.rb

* standardize

* Update featurestore_helper.rb

* small updates

* small changes

* Update preparedstatements_spec.rb

* addressing feedback

* change endpoint to lower case

* [HOPSWORKS-2943] Add query resource (#834)

* implement batch query endpoint

* handle empty start or end time

* throw feature store exception

* return original query

* check nested join

* fix NPE for join

* add feature group to feature

* return features properly.

* add IT for batch query

* fix batch query test

* set commit time to query

* add timetravel condition

* reuse get_query from TrainingDatasetController

* pass withLabel and isHiveEngine when constructing batch query

* reformat

* test get query

* add test for get query

* remove object variables

* fix indentation

* refactor getJoinsSorted

* fix format

* reformat

* use getSingleByNameVersionAndFeatureStore

* add query filter to FV

* refactor event time filter, add unit test

* fix resourcerequest NPE

* add featureview to trainingdatasetfilter

* add query filter to IT

Co-authored-by: Kenneth Mak <[email protected]>

* [HOPSWORKS-2944] Implement keyword related endpoints (#842)

* needs inode to function

* add inode

* Update FeatureView.java

* Update FeatureViewController.java

* Update ProjectController.java

* changes with inode

* Update FeatureViewController.java

* inode work

* Update FeatureViewController.java

* Update FeatureViewController.java

* Update FeatureViewController.java

* Update FeatureViewController.java

* Update ProjectController.java

* permissions

* small changes

* tests

* Update featureview_keywords_spec.rb

* some feedback addressed

* add featureView xattr

* Update FeatureViewBuilder.java

* Update FeatureViewController.java

* Update FeatureViewController.java

* Update FeatureViewController.java

* save hopsfs connector

* Update ProjectController.java

* Update FeatureViewController.java

* Update FeatureViewController.java

* add features to featureViewDTO

* Update FeatureViewBuilder.java

* Update HopsFSProvenanceController.java

* removing fv dir

* Update FeaturestoreController.java

* path changes for fv

* restructure fv path

* Update HopsFSProvenanceController.java

* Update HopsworksJAXBContext.java

* Update featureview_keywords_spec.rb

* some of the comments addressed

* remove keyword controller duplication

* make createFeatureView use and return FeatureView instead of FeatureViewDTO

* Update FeatureViewBuilder.java

* Update FeaturestoreKeywordResource.java

* change feature view path

* Update featureview_keywords_spec.rb

* Update featureview_keywords_spec.rb

* [HOPSWORKS-2946] Add transformation resource and statistics resource (#856)

* get transformation function

* statistics resource

* fix statistics dto forTransformation

* fix transformation uri

* add IT for transformation

* change statistics folder name

* remove td from feature view statistics

* delete statistics along with feature view

* add IT for feature view statistics

* Revert "add IT for feature view statistics"

This reverts commit db49dd0971fe1cdc7929f8fd0c13a5249c451ff0.

* Revert "delete statistics along with feature view"

This reverts commit 697768d51b186c14e2d451858f0552df1d8ffa77.

* Revert "change statistics folder name"

This reverts commit 922c17f94c14c50a771b7d70f04154712305ab19.

* Revert "fix statistics dto forTransformation"

This reverts commit 3143984906d6cc8eced6fb3349d24c4a0a4df360.

* Revert "statistics resource"

This reverts commit 7361bd8ff06ffff239285bfb8360f25ba14ef68a.

* fix access control

* refactor uri

* [FeatureView] return original query as expansion (#866)

* [HOPSWORKS-2942] Implement TagResource (#864)

* init

* Update TagResource.java

* Update TagResource.java

* bug fixing/standardization

* tests

* generic tags resource

* fix for the abstract tags resource

* some feedback addressed

* Update TagsResource.java

Co-authored-by: Alex Ormenisan <[email protected]>

* [HOPSWORKS-3064] [FeatureView] Allow update of metadata in FeatureView resource (#895)

* init

* Update FeatureViewResource.java

* bug fixing

* Update featureview_spec.rb

* Update featureview_spec.rb

* [HOPSWORKS-2941] Implement Training Dataset resource (#845)

* implement batch query endpoint

* throw feature store exception

* return features properly.

* add IT for batch query

* fix batch query test

* reuse get_query from TrainingDatasetController

* create td

* compute td

* get td

* delete td

* delete td data only

* fix compile

* reformat

* set feature view

* set featurestore and project

* implement statistics resource

* handle hopsfstrainingdataset is null

* return all td

* fix create and get td

* do not return query in dto

* skip feature in dto

* set query in job config

* add td IT

* add td IT

* fix internal td it

* external td IT

* add external td test

* rename create td method

* reformat

* move query related methods to QueryController

* revert unintended changes

* refactor get FV

* fix failed tests

* fix comments

* check name of training dataset in IT

* check name of feature view against td

* keep td folder when deleting data only

* fix failed test

* fix failed test

* remove extra update

* return features after creating fv

* fix td exist error

* create feature view td job

* do not assume a split if splits is empty

* remove extra get annotation

* set default td name = fv name + version

* return batch query as object

* set start time and end time to td

* return feature group in correct type

* throw exception if feature not found

* fix test

* fix feature to featuredto map

* fix query test

* [Append] training dataset resource

remove redundant lines in test

* rebase

* [Hopsworks-3063] [FeatureView] Allow update of metadata in TrainingDataset resource (#921)

* init

* Update featurestore_helper.rb

* Update featureview_trainingdataset_spec.rb

* remove label from feature view table (#924)

* [HOPSWORKS-3073] [FeatureView] Remove ResourceRequest from FeatureViewController (#925)

* init

small changes also on other endpoints to standardize

* addressing the feedback

* [HOPSWORKS-2941] [Append]  td resource (#930)

* fix dataset

* fix infinite loop when getting all feature groups

* change event time type

* use date as event time in batch query

* add in memory td

* fix prepared statement comment

* refactor commit time

* fix keyword resource

* fix transformation

* remove unused import

* assign storage connector to in-memory td

* fix integration test

Co-authored-by: Ralf <[email protected]>
Co-authored-by: Kenneth Mak <[email protected]>
Co-authored-by: Alex Ormenisan <[email protected]>
  • Loading branch information
4 people committed May 17, 2022
1 parent df10ec8 commit af28c64
Show file tree
Hide file tree
Showing 74 changed files with 5,359 additions and 654 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

package io.hops.hopsworks.featurestore;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.base.Strings;
import io.hops.hopsworks.common.featurestore.FeaturestoreConstants;
Expand All @@ -14,6 +15,7 @@
import io.hops.hopsworks.common.hdfs.DistributedFileSystemOps;
import io.hops.hopsworks.common.hdfs.DistributedFsService;
import io.hops.hopsworks.common.hdfs.HdfsUsersController;
import io.hops.hopsworks.common.hdfs.inode.InodeController;
import io.hops.hopsworks.common.hdfs.xattrs.XAttrsController;
import io.hops.hopsworks.common.integrations.EnterpriseStereotype;
import io.hops.hopsworks.exceptions.FeaturestoreException;
Expand Down Expand Up @@ -54,15 +56,18 @@ public class KeywordController implements KeywordControllerIface {
private TrainingDatasetController trainingDatasetController;
@EJB
private KeywordsUsedCache keywordsUsedCache;
@EJB
private InodeController inodeController;

private ObjectMapper objectMapper = new ObjectMapper();

public List<String> getAll(Project project, Users user,
Featuregroup featureGroup, TrainingDataset trainingDataset)
public List<String> getAll(Project project, Users user, Featuregroup featureGroup, TrainingDataset trainingDataset,
FeatureView featureView)
throws FeaturestoreException, MetadataException {
DistributedFileSystemOps udfso = dfs.getDfsOps(hdfsUsersController.getHdfsUserName(project, user));
try {
return getAll(featureGroup, trainingDataset, udfso);
String path = getPath(featureGroup, trainingDataset, featureView);
return getAll(path, udfso);
} catch (IOException e) {
throw new FeaturestoreException(RESTCodes.FeaturestoreErrorCode.KEYWORD_ERROR, Level.FINE,
"Error reading keywords", e.getMessage(), e);
Expand All @@ -71,46 +76,8 @@ public List<String> getAll(Project project, Users user,
}
}

public List<String> getAll(Featuregroup featureGroup, TrainingDataset trainingDataset, DistributedFileSystemOps udfso)
throws IOException, MetadataException, FeaturestoreException {
if (featureGroup != null) {
return getAll(featureGroup, udfso);
} else {
return getAll(trainingDataset, udfso);
}
}

public List<String> getAll(Project project, Users user, FeatureView featureView)
throws IOException, MetadataException, FeaturestoreException {
DistributedFileSystemOps udfso = dfs.getDfsOps(hdfsUsersController.getHdfsUserName(project, user));
String path = getFeatureViewLocation(featureView);
String keywords = xAttrsController.getXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, udfso);
if (!Strings.isNullOrEmpty(keywords)) {
return objectMapper.readValue(keywords, List.class);
} else {
return new ArrayList<>();
}
}

private String getFeatureViewLocation(FeatureView featureView) {
// TODO feature view:
return "";
}

private List<String> getAll(Featuregroup featuregroup, DistributedFileSystemOps udfso)
throws IOException, MetadataException {
String path = featuregroupController.getFeatureGroupLocation(featuregroup);
String keywords = xAttrsController.getXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, udfso);
if (!Strings.isNullOrEmpty(keywords)) {
return objectMapper.readValue(keywords, List.class);
} else {
return new ArrayList<>();
}
}

private List<String> getAll(TrainingDataset trainingDataset, DistributedFileSystemOps udfso)
throws IOException, MetadataException {
String path = trainingDatasetController.getTrainingDatasetInodePath(trainingDataset);
private List<String> getAll(String path, DistributedFileSystemOps udfso)
throws MetadataException, JsonProcessingException {
String keywords = xAttrsController.getXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, udfso);
if (!Strings.isNullOrEmpty(keywords)) {
return objectMapper.readValue(keywords, List.class);
Expand All @@ -120,19 +87,15 @@ private List<String> getAll(TrainingDataset trainingDataset, DistributedFileSyst
}

public List<String> replaceKeywords(Project project, Users user, Featuregroup featureGroup,
TrainingDataset trainingDataset, List<String> keywords)
TrainingDataset trainingDataset, FeatureView featureView, List<String> keywords)
throws FeaturestoreException, MetadataException {
validateKeywords(keywords);
Set<String> currentKeywords;
Set<String> currentKeywords = new HashSet<>(keywords);
DistributedFileSystemOps udfso = dfs.getDfsOps(hdfsUsersController.getHdfsUserName(project, user));
try {
currentKeywords = new HashSet<>(keywords);

if (featureGroup != null) {
addFeatureGroupKeywords(featureGroup, currentKeywords, udfso);
} else {
addTrainingDatasetKeywords(trainingDataset, currentKeywords, udfso);
}
String keywordsStr = objectMapper.writeValueAsString(currentKeywords);
String path = getPath(featureGroup, trainingDataset, featureView);
xAttrsController.addStrXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, keywordsStr, udfso);
} catch (IOException e) {
throw new FeaturestoreException(RESTCodes.FeaturestoreErrorCode.KEYWORD_ERROR, Level.FINE,
"Error adding keywords", e.getMessage(), e);
Expand All @@ -147,19 +110,17 @@ public List<String> replaceKeywords(Project project, Users user, Featuregroup fe
}

public List<String> deleteKeywords(Project project, Users user, Featuregroup featureGroup,
TrainingDataset trainingDataset, List<String> keywords)
TrainingDataset trainingDataset, FeatureView featureView, List<String> keywords)
throws FeaturestoreException, MetadataException {
Set<String> currentKeywords;
DistributedFileSystemOps udfso = dfs.getDfsOps(hdfsUsersController.getHdfsUserName(project, user));
try {
currentKeywords = new HashSet<>(getAll(featureGroup, trainingDataset, udfso));
String path = getPath(featureGroup, trainingDataset, featureView);
currentKeywords = new HashSet<>(getAll(path, udfso));
currentKeywords.removeAll(keywords);

if (featureGroup != null) {
addFeatureGroupKeywords(featureGroup, currentKeywords, udfso);
} else {
addTrainingDatasetKeywords(trainingDataset, currentKeywords, udfso);
}
String keywordsStr = objectMapper.writeValueAsString(currentKeywords);
xAttrsController.addStrXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, keywordsStr, udfso);
} catch (IOException e) {
throw new FeaturestoreException(RESTCodes.FeaturestoreErrorCode.KEYWORD_ERROR, Level.FINE,
"Error deleting keywords", e.getMessage(), e);
Expand All @@ -173,21 +134,6 @@ public List<String> deleteKeywords(Project project, Users user, Featuregroup fea
return new ArrayList<>(currentKeywords);
}

private void addFeatureGroupKeywords(Featuregroup featureGroup, Set<String> keywords,
DistributedFileSystemOps udfso) throws IOException, MetadataException {
String keywordsStr = objectMapper.writeValueAsString(keywords);
String path = featuregroupController.getFeatureGroupLocation(featureGroup);
xAttrsController.addStrXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, keywordsStr, udfso);
}

private void addTrainingDatasetKeywords(TrainingDataset trainingDataset, Set<String> keywords,
DistributedFileSystemOps udfso)
throws IOException, MetadataException {
String keywordsStr = objectMapper.writeValueAsString(keywords);
String path = trainingDatasetController.getTrainingDatasetInodePath(trainingDataset);
xAttrsController.addStrXAttr(path, FeaturestoreXAttrsConstants.KEYWORDS, keywordsStr, udfso);
}

public List<String> getUsedKeywords() throws FeaturestoreException {
try {
return keywordsUsedCache.getUsedKeywords();
Expand All @@ -197,6 +143,26 @@ public List<String> getUsedKeywords() throws FeaturestoreException {
}
}

private String getPath(Featuregroup featureGroup, TrainingDataset trainingDataset, FeatureView featureView)
throws FeaturestoreException {
String path;
if (featureGroup != null) {
path = featuregroupController.getFeatureGroupLocation(featureGroup);
} else if (trainingDataset != null) {
path = trainingDatasetController.getTrainingDatasetInodePath(trainingDataset);
} else if (featureView != null) {
path = getFeatureViewLocation(featureView);
} else {
throw new FeaturestoreException(RESTCodes.FeaturestoreErrorCode.KEYWORD_ERROR, Level.FINE,
"Error fetching keyword path");
}
return path;
}

private String getFeatureViewLocation(FeatureView featureView) throws FeaturestoreException {
return inodeController.getPath(featureView.getInode());
}

private void validateKeywords(List<String> keywords) throws FeaturestoreException {
for (String keyword : keywords) {
if (!FeaturestoreConstants.KEYWORDS_REGEX.matcher(keyword).matches()) {
Expand Down
Loading

0 comments on commit af28c64

Please sign in to comment.