Skip to content

Conversation

@khanhtc1202
Copy link
Member

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

NONE

Copy link
Collaborator

@pipecd-bot pipecd-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GO_LINTER

Some issues were detected while linting go source files in your changes.

data []interface{}
}

func (it *Iterator) Next(dst interface{}) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dst is unused in Next

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.44%. This pull request decreases coverage by -0.08%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

Comment on lines 80 to 95
parts, err := f.backend.List(ctx, dpath)
if err != nil {
f.logger.Error("failed to find entities",
zap.String("kind", kind),
zap.Error(err),
)
return nil, err
}

if objects == nil {
objects = make(map[string][][]byte, len(parts))
}
for _, obj := range parts {
id := filepath.Base(obj.Path)

data, err := f.fetch(ctx, obj.Path)
Copy link
Member

@nghialv nghialv Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I was thinking that the List function also returns data of file objects so that we can have their contents without requiring any extra requests.
By this way, a lot of requests number_of_shards * (1 + number_of_objects) will be required in a short time and I think it will not be realistic.

Do we have any better idea for that problem?

Copy link
Member Author

@khanhtc1202 khanhtc1202 Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel you, the (n+1) problem, right. Tbh, I was thinking as you at first but when I read the place we use this filestore List interface (ref: planpreview cleaner) I got the same surprise as you have now 😄

I tried to work around with the "list" support API from our supporting filestore (gcs, s3 and minio), and looks like they only support the get object by key as the only way to fetch the raw data of an object, all list API seems only returns the attributes and meta which we can use to fetch the content. I will investigate more about that, but in the worst case, I think we have 2 points to rely on which are:

  • the number of the object in hot storage of each kind is expected to be small enough, we may need to do some kind of middleware fetching objects parts in parallel here if necessary
  • cache storage in API layer will reduce the number of times we have to list this in a direct way.

Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation.

the number of the object in hot storage of each kind is expected to be all enough, we may need to do some kind of middleware fetching objects parts in parallel here if necessary

I'm afraid this way is not appropriate since we will make a burst of requests on external services so we could get the limit error.

cache storage in API layer will reduce the number of times we have to list this in a direct way.

Yes, that could be an approachable way. I think it is time to think about that before continuing implementation.
What cache solution do you think about now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nghialv thank you so much for your comment 🙏 I added logic to check whether the raw data of the object part is updated or not based on the etag value returned from List objects request. For now, we will only fetch for the object part stored under the given path in case there is no version of it found in filedb.cache. PTAL when you have time 🙌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Let me take a look.

@khanhtc1202
Copy link
Member Author

/hold

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.44%. This pull request decreases coverage by -0.08%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

@khanhtc1202
Copy link
Member Author

/hold cancel

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.42%. This pull request decreases coverage by -0.09%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

Comment on lines 100 to 104
cdata, err := f.cache.Get(obj.Etag)
if err == nil {
objects[id] = append(objects[id], cdata.([]byte))
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about grouping and storing all models of a kind using HashCache https://github.com/pipe-cd/pipecd/blob/master/pkg/cache/rediscache/hashcache.go?
Then we can fetchAll to check instead of calling every time like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think we need to design the key name carefully instead of directly using the Etag value to avoid conflict with other keys in Redis. At other places we're prefixing the name.

Copy link
Member Author

@khanhtc1202 khanhtc1202 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel you, in that case, it's possible to have some objects updated while the whole others are not, meaning that we still for loop to check which one is updated and which one is not. In that case, we have a minor point to care about: Since I store the whole object content as the value for etag key, if we group it as a single value and use GetAll, it's a bit dangerous in case there are a lot of objects. Of course, as the downside of the current implementation, we may have a bunch of requests to cache to check whether the object is updated or not based on its etag. Wdyt about this trade-off 🤔

Copy link
Member Author

@khanhtc1202 khanhtc1202 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think we need to design the key name carefully instead of directly using the Etag value to avoid conflict with other keys in Redis. At other places we're prefixing the name.

Nice catch, lets me address it 🙆‍♂️ tbh, I made the key as id_shard_etag at first, but just feel like it's overdone, lets me add etag_ prefix to this key as other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible to have some objects updated while the whole others are not

Yes, it is. For the outdated ones, we call Cache HSET to update after fetching directly from the file store.

if we group it as a single value on use GetAll, it's a bit dangerous in case there are a lot of objects

I see. Storing by HashCache, I mean the Field Key is the object ID instead of eTag, eTag is included in the value with object content. So the number of objects will not increase when the object is updated.
When storing by ETag the number of objects in cache will increase quickly whenever an object has been updated. In that case, we will need TTL or something like that to deal with that problem.

Wdyt?

Copy link
Member Author

@khanhtc1202 khanhtc1202 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this

  • Normal cache (not hashcache)
  • FieldKey: entity_id_shard
  • FieldValue: {etag: etag_value, data: data}

HashCache only helps us to reduce the number of fetch cache request, but the downside is that we have 2 for loop to handle which should be updated, and the size of the value on the hashkey can be too large since we store the whole object content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am concerned about that way is that it still requires N requests to our Redis.
But definitely, it is simpler so let's apply that way for now and see how it goes. 👍

Copy link
Member Author

@khanhtc1202 khanhtc1202 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am concerned about that way is that it still requires N requests to our Redis.

I feel you, then why not both 🤔
I mean we can make our cache a bit more complicated to handle this case, we will have:

hashcache as your drafted with:

  • hashkey: List_{kind}
  • field key: entityId_shard
  • field value: etagValue

cache to store object data

  • key: etag_entityId (I mean etag_ is the prefix not the value of etag)
  • value: {etag: etagValue, data: data}

And whenever we update the etagValue in the hashcache, we update the cache storing object content as well. Wdyt 👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I forgot that we still need to fetch cache to get the content by separated cache request. Please forget the above suggestion 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL when you have time 😉

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.42%. This pull request decreases coverage by -0.09%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go makeCacheEtagKey -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.39%. This pull request decreases coverage by -0.12%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go NewCache -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Get -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Put -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go makeKey -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.39%. This pull request decreases coverage by -0.12%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go NewCache -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Get -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Put -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go makeKey -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

}

func makeKey(shard datastore.Shard, id string) string {
return fmt.Sprintf("filedb_object_%s_%s", id, shard)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@khanhtc1202 khanhtc1202 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍 Addressed by 2b6d569 🙏

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.39%. This pull request decreases coverage by -0.12%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go NewCache -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Get -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Put -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go makeObjectKey -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.39%. This pull request decreases coverage by -0.12%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go NewCache -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Get -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Put -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go makeObjectKey -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

@nghialv
Copy link
Member

nghialv commented Mar 3, 2022

Here you go.
/lgtm

@pipecd-bot pipecd-bot added the lgtm label Mar 3, 2022
@pipecd-bot pipecd-bot removed the lgtm label Mar 3, 2022
@pipecd-bot
Copy link
Collaborator

TODO

The following ISSUES will be created once got merged. If you want me to skip creating the issue, you can use /todo skip command.

Details

1. Implement filterable interface for each collection.

// TODO: Implement filterable interface for each collection.
type filterable interface {
Match(e interface{}, filters []datastore.ListFilter) (bool, error)
}

This was created by todo plugin since "TODO:" was found in 8a13375 when #3329 was merged. cc: @khanhtc1202.

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 32.39%. This pull request decreases coverage by -0.12%.

File Function Base Head Diff
pkg/datastore/filedb/filedb.go makeHotStorageDirPath -- 0.00% +0.00%
pkg/datastore/filedb/filter.go filter -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Next -- 0.00% +0.00%
pkg/datastore/filedb/iterator.go Iterator.Cursor -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go NewCache -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Get -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go objectCache.Put -- 0.00% +0.00%
pkg/datastore/filedb/objectcache/cache.go makeObjectKey -- 0.00% +0.00%
pkg/datastore/filedb/filedb.go FileDB.Find 0.00% 0.00% +0.00%

@knanao
Copy link
Member

knanao commented Mar 4, 2022

Here you go!
/approve

@pipecd-bot
Copy link
Collaborator

APPROVE

This pull request is APPROVED by knanao.

Approvers can cancel the approval by writing /approve cancel in a comment. Any additional commits also will change this pull request to be not-approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants