Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store shard metadata in S3, add a tailing facility #5

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

bickfordb
Copy link

  • Record shard positions in S3: This writes out a $keyPath.metadata file like {"shards": {"shard id": {"min_sequence_number": "X", "max_sequence_number": "Y"}}}
  • Add NewTailAt() function to tail records starting at a point
  • Factored some things out (e.g. S3 key paths into ArchiveKey)

@@ -45,6 +47,26 @@ func (s *S3Uploader) Upload(fileName, keyName string) (err error) {
return
}

func (s *S3Uploader) UploadBuf(r io.Reader, keyName string) (err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name doesn't seem exactly right... the first argument is a Reader not a buffer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to UploadData()

@rhettg
Copy link
Contributor

rhettg commented Dec 9, 2015

Who is supposed to call noteSequenceNumber()?

triton/store.go Outdated

type streamMetadata struct {
// shard ID => shardInfo
Shards map[string]*shardInfo `json:"shards"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these actually need to be pointers to shardInfo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so because they are mutable

@bickfordb
Copy link
Author

Store now calls noteSequenceNumber()

@rhettg
Copy link
Contributor

rhettg commented Dec 10, 2015

Tests are failing

@rhettg
Copy link
Contributor

rhettg commented Dec 10, 2015

I think this is fine. Still seems like two totally independent designs with the Checkpointers and then this new interface, and they should be reconciled in some way.

But it's not really obvious to me right now how to do that.

👍

@rhettg
Copy link
Contributor

rhettg commented Dec 16, 2015

What's the story on this? Ready to go in?

@bickfordb
Copy link
Author

I'm ended up doing my tailing code in the Triton repo and based it off of
this branch. Still finishing that up and I'll update this review with that
code.

On Wed, Dec 16, 2015 at 2:23 PM, Rhett Garber [email protected]
wrote:

What's the story on this? Ready to go in?


Reply to this email directly or view it on GitHub
#5 (comment).

@bickfordb bickfordb changed the title First stab at adding shard metadata Add shard metadata, tailing Dec 18, 2015
@bickfordb bickfordb changed the title Add shard metadata, tailing Store shard metadata in S3, add a tailing facility Dec 18, 2015
@bickfordb
Copy link
Author

Also, I think adding the (time->sequence number) metadata about each archive is different than checkpointing. It's just a way to skip through the stream faster

}
sort.Sort(sort.StringSlice(keys))
for _, key := range keys {
if strings.HasSuffix(key, metadataSuffix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but maybe you should just attempt to create archives for every key it finds, and let the DecodeArchiveKey figure out if it's a valid key to use or not.

Seems like it would be safer to allow unrecognizable keys to exist for future backwards compatibility reasons too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable

On Fri, Dec 18, 2015 at 3:05 PM, Rhett Garber [email protected]
wrote:

In triton/archive_repository.go
#5 (comment):

  • keys := []string{}
  • err = ar.s3Service.ListObjectsPages(&s3.ListObjectsInput{
  •   Bucket: aws.String(ar.bucket),
    
  •   Prefix: aws.String(keyPrefix),
    
  • }, func(output *s3.ListObjectsOutput, lastPage bool) (shouldContinue bool) {
  •   for _, object := range output.Contents {
    
  •       keys = append(keys, *object.Key)
    
  •   }
    
  •   return true
    
  • })
  • if err != nil {
  •   return
    
  • }
  • sort.Sort(sort.StringSlice(keys))
  • for _, key := range keys {
  •   if strings.HasSuffix(key, metadataSuffix) {
    

Just a thought, but maybe you should just attempt to create archives for
every key it finds, and let the DecodeArchiveKey figure out if it's a valid
key to use or not.

Seems like it would be safer to allow unrecognizable keys to exist for
future backwards compatibility reasons too?


Reply to this email directly or view it on GitHub
https://github.com/postmates/go-triton/pull/5/files#r48064339.

@rhettg
Copy link
Contributor

rhettg commented Dec 18, 2015

Can you add examples on using Tail to the Readme. That might also help inform whether the interfaces are similar enough to the existing api.

Brandon Bickford added 2 commits January 4, 2016 11:09
* Add ShardRecordReader interface
* Add shardReader that uses channels
* Update tail interface
@bickfordb
Copy link
Author

Updated

@@ -0,0 +1,198 @@
package triton

// Module for a record reader which consumes from a streamf ro a each shard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

)

// NewTail returns a new tailing stream starting at "at"
func NewTailAt(params *NewTailAtParams) (tail *TailAt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not loving the name here, but don't have an obvious answer.

So the current most common interface is to do 'NewStreamReader()'

Could this be 'NewTailAtReader'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants