Skip to content

aws.String("xxx") ? #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattes opened this issue Aug 27, 2015 · 33 comments
Closed

aws.String("xxx") ? #363

mattes opened this issue Aug 27, 2015 · 33 comments
Labels
guidance Question that needs advice or information.

Comments

@mattes
Copy link

mattes commented Aug 27, 2015

Why is this lib using aws.String("xxx") instead of just strings "xxx"?

@lsegal
Copy link
Contributor

lsegal commented Aug 27, 2015

@mattes aws.String() returns a pointer because nil is a valid input to these parameters. A lengthier discussion can be found in #114 and others.

@jasdel
Copy link
Contributor

jasdel commented Aug 27, 2015

Hi @mattes, The aws.String is a helper so literal strings can be used when setting API field pointer values without needing to define a local variable first, because &"xxx" is not valid syntax. If the value is already defined as a local variable &xxxStr can also be used.

The AWS API operations include many optional fields which should not be included in the marshaled request if not set. In order to represent the ternary state for these fields pointers are used. This is done because it is not possible to distinguish between a never set value from a value set to the type's Zero value. The SDK uses pointers for all primitive and struct type fields. In many cases the type's Zero value of a field has meaning, and is different from unset, and a non zero value. All API fields in Input and Output structs are pointers, because even though a field is required today, it may not be required in the future. A breaking change would be required to allow not setting the once required field.

@mattes
Copy link
Author

mattes commented Aug 27, 2015

Hmm. I think I understand. Wouldn't it be nicer to use interfaces for things like that though? Like:

func foo(stringOrNil interface{}) {
  ...
}

@lsegal
Copy link
Contributor

lsegal commented Aug 27, 2015

@mattes using empty interface types would completely remove all type checking / inference in the language and tooling, and I would imagine this would cause much more confusion in the library and -1's from the community at large where type information is considered extremely useful.

@mattes
Copy link
Author

mattes commented Aug 29, 2015

I've been working with the API today and I understand the reason behind using pointers. Honestly, I am not a fan though. It creates very weird constructs of switching back and forth between pointers while integrating with the rest of the code base. Debugging isn't fun neither. I am not saying it's hard. It's just not very aesthetic. Which I think is definitely an argument. :-)

I still think something like map[string]interface{} would work for inputs. Yes, it would remove compiler type checking, but I think it's an reasonable approach in this situation. Return types could still be structs. Not sure how others think about this. But it seems as if I am not the only one who is a little unhappy with the current pointer approach.

@alexaandru
Copy link
Contributor

+1 for the current pointer approach -1 for anything based on interface{}

@omeid
Copy link

omeid commented Aug 29, 2015

Why not use methods then?

b := s3.NewObject(bucket, key, map[string]string{"key": key});
err := b.SetACL("ObjectCannedACL");
//Check ACL et al.

@jasdel
Copy link
Contributor

jasdel commented Sep 1, 2015

Hi @omeid take a look at #294 and #306. Between the two the cover the topic pretty well.

Since the service APIs are iterating quickly, fields which were previously required may become optional. This recently occurred with EC2 making changing parameter from required to optional. If we had methods for these operations it would be impossible for you not to provide a value for the now optional parameter without making a breaking change to the SDK. Once the SDK is released and ready for production we want to ensure it is stable.

@endophage
Copy link

Given the desirability of more idiomatic go (path of least surprise yada yada yada...), are there cases where the empty string, "", would functionally mean something different to nil?

@endophage
Copy link

Incidentally, I'm also seeing plenty of places a *string has been used where a []byte would be more appropriate (like the body of SQS Messages). Slices can be nil and that would solve your problems in some cases.

@stevvooe
Copy link

stevvooe commented Sep 2, 2015

@jasdel

This is done because it is not possible to distinguish between a never set value from a value set to the type's Zero value. The SDK uses pointers for all primitive and struct type fields. In many cases the type's Zero value of a field has meaning, and is different from unset, and a non zero value. All API fields in Input and Output structs are pointers, because even though a field is required today, it may not be required in the future. A breaking change would be required to allow not setting the once required field.

The use of *string and others is extremely non-idiomatic, making use of this package more challenging when integrating with existing Go code. Many existing Go packages cope with this just fine and I implore you to study them.

Other than the verbosity required, the main issue with this is the unnecessary performance penalty of the pointer redirect and the extra garbage generated.

Effectively, to use your package, a Go developer must depart from the style commonly used throughout the standard library and every other package.

@jasdel
Copy link
Contributor

jasdel commented Sep 2, 2015

Thanks for the feedback. We definitely want to provide the best idiomatic experience within the context of the AWS service APIs. How the SDK exposes its fields is something we really want to make sure is a good experience prior to releasing the SDK.

@jasdel
Copy link
Contributor

jasdel commented Sep 2, 2015

@endophage The sqs.SendMessageInput.MessageBody actually must be a valid unicode strings, any non-unicode binary data should be unicode escaped. If the API used a []byte instead of string it may imply that binary data is valid for the message body, which isn't the case. (#339) With that said, if you're looking to include binary or numbers within a message not escaped sqs.SendMessageInput.MessageAttributes using sqs.MessageAttributeValue is the best option to specify the additional binary or number data.

@jasdel
Copy link
Contributor

jasdel commented Sep 2, 2015

@stevvooe Since many fields within the the API operations are optional a concept is needed to identify if the field was set to a non-zero value, not set at all, or set to the zero value for the type. When using by value there is no way to distinguish the difference between set to zero value, and not set at all. Since zero value for primitive types are valid API operation values, the common omitempty for zero value pattern is not valid. In order to do so, extra information is needed.

To provide that extra information there are four common ways this could be done.

  • Use pointers instead of values. This was the original proposed solution Go applications began to use to provide dynamic not-set marshalling for concrete structure fields. Non-string primitives such as bool and int64 highlight this issue more than string within the context of the SDK since using zero value for omitempty would prevent setting a field to 0 or false. But like you mentioned the usage of pointers will increase number of allocations, even more so for operations with many fields.
  • Use map[string]interface{}. This would remove the need for pointers, but also remove the highly valued compile time type checking. In addition, it would remove any type ahead assistance various IDEs may provide. In addition field names would need to be exposed as consts in order to prevent typo's in field names.
  • Use function params for required fields. e.g. svc.GetObject("bucketName", "keyName") If the service APIs were static this would be a great pattern for the SDK to follow. But since the service API operations are always evolving fields which were once required can become optional. Required fields becoming optional or optionally required is not uncommon. This pattern would require the SDK to make breaking changes to allow users to not to set previously required fields. This is a perfect pattern to be written as a level above the SDK, since users would be able to control when they upgrade, and choose to update their helper functions as they need. While also being able to update the SDK for new features without breaking their existing code.
  • Wrap primitives struct. As an alternative to pointers we looked into using structs to wrap primitives. The struct would embed the needed tri-state information. With this pattern we could remove the need for pointers of primitives, but it would still require a aws.String("xxx") like method to be able to set the value for a field inline. At the same time it would remove the ability to set a field with the variable directly, such as Bucket: &bucketName which is an option today. These wrappers also have the added benefit of removing the possibility of dereferencing a nil pointer for unset response output structs.StringType below is an abbreviated example of this tri-state wrapper.
type StringType struct {
    s string
    set bool
}
func String(s string) StringType { return StringType{s: s, set: true} }

Of these options primitive pointers and wrapping structs are the two which provide the most flexibility and stability for their cost and ease of use. Any feedback on these methods you have are more than welcome. I have experimented a few times (#276, #294) with using the primitive wrappers, but have not pushed it forward because it removes the ability to use &fieldValue and requires a helper func aws.String("xxx"). The helper func is required because initializing the struct by itself such as aws.StringType{S: s, Set: true} would require setting the Set field to be provided also, and would be a big risk for users thinking they set a value when they really didn't.

@stevvooe
Copy link

stevvooe commented Sep 2, 2015

Since many fields within the the API operations are optional a concept is needed to identify if the field was set to a non-zero value, not set at all, or set to the zero value for the type. When using by value there is no way to distinguish the different between set to zero value, and not set at all.

I disagree with this premise. Zero-valued types are commonly used without adding such complexity across the Go community. I'm confused as to why this doesn't work for this project. Please examine this premise and work from there.

Could you give some concrete examples in the API where an empty string, zero-value number or other zero-valued type needs to be distinguished? Looking at Config, as an example, not one field cannot be correctly detected via zero-valued types. A cursory look over other types shows this to be unnecessary, as well.

Take a peak at this quote from Best practices for a new Go developer:

You’ll work with the strengths of the languages more easily if you don’t try to do C-like things (allocation on the heap, pointers to everything) and treat the stack and pass-by-value as cheap operations for most purposes. I’ve seen big performance improvements just by simplifying heap usage and replacing it with stack usage. Go is designed to do the right thing for programs that look very straightforward, so if you feel like you’re doing something clever, it’s potentially working against both you and the runtime.

While this quote isn't perfectly apropos, please examine whether or not approaching the problem this way is truly necessary. Especially when multiple experienced Go developers have suggested this might not be the right approach.

@lsegal
Copy link
Contributor

lsegal commented Sep 2, 2015

@stevvooe

Looking at Config, as an example, not one field cannot be correctly detected via zero-valued types.

Actually, Config was changed specifically because zero-values didn't cut it. See #157 for details. Specifically, a MaxRetries of 0 is not the same as a default max retry setting. There are other settings in there with the same problem. Users ran into real world blockers when Config worked the way you suggest. The SDK had to change because zero-values don't work there.

Could you give some concrete examples in the API where an empty string, zero-value number or other zero-valued type needs to be distinguished?

Any API where freeform text can be set, or pretty much any API operation that takes a number. Numbers are much easier to give concrete examples for. "0" is often a valid input for numeric parameters in AWS APIs. Since you asked, a concrete example would be DynamoDB's AttributeValue struct, where the N field is an integer of any value. Going down your suggested path would mean that our SDK would be unable to determine if a user set the N field in the case of setting it to 0 (and would therefore be unable to serialize the payload).

I would suggest stepping away from playing the "example-counter-example" game, though. The general problem to this argument is that the path forward needs to be guided by the API specification, not specific incidences of APIs. AWS has over 50 active services with probably hundreds (if not more) API calls, and thousands (if not tens of thousands) of parameters. If any of these values can theoretically have an empty string or zero value, this argument immediately falls apart. Furthermore, if any service in the future can theoretically rely on this spec, the argument falls apart too. The next AWS service may rely on using empty strings and 0's to differentiate from unset fields, and the SDK has to be able to support such a service going forward. Even an existing service might decide to begin differentiating these two values, which would be an even more insidious failure point. There are already some of these services that exist today. It's actually not even realistically possible to track all of the incidents where a zero-value is valid input to the service, because it's entirely contextually dependent to the service and may change at any time if the service makes any changes to their API. The SDK has to observe the wire protocol specification, just like you would want your HTTP libraries to correctly implement those specs and not make assumptions.

@endophage
Copy link

@jasdel JSON also uses unicode as an encoding however you'll note that Go's encoding/json, and encoding/xml packages take encoded input in the []byte type and output encoded content in the []byte type.

Using []byte to represent the raw form of an encoding in Go is very standard across the board. As I would guess most people are sending their own encoded content through SQS and other AWS services, you are simply forcing people to write additional, ugly typecasts by not following the convention used in the core language.

@jasdel
Copy link
Contributor

jasdel commented Sep 2, 2015

@stevvooe Check out this post on Go, REST APIs, and Pointers which provides a great explanation why marshalling Go structs with optional fields that can intentionally not be set, because their zero value has meaning needs knowledge if the field was ever set. In addition, google/go-github#19 the issue mentioned by the post, discusses this topic and explores a few other alternatives as well.

Like @lsegal mentioned aws.Config's case is a little special. For this struct pointers are used because a default config is merged with a user's provide config when instantiating a service client object. In order to merge two structs correctly knowledge of set or unset is needed. Especially since for nearly all of the fields the type's zero value has meaning. For example, Config.DisableSSL if it were set to true in the default config it could never be unset when creating a new service client with a config which had the field set to false, because false is bool's zero value, unless knowledge of set or unset was available. Without this knowledge the user would need to manually call svc.Config.DisableSSL = false after instantiating the service client. This is what pointers fields provide for aws.Config.

@jasdel
Copy link
Contributor

jasdel commented Sep 3, 2015

@endophage the comparison between encoding/json and SQS MessageBody are interesting. What is your usecase for the MessageBody? Is it safe to assume it is being used for marshaled JSON/XML data?

Is it the allocation caused by casting a []byte to a string you're looking to avoid?

@endophage
Copy link

The allocation doesn't overly concern me, my data isn't large, but it would be nice to avoid. It's more that the chosen style of the AWS libraries is inconsistent with the core Go libraries.

My use case is certainly JSON contained in the MessageBody. I know I'm not alone in that. I would hazard a guess that a large majority of users of many AWS services (S3 probably being the main exception, though []byte applies even more appropriately there and the SDK appropriately uses an io.ReadCloser), are using some form of encoding that will need to be parsed/"Unmarshalled" when received by a consumer.

@jasdel
Copy link
Contributor

jasdel commented Sep 3, 2015

Thanks for the info @endophage . To keep this thread from becoming too muddled with too many topics what do you think about creating a new issue to track this topic specifically. If you'd like to open up a new issue about this we can further discuss this use case, and pain points using these string fields cause. Also if there are any other services or operations' fields where you are also experiencing this pain points, please include those as well so we have a wider base to launch off from.

@cbroglie
Copy link

cbroglie commented Sep 3, 2015

@jasdel @lsegal It might be a good idea to add a wiki or FAQ page documenting the reasons for the use of pointers. It's clearly a hot issue for newcomers to the library, and specific documentation you can point people to would be easier to consume than having to read through the discussions in different github issues. I think the post about go-github is a great starting point.

@omeid
Copy link

omeid commented Oct 29, 2015

@jasdel Thanks for the through response. I did read through the mentioned issues and most everything discussed here.

I don't think trying to be extra backward-compatible at the cost of a non-idiomatic API is a good trade-off; further more, if APIs changes, then shouldn't the users need update their code anyway? but if that is not the case, with vendoring now being an official experiment and few other solutions for reproducible builds becoming somewhat widespread, let people who don't want to get update to date stay in the dark. ;)

@lsegal
Copy link
Contributor

lsegal commented Oct 29, 2015

if APIs changes, then shouldn't the users need update their code anyway?

Actually, no, users do not need to update their code to work with API changes in the AWS ecosystem, since almost all API updates are themselves backwards compatible. For example, the same code you wrote against the S3 API 4 years ago will likely still work today as is, even though there are plenty more switches and toggles to play with in newer SDK updates. SDKs should reflect backwards compatibility in the wire protocol-- the problem here is that with many of the proposed syntax changes, a lot of otherwise-backwards-compatible changes in the wire protocol would end up being backwards-incompatible in the SDK. In short, if it's backwards compatible in the AWS API, it should be equally backwards compatible in the SDK. This is the expectation I would have as a developer using the AWS ecosystem.

@omeid
Copy link

omeid commented Oct 29, 2015

Well, in that case, the solution would be to use idiomatic API and let the users use a reproducible build solution like Go 1.5 Vendoring Experiment, Godep, or one of the many other.

Backward compatibility is great, but I personally think it shouldn't cost idiomatic and clean API.

@lsegal
Copy link
Contributor

lsegal commented Oct 29, 2015

Well, in that case, the solution would be to use idiomatic API and let the users use a reproducible build solution like Go 1.5 Vendoring Experiment, Godep, or one of the many other.

Vendoring doesn't solve this problem. I want to be able to stay up to date with the Go SDK version, because I want the security patches, bug fixes, performance improvements, and, occasionally, feature updates, but I don't want to have to change my code every time I sync from upstream. This is the reason why backwards compatibility is important, not just to one Go library, but to almost every development platform out there. Compatibility isn't just about making old code work on old projects at a snapshot in time, it's about keeping old code working as projects continue to evolve.

@omeid
Copy link

omeid commented Oct 29, 2015

Well, if you want new features then you're probably already updating your code; as for security, they can always be back-ported. Too much emphasize on back-compatibility ends up with something a la PHP.

@omeid
Copy link

omeid commented Oct 29, 2015

I think anything more than the official Go 1 compatibility promise is too much, specially if it means an arguably ugly and non idomatic API.

@lsegal
Copy link
Contributor

lsegal commented Oct 29, 2015

Well, if you want new features then you're probably already updating your code

"Updating code" is more complex than a simple binary statement. If I have 10,000 lines of code in my existing system that rely on SDK features, and I am "updating my code" to add 20 lines of code that rely on new SDK features, I don't want to be liable for potentially having to upgrade and rewrite 10,020 lines of code. My goal was to simply work on 20 lines of code, not ten thousand. In other words, just because your system is evolving, it doesn't mean you have the developer cycles to constantly evolve all of it in one piece.

Too much emphasize [sic] on back-compatibility ends up with something a la PHP.

PHP is a successful product, I don't see a problem here. That said, if you don't like PHP, you could just as easily replace that statement with:

  • Too much emphasis on back-compatibility ends up with something a la the Linux Kernel.
  • Too much emphasis on back-compatibility ends up with something a la the Java programming language.
  • Too much emphasis on back-compatibility ends up with something a la the C++ programming language.
  • Too much emphasis on back-compatibility ends up with something a la the Go programming language.

All of these products have the same BC emphasis and yet are huge successes, largely in part to their commitment to backwards compatibility, and all of these are great projects that prove compatibility can work well, even if you think PHP did it wrong (the last one being highly relevant).

@lsegal
Copy link
Contributor

lsegal commented Oct 29, 2015

As a sidenote: the [debatable] difference with PHP was that it was not architected to handle change well, with its global namespace and lack of overloading, etc. It seems like the entire goal surrounding this issue in the Go SDK right now is to architect it in such a way that it does handle backwards-compatible changes well.

In other words: sure, if you try to tack on backwards compatibility to a poorly architected system, you get a huge mess. That's why the developers of the Go SDK are trying to architect this right-- and I don't believe any of the proposed changes help to get it there (if anything they make things worse).

@omeid
Copy link

omeid commented Oct 29, 2015

If I have 10,000 lines of code in my existing system that rely on SDK features, and I am "updating my code" to add 20 lines of code that rely on new SDK features, I don't want to be liable for potentially having to upgrade and rewrite 10,020 lines of code.

This example is absurdly unrealistic. Just because some APIs may have breaking change doesn't mean you would have to rewrite your whole application.

PHP is a successful product, I don't see a problem here.

PHP Is successful despite it's inconsistencies and flaws not because of them, PHP is a success because of first-mover advantage, cheap hostings, and now survives because of big players like Facebook, wordpress, et al.

That said, if you don't like PHP, you could just as easily replace that statement with...

While there maybe many other products that suffer from negative consequences of overzealous backward-compatibility, PHP is the most famous one that I know.

Anyway, the matter of fact here is that the API as it stand is non-idiomatic and arguably a hack. I would have preferred if it was designed better and didn't feel like I am writing not-go.

@jasdel
Copy link
Contributor

jasdel commented Oct 30, 2015

Thanks for the feedback. Providing a SDK which is easy to use and understand is very important to us. I'm a bit late to this discussion, but I'd like to understand the issues you're experiencing with the SDK. Since the client APIs are generated from models defining the service API we want the SDK to be friendly and feel handwritten as much as possible. In designing the SDK we considered several options for how we could represent the service APIs in a Go SDK in a easy to use way, while also ensuring the SDK will not introduce breaking changes when service updates occur. The choice to use structure initialization over function parameters was to ensure the SDK is stable as service APIs are updated.

For the SDK's general availability release we would like to provide the initial functionality that is needed for developers to use AWS with Go. This will give the SDK a good base on which other tools and features can be built on top of.

For API calls like S3's PutObject it's straightforward to create helper methods since Bucket, Key, and Body are the only required fields. But, this becomes more complex when we look at API calls which contain optionally required parameters such as EC2's AuthorizeSecurityGroupIngress. In these cases there are multiple ways the API could be called because the parameters are mutually exclusive.

I'm very excited to see where the Go community lands with vendoring. I think it will go a long way to reduce developer's package management headaches. Even now there are several options, but no standard. The experimental 1.5 vendoring will be a good step forward. With golang/go#12302, versioning of packages might be supported in the future.

I've consolidated a few related issues into this one so the discussion can be in a central location.

@jasdel
Copy link
Contributor

jasdel commented Nov 5, 2015

To wrap this issue up I'd like to thank everyone for their feedback, and the discussion in this thread along with the other related threads. The SDK at its core should stay close to the service API definitions. As the service APIs are consistently adding new features and improvements. With the core SDK built we have a place to build on top of. Especially any customizations and stylistic invocations that are helpful.

@jasdel jasdel closed this as completed Nov 5, 2015
skotambkar pushed a commit to skotambkar/aws-sdk-go that referenced this issue May 20, 2021
Services
===
* Synced the V2 SDK with latest AWS service API definitions.
  * Fixes [aws#359](aws/aws-sdk-go-v2#359)

SDK Features
===

SDK Enhancements
===
* `private/protocol`: Add support for TimestampFormat in protocols ([aws#358](aws/aws-sdk-go-v2#358))
  * Adds support for the timestampForamt API model trait to the V2 SDK. The SDK will now generate API client parameters with the correct time format for APIs modeled with custom time stamp formats specified.
  * Fixes [aws#202](aws/aws-sdk-go-v2#202)
  * Fixes [aws#286](aws/aws-sdk-go-v2#286)
* `aws`: Add example for custom HTTP client idle connection options ([aws#350](aws/aws-sdk-go-v2#350))
  * Adds example to the SDK for configuring custom HTTP client idle connection keep alive options.

SDK Bugs
===
* `private/model/api`: Fix API doc being generated with wrong value ([aws#359](aws/aws-sdk-go-v2#359))
  * Fixes the SDK's generated API documentation for structure member being generated with the wrong documentation value when the member was included multiple times in the model doc-2.json file, but under different types.
  * V2 port of to v1 [aws#2748](aws#2748)
* `aws/ec2rolecreds`: Fix security creds path to include trailing slash ([aws#356](aws/aws-sdk-go-v2#356))
  * Fixes the iamSecurityCredsPath var to include a trailing slash preventing redirects when making requests to the EC2 Instance Metadata service.
  * Fixes [aws#351](aws/aws-sdk-go-v2#351)
* `service/dynamodb/expression`: Improved reporting of bad key conditions ([aws#360](aws/aws-sdk-go-v2#360))
  * Improved error reporting when invalid key conditions are constructed using KeyConditionBuilder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

8 participants