Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service oriented architecture #17

Open
jfoutz opened this issue Oct 10, 2016 · 4 comments
Open

Service oriented architecture #17

jfoutz opened this issue Oct 10, 2016 · 4 comments

Comments

@jfoutz
Copy link

jfoutz commented Oct 10, 2016

I believe maintenance difficulty scales exponentially with code base size. A 100,000 line project is more than twice as difficult to maintain than a 50,000 line project. Most projects have natural separations, where splitting the application is possible. Examples might be authentication, payment processing, and administration tools.

The services don't have to be equal sized. One 200kloc project and 5 10kloc projects is still a lot easier to deal with than one 250kloc monster.

Breaking up a big project into a few smaller services buys some nice properties.

  1. Smaller codebases. Smaller projects are easier to debug and deploy. There are simply fewer resources. Because there are fewer resources, there's less stuff to manage, so there are fewer things to forget.
  2. Incremental upgrades. When a new version of a platform comes out, you don't have to upgrade everything all at once. You can select which ever project is best for upgrading, perhaps the simplest one, to gain experience with the new features. Perhaps it's a security update, in that case you can pick the most business critical project.
  3. Different languages. The only real requirement is preserving the surface API. Vacation Labs has a lot of Ruby experience. It might be worthwhile to build some of the new project in Ruby, simply because development would be faster. When the organization is more comfortable with Haskell in general, those are small projects to replace. It also helps in the other direction. If Haskell doesn't work out for whatever reason, Haskell services can be selectively replaced when you have time.
  4. Separable goals for developers. Coordinating a team of 3-5 developers is fairly straightforward, everyone knows what everyone else is working on. 10 or more developers is much tougher. With smaller projects you can pick 3-5 developers to work on one system, and they're confident they won't get in the way of the other groups, and more importantly, the other groups won't get in their way.

There are risks involved. It might not be a great fit for your team.

  1. There's a lot more pressure on operations. Deploying and monitoring 5 services is a lot tougher than one monolithic application. Furthermore, there's more monitoring and logging to deal with. Fortunately, most of that pain can be automated away. Configuration management tools like puppet, chef, ansible (there are a bunch) can make deployment fairly painless.
  2. Everyone should have some work in every service, implementing features and fixing bugs. It's easy to get stuck being "the authentication guy". With only one person, that code will kind of diverge from company standard. Ideally, when a new feature needs to be implemented in any service, you can pick the best people for the job.
  3. Separate projects is kind of a blunt instrument. Putting every little feature in it's own project creates tons of unnecessary complexity. I'd suggest an upper limit of perhaps 5. Small enough everyone can just know what every project does. 20 or even 10 can be a overwhelming, because the graph of relationships between projects gets confusing.
@saurabhnanda
Copy link
Contributor

@jfoutz the biggest downside of SOA vs one monolithic code-base is refactoring. One of the reasons why we're considering changing to a statically typed and compiled language is that the compiler helps us quickly identify what needs to change when an underlying API or data-structure changes. When you split the project into smaller units, each communicating over an untyped JSON API, this advantage simple goes away.

Unless, you use something like Thrift, of Protocol buffers, or something else, to achieve strongly typed communication across distributed components.

@sudhirvkumar
Copy link

sudhirvkumar commented Oct 10, 2016

@saurabhnanda When we use Servant and servant client. We can re-use the same paths and types. Even though JSON is use to transfer data. We will be able to reliably serialize/de-serialize with compile time guarantees.

@saurabhnanda
Copy link
Contributor

Let me list down the risk factors we've internally come up with, whenever we've thought about a micro-services/service-oriented architecture:

  • Simple function calls get converted to remote API calls and need to deal with network failures and retries
  • How to easily co-ordinate schema/API changes across distributed components? Especially breaking changes?
  • Development environment. Every dev needs all the distributed components running on their local machine to get something to work. Else you need to mock-out the distributed components.
  • Which component "owns" the DB? If I have a micro-service dealing with authentication and user profiles, and the order details page (different micro-service) needs access to the user's profile, a simple SQL JOIN gets converted to a remote API call. What happens when you want to do a simple report? This can potentially lead to an explosion of JSON API endpoints.
  • More moving parts in production to monitor and support

@jfoutz
Copy link
Author

jfoutz commented Oct 10, 2016

I don't know your codebase, but i have some guesses about functionality. I'm going to work from the shopify analogy, because that's all i really know. So one approach might be, a service for the booking engine and another service for your store building. End users just need to know prices and dates. Your customers need to be able to configure those prices and dates. That seems like a natural split. Both services can talk to one big database. Databases change slowly, adding more columns and tables is relatively harmless. Clearly, you can split up functionality in other ways, that will cause more or less engineering pain. But from this specific split,

  • Simple function calls get converted to remote API calls and need to deal with network failures and retries

Yes. There may be a need for the two services to talk to each other, but almost all of the required information comes straight from the database. There will be very few cross service calls, and those can be carefully typed out. It is important to not crash, and ideally update a health status, so something like sensu can restart the downed service, or page someone.

  • How to easily co-ordinate schema/API changes across distributed components? Especially breaking changes?

Well, additive changes are pretty easy. Start a new VM with the new service. Point dns (or the load balanacer) to the new service. Wait a day or two to make sure the new release is stable, then shut down the old service.

With breaking changes, there are a few approaches, but i like feature flags. Say you're dropping a table from the database. For each service that uses the database (or that table), add a feature flag if(new-way){ ... don't use the table ... } else { ... use the old code ... }. Servers can support old clients for a little while. new-way could be populated and cached based on a failed select of the old table, and just retry every 10 minutes. There are more elegant solutions than a simple if, but you get the idea.

Since a server can support an old version, the upgrade works the same way as the non-breaking change, with one additional step. Once the change is stable, delete the feature flag. Feature flags are technical debt, and they can hang around for a little while, but leaving a dozen of them hanging around makes code very confusing.

  • Development environment. Every dev needs all the distributed components running on their local machine to get something to work. Else you need to mock-out the distributed components.

Yes. This can be frustrating. Each service needs its own port. But with a little bit of policy, this can be not to bad. ideally, the process would be for each project, git clone project stack exec project. If it doesn't work, go to http://localhost:project-port/health and get a nice json report of what's wrong. Can't connect to db, missing table, missing data, can't connect to the other service, whatever the issue might be. You'll want something like this for monitoring anyway. It creates a little extra incentive for good error messages.

  • Which component "owns" the DB? If I have a micro-service dealing with authentication and user profiles, and the order details page (different micro-service) needs access to the user's profile, a simple SQL JOIN gets converted to a remote API call. What happens when you want to do a simple report? This can potentially lead to an explosion of JSON API endpoints.

Probably the DB is it's own project. I think, if each service trusts the auth system, each service can just do the join. It's not really necessary to split out everything into its own stand alone thing.

It seems to me, reporting could be built into the tenant ui, or be yet another service. The upside to being another service, you can create another postgres user that only has read access. The downside is, of course, yet another service. From the ui's point of view, it's just another tab and a different endpoint to talk to.

  • More moving parts in production to monitor and support

Yes. absolutely. this is a huge downside. But again, at least some of this pain can be reduced through system automation. Configuration management makes it fairly easy to just roll back in the face of errors.

I guess my point is, code is a liability. The more of it you have, the slower developers can move. When a project is tiny, you can move very quickly, because the whole thing fits in your head. As the project grows, the organization moves slower and slower.

I guess, i think it looks something like this

Maintenance-cost = (language-factor * lines-of-code)^1.01 + (service-count)^1.1

I don't know the actual value of those two magic numbers, 1.01, and 1.1. but it gets the idea across. Smaller projects are more nimble at the expense of operational overhead. More services is much more painful than more lines of code. But at a certain point, that first term dominates the service count, and should be split into 2 or more terms. I'd argue haskell's language factor is much smaller than ruby's language factor. And maybe that's enough. But if you're planning on many more features, a much larger system, more customers and more developers, you should think about this split. Today might not be the day to do it. This sort of split might be 5 years down the road. It's just much easier to think about upfront. Even if you don't do the split, you can kind of contain different potential services in your source now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants