Code repository setup
It's fairly hard to Google the topics of "how do we set up a repo for our team so we don't wanna kill each other?" or "how do we version and package up our code once we barf enough code into the repo?"
How to set up your code repositories is a contentious topic. I won't presume to know your exact situation but I've come across enough scenarios to give a little insight on what works and what has caused utter chaos.
Admittedly this stuff is hard and many teams just get stuck and can never put their finger on the problem. Indulge me if you will on these hotly contested topics. For our exercise, let's assume we have two teams -- the UI and API team. And since I'm a .NET developer, we'll be speaking in those terms.
Let's talk about general repo paradigm's. The monolith vs the discrete repo then see how it picking one or the other relates to a SOA application.
The monolith contains all of the code for an entire project or enterprise in one single repo. Google does it. Facebook does it.
So this must be the best way right? Every greenfield application starts as a monolith (a tiny one).
Let's list some pro's and con's:
- You can quickly alter any of the code without switching repos
- You can keep the UI and the API in sync and deploy together
- You can version the entire repo with a build number and deploy the whole shebang together
- Your CI\CD tool only has one repo to checkout and build
- Your repo's get massive - both teams have to download code they don't need
- The UI team will need to know how to troubleshoot the API and vice-versa
- The UI team can break the build which affects the API and vice-versa
- The entire repo get's versioned together thus coupling the UI and API (this is also a pro depending on POV)
- Developers can depend on other assemblies all too easy causing dependency hell\incest
- Shared code becomes contentious -- the UI may not be ready to accept changes that the API wants now
- Low churn code becomes an anchor in the build times
- It leads to skipping unit-testing since the full service is available
Monoliths allow for faster code iteration however it requires much more discipline as a team and rules to prevent incestuous dependencies. Most teams don't have a whole lot of time to work these rules out so stringent dependency relationship rules never get enforced. Monoliths are very natural to create and easy to add code as needed.
The antithesis of a monolith would be smaller repos focused on just a particular subject domain. For instance rather than co-locating the UI and API code; two repos would be created to represent these. The data layer could be included in the API or even spun off to its own repo. If a repo gets too large, it might end up evolving itself to a monolith unto itself. Let's look at the pro's and con's of the paradigm:
- It makes it very difficult to accidentally or even purposely create hard dependencies between the UI and API
- Repos are smaller and likely understood fully by the team using it
- It encourages having to mock the opposite service which is good for unit testing -- i.e. the UI needs to mock the API and vice-versa
- It makes changing either the UI or API to something different easier
- Breaking the UI build doesn't mean you've broken the API build
- Shared code can be placed in its own repo and the UI and API can independently decide when to depend on a new version
- Mapping which version of the UI is compatible with which version of the API becomes a job for someone
- The CI\CD pipeline gets a bit more complicated due to the increased number of repos to build and deploy
- Updating multiple repos is a bit slower as the update needs to be packaged in order to be used by another repo
- Understanding that a discrete can become a monolith if not managed properly
Discrete repos require a bit more set up with regards to the CI\CD pipeline but that is more or less an upfront one-time cost. Keeping track of which version of the UI is compatible with which version of the API becomes a chore; but this can be handled with API endpoint versioning to prevent breaking changes. Shared code no longer is contentious as the UI and API can depend on that shared code with the version it needs and not the version the opposite team needs. By completely decoupling the UI from the API, there becomes a need to mock the opposite team's work which encourages abstractions and unit testing.
SOA as an example
Let's take a look at approaching a service oriented architecture (SOA) application with both strategies. To recap, a SOA application would mean we have independent services that should be both versioned and deployable separately. Our application will have three services and a shared data project.
Let's look at how the monolith paradigm could be applied to our SOA app.
A typical set up would have the code appear like this:
- UI Project
- Service A Project
- Service B Project
- Service C Project
- Shared Data Project
If a developer wanted to edit any of the services or the UI, they could do so quickly since the services are all in the same solution. So far so good.
When code is co-located, there is nothing stopping a developer from leaking code from one service into another. Strict rules and code review will be required to prevent this.
However it's at deployment time we have a much bigger problem.
I buy into the idea that we can only version an entire repo and not individual projects within a repo. When using a single repo for all services, we can't easily say Service A is on v1.1 and Service B and C are on v1.0. If presume to use Git, we typically tag the code with the version number at release time. We don't tag a project, we tag the entire repo. So the whole repo has to carry a common version. Therefore if Service A is on v1.1, so is B and C even if we didn't change B and C. If you don't tag your repo with a version, you'll have no way to correlate a compiled DLL with the code that is contained within. And tagging a repo with multiple versions when each service is changed is not only difficult to manage, but makes shared assemblies ambiguous to which service they apply to.
And then of course if the UI updates but the services do not, we still have to deploy everything despite only change a simple CSS property on the UI.
Using a monolith, service A, B & C are now no longer independently versioned services as SOA services should be. They all depend on a common data project based on a project dependency and not a versioned dependency. So if we update the data project and release the code, we'll have to tag the whole repo as v1.2 (or whatever) which means services A, B & C all get a version change despite only the data project changing.
As far as team dynamics go, if an overseas team is assigned to Service A and an internal team is assigned to both services B & C; any time a release occurs, all three services need a version bump and will all need to be redeployed. The overseas team would prefer to version and release on it's own schedule and not when the internal team makes a decision to deploy newer versions of B & C.
This negates the idea that each service is independent as SOA dictates in my opinion.
Contention on shared code is extremely frustrating. There are no 'seams' like you find in the dependency injection world when it's all in one repo. It's all just one big hulking thing. Teams have to have high levels of coordination and this is not the strength of most shops let alone shops with outsourced labor. It creates a culture of us vs them between teams and pretty much guarantees that long running branches will be difficult to merge and conflicts rampant.
For SOA I prefer the discreet repo paradigm. With this strategy the code could be organized as following:
Repo 1 - Service A Solution
- Depend on a versioned Data DLL which could be different than Service B or C
Repo 2 - Service B Solution
- Depend on a versioned Data DLL which could be different than Service A or C
Repo 3 - Service C Solution
- Depend on a versioned Data DLL which could be different than Service A or B
Repo 4 - Shared Data Solution that can release many different versions
Repo 5 - UI Solution
As you can see, there's a lot for separation. For SOA, each repo can now carry it's own version and only bump it when the service itself changes and not when another service changes. Each service can depend on a different data DLL version.
The UI can change a simple CSS style and we don't have to bump any of the service solutions nor redeploy them.
For division of labor between independent teams; contention on the shared data DLL goes away. The only coordination that will need to occur when teams need to ask that the shared data DLL be updated. In the monolith situation; anyone can update the shared code and they will have to fix any breaking changes in all three services despite only having to be responsible for their single service. Such edits also forces all three services to be deployed together, not individually. With a discrete repo, we're truly decoupled from the rest and we could upgrade our shared DLL's on our own schedule and deploy our service independently.
For me all discrete repos are just mini-monoliths. It is important that they not grow over time. In my humble opinion, it's a fallacy to think simply separating code into projects under one repo makes an application SOA compliant. To be SOA compliant we have to have complete separation at the repo level to enable proper dependencies and versioning.
I typically use the the following rules to guide myself to break up repos that start to grow unwieldy:
- Identify assemblies that have low code churn. Create a new repo (or add it to a shared repo) and package them up and depend on them as binaries in the other repos.
- Identify assemblies that have high contention. Create a new repo and package them up and have a dedicated team control the churn.
Versioning do's and don'ts
For the love of God, please don't try to version each project in a single repo. I've seen it done and it does not work. One repo = one version. Period. End of story.
Tag your code by version so anyone can easily correlate a compiled DLL with the code. Please make sure the assembly versions are stamped in the DLL meta information.
Just like versioning, I'm of the opinion that you have a single Nuget package per repo maximum. You can add many DLL's to the package from your project (or third party dependencies) but please don't have a single repo produce more than one Nuget package. If you're doing this, you need to split up your repo or release everything as one package.
Yes I'm not a big fan of monoliths, but they do have their place and I can't completely discount them. I prefer discreet repos which are really just focused mini-monoliths. One repo means one version and at most one Nuget package. If you need many versions in one repo, you're doing it wrong in my opinion. If you're trying to implement SOA in a monolith, you can't deploy the services independently. That may be what you want to do, but you need to be aware of that. Discrete repos are more friendly to multiple work teams and it takes away the contention on shared code. Feel free to disagree and/or show me alternatives to my thinking but I hope this gives some insight to someone.