Tell me if you’ve heard this one before.
You’re working on an application. Let’s call it “FooApp”. FooApp has a
dependency on an open source library, let’s call it “LibBar”. You find a bug
in LibBar that affects FooApp.
To envisage the best possible version of this scenario, let’s say you actively
like LibBar, both technically and socially. You’ve contributed to it in the
past. But this bug is causing production issues in FooApp today, and
LibBar’s release schedule is quarterly. FooApp is your job; LibBar is (at
best) your hobby. Blocking on the full upstream contribution cycle and waiting
for a release is an absolute non-starter.
What do you do?
There are a few common reactions to this type of scenario, all of which are
bad options.
I will enumerate them specifically here, because I suspect that some of them
may resonate with many readers:
-
Find an alternative to LibBar, and switch to it.
This is a bad idea because a transition to a core infrastructure component
could be extremely expensive. -
Vendor LibBar into your codebase and fix your vendored version.
This is a bad idea because carrying this one fix now requires you to
maintain all the tooling associated with a monorepo: you have to be
able to start pulling in new versions from LibBar regularly, reconcile your
changes even though you now have a separate version history on your
imported version, and so on. -
Monkey-patch LibBar to
include your fix.This is a bad idea because you are now extremely tightly coupled to a
specific version of LibBar. By modifying LibBar internally like this,
you’re inherently violating its compatibility contract, in a way which is
going to be extremely difficult to test. You can test this change, of
course, but as LibBar changes, you will need to replicate any relevant
portions of its test suite (which may be its entire test suite) in
FooApp. Lots of potential duplication of effort there. -
Implement a workaround in your own code, rather than fixing it.
This is a bad idea because you are distorting the responsibility for
correct behavior. LibBar is supposed to do LibBar’s job, and unless you
have a full wrapper for it in your own codebase, other engineers (including
“yourself, personally”) might later forget to go through the alternate,
workaround codepath, and invoke the buggy LibBar behavior again in some new
place. -
Implement the fix upstream in LibBar anyway, because that’s the Right
Thing To Do, and burn credibility with management while you anxiously wait
for a release with the bug in production.This is a bad idea because you are betraying your users — by allowing the
buggy behavior to persist — for the workflow convenience of your dependency
providers. Your users are probably giving you money, and trusting you with
their data. This means you have both ethical and economic obligations to
consider their interests.As much as it’s nice to participate in the open source community and take
on an appropriate level of burden to maintain the commons, this cannot
sustainably be at the explicit expense of the population you serve
directly.Even if we only care about the open source maintainers here, there’s
still a problem: as you are likely to come under immediate pressure to ship
your changes, you will inevitably relay at least a bit of that stress to
the maintainers. Even if you try to be exceedingly polite, the maintainers
will know that you are coming under fire for not having shipped the fix
yet, and are likely to feel an even greater burden of obligation to ship
your code fast.Much as it’s good to contribute the fix, it’s not great to put this on the
maintainers.
The respective incentive structures of software development — specifically, of
corporate application development and open source infrastructure development —
make options 1-4 very common.
On the corporate / application side, these issues are:
-
it’s difficult for corporate developers to get clearance to spend even small amounts of
their work hours on upstream open source projects, but clearance to spend
time on the project they actually work on is implicit. If it takes 3 hours
of wrangling with Legal and 3 hours of implementation work to fix the
issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of
implementation work in FooApp, a FooApp developer will often perceive it as
“easier” to fix the issue downstream. -
it’s difficult for corporate developers to get clearance from management to
spend even small amounts of money sponsoring upstream reviewers, so even if
they can find the time to contribute the fix, chances are high that it will
remain stuck in review unless they are personally well-integrated members of
the LibBar development team already. -
even assuming there’s zero pressure whatsoever to avoid open sourcing the
upstream changes, there’s still the fact inherent to any development team
that FooApp’s developers will be more familiar with FooApp’s codebase and
development processes than they are with LibBar’s. It’s just easier to
work there, even if all other things are equal. -
systems for tracking risk from open source dependencies often lack visibility
into vendoring, particularly if you’re doing a hybrid approach and only
vendoring a few things to address work in progress, rather than a
comprehensive and disciplined approach to a monorepo. If you fully absorb a
vendored dependency and then modify it, Dependabot isn’t going to tell you
that a new version is available any more, because it won’t be present in your
dependency list. Organizationally this is bad of course but from the
perspective of an individual developer this manifests mostly as fewer
annoying emails.
But there are problems on the open source side as well. Those problems are all
derived from one big issue: because we’re often working with relatively small
sums of money, it’s hard for upstream open source developers to consume
either money or patches from application developers. It’s nice to say that you
should contribute money to your dependencies, and you absolutely should, but
the cost-benefit function is discontinuous. Before a project reaches the
fiscal threshold where it can be at least one person’s full-time job to worry
about this stuff, there’s often no-one responsible in the first place.
Developers will therefore gravitate to the issues that are either fun, or
relevant to their own job.
These mutually-reinforcing incentive structures are a big reason that users of
open source infrastructure, even teams who work at corporate users with
zillions of dollars, don’t reliably contribute back.
The Answer We Want
All those options are bad. If we had a good option, what would it look like?
It is both practically necessary and morally required for you to have a
way to temporarily rely on a modified version of an open source dependency,
without permanently diverging.
Below, I will describe a desirable abstract workflow for achieving this goal.
Step 0: Report the Problem
Before you get started with any of these other steps, write up a clear
description of the problem and report it to the project as an issue;
specifically, in contrast to writing it up as a pull request. Describe the
problem before submitting a solution.
You may not be able to wait for a volunteer-run open source project to respond
to your request, but you should at least tell the project what you’re
planning on doing.
If you don’t hear back from them at all, you will have at least made sure to
comprehensively describe your issue and strategy beforehand, which will provide
some clarity and focus to your changes.
If you do hear back from them, in the worst case scenario, you may discover
that a hard fork will be necessary because they don’t consider your issue
valid, but even that information will save you time, if you know it before you
get started. In the best case, you may get a reply from the project telling
you that you’ve misunderstood its functionality and that there is already a
configuration parameter or usage pattern that will resolve your problems with
no new code. But in all cases, you will benefit from early coordination on
what needs fixing before you get to how to fix it.
Step 1: Source Code and CI Setup
Fork the source code for your upstream dependency to a writable location where
it can live at least for the duration of this one bug-fix, and possibly for the
duration of your application’s use of the dependency. After all, you might
want to fix more than one bug in LibBar.
You want to have a place where you can put your edits, that will be version
controlled and code reviewed according to your normal development process.
This probably means you’ll need to have your own main branch that diverges from
your upstream’s main branch.
Remember: you’re going to need to deploy this to your production, so testing
gates that your upstream only applies to final releases of LibBar will need to
be applied to every commit here.
Depending on your LibBar’s own development process, this may result in slightly
unusual configurations where, for example, your fixes are written against the
last LibBar release tag, rather than its current main; if the project has a branch-freshness requirement, you
might need two branches, one for your upstream PR (based on main) and one for
your own use (based on the release branch with your changes).
Ideally for projects with really good CI and a strong “keep main
release-ready at all times” policy, you can deploy straight from a development
branch, but it’s good to take a moment to consider this before you get started.
It’s usually easier to rebase changes from an older HEAD onto a newer one than
it is to go backwards.
Speaking of CI, you will want to have your own CI system. The fact that GitHub
Actions has become a de-facto lingua franca of continuous integration means
that this step may be quite simple, and your forked repo can just run its own
instance.
Optional Bonus Step 1a: Artifact Management
If you have an in-house artifact repository, you should set that up for your
dependency too, and upload your own build artifacts to it. You can often treat
your modified dependency as an extension of your own source tree and install
from a GitHub URL, but if you’ve already gone to the trouble of having an
in-house package repository, you can pretend you’ve taken over maintenance of
the upstream package temporarily (which you kind of have) and leverage those
workflows for caching and build-time savings as you would with any other
internal repo.
Step 2: Do The Fix
Now that you’ve got somewhere to edit LibBar’s code, you will want to actually
fix the bug.
Step 2a: Local Filesystem Setup
Before you have a production version on your own deployed branch, you’ll want
to test locally, which means having both repositories in a single integrated
development environment.
At this point, you will want to have a local filesystem reference to your
LibBar dependency, so that you can make real-time edits, without going through
a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp
branch, and waiting for all of CI to run on both.
This is useful in both directions: as you prepare the FooApp branch that makes
any necessary updates on that end, you’ll want to make sure that FooApp can
exercise the LibBar fix in any integration tests. As you work on the LibBar
fix itself, you’ll also want to be able to use FooApp to exercise the code and
see if you’ve missed anything – and this, you wouldn’t get in CI, since LibBar
can’t depend on FooApp itself.
In short, you want to be able to treat both projects as an integrated
development environment, with support from your usual testing and debugging
tools, just as much as you want your deployment output to be an integrated
artifact.
Step 2b: Branch Setup for PR
However, for continuous integration to work, you will also need to have a
remote resource reference of some kind from FooApp’s branch to LibBar. You
will need 2 pull requests: the first to land your LibBar changes to your
internal LibBar fork and make sure it’s passing its own tests, and then a
second PR to switch your LibBar dependency from the public repository to your
internal fork.
At this step it is very important to ensure that there is an issue filed on
your own internal backlog to drop your LibBar fork. You do not want to lose
track of this work; it is technical debt that must be addressed.
Until it’s addressed, automated tools like Dependabot will not be able to apply
security updates to LibBar for you; you’re going to need to manually integrate
every upstream change. This type of work is itself very easy to drop or lose
track of, so you might just end up stuck on a vulnerable version.
Step 3: Deploy Internally
Now that you’re confident that the fix will work, and that your
temporarily-internally-maintained version of LibBar isn’t going to break
anything on your site, it’s time to deploy.
Some deployment
heritage
should help to provide some evidence that your fix is ready to land in
LibBar, but at the next step, please remember that your production environment
isn’t necessarily emblematic of that of all LibBar users.
Step 4: Propose Externally
You’ve got the fix, you’ve tested the fix, you’ve got the fix in your own
production, you’ve told upstream you want to send them some changes. Now, it’s
time to make the pull request.
You’re likely going to get some feedback on the PR, even if you think it’s
already ready to go; as I said, despite having been proven in your production
environment, you may get feedback about additional concerns from other users
that you’ll need to address before LibBar’s maintainers can land it.
As you process the feedback, make sure that each new iteration of your branch
gets re-deployed to your own production. It would be a huge bummer to go
through all this trouble, and then end up unable to deploy the next publicly
released version of LibBar within FooApp because you forgot to test that your
responses to feedback still worked on your own environment.
Step 4a: Hurry Up And Wait
If you’re lucky, upstream will land your changes to LibBar. But, there’s still
no release version available. Here, you’ll have to stay in a holding pattern
until upstream can finalize the release on their end.
Depending on some particulars, it might make sense at this point to archive
your internal LibBar repository and move your pinned release version to a git
hash of the LibBar version where your fix landed, in their repository.
Before you do this, check in with the LibBar core team and make sure that they
understand that’s what you’re doing and they don’t have any wacky workflows
which may involve rebasing or eliding that commit as part of their release
process.
Step 5: Unwind Everything
Finally, you eventually want to stop carrying any patches and move back to an
official released version that integrates your fix.
You want to do this because this is what the upstream will expect when you are
reporting bugs. Part of the benefit of using open source is benefiting from
the collective work to do bug-fixes and such, so you don’t want to be stuck off
on a pinned git hash that the developers do not support for anyone else.
As I said in step 2b, make sure to maintain a tracking task for doing this
work, because leaving this sort of relatively easy-to-clean-up technical debt
lying around is something that can potentially create a lot of aggravation for
no particular benefit. Make sure to put your internal LibBar repository into
an appropriate state at this point as well.
Up Next
This is part 1 of a 2-part series. In part 2, I will explore in depth how to
execute this workflow specifically for Python packages, using some popular
tools. I’ll discuss my own workflow, standards like PEP 517 and
pyproject.toml, and of course, by the popular demand that I just know will
come, uv.
Acknowledgments
Thank you to my patrons who are supporting my writing on
this blog. If you like what you’ve read here and you’d like to read more of
it, or you’d like to support my various open-source
endeavors, you can support my work as a
sponsor!
