A (shorter) Three Horizons Framework for Government Reform
We keep propping up a system that’s run out of time to adapt
For the longer version of this post, go here.
I got into government reform sixteen years ago, though I didn’t think of it as reform at the time. I thought of it as just trying to make a few specific things work better. Since then I’ve worked at the local, state, and federal levels, on benefit delivery, on national defense, on a handful of things in between. But the government we have — the operating model it runs on, the rules and structures and assumptions that shape how it hires, procures, and delivers — was built for a world that no longer exists, and the distance between that world and this one is growing. We are approaching the kind of moment when that gap stops being a management problem and becomes a true legitimacy crisis. It’s time to start asking whether the theory of change most of us have been operating under — incremental improvements off a pretty poor baseline — was ever going to get us to a government capable of meeting fast-changing needs. It hasn’t yet, and if we don’t do something differently, it won’t.
Kelly Born at the Packard Foundation recently shared with me a framework called the Three Horizons, originally developed by Anthony Hodgson and adapted widely in systems-change work. In it, Horizon 1 is the currently dominant system. It’s functional enough to persist but failing in critical ways, especially for people with less power.1 Horizon 3 is the future system you’re working toward, already visible in patches of practice that embody different values and different ways of working, but far from the norm.2 Horizon 2 is the turbulent middle where change agents work.
But the key insight is that not all Horizon 2 work is the same. Some H2 innovations genuinely create the conditions for the new system to emerge. Call those transforming H2, or H2+. Others, however inadvertently, extend the lifespan of the failing system by relieving the pressure that might otherwise force structural change. Call those sustaining H2, or H2-. Both feel like reform, but they have very different long-term implications.
A lot of what the government reform field has built over the past fifteen years, including a lot of my own work, is probably H2-. Not because the people doing it were wrong, and not because the work didn’t matter, but because the structure of the work, no matter how well executed, had a systematic tendency to solve the immediate problem in a way that made the structural problem easier to tolerate. If that sounds like an indictment, the first place I’d point the finger is at myself. And I don’t regret any of it. Nor am I saying that there’s not a valuable place for H2- work. There is. I’d just like us to see it for what it has been so we can be more deliberate about what we do next.
There are a few patterns that play out over and over in the category of H2-, the work that sustains the status quo.
The pressure valve
The dynamic of unintentionally propping up the status quo is easiest to see in what you might call vertical interventions. An agency lacks the talent to do important work, so philanthropy pays for some detailees, often through an Intergovernmental Personnel Agreement. An agency can’t hire the people it needs under standard civil service rules, so Congress grants a particular team a special hiring authority, as it did for the CHIPS implementation team. The procurement rules will take too long, so an agency gets an Other Transaction Authority exemption. The answer to “the system doesn’t work” has been to hack it in the vertical domains that get attention, not fix it horizontally.
Each of these carve-outs delivers genuine value. Most of them are the right call in the moment. But each one also functions as a pressure valve on a system that, absent the relief, might generate the political and institutional pressure needed to fix the underlying problem. The legislative carveout that makes Paperwork Reduction Act compliance unnecessary for a particular program means the people and organizations with the most standing to demand PRA reform never do. They just move on to the next challenge. The rescue team that fixes a broken system well enough to make it functional returns it to “good enough,” which is exactly the condition in which structural change is hardest to achieve.
The cost of the workaround isn't just lost pressure. Every special authority, every exemption, every vertical fix makes the overall system more complex, more fragmented, and more navigable by sophisticated actors — large contractors, well-resourced agencies, organizations with the staff to learn which door to knock on. The organizations least able to navigate that complexity are typically the ones serving the populations with the worst outcomes. Vertical interventions don't just leave the underlying dysfunction in place. They tend to entrench it.3
The demonstration that doesn’t scale
A close cousin is the successful pilot, a project that shows that SNAP enrollment can be simplified, agencies can build digital services in-house, or test-and-learn mechanisms can improve program outcomes. The pilot works. But because the structural conditions for scaling don't exist, it stays a demonstration, or it gets scaled in a form that loses the properties that made it work, because the procurement rules, the civil service rules, or the oversight structures were designed for a different kind of work.
These pockets of better practice are real Horizon 3 evidence. They become H2- when the political system treats them as proof that the problem is solved rather than proof that the problem is structural. Demonstrations that show something is possible but don’t change the conditions that make it the exception (especially in the four competencies I’ve talked about before) can end up providing the illusion of progress.4 Philanthropy reinforces this: demonstrations are fundable because they produce legible, attributable, near-term outcomes. Horizontal structural reform — changing civil service rules, repealing the PRA, restructuring how digital projects get budgeted — produces outcomes that are diffuse, slow, and hard to attribute. The result is a field well-resourced to demonstrate what’s possible and under-resourced to change the conditions that keep the possible from becoming normal.
The co-optation trap
The third pattern may be the most pernicious. The field promotes concepts that carry genuine Horizon 3 values — user-centered design, agile development, evidence-based policy. And then the H1 system absorbs the vocabulary, where it does the work of legitimizing the status quo rather than challenging it. "Agile" is now a procurement category that large contractors have learned to perform while delivering waterfall outcomes on agile timelines. "Human-centered design" is a workshop format and a contract deliverable. When evidence production is mandated as a compliance activity, it can crowd out evidence as a genuine decision-making tool. An agency that goes looking for the best possible data because it wants to know something is in a very different relationship with that data than an agency that produces a required report because OMB or Congress needed the box checked. From the outside the two can look almost identical. They produce very different organizational behavior.
The deepest version of co-optation is when the system doesn't just absorb the vocabulary of reform, it learns to perform it. An agency can run user research, adopt agile ceremonies, and publish evidence reports while leaving the governance structures, vendor relationships, and incentive systems that produce bad outcomes entirely intact. The performance is often sincere. The people doing it believe in what the words mean. But copying the visible practices of a different way of working, without changing who has power, how decisions get made, and what anyone is accountable for, produces a somewhat nicer version of the same thing rather than a different thing. Reform becomes a ritual that signals values but doesn't deliver on them. The system learns to speak the language, but it doesn't actually change.
A different moment
The H2- work I’m describing has been done in good faith by people. I am one of those people. Code for America, which I founded and where I spent more than a decade, is in important respects capacity substitution. For much of the past fifteen years, the H2- path was arguably the right call. When there was no political space for structural change, demonstrations were a good way to build the evidence base and develop the field.
I think we are in a different moment now. This moment is defined by disruption. I count three kinds:
Contingent disruption — pandemics, climate events, geopolitical shocks, financial crises — is unpredictable in its specifics but very predictable in its category: large, fast-moving, high-stakes demands that fall disproportionately on government. COVID was not an anomaly. The next version won’t look the same.
The most recent disruption to federal government was political. Whatever the cost of its methods, DOGE made the brittleness of the current operating model impossible to ignore and created political openings for structural arguments that previously had no traction. The reform field did not create this moment. But it can shape what comes out of it.
AI brings structural disruption — a transformation already underway in the material conditions of work, economy, and administration. AI is not only an exogenous shock that government will have to absorb. It is also moving the bar on what counts as acceptable service in the first place. People are already using AI to understand their medical bills, navigate insurance denials, and draft appeals for benefits they were wrongly denied. Soon they will expect to apply for SNAP or file their taxes by uploading a paystub and answering a few plain-language questions. The forty-page PDF used to feel intolerable. The well-designed web form will start to feel that way too, and faster than the last transition did. And service delivery is only the most visible piece. If a small team with the right tools can map a regulatory regime in a week, the timelines we have now, in which rulemaking takes several years — or even multiple presidential terms — become indefensible.
So the gap we have been measuring, between what government delivers and what the public considers a basic level of competence, is widening from both ends at once. The system is straining to clear the old bar at the same moment the bar is rising. In a stable environment, H2- work that buys time for a failing system might be much-needed, and might be a missed opportunity for transformation. In an environment where disruptions of all kinds are accelerating, it becomes a compounding liability. Extending the lifespan of a brittle system just means the system eventually fails more spectacularly. More people get hurt. More people look for alternatives to democracy.
Sketching H2+
So what does H2+ look like? There certainly won’t be one definition everyone agrees on, but there are a few principles worth throwing out.
Move upstream. The most important marker of H2+ work is that it targets the conditions that produce the problem rather than the symptoms they generate. Getting a specific agency better digital services is H2- if it leaves untouched how digital projects are staffed, funded, and overseen across government. The upstream target is an operating model in which funding mechanisms, team structure, decision-making authority, roadmapping practices, and success metrics are all aligned around outcomes rather than outputs. A fellowship that places talented technologists in one agency and a civil service reform effort that changes how every agency can hire are both valuable. But they are not equivalent bets on the future of government capacity, and a portfolio that’s weighted heavily toward the former at the expense of the latter reflects a preference for the legible over the structural. H2+ philanthropy means being willing to fund work whose outcomes are diffuse, slow, and hard to attribute — because that’s often what structural change looks like from the outside.
Connect legislative and executive branch reform. The people working on congressional modernization and those working on executive branch reform have largely been operating in parallel, each with a partial picture. Legislative modernizers understand how the structure of congressional incentives shapes what agencies are asked to do and how they are overseen; executive branch reformers understand what those asks produce when they land in agencies. Together, that knowledge is considerably more powerful than either is alone. California’s Outcomes Review experiment — in which legislators are building structured feedback loops between the laws they pass and what those laws actually produce — is an early model of what it looks like when the legislative branch takes seriously its role not just in passing law but in learning whether law works.
Advocate, influence, build power. The reform field has been busy helping agencies navigate broken systems, building capacity to survive dysfunction, demonstrating that things could work better. That work was and is enormously valuable. But those needs will continue to grow as long as we put off structural reform. The people busy plugging holes in the system are exactly the people who most need to be in the reform conversations — they get what really needs to happen. It’s not about sidelining them. It’s about putting the pieces of the puzzle together and building a field that advocates effectively, and advocates for the right things. That means a diverse coalition whose messages can harmonize, including hard-hitting campaigners willing to play the game, not just technocratic analytical types.
Use federalism as a flywheel. States are laboratories of democracy5, but H2+ work doesn’t just wait for state and local governments to experiment and hope the results diffuse upward. It actively designs for the spread of proof points, using early-adopter states to generate evidence, building the connective tissue that carries lessons from those states to others and to the federal level and back down again. Progress up and down the federalism stack reinforces itself, if the field is organized to make that happen rather than leaving it to chance.
The window
None of this means rescue work should stop, or that demonstrations are worthless, or that capacity substitution isn’t helpful and needed. Some H2- work, done deliberately and named honestly, is best understood as experimentation: we’re running it inside the failing system precisely because that’s where we’ll learn what a new operating model has to do. Both can be valuable.
But the field needs a shared frame clear-eyed enough to ask, with each investment: does this move the system toward H3, or does it prolong H1? That question should be driving how resources, talent, and attention get allocated now — not because the prior work was mistaken but because the moment is different and the cost of extending the status quo is too high.
Some things haven’t changed. The community is still full of good, smart people with enormous insight into a very difficult problem. We’ve just run out of time to do it the way we’ve been doing it. Every H2- intervention that returns the system to “good enough” is now a bet that good enough will hold. It’s a bet I no longer think we can afford to make.
The window for H2+ work has not been open like this before. It will not stay open indefinitely.
I’ve disabled comments on this shorter version of the post. If you want to comment, go to the longer version and comment there.
“Horizon” is an imperfect metaphor. H1 is not distant, it’s the system you’re working inside today. A political theorist might call it a hegemony: dominant less because it functions well than because it shapes what seems possible, and therefore what gets attempted. I’ll use the horizon framework’s language throughout this post, but that’s what I mean by it. Anyone who wants to start talking about Hegemony 1 and Hegemony 3 instead is welcome to. It would also be correct, if a bit of a mouthful.
To state the obvious, there will be many Horizon 3s. They will not all be compatible. But articulating a variety of them would be incredibly valuable right now.
It’s tempting to read this list and conclude that the root problem is congressional dysfunction, that agencies hack the system because Congress won’t fix it and they have no other choice. Sometimes that’s true. But it’s true less often than people assume. In many cases the binding constraint isn’t statutory at all. Agencies develop unhelpfully narrow interpretations of existing authorities, treat those interpretations as immovable, and then seek explicit legislative permission to do something the law already allows. (When I served on the Defense Innovation Board, when Department staff would ask for an exception or new authority, Congressional staff would invariably reply “but you don’t use the ones we already gave you!”) Blaming Congress can function as its own kind of pressure valve, a way of externalizing the problem instead ofso that no one has to doing the harder work of changing internal practice, culture, and interpretation.
It is much more appealing to politicians and political appointees with loyalties to status quo actors to use demonstrations as an excuse to declare victory within the current broken structures than to tackle the underlying structural issues in the face of opposition from supporters who benefit from the status quo, or at least think they do. Even when politicals aren’t coopted by the interests of status quo actors, there’s just the practical issue of timeframe: Politicals are thinking about what they can accomplish – in a visible way – within very compressed electoral cycles. Their incentives are not aligned to the hard, long, time-consuming slog of systems change that will probably only visibly benefit the next leader (or the leader after that).
If you don’t already follow Daniel Stid’s Substack – The Art of Association – you should. His writing on pluralism, civic engagement, and state capacity in America and the role of philanthropy in it has shaped my own thinking.


