Skip to main content
by Chris Given

Current Events

I’ve been deliberately steering clear of using this space to react to current events. My last few months in government were discombobulating and highly reactive, and I’ve been trying to get enough distance to be able to think again. So although I’m doing plenty of raging against the calamities of the present moment, I’m not in a place to channel that rage into writing anything of use.

However, I have accumulated a small set of “thing bad” IRS topics that are maybe worth putting down on paper, if for no other reason than to get them out of my brain.

Clue: Direct File Edition

Newly minted IRS Commissioner Billy Long made some off-the-cuff remarks at a conference for tax practitioners, and this kicked off, by my count, the fourth news cycle this year about the killing of Direct File. Don’t get me wrong, as someone who’s emotionally invested in Direct File, jacking into a feed of social media outrage on a regular basis is plenty cathartic in a probably-not-healthy way. And each time the news breaks, word reaches more newly outraged folks. But this is starting to get a little repetitive as someone who has a Google News alert set for Direct File.

It does point to a likely answer for one outstanding question: who’s going to be the one holding the knife? Both Secretary Bessent and DOGE sought to avoid getting their fingerprints on the decision. It looked like Plan A was for Congress to diffusely take the blame, but language terminating Direct File was stripped from OBBBA for parliamentary reasons. They’re currently trying again via a rider on next year’s IRS budget; it’d be nifty if an amendment to remove that provision got a vote and forced members to go on the record. But absent Congressional action, it looks like “Big Beautiful Billy” (his words) will be the fall guy.

OBBBA also included a new $15 million report requiring the IRS to propose a new “public-private partnership” to replace Direct File. I’d wager this presages the triumphant return of Intuit to the Free File game, giving them a chance to recover their past fumbles. I’m also watching for what happens to the $15 million. There’s no conceivable way to spend that amount of money on writing a report in 90 days (let alone a report whose conclusions are already specified by the legislation), so the best possible outcome is the money just doesn’t get spent. The worst outcome is that the administration awards a sweetheart sole-source procurement and wildly overpays someone for little real work.

As for Direct File itself, you can still use it to file your taxes if you haven’t filed yet or requested an extension. Direct File returns are processed the exact same way as any other return, so it’s perfectly safe to use. It’s not dead yet. (But it will most definitely be dead before next year.)

Everyone Saw This Coming

The IRS lives and dies by the filing season calendar, so the comment from the Commissioner that I was most struck by was not his dismissal of Direct File but rather a revealing statement he made about the IRS’s readiness for next year. While nominally projecting confidence about the state of preparations, he warned that filing season won’t start until mid-February, as compared to the usual January timeframe.

It’s only August, and if the IRS is already expecting to be late, things could get much worse before next year. The IRS doesn’t issue refunds until filing season opens, so taxpayers will be angered by not being able to get their money. Even if the schedule doesn’t slip further, with the IRS’s back against the wall, they could be under immense pressure to start filing season before their systems are truly ready, which could result in errors and further delays for some taxpayers.

All of this is the obvious consequence of cutting 25% of IRS employees right before passing a complex new tax law. Even worse than the topline numbers, the “put them in trauma” strategy drove out many of the IRS’s long-time MVPs, the folks who could be counted on to make it work year after year, come hell or high water. Some really great people are still there, but they’re now spread even more thin. I’m rooting for them, but I don’t envy their task.

The IRS knows how bad things could get. I was surprised when the administration appeared to reverse course on its hiring freeze, and posted job announcements for permanent customer support representatives, the folks who field taxpayer questions via phone. But those announcements were quickly canceled without explanation.

Salting the Earth

I’ve been following the administration’s efforts to extract IRS data for use by ICE. Even if the scheme isn’t illegal outright, it seems ripe for abuse.

Every IRS employee is drilled on the importance of protecting taxpayer data. The number 6103, the section of the Internal Revenue Code governing when and how the IRS can share data, is so well-known that it becomes a sort of shorthand, e.g. “I think that’s a 6103 issue.” The collective responsibility to respect taxpayer privacy is such a pervasive cultural ethos, it wasn’t remotely surprising that the Littlejohn data breach was the work of a contractor, not an employee. Stating the obvious, these are not values ICE employees share.

The thing I keep returning to is that, as a means of achieving its stated purpose, I don’t think the data sharing arrangement as it’s currently described will be all that effective. Maybe this isn’t understood by the folks in charge, a result of everyone with an ounce of professional ethics or legal self-preservation staying well clear of the project (or, you know, getting fired). But perhaps there’s another purpose that’s going unsaid.

I think the potential “value” of the effort is the deliberate destruction of trust in government, particularly among immigrant communities, the equivalent of salting the earth after conquest so nothing will grow again. Yeah, that will be effective. To recover from this destruction and someday recultivate trust, it won’t be sufficient to say, “Don’t worry, we’re running things now.”

We’ll need to rebuild in such a way that this isn’t possible. We need to protect privacy with more than just norms and good intentions. And, contra a lot of civic tech orthodoxy, that could mean more restrictions around data sharing, not less.

My Favorite Judicial Opinion

Judge Carlton Reeves of the Southern District of Mississippi has written some real bangers in his career, but his magnum opus might be his 2018 opinion in Boatner v. Berryhill, an appeal of the Social Security Administration’s denial of disability benefits for a Mississippi man named Carl Boatner. Judge Reeves finds that Social Security wrongly denied Boatner’s claim.

The 26 page decision spends most of its ink reviewing, step by step, why it took the better part of a decade for Boatner to receive the benefits to which he is entitled. This recitation is exhaustive but also breathtaking in its clarity. Headings like “Did Boatner’s ALJ Review the Evidence Properly?” are followed by single-word paragraphs: “No.”

Reeves paints a damning picture of what happens when an administrative process is deprived of the resources to fairly evaluate cases like Boatner’s. He concludes, “Doing justice means finding truth. Finding truth takes empathy, expertise, and time. Without those resources, people who decide disability cases are doomed to do injustice.”

It’s worth reading in full. But the introduction is perfection in its plain acknowledgement of the facts of Boatner’s life and in its determination to not waste a second more of his time.

Carl Boatner spends much of his life waiting. He waits to catch his breath after walking a few dozen steps. He waits for family and friends to assist him in shaving and taking medications. He waits on car rides in rural Mississippi to his many doctor’s appointments. He waits in parking lots while others shop for him, afraid of having a medical emergency in public.

Boatner’s diagnoses include coronary artery disease, two liver diseases, diabetes, obesity, hypertension, spine disorders, major depressive disorder, and anxiety disorder. In 2015, Boatner received a terminal diagnosis of chronic obstructive pulmonary disease, but survived a months-long stay in hospice care.

Even before entering hospice care, doctors described Boatner as “chronically ill” and “disabled.” Between 2011 and 2015, they prescribed him about 17,000 pills. Boatner, now 52, uses oxygen tanks and other devices to help him breathe. He has had a number of strokes and heart attacks, with stents across his heart and liver, and recently had triple-bypass heart surgery. Around 2000, a heart attack ended his two-decade long career as a truck driver. He has not had a steady job since. Boatner last applied for a job in 2015 as a yard hand, but was rejected because the employer thought he “couldn’t hold up.”

Boatner has spent nearly a decade seeking disability payments from the Social Security Administration, filing his last application in 2014. Despite acknowledging the severity of Boatner’s medical conditions and his trips to death’s doorstep, the Administration has denied each of his four applications. These denials have been painful. One caused Boatner to walk out of his house, put a gun to his head, and threaten to kill himself.

Boatner filed this lawsuit to challenge the latest denial. No lengthy judicial opinion need resolve that challenge. Boatner plainly qualifies for disability payments.

But Boatner’s story is worth telling in full. It reveals a disability payment system tasked with managing millions of cases each year, yet stripped of the resources to decide those cases fairly. As Justice Thurgood Marshall once wrote, such an “unnecessary barrie[r]” to people with disabilities “stymie[s] recognition of [their] dignity and individuality,” and therefore requires careful review. Furthermore, the disability payment system aims to be “as protective of people’s dignity as possible,” a purpose “courts must give ‘due regard for.’” The Court must explore why, until today, the disability payment system has left Boatner waiting.

My Favorite Line From a Government Report

Government writing is usually stripped of all personality, so I like it when civil servants let loose in times of stress. It shows that the faceless bureaucracy is just people trying their best.

Here’s how Social Security’s Lancaster, California district office described the disastrous launch of the Supplemental Security Income (SSI) program on January 1, 1974, in an official report to Washington.

By mid-February we tried “Dial a Prayer,” and by mid-March were negotiating for an exorcist.

Quoted in Martha Derthick, Agency Under Stress: The Social Security Administration in American Government (Brookings Institution, 1990), 26.

The Things That Cannot Be Changed

Direct File was an obvious idea. Many people had tried to make it happen. The Direct File team was good, but we weren’t any smarter than those who came before. So what made this time different?

I’ve always found Reinhold Niebuhr’s serenity prayer to be a useful thought technology for life in government.

God, give us grace to accept with serenity the things that cannot be changed, courage to change the things that should be changed, and the wisdom to distinguish the one from the other.

Success as a civil servant requires an ample helping of all three traits, particularly serenity. However, “the things that cannot be changed” is not a static category, and it rewards assiduous attention. The smallest shift can make what was once an impossibility suddenly possible.

Direct File had long been on my dream project list, but when I began working on taxes in January 2021, my strong assumption was that any attempt to make it a reality, like those before, would doubtless fail. However, a little failure from time to time is healthy, and I figured it would at least make for a good story someday.

After years of jogging items up and down on my mental list of the most likely ways Direct File would fail, I didn’t realize how thoroughly I had internalized this assumption of doom until the moment Dixie Warden filed the first Direct File return on February 1, 2024. She had agreed to allow Bridget Roberts, Jen Thomas, and me to observe her progress through the application. Direct File still had some significant known bugs, so my responsibility was to call the whole thing off if Dixie ventured too close to certain tax landmines. As she signed and submitted her return, I realized that I had not let myself believe until that precise moment that we would ever actually be allowed to file someone’s taxes.

Chris Given and Bridget Roberts sit next to each other, scrutinizing a laptop screen. They sit at a round table in a nondescript government office. Chris is a white man with glasses and a beard wearing a dark button-down shirt. He is hunched over to be close to the screen, one hand resting on his forehead, the other gripping the table nervously. Bridget is a white woman with dark hair wearing a dark, patterned blouse. Of the two, she appears decidedly more placid. A large water bottle and an empty Diet Coke bottle sit on a shelf near her. There are at least three laptops cluttering the table, two PCs and one Mac; the Mac is adorned by a circular sticker with an abstract illustration of a flamingo and a fern in front of an ocean.
Photo: Bridget and I watch as Dixie files her taxes, February 1, 2024.

Three years earlier, no one could have anticipated that moment would come. Direct File wasn’t on the agenda. There wasn’t and would never be a grand strategy to mastermind events to bring it about. But events, as is their wont, happened anyway. It’s a stretch to call it “wisdom,” but by setting up at the right place at the right time, we were able to perceive that a window of opportunity had cracked open. These are the three events that made this time different.

1. The pandemic

On March 27, 2020, the Coronavirus Aid, Relief, and Economic Security (CARES) Act was signed into law. One of its provisions was the Recovery Rebate Credit (RRC), more commonly known as stimulus checks, advanced in the form of $1,200 economic impact payments to individuals (plus $500 for each qualifying child dependent). RRC had a subtle but significant difference as compared to a previous programs like that of the Economic Stimulus Act of 2008 (ESA). ESA also provided relief payments to individuals, but it crucially required at least $3,000 of income to qualify for the credit. In the CARES Act formulation, RRC could be claimed by anyone with a Social Security number, even if they didn’t have a cent of income.

The IRS launched a massive effort to get checks out to as many people as possible, working with the Social Security Administration and the Department of Veterans Affairs to identify recipients of Social Security benefits, Supplemental Security Income recipients, and disabled veterans to whom checks could automatically be sent, despite the recipient not having filed a tax return. I’m still astounded and endlessly impressed that the IRS and partner agencies were able to pull off this feat with workforces still reeling from the early weeks of the pandemic.

For everyone else, you needed to have filed a recent tax return in order for the IRS to know how to get you your money. But a small group of federal employees at the Department of Treasury masterminded an ingenious policy hack and issued Revenue Procedure 2020-28, which created a new option for anyone who wasn’t required to file a tax return, generally because they made less than $12,200 or $24,400 for a married couple. Instead of needing to assemble the documentation required to file a full return, these taxpayers could instead file a simplified return claiming just $1 of income.[1] The IRS would understand these to be returns for the sole purpose of receiving an economic impact payment and agreed not to challenge their accuracy.

In order to make it possible to file such a return, Intuit, under the auspices of the Free File Alliance, repurposed the existing Free File Fillable Forms product to stand up a non-filer portal (officially, the “Non-Filers: Enter Payment Info Here” tool). Launched on April 10, just two weeks after enactment of the CARES Act, the non-filer portal enabled millions of additional Americans to receive stimulus checks. An Intuit employee would later describe this work as the most rewarding of his career.[2]

The following year, the American Rescue Plan Act built on the CARES Act model in its expansion of the Child Tax Credit (CTC). The refundable portion of the credit had previously required at least $2,500 of income, but now, in addition to the full credit being refundable, any taxpayer could claim the credit even if they had no taxable income. Hundreds of thousands of kids who had previously been excluded from the benefits of CTC would now see their share of the money.

In order to reach the children of non-filers, the IRS and Treasury dusted off their CARES Act playbook, and on June 14, 2021, they reopened the non-filer portal.[3] But after a year, the portal was seen in a new light. For those who have not had the pleasure of using Free File Fillable Forms, it’s a deliberately excruciating experience.[4] The interface imitates paper forms (don’t try it on a phone) and invites the user to manually type numbers into boxes. The non-filer portal used the same building blocks, so the experience is comparable, just with fewer boxes to type in. That might have been fine for something shipped in two weeks, but there had now been ample time to make it even slightly better.

Congress and advocates began to ask reasonable questions. Can we make this work on phones? Can we get this in Spanish? Can we make it not look sketchy as shit? (Okay, that last one might have just been me.) The IRS’s answer, or what was really the Free File Alliance’s answer: no. Congress didn’t love that answer; neither did the advocates. It’s not a great feeling to go through the rigamarole of passing a law, only for its benefits to not reach those who need them the most because the delivery mechanism is hostage to the whims of a third party incentivized to make that experience as bad as possible. Administration officials were startled to be on the receiving end of this ire.

A senior advisor to Treasury and I had anticipated this eventuality a couple months prior, and we had started a conversation with Code for America about the possibility of them creating an alternative to the official non-filer portal. The result, GetCTC.org, began beta testing in August, and provided the White House with a much needed solution to its portal problem. It worked on phones, its plain language was available in English and Spanish, and for bonus points, it looked like a tool you could trust. The White House enthusiastically promoted GetCTC as part of their CTC outreach efforts.

Even as a nonprofit came to their timely aid, however, officials had learned an object lesson about what it meant for the government to not be in control of its own destiny, a lesson that would not soon be forgotten.

2. Intuit leaves Free File

Those alive at the time will always remember where they were when they heard JFK was shot. I will always remember where I was when I heard Intuit was leaving the Free File Alliance.[5] It was an event that so completely upended what was possible, I was vibrating for a week afterwards.

Free File was created in 2002 and successfully forestalled the development of Direct File by more than twenty years. The Bush Administration had established “E-Government” as a core pillar of its President’s Management Agenda and proposed the creation of “an easy, no-cost option for taxpayers to file their tax return online.” At the time, the average cost of electronic filing via a third party was just $12.50 ($22.41 adjusted for inflation), which the administration noted was quite a bit more than the cost of a 34 cent stamp.[6]

The tax preparation industry scrambled for an alternative. Intuit organized its competitors to form the Free File Alliance and offered the IRS a deal: the companies would agree to provide free versions of their products to low-income Americans in exchange for the IRS committing to not offer online filing. Heralded as a “public-private partnership,” the arrangement appealed to the White House, despite some hesitation from the Office of Management and Budget and Treasury.

For its part, the IRS was relieved to not have to undertake a risky, high-profile IT project. The memory of 1996’s Cyberfile disaster was still relatively fresh. In late 2022, I attended the retirement party of a long-time IRS employee and chatted with someone who was closely involved in the early E-Government work. He recalled of Free File, “We were given just nine months to get something out the door.” He gave me a wry smile. “Sounds familiar, no?” He wished me luck; it was clear he thought Direct File would need it. (He wasn’t wrong.)

What happened next has been well documented, including actions the IRS took to limit the scope of the Free File program out of concern for the “future revenues and profits of the publicly traded company participants,” actions that had Free File taken on its own behalf, would have constituted illegal collusion and price fixing. Apparently by design, Free File usage dwindled.

In 2005, the IRS terminated its TeleFile program, which let taxpayers file via phone.[7] Paper returns were now the only remaining way to file without going through a third party. The success of electronic filing relied on the goodwill of industry. Efforts to prevent identity theft and refund fraud hinged on the voluntary participation of companies, and the IRS was often negotiating from a position of weakness.

Then came 2019, a tumultuous year for Free File. The year started promisingly enough, with the near realization of lobbyists’ long-held dream of enshrining Free File into law via the Taxpayer First Act. However, public backlash resulted in this provision being stripped, and over the coming months, reporters at ProPublica released a series of articles highlighting deceptive industry tactics. Investigations were launched, and by the end of the year, Free File was on its heels. In an attempt to deescalate, Free File and the IRS agreed in December to remove the language from their Memorandum of Understanding (MOU) that barred the IRS from offering Direct File.

Both parties understood that this was for show. Little had changed, and the IRS’s hands remained effectively tied. When I arrived at the agency in 2021, Free File was described to me as “the third rail of the IRS.” Venture too close to that third rail, as the Volunteer Income Tax Assistance program found out when its Facilitated Self Assistance model became too popular during the pandemic, and you could expect to be put on blast by industry lawyers during tense phone calls.

I saw Free File less as a third rail and more as a Gordian Knot. The way it prevented Direct File was not by explicitly barring it, nor via awkward meetings, but rather by making it too hard to do anything else. Despite Free File’s underutilization, there were about four million taxpayers who used it in filing season 2021. Even after the prohibition was dropped in 2019, the MOU still contained the following clause:

Should the IRS commit funding to offer Services for free to taxpayers the IRS shall notify [Free File] immediately. If the IRS gives such notice during the tax season (between January 1 and April 15 […]) of any year, [Free File] may, by written notice to IRS, terminate this MOU, effective on April 16 […]. If the IRS gives such notice between April 16 […] and October 15 of any year, then [Free File] may, by written notice to IRS other than during tax season, terminate this Agreement, such termination to be effective no fewer than 30 days after the date of [Free File]'s notice of such termination. If IRS gives such notice between October 15 and December 31, [Free File] may by written notice immediately terminate this Agreement at any time on or before December 31.

The notification clause allowed industry to take their ball and go home the moment the IRS committed a single dollar to Direct File, instantly depriving some number of taxpayers of the method they used to file. But since the IRS had been sitting on the sidelines for literal decades, it would require years of work to catch up and create a service that could address the diverse tax situations of all of those taxpayers to the standards of the IRS. In the short-term, some people would need to find a new way to file, and industry would point the finger at the IRS. The IRS had no room to maneuver out of the predicament in which it had been trapped.

Then on July 15, 2021, Intuit announced that it would no longer participate in Free File, following the lead of H&R Block in 2020, leaving only TaxHawk (aka FreeTaxUSA) and a number of smaller companies to man the fort. The eminently predictable result was that a million fewer people used Free File in 2022. How did this enable Direct File?

In short, Intuit gave up its leverage. In the chronology as it played out, there are four events. (1) Intuit announces it’s leaving Free File. (2) A million fewer people use Free File. (3) The IRS announces it will pilot Direct File. (4) 140,803 people use Direct File in a limited pilot.

Now imagine a different chronology. (1) The IRS announces it will pilot Direct File. (2) In response, Intuit announces it’s leaving Free File. (3) A million fewer people use Free File, and only 140,803 people use Direct File. I would argue that in this world, Direct File would have been dead on arrival. The IRS would have gone through a whole lot of trouble for the net result of fewer people filing their taxes for free. It just didn’t work politically, and I saw no way of untangling that knot.

But then Intuit cleaved the knot for us. And so when the Direct File pilot happened, Free File saw a significant increase in usage, the result of more taxpayer interest in options for filing for free (and admittedly, a whole lot of Free File/Direct File brand confusion). Taxpayers had more options, not less, and more of them were able to file for free, an unambiguous win.

(Now that Direct File lies comatose, Intuit gets a second bite at the apple, and they and the rest of the tax preparation industry will attempt to restore their leverage over the IRS and state revenue agencies, lobbying legislators and attempting to persuade/bully tax administrators into doing long-term damage to the government’s negotiating position. Something to watch for.)

3. The Inflation Reduction Act

There is a pervasive narrative, supported by many talking points to this effect, that the Inflation Reduction Act (IRA) led to Direct File. And it did, but maybe not in the way you think. The IRA’s impact wasn’t just money, or a report; it completely inverted the political calculus around Direct File.

Certainly the IRA’s $80 billion in IRS funding provided ample elbow room for new initiatives like Direct File, and I never again had to answer the question, “But where will the money come from?” However, over the entirety of my time on the project, the IRS spent on Direct File an amount less than 0.1% of the funding provided by the IRA. As a share of the IRS’s scant enacted budget excluding IRA funds, Direct File would have comprised 0.26%, the equivalent of an ad-free Netflix subscription for a household earning $80,000. The IRA funds put wind in our sails, but Direct File was never going to break the bank.

The IRA contained a provision asking the IRS to spend $15 million to write a report about Direct File. After delivery of the report, the Secretary of the Treasury directed the IRS to pilot Direct File in filing season 2024, and the rest is history. But as the author of multiple sections of that report (full credit to Amy Paris as the primary author; she would agree), there’s no way our words alone would have been worth $15 million, particularly compared to the $24.6 million price tag of producing a report and shipping the damn thing (and producing a second report as a bonus). There’s a case to be made that the report to Congress was politically useful, but I think this case is overstated and relies heavily on post hoc reasoning. I can only hope lawmakers realize before repeating the experiment that despite Direct File’s success, the likeliest outcome of the provision was a costly but swiftly forgotten report. (Oh. Oh no.)

No, the IRA’s greatest contribution to Direct File: it passed.

During the early discussions of Direct File, the greatest reluctance came from a contingent, particularly in Treasury, who believed it would jeopardize the legislative chances of what was then called the Build Back Better Act, and in particular, its banner rejuvenation of the emaciated IRS budget. The Senate was split 50/50, and Democrats only held a technical majority via the tie-breaking vote of the Vice President. Unable to lose even a single vote, the camp resisting Direct File viewed movement on a “controversial” program as the straw that would break the camel from West Virginia’s back.[8]

Even as Build Back Better’s prospects looked increasingly grim, hope sprang eternal on Pennsylvania Avenue,[9] right up until Senator Manchin hammered what appeared to be the final nail in the act’s coffin on July 14, 2022. But less than two weeks later, well before this new reality could sink in, Senators Manchin and Schumer announced they had struck a secret deal to save the tax and climate portions of the bill, and unveiled the IRA. It swiftly passed and was signed into law on August 16.

Not only was their Direct File hesitance now obviated by the bill passing Congress, the proponents of additional IRS funding faced a new challenge. They needed to demonstrate to the American public the value of investing $80 billion in the IRS. For policy wonks, the case was a slam dunk. The IRA would enable the IRS to pursue high-income individuals and multinational corporations, whose complex returns had escaped scrutiny as the IRS was starved of resources. The Congressional Budget Office estimates that each additional dollar invested in tax enforcement yields between $5 and $9 of additional revenue. More funding for the IRS wasn’t spending money, it was raising money by enforcing the laws already on the books.

But $80 billion is a lot of money, and even the wonks understood that the IRS needed to show taxpayers a tangible return for this massive investment. Answering phones and clearing paper backlogs were obvious and important service wins, but these accomplishments wouldn’t be felt by most taxpayers.

The IRS needed something bigger, something exciting, something that showed that the bar had been raised for the kind of services taxpayers could expect from an adequately funded agency.

Consensus was within reach. Direct File was no longer a thing that couldn’t be changed, and it had quite suddenly become a change that voices across government were clamoring for, if only the IRS could find the courage.


  1. IRS systems didn’t support tax returns with $0 of income, thus the hack of telling everyone to say they made a buck. ↩︎

  2. I say without malice that he should consider public service. ↩︎

  3. The delay until June was to avoid taxpayers choosing simplified filing over filing a full return, which in many cases would allow the taxpayer to claim additional benefits. ↩︎

  4. It’s how I file my own taxes, since DC was waiting until 2026 to join Direct File. I guess I’m going to continue doing so, unless I switch to paper as a small protest against Direct File’s termination. ↩︎

  5. Outside of Ted’s Bulletin on 14th Street Northwest. I had just gotten dinner with friends, one of whom (coincidentally a former Intuit employee) had asked that we stop at Ted’s so she could get a malt. The resulting shake turned out to be mostly malt powder, so she had gone back inside to ask for a more correctly proportioned replacement. While we waited, another friend (actually the senior advisor to Treasury) was checking his phone and announced, “Huh, Intuit’s leaving Free File.” For the record, any amount of malt powder is a fine way to ruin perfectly good ice cream. ↩︎

  6. The E-Government team also recognized the importance of authentication, and they proposed the creation of a single sign-on for government. They warned, “Not undertaking a consolidated authentication approach would cost an additional $200 million in development costs, $26 million in acquisition costs and would delay implementation of the E-Government initiatives to 2005 and beyond.” Oh, if only they knew. ↩︎

  7. One of the engineers who built TeleFile also worked on Direct File, playing a key role in developing the State API, among other contributions. ↩︎

  8. At no point has Direct File actually been controversial; polling shows massive popular support among taxpayers across the political spectrum. The only opposition is from lobbyists. ↩︎

  9. Main Treasury and the White House are next-door neighbors on Pennsylvania Avenue. ↩︎

GAO: IRS Should Strengthen Oversight of Its Identity-Proofing Program

James R. McTigue, Jr.:

IRS officials were unable to show us that they had independently documented measurable goals or objectives to manage the outcomes of its identity-proofing program. According to federal internal control standards, officials should define objectives clearly to enable the identification of risks and define risk tolerances. Without independently established measures and goals, IRS cannot determine whether the performance of ID.me’s solutions meets IRS needs.

Without goals or objectives set by IRS, it is also not clear which of the several measures that ID.me provides are the best matches for what IRS needs or what level of performance is appropriate for a given application. For example, ID.me’s true pass rate excludes both users who abandon the process and users identified as highly probable fraudulent. In establishing measurable goals, IRS could determine that information on such users are essential performance measures that need to be established. Furthermore, as IRS continues to expand online services that require identity proofing it will need to consider additional metrics for these services.

At some point I’m going to write a long post about Direct File and identity. Today is not that day. But expect this report to be cited when I do.

GAO reports are pretty technical, and it’s understandable that some of the early takes I’ve seen kind of miss the mark. This is the key point: the IRS abdicated its responsibility to define what good looks like, outsourcing that responsibility to its vendor.

Algorithm Improvement Suggestions

So last week’s post was not so subtly a provocation for the “just add AI” approach to government. But hey, now that I’ve already done the work of sussing out what’s going on and writing it out in detail, let’s add AI and see what happens.

I asked ChatGPT to read the post and suggest a few ways the algorithm could be improved.

Thanks for sharing the post. It’s a rich and thoughtful deep dive into a real-world government algorithm with significant care and attention to fairness, complexity, and transparency. Still, even great systems can be improved. Here are several ways the algorithm might be enhanced or revisited, based on what the blog described.

Oh god, the sycophancy.

  1. Reduce Reliance on Heuristics in Proportional Allocation
  • Current approach: The AMA algorithm uses fixed parameters like the INTERPOLATED_DIRECT_REVIEW_PROPORTION_ADJUSTMENT (e.g., 67%) and a MAXIMUM_DIRECT_REVIEW_PROPORTION cap (e.g., 80%).
  • Suggested improvement: Replace static interpolation factors and hardcoded caps with dynamic, data-driven optimization. Techniques like Bayesian updating or reinforcement learning could help the system adapt in real time based on backlog levels, processing rates, and success in meeting deadlines.

Right off the bat, I think this provokes an interesting question. ChatGPT suggests a system that requires less human intervention because it optimizes itself toward a defined set of goals.

In the algorithm as I designed it, there are certainly examples of exactly this. You could imagine a version of the algorithm that preserved the concept of “docket month,” and allowed a human operator to specify how deep on the docket the machine was allowed to look, historically a policy decision that humans at the Board had been making. In this case, I opted to completely replace this process. Given the pace at which humans can orient and act upon new information, a process with a human in the loop would necessarily require a greater margin of error, and it was better for the algorithm to determine the most efficient possible parameter at any given time.

The same could be said for the parameters ChatGPT has called out above. Except that I think those items differ from the docket month example in terms of the pros and cons that must be weighed to change them. There is no right answer to whether the Board should prioritize Direct Reviews over legacy appeals.

These are policy questions in a complex decision space without an optimal answer. As I explained the algorithm to different audiences, one thing that I would always emphasize was that it provided “levers” that decision makers could pull. It wasn’t my place to decide those questions as the algorithm’s designer, as I would if I had obfuscated my opinions via an optimization function specified in a black box algorithm.

  1. Implement Feedback Loops for Goal Reassessment
  • Current approach: The system assumes the 365-day goal for Direct Review decisions remains static, even when it becomes unrealistic.
  • Suggested improvement: Introduce a feedback mechanism that automatically flags when a policy goal (like the 365-day target) is routinely being missed, and prompts a reassessment. This could be both internal (staff dashboards, alerts) and external (public transparency reports).

This is right on, and actually prompted me to go back and add a note about the dashboard we built for this very purpose. (ChatGPT read an early draft; I was paranoid it might find some egregious error past me had made and thus undermine current me’s whole reason for writing.) But I like the suggestion that the data should be public.

  1. Address Race Conditions in FIFO Distribution
  • Current approach: Race conditions can occur when two judges request cases simultaneously and might skip over the true “oldest” FIFO case.
  • Suggested improvement: Introduce record-level locking or atomic fetch-and-assign mechanisms at the database level, even for legacy VACOLS records, possibly via a transactional wrapper layer or a temporary in-memory locking system.

My post flagged this as a theoretical problem, and yeah, it’s totally fixable. Caseflow already was using Redis; it wouldn’t have been hard to implement a locking mechanism. Heck, I added locking in Redis to my Slackbot to look up acronyms, which was nowhere near as mission critical. But the cost is adding complexity to the system, adding one more thing that could break. Given the negligible potential impact, I think I can defend letting that one go.

  1. Use Better Predictive Models for Case Inflow and Processing Time
  • Current approach: Direct Review pacesetting proportions and legacy NOD adjustments are based on historical averages (e.g., 40% of NODs becoming Form 9s).
  • Suggested improvement: Use predictive analytics or machine learning to estimate case inflow and judge workload more precisely. For example:
    • Train models on NOD metadata to predict Form 9 likelihood.
    • Use survival analysis to forecast case readiness timelines.
    • Update predictions continuously based on actual outcomes.

This one’s amusing to me. Once I finally got a VA computer (six weeks after I started work at VA), one of the first things I did was use survival analysis to model the appeals process end-to-end. Those models ended up having a ton of uses, both internal (workforce planning) and external (time estimates for Veterans checking their appeal’s status on VA.gov). ChatGPT would be proud. As for whether I should have plugged those models into the algorithm instead of the hardcoded 40% heuristic, nah, not worth the extra complexity.

  1. More Granular Customization of Batch Sizes

The feature it turned out nobody needed? Yeah, not adding that. I guess I did call it a “disadvantage,” though, so that one’s on me.

  1. Improve Fairness Monitoring Across Judges
  2. Add Resilience for Sudden Capacity Shifts

These are nice sounding words that mean nothing.

  1. Publish Algorithm Documentation and Code
  • Current approach: The algorithm used to be public but was removed from GitHub.
  • Suggested improvement: Restore the public repository, redacting any sensitive implementation details, to meet modern expectations for algorithmic transparency in government decision-making. This helps build trust and allows outside experts to contribute improvements.

I agree completely.

Sittin’ On the Docket At VA

And now for something completely different.

This is a blog post about designing algorithms for government. “Algorithm,” in this context, just means “an automated way of making a decision.” I am going to describe, in great detail, a specific algorithm that I created for the U.S. Department of Veterans Affairs (VA). If you’re a U.S. Veteran who had an active case with the Board of Veterans Appeals (the Board) at any time since November 2018, this algorithm affected your appeal.

This post is going to get wonky. And it’s very long. I’m going to try my best to inverted pyramid this thing. It’s entirely okay to stop reading when you’ve had enough. There isn’t a twist at the end. There isn’t a reward for having finished it. The only reward is more detail. There is no bottom.

I’m sorry.

Why does any of this matter?

I hope to illustrate three things.

First, this is a story about automation. In this story, I automate a manual, bureaucratic process, from start to finish. But no one was fired, and in fact, the people who used to own the process were thrilled to be able to focus on more important work. To quote Paul Tagliamonte, “Let machines do what machines are good at, let humans do what humans are good at.” Everyone on all sides of this automation story was committed to doing right by Veterans. I sought to approach the task with the same level of care as its former stewards had done for many years, and I leveraged insights from the manual steps they had worked out through trial and error.

Automating this process involved more than just translating legal rules into instructions for a machine. The manual process involved bounded but significant discretion to ensure that the appeals continued to move apace. As we’ll see, a statutory change created even more opportunities for discretion. Machines don’t do discretion (or at least, they shouldn’t). So in concretizing the process into an algorithm, that discretion falls to the algorithm’s designer. Automation is thus a kind of rulemaking, but without established methods of public participation like notice-and-comment. And even while the decision space was sufficiently constrained to preclude bias on the basis of protected factors, I still needed to grapple with fundamental questions of fairness and ensure that humans remained in the driver’s seat.

Second, this is a story about complexity. You hear about civil servants studiously attending to their small niche of bureaucratic arcana, and this is a guided tour of one such niche. It’s not my niche; I was just a visitor. If, a couple thousand words into this post, the fleeting thought “maybe this whole process should just be simpler” crosses your mind, I understand. And maybe it should be! Sometimes complexity is a once good idea that’s gone rotten with time. Sometimes it was always a bad idea. But I guarantee there was intention behind the complexity. It’s trying to help someone who was left behind. It’s there because a court ordered that it must be, or because Congress passed a law. And sometimes, it might actually be load-bearing complexity. Take a sledgehammer to it at your own risk.

I think we should strive to “make it simpler” more often. Certainly I feel that Congress and the courts rarely give the challenges of implementing their will due consideration, nor do they always consider how things like administrative burden can adversely affect outcomes. But from the standpoint of most government workers, you have no choice but to make complexity work every single day. This is a story about how to make it work.

Finally, this is a story about transparency. The system that we are going to be looking at was developed in the open. Anyone in the world could check out its source code or read discussions the team had while designing and building it. This system manages every Veteran’s appeal, and instead of asking them to trust us, we’re showing our work. Because the U.S. government built it, it’s in the public domain, and you can find it on GitHub.

Except that’s a dead link. It’s dead because VA decided to take a project that had successfully operated with complete transparency for more than three years and hide it from the public. I don’t know why they decided to do that; I was long gone. I have no reason to believe there were any nefarious motivations, apart from a desire of certain officials to steer the project in a very different direction than that which my team believed in (a direction that, as the subsequent years have made clear, has not worked out).

So I’m going to have to describe the algorithm as it existed when I left VA in May 2019. I can tell from the Board’s public reports that, at a minimum, the parameters of the algorithm are set to different values than when I left (which is great; that’s why I put them there). Maybe the algorithm itself has been completely replaced (that would be cool; I don’t have a monopoly on how to solve the problem, and maybe I got something wrong!).

What’s not cool is that the public has no way of knowing, that Veterans have no way of knowing. For all I know, the algorithm has been replaced by an octopus (literal or figurative).

It is just a tiny component of a tiny system in a tiny part of VA, which is just one part of the U.S. federal government. But all across government, more and more decisions are being automated every day. For all the attention paid to the introduction of automation into decisions of enormous consequence rife with potential for discrimination (rightly so!), there are a host of smaller decisions that nevertheless matter to someone. How we approach the task of automating these decisions matters. The guardrails we put around it matter. Whether anyone even understands how the decision was made matters.

If you make it to the end of this post (and again, it’s really, really okay if you don’t), you will be able to explain every decision this particular algorithm will ever make. Why shouldn’t every government algorithm be knowable in this way? No, you couldn’t possibly invest the time to understand each and every algorithm that affects your life (are you absolutely sure you want to invest it here?), but wouldn’t it be reassuring to know that you theoretically could? (And, satisfied in the knowledge that you could have read this post to the end, wouldn’t it be so much better to go to your local library and pick up a nice Agatha Christie?)

What it do?

When a Veteran appeals a decision made by VA, such as a denial of a claim for disability benefits, they enter a sometimes years-long, labyrinthine process. Eventually, their case will reach the Board, a group of more than 130 Veterans Law Judges and their teams of attorneys, who decide the appeal.

At time of writing, there are more than 200,000 appeals waiting for a decision from the Board.[1] These appeals wait on a “docket,” which is lawyer speak meaning they wait in a line. Per the regulations, “Except as otherwise provided, each appeal will be decided in the order in which it is entered on the docket to which it is assigned.”[2]

When a judge is ready to decide another appeal, they need to retrieve the first case on the docket, the next case in line. The algorithm we’re discussing determines what case to give the judge. It’s called the Automatic Case Distribution algorithm.

That’s it. Really. “What’s the next case in line?” is all we’re doing here.

Of course, as you scroll through this 6,400 word blog post, you can probably guess that it’s going to get a lot more complicated than your average deli counter. The manual process was run by a team of four people (with other responsibilities too), armed with lots and lots of Excel spreadsheets.

Then Congress passed the Veterans Appeals Improvement and Modernization Act of 2017. Now the Board would have not one, but four dockets. Any chance that four humans could keep track of it all went out the window. Now automation wasn’t optional, it was essential. But before tackling all of that complexity, I started by automating just the one docket. So let’s start there too.

Easy mode: a single docket

When an appeal arrives at the Board, it is assigned a sequential docket number that starts with the date.[3] If we just sort the appeals by that number (and thus, implicitly, by the date they arrived), we’ll get a list of appeals in docket order.

Judges request appeals in batches, which they’ll divvy up among their team of attorneys, who will review the available evidence and write decisions for the judge to review and sign.

In order to supply a judge with a fresh batch of cases, the team managing the docket would run a report in the legacy Veterans Appeals Control and Locator System (VACOLS) to retrieve a spreadsheet of the oldest cases ready to be decided. Working in docket order, they would move the requested number of cases to the judge in VACOLS and email the judge a list of those cases for tracking. (Before the case files were digitized, the paper case file would also need to be sent to the judge’s office.)

It’s pretty easy to imagine what this process looked like once automated. The judge would log into Caseflow, the system we were building to replace VACOLS. Assuming they had already assigned all of their pending cases to attorneys, they would be presented with a button to request a new batch of appeals. Click that and new cases would appear, ready to be assigned.

One downside of the automated approach relative to the manual process was that I designed it to always assign each judge a set number of cases, three cases for each active attorney on the judge’s team. This parameter, cases per attorney, was configurable by the Board, but not by individual judges. Back when the humans were running things, judges were able to request whatever number of cases they wanted. But in user research with judges, we didn’t hear that anyone really needed to customize the number of cases, a finding that was confirmed after we launched and didn’t get any pushback. Using a fixed number of cases, called the batch size, kept things more predictable.

There are two complications we have to care about. First, some cases are prioritized. A case can be prioritized because it is Advanced on the Docket (AOD), a status granted to cases where the Veteran is facing a financial or other hardship, a serious illness, or is 75 years or older. A case can also be prioritized because the Board’s decision was appealed to the Court of Appeals for Veterans Claims (CAVC) and it was remanded back to the board to correct a problem. If the case is prioritized, it must be worked irrespective of docket order.

Second, if a judge had held a hearing with the Veteran, or if they had already issued a decision on the appeal, that appeal is tied to the judge and must be assigned to them, and no one else. If the judge retires, the Veteran would need to be offered another hearing with a different judge before the Board could decide the case.

For these reasons, the humans had a more difficult task than just selecting the first several rows of a spreadsheet. Prioritized cases were front of the line, but they didn’t want any single judge to become a bottleneck after getting assigned too many of the most time-sensitive cases, so the team would try to ensure that each judge got a balanced number of AOD cases, while also ensuring that no AOD case sat too long.

If there were cases that were tied to a particular judge, it wouldn’t make sense to assign that judge a bunch of cases that could have been worked by anybody (gen pop cases). So the team was allowed look past the strictly oldest cases on the docket to keep things moving. In order to keep this aboveboard (after all, cases are supposed to be assigned in docket order), each week the team would determine a “docket month” by calculating the age of the Nth oldest case, where N was a number that was agreed upon by the Board and stakeholders. Any case that was docketed in that month or before was considered fair game, giving the team the wiggle room they needed to keep things moving smoothly.

The algorithm basically replicates this approach, with some machine-like flair. The concept of docket month, an easy-to-use tool for humans to keep track of which cases they could assign, is an unnecessary abstraction for a computer. I replaced it with a concept called docket margin. Even though judges request cases on their own schedules, the algorithm starts by asking, “What if every single judge requested a new batch of cases at the same time? How many cases would I distribute?” That count is our docket margin, a rough estimate of the concurrent bandwidth of the Board.

To determine how many prioritized cases to give to each judge, we count the number of prioritized cases and divide it by the docket margin to arrive at the priority target. Multiplying the batch size by this proportion and always rounding up, we arrive at the target number of prioritized appeals we want to distribute to the judge.

Here’s some Ruby code. The code is just another way of saying the same thing as the above paragraph, so if don’t like to read code, you’re not missing anything.

def priority_target
  proportion = [legacy_priority_count.to_f / docket_margin, 1.0].min
  (proportion * batch_size).ceil
end

We can also use the docket margin to derive the docket margin net of priority, which is the docket margin minus the count of prioritized cases. Like the docket month, this range determines how far back we are allowed to look on the docket and still be considered to be pulling from the front of the line. Unlike the docket month, it is calculated on demand and is more precise.

def docket_margin_net_of_priority
  [docket_margin - legacy_priority_count, 0].max
end

Now we have everything we need to distribute cases from our single docket. The algorithm has four steps.

In the first step, we distribute any prioritized appeals that are tied to the judge requesting cases. As no other judge can decide them, we will distribute as many such appeals as we have ready, up to the batch size.

In the second step, we distribute any non-prioritized appeals that are tied to the judge. We will again distribute an unlimited number of such appeals, but only searching within the docket margin net of priority, i.e. the cases that are considered to be at the front of the line.

Note that, in terms of the ordering of the steps, we are considering first whether the appeal is tied to a specific judge before considering whether the appeal is prioritized. This is because at the micro level, it’s more important for any given judge to be working the cases that only they can work. At the macro level, between the Board’s more than 130 judges, there will always be plenty of judges available to work AOD cases quickly. Note that even in extreme circumstances, such as if every appeal was tied to a judge, the algorithm would be self-healing, because the docket margin net of priority shrinks the more prioritized cases are waiting, thus reducing the number of cases distributed in step two.

In the third step, we check the priority target. It’s possible we already hit or even exceeded the target in step one, in which case we skip this step. But if we still need more prioritized cases, we’ll distribute gen pop prioritized cases until we reach the target. In order to ensure that prioritized cases continuously cycle, they are not sorted by docket date, but rather by how long they’ve been waiting for a decision, or in programmer speak, a first-in, first-out (FIFO) queue.

At any point, if we have reached the limit of the batch size, we stop distributing cases. Our work is done.

But assuming we still need more cases, our fourth and final step is to look to non-prioritized, gen pop appeals. We distribute those in docket order, until the we’ve reached the batch size and the judge has the cases they need.

Here’s what it looks like in code:

def legacy_distribution
  rem = batch_size

  priority_hearing_appeals =
    docket.distribute_priority_appeals(self,
                                       genpop: "not_genpop",
                                       limit: rem)
  rem -= priority_hearing_appeals.count

  nonpriority_hearing_appeals =
    docket.distribute_nonpriority_appeals(self,
                                          genpop: "not_genpop",
                                          range: docket_margin_net_of_priority,
                                          limit: rem)
  rem -= nonpriority_hearing_appeals.count

  if priority_hearing_appeals.count < priority_target
    priority_rem =
      [priority_target - priority_hearing_appeals.count, rem].min

    priority_nonhearing_appeals =
      docket.distribute_priority_appeals(self,
                                         genpop: "only_genpop",
                                         limit: priority_rem)
    rem -= priority_nonhearing_appeals.count
  end

  nonpriority_appeals =
    docket.distribute_nonpriority_appeals(self,
                                          limit: rem)

  [
    *priority_hearing_appeals,
    *nonpriority_hearing_appeals,
    *priority_nonhearing_appeals,
    *nonpriority_appeals
  ]
end

Congratulations, that’s one full government algorithm. It was turned on in production in November 2018, but it would only stay on for three months…

The Appeals Modernization Act

On August 23, 2017, the Veterans Appeals Improvement and Modernization Act of 2017, a.k.a the Appeals Modernization Act (AMA), was enacted into law. The first significant reform of the VA appeals process in decades, VA was given 540 days to implement the new law. There’s no possible way this already overstuffed post can accommodate my opinions about AMA and how it was implemented, so let’s just narrowly focus on what this new law meant for the Automatic Case Distribution algorithm.

My team was working to replace VACOLS—a legacy system, built in PowerBuilder, that had been maintained for decades by a single person—with Caseflow. The passage of AMA was a clarifying moment for our team. It was quickly apparent that there was no path for VACOLS to be retrofitted for the new law, so the 540-day clock provided a deadline for Caseflow to be able to stand on its own two legs.

545 days later, VA was ready and the law went into effect. Nothing changed for Caseflow. Every single piece of new functionality had already gone live in production and been used to manage the real appeals of real Veterans via the Board Early Applicability of Appeals Modernization (BEAAM) program,[4] which invited a small group of 35 Veterans to conduct a trial run of AMA. The program not only helped to ensure a smooth rollout of the technology, but it also gave us valuable insights as we prepared updated regulations and procedures for the new law and designed informational material for Veterans and their representatives.

By the way, I think this is the Platonic ideal of government IT launches. You just throw a nice party for the team because everything has already shipped.

A man stands next to a banner reading 'grand opening' holding a pair of oversized scissors. The scissors are approximately 30 inches (75 centimeters) long. The man wears glasses and a leather jacket, and is of east Asian descent. Another man stands in the background, holding one end of the banner. They are in a windowless room with bare white walls, presumably a nondescript government office building.
Photo: And you buy jumbo scissors, February 19, 2019.

Everything had gone live, that is, except for the new version of the Automatic Case Distribution algorithm. The whole reason we implemented the single-docket algorithm described above, despite only needing it for three more months, was so that we could roll out as much as possible ahead of the applicability date of the law. But on AMA day, we had to flip a feature flag and swap in a completely different algorithm.

In order to test the new algorithm ahead of time, I built a discrete-event simulation (DES) to model what the Board would look like under AMA, and I used that to pressure test the algorithm under various conditions. I had done the same for the single-docket version before rolling it out too, although that was easier thanks to decades of historical data. For example, when I said above that docket margin net of priority was “more precise” than docket month, the evidence for that claim before we took the algorithm live was simulation results showing that it never had to look more than 3,000 cases deep on the docket, which was narrower than the range the humans were using at the time. I evaluated various algorithms using four measures: (1) docket efficiency, how deep the algorithm had to look on the docket to find cases; (2) distribution diversity, how balanced prioritized cases were between judges; (3) priority timeliness, how long it took to distribute a new prioritized case; and (4) priority pending, the maximum number of prioritized cases waiting at any given time.

The challenge was that for AMA, I was modeling a novel and highly complex procedural framework with only limited data as to what might happen (collecting a preliminary evidence base was another goal of the BEAAM program, which featured extensive interviews with Veterans and their representatives to explore how they would approach making choices under AMA). It was extremely important to test the algorithm under extreme scenarios, not just how VA hoped they would play out.

AMA says, “The Board shall maintain at least two separate dockets,”[5] one for appeals that requested a hearing, and one for appeals that didn’t. The Board chose to create not two, but three dockets. As required by the statute, one docket would be for appeals that requested a hearing. A second docket would be for appeals that didn’t request a hearing, but where the Veteran had added new evidence to their case that had not been reviewed by the agency that had originally made the contested decision (i.e. the Veterans Benefits Administration for disability benefit claims). A final docket, the “Direct Review” docket, would offer Veterans the guarantee of a faster decision in exchange for not requesting a hearing or adding new evidence. When a Veteran appealed a decision to the Board, they would have to choose which docket they wanted to use. And as the dockets were maintained separately, the Board could choose how to allocate resources between them, working some dockets faster than others, while still respecting docket order within any given docket.

Veterans who already had an appeal could keep it under the old “legacy” rules. As a result, there would now be four separate dockets for the algorithm to consider: the hearings docket, the new evidence docket, the Direct Review docket, and the legacy docket.

The Board’s policy goals

The Board articulated three policy goals to inform the design of the algorithm. As is often the case, the goals are sometimes vague and contradictory. That’s what makes this fun, I guess.

The first goal was not vague. Appeals on the Direct Review docket should be decided within one year, 365 days. I understood at the time that this was not realistic in the medium term,[6] but the Board was unwilling to acknowledge that fact. As of December 2024, appeals on the Direct Review docket take an average of 722 days to complete, down from a peak of 1,049 days in July 2024.[7] Absent any staffing reductions, it is possible that the Board will be able to reach a steady state where it consistently achieves its goal by 2026. From the perspective of algorithm design, I sought to give the Board the best shot at achieving its goal, while also ensuring the Board didn’t shoot itself in the foot if the goal turned out to be unachievable.

The second goal was that the dockets should be balanced “proportionately.” The definition of “proportionately” was left to me to interpret, but any definition was in contradiction with the other two goals. In the end, I excepted the Direct Review docket from any claim of proportionality and rearticulated this goal as “the other dockets should be balanced proportionately.”

The third goal was that the Board would prioritize completing legacy cases. The size of the legacy appeals backlog, then about 400,000 cases, was the primary reason I felt the first goal was not realistic. Under the legacy rules, Veterans have options to keep an appeal going almost indefinitely, so the Board will continue to work legacy appeals for literal decades. However, unless the Board clung too long to the 365-day goal for the Direct Review docket, I expected the Board could reach “functional zero” by 2025 or 2026, where only a small proportion of its resources were going toward the long tail of legacy appeals.

Keeping these goals in mind, let’s take a look at the AMA algorithm. Or, you know, maybe you could take a pleasant walk in the fresh air instead?

Hard mode: four dockets

As before, we start by calculating the docket margin and docket margin net of priority, only now looking at prioritized cases on any of the four dockets.

Setting aside the prioritized cases for a moment, we need to determine what proportion of non-prioritized cases should come from each of the dockets. Each docket has a weight, which is generally equal to the number of non-prioritized cases waiting.

The legacy docket’s weight is adjusted to account for cases that we know about, because the Veteran has filed a Notice of Disagreement, but which have yet to come to the Board and be docketed, because they are waiting on a Form 9. About 40% of Notices of Disagreement end up reaching the Form 9 stage, so we add a discounted count of pre-Form 9 cases to the number of docketed cases to give us the legacy docket’s weight.

class LegacyDocket
  # When counting the total number of appeals on the legacy docket for purposes of docket balancing, we
  # include NOD-stage appeals at a discount reflecting the likelihood that they will advance to a Form 9.
  NOD_ADJUSTMENT = 0.4

  def weight
    count(priority: false) + nod_count * NOD_ADJUSTMENT
  end

  # ...
end

We’ve now fulfilled the Board’s second goal, and calculated a set of proportions using the size of each docket. If that was our only goal, we could stop, but in response to the other two goals, we’ll make need to make two adjustments.

First, we need to fix the Direct Review docket such that cases are decided within 365 days. When a case is docketed on the Direct Review docket, Caseflow stamps it with a target decision date, 365 days after the day it was docketed. We record the target decision date for each case to enable the Board to later adjust the timeliness goal (should it prove to be infeasible), while continuing to honor the commitment that was made to Veterans when they chose the Direct Review docket. The goal is adjusted for future cases, but we continue working the cases we have within the time we promised.

From the target decision date, we can derive the distribution due date, the date that we want to start looking for a judge in order to get a decision out the door by the 365 day mark. This was initially 45 days before the target decision date, but we planned to adjust this number as we got real-world data.

We can count the number of cases where distribution has come due and divide by the docket margin net of priority to calculate the approximate proportion of non-prioritized cases that need to go to Direct Reviews in order to achieve the timeliness goal. Initially, no cases are due, and so this proportion would be zero. But the Board didn’t want to wait to start working Direct Review appeals, preferring to start working them early and notch the win of beating its timeliness goal (even if this wasn’t sustainable). As a result, I constructed a curve out for the Direct Review docket.

We start by estimating the number of Direct Reviews that we expect to be requested in a year. If we’re still within the first year of AMA, we extrapolate from the data we have. We can divide this number by the number of non-priority decisions the Board writes in a year to calculate the pacesetting Direct Review proportion, the percentage of non-priority decision capacity that would need to go to Direct Reviews in order to keep pace with what is arriving.

def pacesetting_direct_review_proportion
  return @pacesetting_direct_review_proportion if @pacesetting_direct_review_proportion

  receipts_per_year = dockets[:direct_review].nonpriority_receipts_per_year

  @pacesetting_direct_review_proportion = receipts_per_year.to_f / nonpriority_decisions_per_year
end

Our goal is to curve out to the pacesetting proportion over time. So we calculate the interpolated minimum Direct Review proportion, using the age of the oldest waiting Direct Review appeal relative to the current timeliness goal as an interpolation factor. We apply an adjustment to this proportion, initially 67%, to lower the number of cases that are being worked ahead of schedule.

def interpolated_minimum_direct_review_proportion
  return @interpolated_minimum_direct_review_proportion if @interpolated_minimum_direct_review_proportion

  interpolator = 1 - (dockets[:direct_review].time_until_due_of_oldest_appeal.to_f /
                      dockets[:direct_review].time_until_due_of_new_appeal)

  @interpolated_minimum_direct_review_proportion =
    (pacesetting_direct_review_proportion * interpolator * INTERPOLATED_DIRECT_REVIEW_PROPORTION_ADJUSTMENT)
      .clamp(0, MAXIMUM_DIRECT_REVIEW_PROPORTION)
end

This gives us a nice curve out, as shown below. The jolt upward occurs when we catch up with the distribution due date of the docketed appeals.

A chart showing the simulated percentage of each distribution to a judge that went to the Direct Review over time. Each distribution is plotted as a blue circle. At first, very few cases go to Direct Reviews, but this percentage curves upward until a little less than half of all cases go to Direct Reviews. Then the distributions jolt upward, showing that about 70% of all cases are going to Direct Reviews, with individual distributions ranging between 55% and 80% Direct Reviews.
Figure: A simulation where Direct Reviews take up about 70% of the Board's capacity.

We also apply a limit to the number of cases that can go to the Direct Review docket, initially set at 80%.

direct_review_proportion = (direct_review_due_count.to_f / docket_margin_net_of_priority)
  .clamp(interpolated_minimum_direct_review_proportion, MAXIMUM_DIRECT_REVIEW_PROPORTION)

The second adjustment that we apply to the raw proportions is that we set a floor of 10% of non-priority cases to come from the legacy docket, provided there are at least that many available. This adjustment ensures that the Board continues working the legacy appeals backlog, even as it dwindles to only a handful of cases.

A brief aside

An implicit assumption here is that the Board needed to be willing to admit when it was no longer able to meet the 365-day goal. The 80% limit gave me confidence that at least nothing would break. But if the percentage of Direct Reviews was continuously pegged to 80%, there would be scarce capacity to work anything other than Direct Reviews, and in particular, to work down the legacy appeals backlog.

To this day, VA.gov reads, “If you choose Direct Review, […] the Board’s goal is to send you a decision within 365 days (1 year).” VA continued to make that promise to Veterans even as it was issuing decisions on Direct Reviews that were more than 1,000 days old.

Again, because it’s hidden from the public, I don’t know how the Board has updated the algorithm’s parameters, or even if they’re still using it. But it’s quite apparent from publicly available data that whatever algorithm is in use, the Board has not been able to keep pace with the 365-day goal it continues to claim. Fortunately, it looks like things started to turn around in 2024 as the legacy backlog began to dry up, and it’s possible the Board will be able to meet its stated goal within the next couple years. If its staffing levels aren’t cut.

Hearings make this even harder

Okay, one last complication. Under AMA, cases on the hearings docket get distributed to judges as soon as they are ready, generally after the hearing has occurred, been transcribed, and the evidentiary period (the time after the hearing in which the Veteran can add new evidence to their case) has expired or been waived. The Board’s term for this is “one-touch hearings,” which is in contrast to legacy hearings, which could take place months or even years before the case was decided. As a result, the number of cases that get worked on the docket is not decided when cases are distributed, but rather when we determine how many AMA hearings to hold.

Fortunately, Caseflow is also responsible for scheduling hearings. Every quarter, Caseflow Hearing Schedule asks Caseflow Queue (where the algorithm lives) to tell it the number of AMA hearings it should schedule. Caseflow calculates the docket proportions, as above, and multiplies the hearings docket proportion by the number of non-prioritized decisions it plans to issue that quarter.

def target_number_of_ama_hearings(time_period)
  decisions_in_days = time_period.to_f / 1.year.to_f * nonpriority_decisions_per_year
  (decisions_in_days * docket_proportions[:hearing]).round
end

One complication of this number is that a Veteran could withdraw their hearing request. Due to a legal technicality, however, their appeal would remain on the hearings docket. This means that Caseflow Hearing Schedule needs to look to whether the hearing request had been withdrawn. If so, it marks the case as ready for a decision; if not, it schedules a hearing.

Under AMA, cases with hearings are not required to be decided by the same judge who had held the hearing, as they were under the legacy rules. However, it remains better for everyone involved if the same judge decides the case, if possible, so the algorithm continues to mostly work the same, which I termed affinity. The only difference between affinity and the old rules is that if a judge retires or takes a leave of absence, Caseflow treats the cases they heard as gen pop, available to be assigned to anyone, instead of requiring another hearing.

At long last, let’s automatically distribute some cases

Okay, so now we know the proportion of the Board’s decisions that should be allocated to each docket. We’ve helped Caseflow Hearing Schedule hold the right number of hearings some months ago. Now a judge comes along and asks for a new batch of cases. Let’s help them out, shall we?

As before, we calculate the priority target and docket margin net of priority. The only difference is that we now need to look at the number of prioritized appeals on any of the four dockets. We’ll also calculate how deep we can look on the legacy docket specifically, the legacy docket range by multiplying the docket margin net of priority by the legacy docket proportion.

def legacy_docket_range
  (docket_margin_net_of_priority * docket_proportions[:legacy]).round
end

We can start distributing appeals by looking at cases that are either tied to the judge (legacy docket) or have affinity for the judge (hearings docket). And of course, prioritized cases go first.

# Distribute priority appeals that are tied to judges (not genpop).
distribute_appeals(:legacy, limit: @rem, priority: true, genpop: "not_genpop")
distribute_appeals(:hearing, limit: @rem, priority: true, genpop: "not_genpop")

# Distribute nonpriority appeals that are tied to judges.
# Legacy docket appeals that are tied to judges are only distributed when they are within the docket range.
distribute_appeals(:legacy, limit: @rem, priority: false, genpop: "not_genpop", range: legacy_docket_range)
distribute_appeals(:hearing, limit: @rem, priority: false, genpop: "not_genpop")

Next, we need to see if we’ve reached the priority target, and if not, distribute more prioritized cases. We’ll need to find prioritized cases irrespective of docket, looking at how long the case has been sitting rather than its docket order to ensure that prioritized cases keep moving. This is a two step process, where we ask the databases which dockets have the oldest appeals first and then distribute that many appeals second. As a result, there is a risk of a race condition where two judges could request cases at approximately the same time and the second judge would get a case that wasn’t strictly the oldest. Any case that is skipped over in this way would likely be picked up by the very next judge, so it wasn’t necessary to try to implement some form of locking (not trivial because legacy appeals are stored in VACOLS), but it was important to flag that edge case in our documentation so we weren’t hiding even a trivial deviation from our stated procedures.

# If we haven't yet met the priority target, distribute additional priority appeals.
priority_rem = (priority_target - @appeals.count(&:priority)).clamp(0, @rem)
oldest_priority_appeals_by_docket(priority_rem).each do |docket, n|
  distribute_appeals(docket, limit: n, priority: true)
end

Now we need to distribute the remaining non-prioritized cases. We’ll deduct the non-priority appeals that we’ve already distributed from the proportions, and then it’s time to figure out how many cases we distribute from each docket.

If we were to multiply the docket proportions by the number of cases we have left in the batch size, it’s quite unlikely we would get an whole number result, and it’s rather difficult to distribute a fraction of a case. To avoid this, we’ll use a form of stochastic rounding to ensure we distribute the requested number of cases and don’t leave any docket behind.

In our stochastic rounding method, we multiply the docket proportions by the number of cases remaining. We always allocate each docket any whole number of cases, but we set aside the remainders. Using these remainders as weights, we probabilistically allocate the remaining cases to dockets. While each individual batch can vary, over many distributions, the actual proportions will converge toward the target proportions, and even dockets that are wildly outnumbered will still get the right amount of attention.

# Extends a Hash where the values represent parts of a whole.
#
# {
#   a: 0.5,
#   b: 0.25,
#   c: 0.25
# }.extend(ProportionHash)

module ProportionHash
  def stochastic_allocation(num)
    result = transform_values { |proportion| (num * proportion).floor }
    rem = num - result.values.reduce(0, :+)

    return result if rem == 0

    cumulative_probabilities = inject({}) do |hash, (key, proportion)|
      probability = (num * proportion).modulo(1) / rem
      hash[key] = (hash.values.last || 0) + probability
      hash
    end

    rem.times do
      random = rand
      pick = cumulative_probabilities.find { |_, cumprob| cumprob > random }
      key = pick ? pick[0] : cumulative_probabilities.keys.last
      result[key] += 1
    end

    result
  end

  # ...
end

We run this step in a loop, in case any docket runs out of available cases (quite likely with the hearings docket, where most cases have affinity for a judge). At the end of each pass, we zero out the docket proportion of any docket that has been unable to supply us with all of the cases we requested, we normalize the proportions to sum to 100%, and then we re-run the stochastic allocation method. We repeat until we’ve either provided the judge with all the cases they need, or all of our dockets run out of cases.

until @rem == 0 || @remaining_docket_proportions.all_zero?
  @remaining_docket_proportions
    .normalize!
    .stochastic_allocation(@rem)
    .each do |docket, n|
      appeals = distribute_appeals(docket, limit: n, priority: false)
      @remaining_docket_proportions[docket] = 0 if appeals.count < n
    end
end

Monitoring and tweaking

Each time a judge requests a distribution, Caseflow records the state of the world as the algorithm saw it, enabling us to reconstruct why it made the decisions it made and to study the behavior of the algorithm in the real world. All of this data was reported on an easy-to-use dashboard so that Board staff could understand whether the algorithm was still correctly configured to meet their goals and to provide an “early warning” system for when things were clearly no longer sustainable.

I took my best stab at initial values for the different parameters of the algorithm. But these things were parameterized explicitly in anticipation that they would need to change. All told, the Board could control the batch size per attorney, the Direct Review docket time goal, the Direct Review distribution due date, the maximum Direct Review proportion, the interpolated Direct Review proportion adjustment, and the minimum legacy docket proportion. The Board also had the option of overriding the recommended number of AMA hearings to schedule.

Tweaking these parameters could adapt the algorithm to any situation that I could anticipate, and hopefully many that I couldn’t. It also ensured that the decisions made by the algorithm remained properly within the control of VA leadership and were not coopted by an enigmatic machine.

Fin

Figuring all that out was maybe 10–20% of my job for four months of my life. All of the code is pretty straightforward and boring. It’s learning the whys behind the existing process, working out of what needed to change, finding ways to test it to make sure nothing went horribly wrong, getting feedback and buy-in from stakeholders, documenting and explaining it to different audiences; that’s where the real challenge was. Anyway, that’s my experience of what it looks like to actually automate bureaucracy.

I did say there isn’t a reward for making it to the end. I hope you’re happy.


  1. VA Board of Veterans’ Appeals Annual Report, Fiscal Year 2024 ↩︎

  2. 38 CFR 20.800(b) ↩︎

  3. Unfortunately, the Board used a two-digit year for this date. The Board was created in 1930, so it has its own flavor of Y2K bug scheduled to hit in 2030. Generally speaking, there shouldn’t be new appeals using this docket numbering scheme after 2019, but it’s possible for a closed appeal to be reopened through a finding of clear and unmistakable error, which would result in a new docket number being assigned. The Board still has four years to decide how to fix that one, and I built error messaging to ensure no one forgets. ↩︎

  4. I designed and managed the program, but I’m not responsible for the name. Technically, we used the authority under section 4 of AMA (“Programs to test assumptions relied on in development of comprehensive plan for processing of legacy appeals and supporting new appeals system”), not the provisions of section 2(x)(3) (“Early applicability”), as the name inaccurately suggests. Yes, it still grates on me. ↩︎

  5. 38 USC §7107(a)(1) ↩︎

  6. In the first year of the new law, any cases you decide are definitionally less than 365 days old. In the long term, the legacy appeals backlog has been cleared, so you have more capacity for the AMA dockets, and you can keep pace so long as the Board is adequately staffed. But in the medium term, even my most optimistic scenarios showed the Board losing the battle to keep pace with its goal unless it worked the Direct Review docket to the exclusion of the others. ↩︎

  7. Board of Veterans’ Appeals, “More Board personnel address pending AMA appeals and wait times.” Retrieved June 9, 2025. ↩︎

Direct File is Open Source!

Alex Mark:

The tax code contains some pretty crazy edge cases that required a good amount of effort to build and test. Two of my favorite tax rules are the different ways to calculate whether the taxpayer was 65 or older during the tax year — people over 65 receive a greater standard deduction, but they also phase out of eligibility for EITC if they don’t have qualifying children. Confusingly, these rules use different definitions of what it means to be 65 during the tax year[…]

I wish that these edge cases were rare — but they’re all over the tax code (children who are not children, married couples filing jointly who don’t meet the Joint Return Test, people who can be claimed as a dependent who aren’t, etc.). I feel confident that we represented these cases and gave the taxpayer every possible shot to file an accurate, beneficial return.

When Alex joined USDS in 2023, I knew only one thing about him, which was that he volunteered in the VITA program. I immediately wanted to get him on Direct File. Here he gives a guided tour of some of Direct File’s (and the tax code’s) nooks and crannies, and I insisted he put in a photo of him dumping a bucket of ice water on his head.

The 2025 Direct File Report

Nina Olson and Gabriel Zucker:

…the team continued to evaluate Direct File, as it had all along, preparing a comprehensive year-two report on its work. We filed a Freedom of Information Act request for that evaluation, and this week received Direct File’s 2025 season-end filing report. It tells an inspiring story of a product genuinely transforming tax administration in America, a story made bittersweet only by the fact that such progress has been cut short.

In addition to their summary, the Center for Taxpayer Rights has posted the full report for all to read.

The Who Cares Era

Dan Sinker:

In the Who Cares Era, the most radical thing you can do is care.

In a moment where machines churn out mediocrity, make something yourself. Make it imperfect. Make it rough. Just make it.

At a time where the government’s uncaring boot is pressing down on all of our necks, the best way to fight back is to care. Care loudly. Tell others. Get going.

At last week’s Code for America Summit, Suzanne Chapman and Katie Aloisi spoke about how Direct File was designed to build trust. Katie spent a significant portion of her time getting into the weeds of the tax situation of a fictional multi-generational household.

The audience wasn’t bored. They were riveted. Suzanne and Katie’s care was self-evident and contagious.

Caring is more than saying the right things. Care is shown.

Saying Goodbye

When the Treasury Department ordered the Direct File team to stop work and disband, the toughest part was saying goodbye.

Direct File deserves an Ocean’s Eleven-style montage for assembling the team. The old partner who had to be convinced to come back for one more job (the best, and perpetually most reluctant, user experience lead in the business). The loose cannon I was wary of bringing onto the team due to a checkered past (a brilliant engineer with a crypto startup on his resume). The whiz kid with the precise set of skills needed to pull off the plan (an interaction designer by training, she and I had studied up on tax law together, and she took to it like a fish to water).

We came together, a band of misfits and weirdos, under the banner of a fantastical, improbable mission. It felt like we were unstoppable. And then we were stopped.

And as much as we had accomplished, we knew we were just getting started. So we did our best to document what we had learned. We prepared the project for hibernation, writing notes for a future team that might never come. We said goodbye.

It was August 2022.


It wasn’t Direct File’s first death. By my count it was the second, although there’s an argument for it being the third. Neither count includes all of the doomed attempts to tilt at this particular windmill over the two and a half decades before we took up the lance.

During the initial policy process[1] on Direct File in 2021, I had argued that we would need to make a go/no go decision by June 2022 in order to have the best shot at success in filing season 2024, the final filing season of the Biden administration. And so when the policy process was reconstituted in early 2022, we were told our goal would be to prepare options and recommendations for a June decision. But by May, it was clear that we were going to blow that deadline.

So Merici Vinton, my U.S. Digital Service (USDS) project co-lead, and I hatched a plan. We asked to use our small USDS team to build a “proof of concept.” This prototype[2] of Direct File would illustrate some of the key issues being discussed in the policy process and develop an evidence base to inform future strategy. It would build momentum ahead of a decision, enabling us to hit the ground running if and when that decision came. I said the words “preserving optionality” a lot.

It was also, and everyone knew it, an advocacy tool. As Merici put it, she wanted something so self-evidently good that it would “make it hurt” when decision makers declined to give Direct File the green light. And so a team of three designers and four engineers (if you count me despite splitting time between coding, memos, and meetings), built a proof of concept version of Direct File in eight weeks.

Seizing an opportunity to grow the expertise and capacity of the team was an under-appreciated benefit of this work. Even from this early stage, the product was designed alongside users, and these early insights from taxpayers helped shape our intuitions. The proof of concept served as a laboratory to try out ideas and as a springboard for future iterations. We made mistakes, and learned from them (whatever you do, don’t try to express the logic of the Internal Revenue Code as YAML). The team developed a shared vocabulary, critical for collaboration between disciplines, which the reluctant user experience lead termed our taxonomy. And then there were the inside jokes, and the fridge magnets, and the Pictionary games.

At the end of the eight weeks, we not only had a version of a Direct File, but we had a close-knit group of people who knew how to build a Direct File. Because they had done it.


We began to strategize about what to do next with our small but mighty team. But then the administration’s climate and tax bill, thought dead, was suddenly accelerating toward becoming law. The Inflation Reduction Act (IRA), as it was now titled, would give the IRS an additional $80 billion over ten years. What should have been very good news for Direct File unexpectedly spelled its demise.

The decision was made within Treasury, so I can’t speak to the deliberations that occurred. The stated rationale was that with the imminent passage of the IRA, the IRS was ready to take on Direct File without outside help, and that the effort would be best managed as part of the emerging IRA strategic process. Instead of handing off the work in progress, however, the project would inexplicably start from scratch, walking away from the progress that had been made. Treasury confirmed that they understood this decision would likely foreclose delivery of Direct File by 2024, which, writing from the future where we know the results of the 2024 election, meant that Direct File would never have shipped.

It also didn’t help that the IRA required the IRS to produce a report on Direct File. At this point, no one had a plan for that report (apart from MITRE, which assumed a lucrative contract would soon come their way). The optics of one part of government studying the question of Direct File while another part moved full steam ahead weren’t great. The eventual plan—delaying a decision on Direct File until May 2023 but producing another, more sophisticated prototype as part of the report work and thus preserving optionality (my favorite phrase)—was still a few months off. It turns out that tasking an agency to spend $15 million and nine months writing a report probably isn’t a great way to achieve a result, even if it worked in our case.[3]

Starting over from square one, the IRS was not set up for success. As just one example, one of my top anxieties during this time was how Direct File would handle state taxes. I had written a four-page memo outlining the problem and five potential solutions, the whiz-kid designer had storyboarded out an early version of the experience that would eventually ship, and we had assembled a spreadsheet categorizing the filing options available in every state in the nation. Treasury’s July 2022 memo to the IRS, informing the IRS of its interest in pursuing Direct File, boiled all of this down to the single sentence, “It will be critical to determine the extent to which a federal filing tool can offer taxpayers an integrated way to file their state returns.” Helpful.

More than any memo, though, and even more than any prototype, what the IRS was deprived of was people. Because my top anxiety was: who is going to build Direct File? The IRS didn’t have the capacity to build a product like Direct File, which is not to say that the agency was incapable of it, just that it would take time and leadership to recruit and develop a team with the right combination of skills.

There aren’t very many user experience designers who understand tax law and can effectively collaborate with the IRS Office of Chief Counsel. For that matter, it’s also pretty difficult to find tax lawyers who know how to collaborate with designers. But we absolutely could find talented designers and talented lawyers and give them the space to learn from each other.

These problems can be hand-waved away if the plan is to outsource the work (this was the IRS’s preferred approach). Of course, that doesn’t solve the problem; it just makes it someone else’s problem. It doesn’t take a crystal ball to predict the steaming pile of crap taxpayers would have received after shelling out nine figures (cf. FAFSA).

The work of growing a team ready for any challenge takes time and deliberate care. There are no shortcuts. What hurt us the most was not pausing the building of Direct File but abandoning the capacity to build we were cultivating.


Our team was disbanded. A majority went to the Department of Education and helped launch the ill-fated student loan forgiveness application.[4] The loose-cannon engineer was assigned to a project that made him miserable, but where he would meet a couple of similarly miserable designers with whom he would share the dream of Direct File. Only Merici and I remained, advising Treasury on IRA implementation, hoping we might someday light the beacons and call the dream team back.

Three months later, with interest in Direct File on the rise, Treasury would reverse course and bring our team back to help pull off Direct File from within the IRS. I would be in charge of the team designing and building Direct File, and Merici would serve as deputy to the IRS executive who would run the Direct File program. That’s a story for another time, however. We lit the beacons, and the team returned. And then the team grew.

There were many moments in the course of the following year that I wished we had those three months back. We were comically past the ideal timeline I had laid out back in 2021, and we also now had to write a report to Congress. But regardless of whether this hiatus was death number two or three, it would be the last time Direct File died. Until this year.


I’ve been continuously working in government, one way or another, since 2015. A few weeks ago, I resigned, as did the vast majority of the Direct File team. This time, it was even harder to say goodbye.

I do mourn the fact that Direct File won’t be available to taxpayers next year. But as I told the team as the end closed in, “We took a pipedream and made it a policy choice.” No one can claim with a straight face that Direct File is impossible anymore; bringing it back requires only that our elected leaders make a different choice.

What I mourn the most, though, is the dissolution of the team, the disregard for the vast impact they were poised and eager to deliver. The team itself is what I am proudest of from my time working on Direct File. Their manic dedication to the mission. The care they consistently took to get it right. The trust and love they had for each other.

One of the team’s many sayings was “the next version of Direct File is the best version of Direct File.” But just as Direct File constantly got better, so too did the team. We were growing our capacity to deliver for taxpayers each and every day, and we were helping to build new capacity across the IRS. Now that progress is interrupted, a reflection of progress interrupted across the federal government, of scientific research interrupted, of international aid interrupted, of all manner of interruptions.

Direct File is a policy choice, but the Direct File team was an orchard, just starting to produce a meaningful yield. Now it’s been razed.

We’ve documented what we learned. We’ve prepared the project for hibernation and written notes for a future team that might never come. We’ve open sourced it for the world to see.

We’ve said goodbye.


Except, well, a member of the team reviewed an early draft of this post, and she would like to refute this point, i.e. my title. According to her, the team has never said goodbye, only “ta ta for now.” We remain as delusional as ever. How else do you think Direct File happened?

It’s a peculiar sort of person who lives through the many deaths of Direct File and says, “Sign me up for one more, please.” But that’s the Direct File team for you. I dare you to try and stop them.

Maybe we’ll be able to relight the beacons someday. And maybe this will be the last time Direct File dies.

About 70 people who worked on Direct File pose on the Navy Steps of the Eisenhower Executive Office Building. They are dressed professionally but with an abundance of bright colors, including pinks, purples, and blues. The building's facade is granite, and it is designed in an ornate Baroque Revival style.
Photo: The IRS Direct File team, May 2, 2024.

  1. “Policy process” is a fancy way of saying a bunch of government officials talking about what to do. The process was facilitated via an interagency policy committee (IPC), a working group of people from different parts of government, each empowered to represent their “building’s” perspective. We teased out problems that needed to be solved and other considerations for a decision with the goal of presenting options and recommendations to “principals,” the highest-ranking political appointees. ↩︎

  2. Not to be confused with a separate prototype that was built in 2023 under the aegis of the Inflation Reduction Act-mandated Report to Congress. The later prototype was significantly more robust, and it built out the codebase that would eventually ship to taxpayers. ↩︎

  3. I like to say that the Direct File pilot cost just two reports to Congress, and we threw in a tax filing tool for free. Oddly enough, $15 million being a reasonable price tag for a report is something on which both parties agree. ↩︎

  4. Tilting at windmills is great exercise. ↩︎

Performance.gov: IRS Direct File

Speaking of Merici, I’ve been meaning to reupload this video from a July 2024 event, where she and I spoke about lessons learned from the Direct File pilot. The original was taken down in January 2025 due to the transition.

ESP Launches Future of Tax Filing Fellowship

Economic Security Project (ESP) announces a new fellowship focused on the future of tax filing, civic tech, and benefits equity. Launching in the wake of the Trump administration’s decision to sunset IRS Direct File—one of the most popular civic tech pilots in recent memory—this new initiative brings together technologists, policy experts, and equity-driven strategists to preserve hard-won progress and accelerate new ideas that make government work better for everyone.

In personal news, I’ve joined with our friends at the Economic Security Project to document and build upon the legacy of Direct File.

The Economic Security Project was one of the earliest and most effective advocates for Direct File, and throughout the program’s existence, they were a leader in getting the word out to the taxpayers who needed it the most. I’m grateful for this opportunity to continue collaborating with Gabriel, Jen, and Merici, reflecting on what we learned and what work remains unfinished.

Peeking Behind the Code

Andrew Leahey, for Forbes:

It isn’t every day that the IRS drops something that reads like a blend of legal code, logic theory, and open government idealism. But that is exactly what it did, laying Direct File bare—and releasing it into the public domain—for the world to scrutinize, fork, or admire in stunned silence.

At the heart of this code disclosure is something called the Fact Graph, a dry name for a tool that is quietly revolutionary: a logic engine that parses ambiguous tax scenarios and infers relationships while consistently applying tax rules.

The first line of code for Direct File, written February 8, 2022, was for an experiment to explore the concepts that would become the Fact Graph. Even from that first experiment, the work was motivated by the need to establish trust with taxpayers. Direct File can ask hundreds of questions about your tax situation, but most users will see only a small fraction of that. The Fact Graph provides us with a means of proving that none of the unasked questions would have changed the bottom line of your tax return and that you’re getting every tax benefit to which you’re entitled.

The IRS has been working for years to develop a relatively simpler model of taxes, its “Tax Calculator” effort, which would model the relationships between the lines of the various tax forms. But Direct File and its Fact Graph go further by also tackling the last mile problem of taxes, the often fuzzy relationship between what gets entered on tax forms and the assorted facts about your life that inform what you’re eligible to claim, a relationship that’s variously described via statutes, regulations, court decisions, instructions, publications, and other guidance. This wasn’t trivial, as our team would often be teeing up novel questions for the IRS Office of Chief Counsel to consider as we sought to bring that ambiguity into sharp focus.

There’s ample prior art of knowledge graphs for taxes, but for me personally, the lineage of my thinking traces to a conversation with Ben Klemens at Open Data Day 2016, discussing his python script for doing his own taxes.

This all matters in an age when public trust in institutions is frayed and when algorithms, or artificial intelligence (AI), increasingly mediates our rights and obligations. If the tax code is going to be enforced by machines, and the preparation calculated and handled by the same, the rules those machines follow should be legible and contestable. The open-sourcing of Direct File hints at a future where regulatory logic and automated decision-making isn’t hidden, but published like case law—reviewable, revisable, and maybe even improved by the very public it serves.

Bingo.

Direct File on GitHub

The IRS has now published the vast majority of Direct File’s code on GitHub as open-source software. As a work of the U.S. government, Direct File is in the public domain. And now everyone can check it out.

Releasing Direct File’s source code demonstrates that the IRS is fulfilling its obligations under the SHARE IT Act (three weeks ahead of schedule!). Now that Direct File has paved the way, I hope that more of the IRS’s code, paid for with taxpayer dollars, will soon be available to all of us.

Open sourcing Direct File has long been planned, and even longer desired. Explaining last May why open source is particularly important for Direct File, the team wrote:

The IRS could take further steps to build public trust and enable independent assessment of its work. The Direct File product team was given the mandate to develop software that ensures every taxpayer receives the full benefit of any tax provisions for which they are eligible. Releasing components of Direct File as open-source software would enable the team to demonstrate this commitment.

Establishing trust with taxpayers was core to our approach for designing and building Direct File. By creating the most accurate option for filing, by making taxes accessible to all, by keeping taxpayer data secure, and now, by publicly sharing Direct File’s code, the Direct File team showed our dedication to earning taxpayers’ trust.

Please note: As of two weeks ago, I no longer work at the IRS. I am writing solely in my personal capacity.

Hello world

I made a thing. Apologies for the walls being pretty bare, I only just moved in.

I have an RSS feed if you’d like to know when I post something. No promises.