Published: March 11, 2024 By

Ashutosh TrivediTax preparation software is critical for our society, but if the software is incorrect, people using it are responsible for accuracy-related penalties."
- Ashutosh Trivedi

When you file your taxes, you are responsible for any errors, even those created by the software you trust to compute your tax return. Since over 93 percent of individual tax returns were filed electronically in 2023, many in the United States are vulnerable to bugs in these systems.

This didn't sit right with Associate Professor of Computer Science Ashutosh Trivedi, and his work is getting positive attention from the IRS.

"Tax preparation software is critical for our society, but if the software is incorrect, people using it are responsible for accuracy-related penalties" he said. 

This led Trivedi and fellow researchers to seek a way to verify the correctness of tax software, but how to do it?

No 'oracles'

"Tax software has what is commonly called an oracle problem. It means that you can think of the input, but you don't know what the ideal output is for that situation," he said. 

Unlike a simple calculator where you can add two numbers and have a guaranteed output, the U.S. tax system is highly complex and expressed in potentially ambiguous natural language.

There are gaps and loopholes that leave more of the code up for interpretation than you might think.

The United States tax code, notes included, is also over 9,000 pages. As the code changes every year, software can unintentionally miss new requirements. 

The researchers' solution? Legal precedent. In United States law, similar cases should have similar outcomes. This can be replicated in a software engineering approach called "metamorphic testing."

It's the little things

"In metamorphic testing, you present two inputs to the system that differ from each other so that the correct output of the program for these inputs must be in a certain predictable relationship," Trivedi said. 

For instance, one may not know someone's exact tax return, but they can expect that another individual, who has the same taxable characteristics except that their spouse is blind, must receive a higher standard deduction.

Due to privacy concerns, there is no accessible dataset of taxpayer answers to forms, so it was necessary for Trivedi and his team to delve deep into the tax documentation and create fictional personas based on edge and corner cases. These variations could then be tested next to each other.

By creating personas with very similar taxable characteristics, you can test whether people who have similar inputs are getting similar results, or if something has gone wrong. 

What they found was that there were real bugs in open-source software, especially when returns were close to zero dollars, or when a taxpayer was disabled. They then created simple, easy-to-follow flow charts explaining where the errors occurred. This test can be applied to any tax preparation software. 

Law into logic

Trivedi sees the power of this approach as going beyond taxpayer software. What if interested parties could transform the natural language limitations of law into code upon which experiments could be conducted?

"For the first time you have software that, if proven correct, could stand in for the text that the government releases," he said. Trivedi said he believes code would be easier to interrogate for fairness and discrimination than natural language, potentially increasing the impartiality of law. 

The project has intrigued the IRS, who have invited the team behind the research to present at the IRS-TPC Joint Research Conference on Tax Administration in June.