Common Sense AI for CPAs: Testing Tools, Understanding Limits and Applying Judgment
Tax season is here. From one CPA to another, these are the AI tools you should be using.
If you’re reading this right as it’s posted, you’re likely in the thick of tax season—and chances are, you’re already relying on artificial intelligence tools to help you get through it.
Over the past several weeks, I’ve tested a range of tools that you could be using. This includes the most popular tax-specific platform (TaxGPT), an internal AI system used by a national tax preparation service, and general AI models such as ChatGPT and Claude.
They all perform remarkably well—but not equally, and not without caveats. I’m going to tell you everything I learned that you should know.
To properly challenge each tool, I approached the testing in two ways. And if you’ve learned anything from the newsletters at TD Publishing, you know asking the right questions makes all the difference.
Testing Phase 1: Asking AI Basic Questions
To begin, I asked straightforward factual questions. For example, how to determine whether a client can claim both the Earned Income Credit and the Child Tax Credit, which rules apply, and which forms are required.
On basic factual questions, every system performed well—and even went a step further, offering to draft internal memos or client letters explaining the position taken. For that service alone, AI has become indispensable.
And I think we can all agree, editing a draft memo is dramatically more efficient than writing one from scratch.
Testing Phase 2: Asking AI Ambiguous Questions
Next, I tested more complex scenarios involving multiple facts and intentional ambiguity. Instead of asking whether an expense is deductible, I asked whether it could be deductible. All the systems analyzed the problems correctly and explained why the desired outcome could not be achieved—at least initially. This is where meaningful differences emerged.
The free or general-purpose AI systems tended to be more “helpful.”
TaxGPT adhered strictly to the law as written.
Claude and Perplexity hesitated, suggested that slightly different facts might produce a different result, and—when pressed—acknowledged that the desired outcome might be possible, while still recommending consultation with a tax professional.
ChatGPT, by contrast, produced the longest and most conversational responses. It was more confident, more willing to explore gray areas, and—when the facts were nudged just enough—more inclined to lean toward the answer it perceived I wanted.
So, how do these tools compare?
Unlike Claude or Perplexity, ChatGPT did not recommend seeking professional advice and presented itself with considerable authority.
Based on these interactions, ChatGPT was the most pleasant system to work with—but also the most positively biased. Even with relatively straightforward compliance questions, it leaned toward favorable interpretations.
Claude and Perplexity came from the Sgt. Friday school of analysis: “Just the facts.” For anything beyond basic tax questions, that fact-focused restraint may actually be preferable.
And here’s why CPA oversight is essential…
One consistent issue across all the free systems was their willingness to generate facts. When I requested EINs for several companies, including small and private ones, all systems supplied numbers that appeared correct. When pressed, they cited relevant IRS regulations and Treasury guidance—but usually only after I specifically asked for citations.
The takeaway is simple: if you’re relying on a free or low-cost system for a borderline issue, ask for the citation—and then read it yourself. The answer is often right, but not always.
Compared with general AI systems, TaxGPT is noticeably more authoritative. It does not bend toward a desired outcome, no matter how the prompt is phrased. Its database is closed, composed entirely of tax law sources and continuously updated. When it cites a regulation, you can be confident both the citation and the interpretation are reliable. That confidence comes at a price: $2,000 annually, with discounts for multi-year contracts.
TaxGPT also offers an add-on service called Agent Andrew. For $30 per return, it reviews a completed tax return and flags potential issues—essentially replicating the role of a traditional tax manager signing off on staff work. For small and mid-sized firms, the time savings alone can justify the cost. And for those inclined to maximize value, the subscription runs for a full year from sign-up—meaning a March 15 enrollment could carry through to part of the following filing season.
The internal AI systems used by national firms also scored highly for reliability.
Staff have been instructed that output from these systems may be treated as authoritative. While those tools are not available to most CPAs, they provide a useful benchmark for evaluating what’s accessible to the rest of us.
Before we become overly reliant on any AI system, it’s important to remember where these tools excel—and where they struggle.
AI handles black-and-white issues extremely well. Gray areas are another matter. If tax law were limited to fifty shades of gray, compliance would be easy. In reality, it’s a spectrum of thousands of unresolved hues.
As one prominent attorney remarked about AI generated legal briefs, “AI knows words, but it doesn’t understand law.” The same is true of tax.
According to a recent Tax Practice News article, “Why AI Keeps Getting Tax Preparation Wrong,” AI systems get roughly 50 percent of extremely complex tax issues wrong.
That sounds alarming, but context matters. The operative phrase is extremely complex. If you’re dealing with problems at that level, you’re likely working with a team. Frankly, a 50 percent error rate among individuals confronting truly complex tax issues may not be much better than AI’s performance.
So, what’s the bottom line?
AI systems perform exceptionally well on clear, settled issues. They are less reliable in gray areas. Used properly—as a research aid, drafting assistant and efficiency tool—they are invaluable. For unfamiliar territory, especially state and multi-state issues, a tax-specific system like TaxGPT can easily justify its cost. Additionally, the ability to draft client letters, IRS responses and internal memoranda remains close to unimaginable in its time-saving potential.
No CPA should attempt to navigate tax season without AI support—but support is not the same as reliance.
If you haven’t yet, subscribe above to receive next month’s edition of “Common Sense AI For CPAs.” Feel free to reach out to me at jeffrey@tdfactfind.com with any questions, comments or suggestions for future topics of interest.
Meet the Author: Jeffrey Yudkoff, CPA
You Might Also Like:
Disclaimer: All images and videos in this article have been generated with AI [Google Gemini Nano Banana Pro; Adobe Firefly; Kling 2.6 Pro via Artlist] as well as CapCut for captioning.




