ChatGPT aces CPA exam after prior version flunked

The artificial intelligence known as ChatGPT, after failing the CPA exam and similar accounting certifications, can now take off the dunce cap.

ChatGPT-4 recently passed the exams with an average score of 85.1%, outsmarting its earlier version, ChatGPT-3.5, which flunked with a 53.1% average, according to a recently released study by researchers at four universities.

The newest version of the generative AI, created by Microsoft-backed OpenAI, nailed the other exams that stumped its precursor, including tests for the title as Certified Management Accountant, Certified Internal Auditor and enrolled agent, the researchers found.

“This suggests that computer programmers will likely be able to program apps that will allow computers to perform tasks that are currently performed by humans,” the researchers said. ChatGPT “likely will prove disruptive to the accounting and auditing industries.”

Accountants are not the only professionals who might be nudged out of a job by AI.

Large language models such as ChatGPT also imperil jobs for financial quantitative analysts, blockchain engineers, interpreters, mathematicians, journalists and workers in dozens of other lines of work, according to a study by researchers at the University of Pennsylvania and OpenAI.

The technology over time will streamline at least 10% of the tasks performed by 80% of workers, and half of the tasks done by 19% of workers, the researchers said.

Already, AI has brought big advances to accounting, the accounting researchers said. It has improved audit efficiency, internal and external audit quality, the accuracy of management forecasts, the timeliness of earnings announcements and the precision of earnings forecasts, they said, citing other studies.

Several companies have seized on the new technology. EY in March said it created a ChatGPT program to answer questions about payrolls. PwC has deployed a platform using natural language processing, machine learning and data analytics to improve its legal work, the company said in March. It plans to invest $1 billion over three years in ChatGPT and other AI capabilities.

Self-criticism

ChatGPT-4 and its rival programs, including Google’s Bard and Microsoft’s Bing Chat, acknowledge their flaws, including occasional factual mistakes and lapses into “hallucination,” or complete fabrication.

OpenAI said in March that the accuracy of ChatGPT-4 in business applications is less than 80%. Google says in a disclaimer that “Bard is experimental, and some of its responses may be inaccurate, so double-check information in Bard’s responses.”

Indeed, generative AI apps that mimic accountants are far from replacing their human counterparts, researchers in the accounting study predicted.

“I can think of few professions that have disappeared because of a new technology,” David Wood, an accounting professor at Brigham Young University, said in an email response to questions.

“People predicted the end of accountants because of the computer, then because of spreadsheets, ERP systems, blockchain, etc. — it didn’t happen,” he said. “More often, you see professions change and adapt because of technology.”

ChatGPT will alter what accountants do but, in the short term at least, not cause net job loss, he predicted.

AI and other automation will probably replace many of the human tasks focused on compliance, and free up accountants to focus on higher value work such as prediction and advisory, Wood said.

“If ChatGPT and other technologies can remove the ‘boring’ stuff, everyone is better off,” Wood said. “The job gets done more efficiently and effectively and the worker can then focus on the more enjoyable parts of their work.”

At the same time, “there will certainly be an uncomfortable disruptive phase as people adjust to what technology can do,” he said. “But I think after the adjustment people will be happy with the change.”

ChatGPT’s success in learning from its CPA exam failure and mastering basic knowledge highlights how it follows a human-like educational path, according to Hamid Vakilzadeh, an accounting professor at the University of Wisconsin-Whitewater.

“There are so many similarities between AI and humans,” he said in an email response to questions. “For example, understanding context is crucial.”

Also, “both humans and AI learn from mistakes,” he said. While humans benefit from feedback, computer programmers use techniques such as Reinforcement Learning from Human Feedback to improve generative AI.

Finally, “access to proper tools and resources such as a calculator is important,” and a partial explanation for ChatGPT’s improved exam performance, Vakilzadeh said. “The message for human candidates [taking an accounting credentialing exam] is don’t do all the calculations in your head!”