Are AI Companies Working on the Right Things?
I will preface this post by saying I know exactly zero about AI companies and what they are working on. But I wonder if they are working on the right thing.
First, a digression. Anyone who is more than a casual user of Microsoft Word understands that there are fundamental bugs in the core of the program that have existed since almost the very first version and have never been fixed in almost 30 years. Two that come immediately to mind are the difficulty in getting images to stay where you put them and the absolutely terrible structured outlining (eg section II-B-iv-2-a). The former is so bad you can find a zillion memes on it. The latter is so bad that Word Perfect still survives focused on lawyers who write a lot of documents with hierarchical bulleting.
Everyone knows these problems exist. Presumably they are fixable with some amount of effort. But they are not fixed. Instead, release after new release in Word trumpets new niche functionality without ever focusing on the core functionality. I can't remember ever using a feature of Word that was added since 2005, and maybe earlier, but yet adding those new features is what consumes all the development time.
My fear is that AI companies are doing the same thing. New features and capabilities of the major AI models are impressive. But at their core, at least for researching and writing, they still have the critical, fatal flaw of hallucinations. Almost every day we can watch some law firm get reprimanded by a judge for submitting briefs that include fake, made-up, hallucinated cases.
I don't care how capable and human sounding these ai models are, if they are inserting reputation-destroying hallucinations in a firm's output, or writing in an identifiable AI style, they are worse than useless. And companies that say "Oh, we don't use AI" are fooling themselves because even the best and brightest kids that they are hiring have become habituated to using AI to finish research and writing assignments. A young woman I know who manages case teams for one of the big strategic consultants (I won't give the name but think McKinsey, BCG, Baine, etc) says that a huge part of her job as engagement manager is to stop AI-generated slop with obvious errors and recognizable AI writing style from getting to the client. Her case team keeps handing her things that at best are obviously AI prose and at worst contain errors. Interestingly, she checks all this stuff not because she was assigned to do it, but because she grew up on the AI/non-AI temporal border and sees the risks. I have a bet online where I believe one of these firms is going to be caught up in a public scandal and lawsuit in 2026 for turning in ai-generated client presentations while billing that client 7 figures a month (imagine the explosion when a CEO finds out they were paying $1 million a month for the output of a few ChatGPT prompts).
The problem is actually bad enough that I briefly considered starting a new firm whose sole job was to independently review, fact-check, and edit all of a firm's output to help them identify hallucinations and AI tells. You could probably go hire 100 of the older generation of Washington Post layoffs right now who have actual reporting, editing, and fact checking experience (avoid the younger ones who grew up in the journalism as advocacy era). Go out and sell your services to law firms and consultants and such. Gotta be a business there. Right now I am too newly retired to pursue it but I will leave the idea to you guys. You're welcome.
Obviously, nothing about what I describe above sounds like the employment apocalypse everyone is expecting. You are simply not going to see the promised productivity gains until AI cleans up its house and in my mind that would include transparency about hallucinations -- what are the rates, what have they done to fix them in this version, are the rates going down, etc.
Not one but two topics near and dear to my heart.
AI is incredibly empowering, because it enables people with no skill or experience to produce reduce that look passable at first glance. In that sense, it is really increasing productivity 100-fold, easily.
It is also a great tool for very experienced hands to use to get something started quickly - people who know when to reject suggestions or reframe tasks because they've see the consequences of sloppiness before. I stress "to get something started", because as soon as we have a complex project with lots of interdependencies, the constraints on current AI (size of the context window mostly, TBH) render the poor things dysfunctional. They still have utility in making scoped changes to such a project, but at that point, the productivity of "I am just doing this myself" vs "I patiently coax an LLM to get to the correct solution" become minimal.
All of this works well in areas where the correctness and quality of outcomes don't matter. As your acquaintance is well aware, not all areas are like that, and containing an expertly bullshat handwavy solution back to we can ship this without reputational damage is quite a bit more costly than to just do it the right way from scratch. Doubtlessly, we're going to see that lesson play out very publically over the course of the next decade.
Now, Microsoft, Word and longstanding bugs. In order to understand the process that leads to such outcomes, one has to understand how the place is structured internally: you've got the engineers, who are in charge of the codebase and implementing what's handed down to them, and you've got product and program management, who are in charge of making sure that engineers don't waste time fixing bugs but instead focus on the grand vision and the amazing new features required to get everyone into the brightest of futures. The latter role was originally supposed to support engineering, but quickly morphed into guiding and constraining engineering activity to align with leadership. Hardly surprising, if you put extroverted driven theater majors in charge of introverted, detail-oriented engineers, but, well, here we are. You get such great arguments as "we've shipped it before, and sell massive amounts of product, so clearly it can't be such an issue", "there is no need to fix this bug because a workaround exists" and "if we fixed this now, we'd break all the beloved workflows of our existing users". All of these are delivered with a smile and an unironic byline of how incredibly 100% customer focused this approach is and leaves a whole lot of frustrated customers and angry engineers in its wake. To improve morale, engineers are then put on calls with real life customers so that the two get to talk to each other.