Blown Away
It’s been an amazing month for AI. What happened?
Between vacation, end-of-year projects, the coming holidays, and other hysteria, I haven’t come up with an article this month. So here’s a quick list of things that have amazed me recently.
Are we virtual yet?
I’m far from the first person to find NotebookLM amazing, and I certainly won’t be the last. I did a simple experiment: I pointed it at two of my recent posts, “Think Better” and “Henry Ford Does AI.” Both the summary and suggested questions NotebookLM provided were quite good: They went beyond simply commenting on the two pieces and got into the relationship between the two. But what blew me away was the podcast it generated: an eight-minute discussion between two synthetic people who sounded interested and engaged. (Here’s a description of some of the techniques Google puts to use to make it happen.) Was it 100% correct? No, but honestly, if a human summarized my articles, I’d probably find a few things to complain about.
Being Google, after the initial experience, the user interface was more than a bit clunky. When I wanted to go back to the podcast a few days later, I had to play “guess what to click” way too much. (Hint: Would you guess that you need to click on “Notebook Guide”? Why doesn’t the podcast player appear by default?) But that’s really a very minor problem.
Models using computers
Anthropic’s computer use API is now available in beta. Beta is right—there’s clearly a lot going on here that’s dangerous and easily abusable. But it’s also a lot of fun, and it points toward a new direction for AI development.
In essence (and I may have the essence wrong), computer use allows you to tell Claude how to use a computer: browsers, editors, shells, anything that can have a user interface on a screen (and possibly more). Anthropic provides a demo as a Docker container, so you can run it safely. Once the container is running, you can give Claude a problem to solve; it will figure out how to solve that problem, and use the container’s virtual Linux computer to do the work. For example, you could ask it to fill out a spreadsheet with data it collects from websites. Claude will do all the clicking, copying, and pasting for you.
Is this revolutionary? My first reaction was “Big deal, I can upload a file to GPT and use it to browse the web for me.” In principle that’s true, although ChatGPT doesn’t allow web browsing and file uploading in the same conversation. What’s really new? Think about the monstrous prompt you’d need to get GPT to read a spreadsheet, find out what data was missing, look for that data on the web, and generate a new updated spreadsheet. It wouldn’t be simple. With computer use, most of that complexity disappears.
Does it really disappear? We’ll find out as we get further in. We’re still at the stage where hallucinations and misbehavior are cute rather than critical. It’s easy for Claude to be misled into interpreting something on a random website as a prompt. It will be a field day for prompt injection attacks. And I can imagine plenty of improvements. Computer use currently works by taking screenshots and shipping them to Claude, which computes where to click. That seems incredibly awkward, especially given that many applications have accessibility affordances that might make the screenshotting unnecessary.
For now, relax and take a breath. Don’t use computer use for anything serious yet—it’s important to heed Anthropic’s many warnings. But you should play with it and think about what it means. An automated framework for testing web applications, Selenium++? A tool for negotiating with online vendors? We’re much closer to an agent-filled world where we ask a computer what to do and it does it for us.
Could this be the end of CRM?
Somewhat along the same lines: Sam Lessin posted on Twitter (I won’t call it X) about a very clever and useful hack. He exported many years of email, used GPT to extract key parts, and uploaded it to NotebookLM (yes, again), which allows him to ask questions about his conversations over the past decade. Who did I talk to? Why? What are the topics we talked about? That’s all useful information.
Sam argues that this is the end of structured customer relationship management (CRM) software. I won’t offer an opinion for investors or founders, but his process resonated with me immediately. I’ve worked with many authors and potential authors over the decades, and my email includes conversations with thousands of people. So when I want to ask a question like “I want to understand more about DDOS; who should I talk to?” my first step is to go to Gmail and start searching. Email is my CRM system; I’ve never used a commercial CRM product.
Unfortunately and ironically, Gmail’s ability to search is quite poor. Using it for contact management, though it can be made to work, isn’t pleasant. Can I just ask NotebookLM? Absolutely.
Email-based CRM might even be a good startup idea, though it’s hard to imagine succeeding long-term. There wouldn’t be much of a “moat” to protect a startup against larger companies—like Google itself. I can easily imagine Google building this kind of AI-enabled search directly into Gmail. They already have all the data.
That’s it for this month. That wasn’t so bad—maybe I should do this more often.