Radar Trends to Watch: January 2025
Developments in Security, Programming, AI, and More
Despite its 31 days, December is a short month. It’s hard for announcements and happenings other than office parties to get attention. Fighting this trend, OpenAI made a series of announcements: their “12 Days of OpenAI.” Not to be upstaged, Google responded with a flurry of announcements, including their Gemini 2.0 Flash Thinking model. Models appeared that could use streaming audio and video for both input and output. But perhaps the most important announcement was DeepSeek-V3, a very large mixture-of-experts model (671B parameters) that has performance on a par with the other top models—but cost roughly 1/10th as much to train.
AI
- DeepSeek-V3 is another LLM to watch. Its performance is on a par with Llama 3.1, GPT-4o, and Claude Sonnet. While training was not inexpensive, the cost of training was estimated to be roughly 10% of the bigger models.
- Not to be outdone by Google, OpenAI previewed its next models: o3 and o3-mini. These are both “reasoning models” that have been trained to solve logical problems. They may be released in late January; OpenAI is looking for safety and security researchers for testing.
- Not to be outdone by 12 Days of OpenAI, Google has released a new experimental model that has been trained to solve logical problems: Gemini 2.0 Flash Thinking. Unlike OpenAI’s GPT models that support reasoning, Flash Thinking shows its chain of thought explicitly.
- Jeremy Howard and his team have released ModernBERT, a major upgrade to the BERT model they released six years ago. It comes in two sizes: 139M and 395M parameters. It’s ideal for retrieval, classification, and entity extraction, and other components of a data pipeline.
- AWS’s Bedrock service has the ability to check the output of other models for hallucinations.
- To make sure they aren’t outdone by 12 Days of OpenAI, Google has announced Android XR, an operating system for extended reality headsets and glasses. Google doesn’t plan to build their own hardware; they’re partnering with Samsung, Qualcomm, and other manufacturers.
- Also not to be outdone by 12 Days of OpenAI, Anthropic has announced Clio, a privacy- preserving approach to finding out how people use their models. That information will be used to improve Anthropic’s understanding of safety issues and to build more helpful models.
- Not to be outdone by 12 Days of OpenAI, Google has announced Gemini 2.0 Flash, a multimodal model that supports streaming for both input and output. The announcement also showcased Astra, an AI agent for smartphones. Neither is generally available yet.
- OpenAI has released canvas, a new feature that combines programming with writing. Changes to the canvas (code or text) immediately become part of the context. Python code is executed in the browser using Pyodide (Wasm), rather than in a container (as with Code Interpreter).
- Stripe has announced an agent toolkit that lets you build payments into agentic workflows. Stripe recommends using the toolkit in test mode until the application has been thoroughly validated.
- Simon Willison shows how to run a GPT-4 class model (Llama 3.3 70B) on a reasonably well-equipped laptop (64GB MacBook Pro M2).
- As part of their 12 Days of OpenAI series, OpenAI finally released their video generation model, Sora. It’s free to ChatGPT Plus subscribers, though limited to 50 five-second video clips per month; a ChatGPT Pro account relaxes many of the limitations.
- Researchers have shown that advanced AI models, including Claude 3 Opus and OpenAI o1, are capable of “scheming”: working against the interests of their users to achieve their goals. Scheming includes subverting oversight mechanisms, intentionally delivering subpar results, and even taking steps to prevent shutdown or replacement. Hello, HAL?
- Roaming RAG is a new technique for retrieval augmented generation that finds relevant content by searching through headings to navigate documents—like a human might. It requires well-structured documents. A surprisingly simple idea, really.
- Google has announced PaliGemma 2, a new version of its Gemma models that incorporates vision.
- GPT-4-o1-preview is no more; the preview is now the real thing, OpenAI o1. In addition to advanced reasoning skills, the production release claims to be faster and to deliver more consistent results.
- A group of AI agents in Minecraft behaved surprisingly like humans—even developing jobs and religions. Is this a way to model how human groups collaborate?
- One thing the AI industry needs desperately (aside from more power) is better benchmarks. Current benchmarks are closed, easily gamed (that’s what AI does), and unreproducible, and they may not test anything meaningful. Better Bench is a framework for assessing benchmark quality.
- Palmyra Creative, a new language model from Writer, promises the ability to develop “style” so that all AI-generated output won’t sound boringly the same.
- During training AI picks up biases from human data. When humans interact with the AI, there’s a feedback loop that amplifies those biases.
Programming
- Unicon may never become one of the top 20 (or top 100) programming languages, but it’s a descendant of Icon, which was always my favorite language for string processing.
- What do CAPTCHAs mean when LLM-equipped bots can successfully complete tasks set for humans?
- egui, together with eframe, is a GUI library and framework for Rust. It’s portable and runs natively (on macOS, Windows, Linux, and Android), on the web (using Wasm), and in many game engines.
- For the archivist in us: The Manx project isn’t about an island in the Irish Sea or about cats. It’s a catalog of manuals for old computers.
- Cerbrec is a graphical Python framework for deep learning. It’s aimed at Python programmers who don’t have sufficient expertise to build applications with PyTorch or other AI libraries.
- GitHub has announced free access to GitHub Copilot for all current and new users. Free access gives you 2,000 code completions and 50 chat messages per month. They’ve also added the ability to use Claude 3.5 Sonnet in addition to GPT-4o.
- Devin, the AI assisted coding tool that claims to support software development from beginning to end, including design and debugging, has reached general availability.
- JSON5, also known as “JSON for humans,” is a variant of JSON that has been designed for human readability so that it can be written and maintained by hand—for example, in configuration files.
- AWS has announced two significant new services: Aurora DSQL, which is a distributed SQL database, and S3 Tables, which supports data lakehouses through Apache Iceberg.
- Autoflow is an open source tool for creating a knowledge graph. It’s based on TiDB (a vector database), LlamaIndex, and DSPy.
Security
- Portspoof is a security tool that causes all 65,535 TCP ports to appear open for valid services. It emulates a valid service on every port. It makes it difficult for an attacker to determine which ports are actually open without probing each port.
- Let’s Encrypt, which issues the certificates that websites (and other applications) use to prove their identities, has announced short-lived certificates that expire after six days. Short-lived certificates increase security by minimizing exposure if a private key is compromised.
- Because of the continued presence of attackers within telecommunications networks, the US FBI and CISA have recommended the use of encrypted communications protocols. (Though they still want backdoors into encryption systems, which would make them vulnerable to attack.)
- A new phishing attack uses corrupted Word documents to bypass security checks. While the documents are corrupt, Word is able to recover them.
- LLM Flowbreaking is a new class of attack against language models that prevent guardrails from stopping objectionable output from reaching the user. These attacks take advantage of race conditions in the application’s interaction with users.
- Bootkitty is a UEFI rootkit that targets secure boot on Ubuntu systems. It appears to have been developed by cybersecurity students in Korea, then leaked (possibly accidentally). It hasn’t yet been found in the wild, but when it is, it will be a dangerous threat.
- DEF CON has started a project to improve cybersecurity for water infrastructure in the US. They’re starting with six water companies serving rural communities.
Quantum Computing
- Google has built a quantum computing chip in which an error-corrected logical qubit can remain stable for an hour. It passes the “below threshold”: the error rate decreases as physical qubits are added for error correction. The chip was built in Google’s new fabrication facility.
Web
- Google is adding “store reviews” to Chrome. Reviews are AI-generated summaries of reports from well-known sources that report scams and other issues.
- Here’s a how-to on building streaming text user interfaces on the web. Streaming text is almost a necessity for building AI-driven chatbots.
Biology
- Yes, we can have virtual taste. A research group has developed a lollipop interface so that people can experience taste in virtual worlds.