- Internet of Bugs Newsletter
- Posts
- Updates: Devin Disappointment, DeepSeek Detail & Defensive Duplication
Updates: Devin Disappointment, DeepSeek Detail & Defensive Duplication
Several Stories have popped up lately that are related to past videos, but don't warrant making a dedicated video to talk about. And there's some stuff from my DeepSeek video that I cut out of the script (not so much for time, as for flow).
First off, a follow up to my Devin video:
Two different groups (that I’ve seen) have published articles about their experience (and displeasure) with Devin, now that they’ve used it (and paid for it) for a month:
Read for yourself, but so far, few people seem impressed.
To be perfectly honest, I’m surprised by how poorly it seems to be doing, just as I was surprised when I dug into their Upwork Demo video that the code Devin was “debugging” was code it wrote itself. It seemed perfectly reasonable to me that an LLM ought to be able to debug actual code, but so far, I haven’t heard of one that does it very well.
Dive Into DeepSeek:
I love Dr Mike Pound’s videos, and this one was no exception. If you’re interested in what’s under DeepSeek’s hood, I can’t recommend this video highly enough. I ended up cutting a discussion of it from my DeepSeek video, because it just didn’t fit the narrative flow. I’m happy to have a place now to point people to resources. (In the past, I’ve put them in the video descriptions, but it doesn’t look like people really read those all that much.
Replicating DeepSeek:
Two groups have replicated parts of DeepSeek, and have published their results:
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works
Through RL, the 3B base LM develops self-verification and search abilities all on its own
You can experience the Ahah moment yourself for < $30
Code: github.com/Jiayi-Pan/Tiny…Here's what we learned 🧵 x.com/i/web/status/1…
— Jiayi Pan (@jiayi_pirate)
5:14 PM • Jan 24, 2025
This gives us (or at least me) a lot of confidence that, even if the cost numbers are greatly downplayed, that there are definitely real, large cost and time savings in the way DeepSeek was built.
And if you want to hear more about the GPUs that China has that they’re not supposed to be able to get, see this video from Jack over at Nobody Special Finance:
Replicating OpenAI’s Deep Research:
Slightly off topic, but DeepSeek isn’t the only thing that has been replicated recently. Some folks over at Hugging Face managed to make a working copy of OpenAI’s new, vaunted “Deep Research” in 24 hours:
Replication Red Line, Redux:
And, last but not least, there’s another breathless clickbait article about AI’s “escaping” into the wild.
In this case, the researchers specifically told the AI to see if it could get another copy of itself running, and it could, between 50% and 90% of the time.
This seems to panic the people that are in the market for comparing LLMs to SkyNet, but for those of us that have been around a while, that’s called a “worm” and it dates back to the Morris worm in 1988.
There are a whole bunch of things I worry about when it comes to AI safety, but “escaping into the Internet like Ultron in Avengers 2” is not in my top 100. It makes headlines, though.