An Appllama Week in AI

Meta makes waves with Llama 2, while Bloomberg pumps itself with Apple LLM non-news.
Llama eating apple

This past week brought significant AI news; in fact, it’s feeling like the rate of change in the generative AI space is notching back up again. I’ll focus in two areas here: Llama 2 and Apple LLM rumors.

Meta AI Releases Llama 2

Meta’s Llama 2 release is a seminal event, given that (a) Meta’s initial LLaMA 1 release spawned AI March Madness, (b) Llama 2 is substantially more capable than the first LLaMA, and (c) this time Meta is allowing as opposed to blocking commercial use. That is, unless you’re Apple, Snap, TikTok, and a handful of others—more on this below.

Last time around, Meta announced LLaMA but made the full details—most importantly, model weights—available only to a limited set of researchers. But someone leaked those weights within days of the announcement, and that led to an explosion of innovation among independent researcher-hobbyists. This time, I suppose Meta learned the lesson and preempted the leak.

Another innovation explosion appears to be underway; already Llama 2 derived models are near the top of the HuggingFace Open LLM Leaderboard, and new Llama stories are still on the first page of Hacker News, such as this LLama 2 pure C implementation that will run on laptops, including Apple M1 series MacBooks.

Meta releasing Llama 2 fairly openly is a very good thing for open LLM research and progress, but don’t be fooled into thinking that Meta’s doing something altruistic here—NOT. Meta’s looking out for Mark and Meta, only—this is a commoditize your competition play. But still a very good thing.

Bloomberg’s Mark Gurman “Scoops” Apple GPT aka Ajax

Mark Gurman on Wednesday “broke” the story that Apple is quietly active in the LLM space and working on a GPT style chatbot. I use airquotes around “scoop” and “broke” because everyone with any historical Apple knowledge knows that Apple always has a team working secretly on everything that Apple should obviously be working on, obviously. And there’s nothing in the universe more obvious than this one.

John Gruber, in his commentary on the Gurman piece, educated me on the Bloomberg News / Bloomberg Terminal pump-scam, which I hadn’t been aware of:

Apple’s brief 2.7 percent jump and Microsoft’s smaller but still-significant drop, both at 12:04pm, were clearly caused by Gurman’s report. Bloomberg Terminal subscribers get such reports before anyone else. (Bloomberg employees, of course, know such information before it’s published, but I’m sure never do anything untoward with it.) Once you view Bloomberg’s original reporting through this prism — that most of their original reporting is delivered with the goal of moving the stock prices of the companies they’re reporting on, for the purpose of proving the value of a Bloomberg Terminal’s hefty subscription fee1 to day-trading gamblers — a lot of their seemingly inexplicable stylistic quirks don’t seem so inexplicable any more. They just seem a little gross.

Yeah, that is SlimyWorld all the way. Not surprising, though, the last 8-odd years have lifted the curtain on a lot of formerly hidden beliefs and behaviors; this fits right in, and is brought to us courtesy of the same fabulous crowd.

Anyway, Gurman’s article didn’t contain much actual news; just that the project’s codename might be Ajax and that some people are calling the chat interface Apple GPT.

Intersection: Apple LLM Chatter on Hacker News

Big news like Llama 2 get immediately posted on YCombinator’s Hacker News, and Llama 2 was near the top of the activity-ranked feed for several days, with 819 comments. Meta released Llama 2 under a license that’s much more commercial friendly than the original LLaMA; let’s call it commercial-friendly-unless-you’re-huge because it contains a clause that triggers a need to secure a license only if the organization using Llama 2 has more than 700 million monthly active users. This constraint impacts a very short list of companies, Apple among them.

That led to chatter on Hacker News around Apple’s place in the LLM / GPT space. Here’s a sampling:

(stu2b50) I think more Apple [that this constraint blocks]. It’s not like Google or Microsoft would want to use LLaMA when they have fully capable models themselves. I wouldn’t be surprised if Amazon does as well …Apple is the big laggard in terms of big tech and complex neural network models.

(samwillis) Apple would absolutely not want to use a competitors, or any other, public LLM. They want to own the whole stack, and will want to have their own secret source as part of it. It’s not like they don’t have the capital to invest in training…

(NotAFood) Apple has shown time and time again that they have the human capital and money to tackle massive projects discretely. It’s already fairly well known that Apple’s NLP experts from Siri have been reallocated to some secret project. They are more than capable of training an LLM but given their track record in other segments they probably want to wait for the technology to become more “polished” and give less hallucinated answers. They are likely also want the LLM to work locally (at least partially) on their devices using the Neural Engine which adds further engineering complexity to their project. They could even be timing the LLM’s launch around a hardware release capable of running the model (M3, M4, etc…).

(amelius) Apple only has to slightly open their wallet to become a DL superpower.

(yellow_postit) Apple is a complete laggard in this space due to years of restrictions on research. They are hiring multiple “AI” roles now and they have the capital and focus to “eventually” catch up — but it is very much a catch-up game … That said, they seem to prefer catchup waiting till others explore new tech they swoop in an (claim) to perfect it from a usability pov. I have no reason to suspect they won’t do the same here.

(whimsicalism) I work in the field and they just are not hiring the people they need to be hiring.

My Take on Apple in LLM/GPT Land

  • Of course Apple is working on something LLM.
    • This doesn’t mean the something will ever see the light of day.
    • We won’t hear about it until it’s polished and ready.
  • Despite massive support for, and applications of, machine learning throughout their software+hardware ecosystem, Apple was blindsided by the generative AI explosion and is playing catch-up.
    • The comments around Apple not hiring the people they need to be hiring are probably a fair assessment, a symptom of Apple’s blindsided-catching-up current state.
  • Apple is uniquely positioned to do some really good things in this space, once they get their shit together.
  • Apple will indeed get their LLM shit together to some reasonable degre—in Apple Time.
  • Apple Time moves far slower than AI Time. Apple might get pinched, except where they’re uniquely positioned.