Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference
www.gizmoweek.com - 116 poäng - 83 kommentarer - 531953 sekunder sedan
Kommentarer (19)
- temp7000 - 507199 sekunder sedanIs it me, or does the article sound like LLM output?
The pattern "It's not mere X — it's Y", occurs like 4 times in the text :v
- mfro - 503329 sekunder sedanStrangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not because of thermal throttling. I wish I could see some diagnostic data.
- codybontecou - 512654 sekunder sedanUnfortunately Apple appears to be blocking the use of these llms within apps on their app store. I've been trying to ship an app that contains local llms and have hit a brick wall with issue 2.5.2
- karimf - 512616 sekunder sedanRelated: Gemma 4 on iPhone (254 comments) - https://news.ycombinator.com/item?id=47652561
- conception - 506175 sekunder sedanI’m pretty excited about the edge gallery ios app with gemma 4 on it but it seems like they hobbled it, not giving access to intents and you have to write custom plugins for web search, etc. Does anyone have a favorite way to run these usefully? ChatMCP works pretty well but only supports models via api.
- Chrisszz - 506440 sekunder sedanI just installed Google Ai Edge Gallery on my iPhone 16 pro, here are the results of the first benchmark with GPU, Prefill Tokens=256, Decode Tokens=256, Number of runs: 3. Prefill Speed=231t/s, Decode Speed=16t/s, Time to First Token=1.16s, First init time=20s
- DoctorOetker - 504355 sekunder sedandoes anyone know of a decent but low memory or low parameter count multilingual model (as many languages as possible), that can faithfully produce the detailed IPA transcription given a word in a sentence in some language?
I want to test a hypothesis for "uploading" neural network knowledge to a user's brain, by a reaction-speed game.
- usmanshaikh06 - 511320 sekunder sedanESET is blocking this site saying:
Threat found This web page may contain dangerous content that can provide remote access to an infected device, leak sensitive data from the device or harm the targeted device. Threat: JS/Agent.RDW trojan
- bearjaws - 507110 sekunder sedanWould love to see a show down of performance on iPhone vs Googles Tensor G5, which in my experience the G5 is 2 full generations behind performance wise.
- the_inspector - 505986 sekunder sedanYou are referring to the edge models, right? E2B and E4B, not the bigger ones (26B, 31B)...
- mistic92 - 515533 sekunder sedanIt runs on Android too, with AI Core or even with llama.cpp
- grimmai143 - 505400 sekunder sedanDo you know of a way of running these models on Android? Also, what does the thermal throttling look like?
- pabs3 - 512694 sekunder sedan> edge AI deployment
Isn't the "edge" meant to be computing near the user, but not on their devices?
- logicallee - 511688 sekunder sedanFor those who would like an example of its output, I'm currently working through creating a small, free (cc0, public domain) encyclopedia (just a couple of thousand entries) of core concepts in Biology and Health Sciences, Physical Sciences, and Technology. Each entry is being entirely written by Gemma 4:e4b (the 10 GB model.) I believe that this may be slightly larger than the size of the model that runs locally on phones, so perhaps this model is slightly better, but the output is similar. Here is an example entry:
Seems pretty good to me!
- ValleZ - 511187 sekunder sedanThere are many apps to run local LLMs on both iOS & Android
- bossyTeacher - 514548 sekunder sedanIs the output coherent though? I am yet to see a local model working on consumer grade hardware being actually useful.
- andsoitis - 529533 sekunder sedanis there a comparison of it running on iPhone vs. Android phones?
- camillomiller - 513312 sekunder sedan[flagged]
- grimm7000 - 505852 sekunder sedan[dead]
Nördnytt! 🤓