Kotlin creator's new language: a formal way to talk to LLMs instead of English

codespeak.dev - 223 poäng - 178 kommentarer - 21264 sekunder sedan

Kommentarer (74)

lifis - 16718 sekunder sedan
As far as I can tell it's not a new language, but rather an alternative workflow for LLM-based development along with a tool that implements it.
The idea, IIUC, seems to be that instead of directly telling an LLM agent how to change the code, you keep markdown "spec" files describing what the code does and then the "codespeak" tool runs a diff on the spec files and tells the agent to make those changes; then you check the code and commit both updated specs and code.
It has the advantage that the prompts are all saved along with the source rather than lost, and in a format that lets you also look at the whole current specification.
The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it (and also can't do LLM-driven changes that refer to the actual code), and also that in general it's not guaranteed that the spec actually reflects all important things about the program, so the code does also potentially contain "source" information (for example, maybe your want the background of a GUI to be white and it is so because the LLM happened to choose that, but it's not written in the spec).
The latter can maybe be mitigated by doing multiple generations and checking them all, but that multiplies LLM and verification costs.
Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.
oofbaroomf - 2817 sekunder sedan
Ugh, I just wish there was a deterministic and formal way to tell a computer what I want...
the_duke - 17810 sekunder sedan
This doesn't make too much sense to me.
* This isn't a language, it's some tooling to map specs to code and re-generate
* Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)
* Models are evolving rapidly, this months flavour of Codex/Sonnet/etc would very likely generate different code from last months
* Text specifications are always under-specified, lossy and tend to gloss over a huge amount of details that the code has to make concrete - this is fine in a small example, but in a larger code base?
* Every non-trivial codebase would be made up of of hundreds of specs that interact and influence each other - very hard (and context - heavy) to read all specs that impact functionality and keep it coherent
I do think there are opportunities in this space, but what I'd like to see is:
* write text specifications
* model transforms text into a *formal* specification
* then the formal spec is translated into code which can be verified against the spec
2 and three could be merged into one if there were practical/popular languages that also support verification, in the vain of ADA/Spark.
But you can also get there by generating tests from the formal specification that validate the implementation.
kleiba - 17899 sekunder sedan
I cannot read light on black. I don't know, maybe it's a condition, or simply just part of getting old. But my eyes physically hurt, and when I look up from reading a light-on-black screen, even when I looked at only for a short moment, my eyes need seconds to adjust again.
I know dark mode is really popular with the youngens but I regularly have to reach for reader mode for dark web pages, or else I simply cannot stand reading the contents.
Unfortunately, this site does not have an obvious way of reading it black-on-white, short of looking at the HTML source (CTRL+U), which - in fact - I sometimes do.
niam - 5353 sekunder sedan
The title writer might be doing the project a disservice by using the term "formal" to describe it, given that the project talks a lot about "specs". I mistook it to imply something about formal specification.
My quick understanding is that isn't really trying to utilize any formal specification but is instead trying to more-clearly map the relationship between, say, an individual human-language requirement you have of your application, and the code which implements that requirement.
le-mark - 16800 sekunder sedan
This concept is assuming a formalized language would make things easier somehow for an llm. That’s making some big assumptions about the neuro anatomy if llms. This [1] from the other day suggests surprising things about how llms are internally structured; specifically that encoding and decoding are distinct phases with other stuff in between. Suggesting language once trained isn’t that important.
[1] https://news.ycombinator.com/item?id=47322887
temp123789246 - 4736 sekunder sedan
One requirement for a programming language to be “good” is that doing this, with sufficient specificity to get all the behavior you want, will be more verbose than the code itself.
tonipotato - 17975 sekunder sedan
The problem with formal prompting languages is they assume the bottleneck is ambiguity in the prompt. In my experience building agents, the bottleneck is actually the model's context understanding. Same precise prompt, wildly different results depending on what else is in the context window. Formalizing the prompt doesn't help if the model builds the wrong internal representation of your codebase. That said curious to see where this goes.
seanmcdirmid - 11730 sekunder sedan
I've done something similar for queries. Comments:
* Yes, this is a language, no its not a programming language you are used to, but a restricted/embellished natural language that (might) make things easier to express to an LLM, and provides a framework for humans who want to write specifications to get the AI to write code.
* Models aren't deterministic, but they are persistent (never gonna give up!). If you generate tests from your specification as well as code, you can use differential testing to get some measure (although not perfect) of correctness. Never delete the code that was generated before, if you change the spec, have your model fix the existing code rather than generate new code.
* Specifications can actually be analyzed by models to determine if they are fully grounded or not. An ungrounded specification is going to not be a good experience, so ask the model if it thinks your specification is grounded.
* Use something like a build system if you have many specs in your code repository and you need to keep them in sync. Spec changes -> update the tests and code (for example).
sornaensis - 7034 sekunder sedan
This seems like a step backwards. Programming Languages for LLMs need a lot of built in guarantees and restrictions. Code should be dense. I don't really know what to make of this project. This looks like it would make everything way worse.
I've had good success getting LLMs to write complicated stuff in haskell, because at the end of the day I am less worried about a few errant LLM lines of code passing both the type checking and the test suite and causing damage.
It is both amazing and I guess also not surprising that most vibe coding is focused on python and javascript, where my experience has been that the models need so much oversight and handholding that it makes them a simple liability.
The ideal programming language is one where a program is nothing but a set of concise, extremely precise, yet composable specifications that the _compiler_ turns into efficient machine code. I don't think English is that programming language.
BrianFHearn - 7740 sekunder sedan
Interesting project, but I think it's solving the wrong bottleneck. The gap between what I want and what the model produces isn't primarily a language problem — it's a knowledge problem. You can write the most precise spec imaginable, but if the model doesn't have domain-specific knowledge about your product's edge cases, undocumented behaviors, or the tribal knowledge your team has accumulated, the output will be confidently wrong regardless of how formally you specified it.
I've been working on this from the other direction — instead of formalizing how you talk to the model, structure the knowledge the model has access to. When you actually measure what proportion of your domain knowledge frontier models can produce on their own (we call this the "esoteric knowledge ratio"), it's often only 40-55% for well-documented open source projects. For proprietary products it's even lower. No amount of spec formalism fixes that gap — you need to get the missing knowledge into context.
pshirshov - 12586 sekunder sedan
From what I was able to understand during the interview there, it's not actually a language, more like an orchestrator + pinning of individual generated chunks.
The demo I've briefly seen was very very far from being impressive.
Got rejected, perhaps for some excessive scepticism/overly sharp questions.
My scepticism remains - so far it looks like an orchestrator to me and does not add enough formalism to actually call it a language.
I think that the idea of more formal approach to assisted coding is viable (think: you define data structures and interfaces but don't write function bodies, they are generated, pinned and covered by tests automatically, LLMs can even write TLA+/formal proofs), but I'm kinda sceptical about this particular thing. I think it can be made viable but I have a strong feeling that it won't be hard to reproduce that - I was able to bake something similar in a day with Claude.
ucyo - 6916 sekunder sedan
Literally the first example on the main page declared as code.py would result in an indentation error :)
wuweiaxin - 7925 sekunder sedan
The pattern we keep converging on is to treat model calls like a budgeted distributed system, not like a magical API. The expensive failures usually come from retries, fan-out, and verbose context growth rather than from a single bad prompt. Once we started logging token use per task step and putting hard ceilings on planner depth, costs became much more predictable.
alexc05 - 16594 sekunder sedan
this is really exciting and dovetails really closely with the project I'm working on.
I'm writing a language spec for an LLM runner that has the ability to chain prompts and hooks into workflows.
https://github.com/AlexChesser/ail
I'm writing the tool as proof of the spec. Still very much a pre-alpha phase, but I do have a working POC in that I can specify a series of prompts in my YAML language and execute the chain of commands in a local agent.
One of the "key steps" that I plan on designing is specifically an invocation interceptor. My underlying theory is that we would take whatever random series of prose that our human minds come up with and pass it through a prompt refinement engine:
> Clean up the following prompt in order to convert the user's intent > into a structured prompt optimized for working with an LLM > Be sure to follow appropriate modern standards based on current > prompt engineering reasech. For example, limit the use of persona > assignment in order to reduce hallucinations. > If the user is asking for multiple actions, break the prompt > into appropriate steps (**etc...)
That interceptor would then forward the well structured intent-parsed prompt to the LLM. I could really see a step where we say "take the crap I just said and turn it into CodeSpeak"
What a fantastic tool. I'll definitely do a deep dive into this.
pcblues - 6214 sekunder sedan
A formal way for a senior to tell AI (clueless junior) to do a senior's job? Once again, who checks and fixes the output code?
Of course an expert would throw it out and design/write it properly so they know it works.
sutterd - 11000 sekunder sedan
I am trying a similar spec driven development idea in a project I am working on. One big difference is that my specifications are not formalized that much. Tney are in plain language and are read directly by the LLM to convert to code. That seems like the kind of thing the LLM is good at. One other feature of this is that it allows me to nudge the implmentation a little with text in the spec outside of the formal requirements. I view it two ways, as spec-to-code but also as a saved prompt. I haven't spent enough time with it to say how successfuly it is, yet.
h4ch1 - 18599 sekunder sedan
You can basically condense this entire "language" into a set of markdown rules and use it as a skill in your planning pipeline.
And whatever codespeak offers is like a weird VCS wrapper around this. I can already version and diff my skills, plans properly and following that my LLM generated features should be scoped properly and be worked on in their own branches. This imo will just give rise to a reason for people to make huge 8k-10k line changes in a commit.
hmokiguess - 8689 sekunder sedan
I'm gonna be honest here, I opened this website excited thinking this was a sort of new paradigm or programming language, and I ended up extremely confused at what this actually is and I still don't understand.
Is it a code generator tool from specs? Ugh. Why not push for the development of the protocol itself then?
etothet - 11591 sekunder sedan
Under "Prerequisites"[0] I see: "Get an Anthropic API key".
I presume this is temporary since the project is still in alpha, but I'm curious why this requires use of an API at all and what's special about it that it can't leverage injecting the prompt into a Claude Code or other LLM coding tool session.
[0]: https://codespeak.dev/blog/greenfield-project-tutorial-20260...
riantogo - 6511 sekunder sedan
When we understand that AI allows the spec to be in English (or any natural language), we might stop attempting to build "structured english" for spec.
paxys - 8765 sekunder sedan
I read through the thing and don't quite understand what this adds that the dozens of LLM coding wrappers don't already do.
You write a markdown spec.
The script takes it and feeds it to an LLM API.
The API generates code.
Okay? Where is this "next-generation programming language" they talk about?
roxolotl - 19278 sekunder sedan
This doesn't seem particularly formal. I still remain unconvinced reducing is really going to be valuable. Code obviously is as formal as it gets but as you trend away from that you quickly introduce problems that arise from lack of formality. I could see a world in which we're all just writing tests in the form of something like Gherkin though.
- 9046 sekunder sedan
b4rtaz__ - 12563 sekunder sedan
A few days ago I released https://github.com/b4rtaz/incrmd , which is similar to Codespeak. The main difference is that the specification is defined at the *project* level. I'm not sure if having the specification at the *file* level is a good choice, because the file structure does not necessarily align with the class structure, etc.
- 9347 sekunder sedan
uday_singlr - 12317 sekunder sedan
We tend to obsess over abstractions, frameworks, and standards, which is a good thing. But we already have BDD and TDD, and now, with english as the new high-level programming language, it is easier than ever to build. Focusing on other critical problem spaces like context/memory is more useful at this point. If the whole purpose of this is token compression, I don't see myself using it.
mft_ - 17427 sekunder sedan
Conceptually, this seems a good direction.
The other piece that has always struck me as a huge inefficiency with current usage of LLMs is the hoops they have to jump through to make sense of existing file formats - especially making sense of (or writing) complicated semi-proprietary formats like PDF, DOC(X), PPT(X), etc.
Long-term prediction: for text, we'll move away from these formats and towards alternatives that are designed to be optimal for LLMs to interact with. (This could look like variants of markdown or JSON, but could also be Base64 [0] or something we've not even imagined yet.)
[0] https://dnhkng.github.io/posts/rys/
ppqqrr - 14606 sekunder sedan
i’ve been doing this for a while, you create an extra file for every code file, sketch the code as you currently understand it (mostly function signatures and comments to fill in details), ask the LLM to help identify discrepancies. i call it “overcoding”.
i guess you can build a cli toolchain for it, but as a technique it’s a bit early to crystallize into a product imo, i fully expect overcoding to be a standard technique in a few years, it’s the only way i’ve been able to keep up with AI-coded files longer than 1500 lines
xvedejas - 17872 sekunder sedan
We already have a language for talking to LLMs: Polish
https://www.zmescience.com/science/news-science/polish-effec...
gritzko - 20460 sekunder sedan
So is it basically Markdown? The landing does not articulate, unfortunately, what the key contribution is.
montjoy - 14206 sekunder sedan
So, instead of making LLMs smarter let’s make everything abstract again? Because everyone wants to learn another tool? Or is this supposed to be something I tell Claude, “Hey make some code to make some code!” I’m struggling to see the benefit of this vs. just telling Claude to save its plan for re-use.
herrington_d - 13714 sekunder sedan
Isn't the case study.... too contrived and trivial? The largest code change is 800 lines so it can readily fit in a model's context.
However, there is no case for more complicated, multi-file changes or architecture stuff.
WillAdams - 14548 sekunder sedan
This raises a question --- how well do LLMs understand Loglan?
https://www.loglan.org/
Or Lojban?
https://mw.lojban.org/
good-idea - 10241 sekunder sedan
"Shrink your codebase 5-10x"
"[1] When computing LOC, we strip blank lines and break long lines into many"
giantg2 - 5165 sekunder sedan
This is basically what I talked about maybe a year ago. Glad to see someone is taking it on.
leksak - 13709 sekunder sedan
I think I prefer Tracey https://github.com/bearcove/tracey
frizlab - 11562 sekunder sedan
The next step will be to formalize all the instructions possible to give to a processor and use that language!
Cpoll - 16831 sekunder sedan
> The spec is the source of truth
This feels wrong, as the spec doesn't consistently generate the same output.
But upon reflection, "source of truth" already refers to knowledge and intent, not machine code.
koolala - 10597 sekunder sedan
Looks like JSON like YAML. It is still English. Was hoping for something like Lojban.
semessier - 9888 sekunder sedan
it's not a new question if the as-is programming languages are optimal for LLMs: a language for LLM use would have to strongly typed. But that's about it for obvious requirements.
ljlolel - 19437 sekunder sedan
Getting so close to the idea. We will only have Englishscripts and don’t need code anymore. No compiling. No vibe coding. No coding. Https://jperla.com/blog/claude-electron-not-claudevm
cesarvarela - 19308 sekunder sedan
Instead of using tabs, it would be much better to show the comparison side by side.
Also, the examples feel forced, as if you use external libraries, you don't have to write your own "Decode RFC 2047"
- 7047 sekunder sedan
amelius - 18158 sekunder sedan
I want to see an LLM combined with correctness preserving transforms.
So for example, if you refactor a program, make the LLM do anything but keep the logic of the program intact.
fallkp - 17224 sekunder sedan
"Coming soon: Turning Code into Specs"
There you have it: Code laundering as a service. I guess we have to avoid Kotlin, too.
CodeCompost - 14335 sekunder sedan
Yes I'm also one of those LLM skeptics but actually this looks interesting.
- 17392 sekunder sedan
mgax - 9473 sekunder sedan
Good code is the specification.
weezing - 8489 sekunder sedan
I'll stick to Polish
oytis - 17223 sekunder sedan
Then of course we are going to ask LLMs to generate specifications in this new language
yellow_lead - 14884 sekunder sedan
So, just a markdown file?
rcvassallo83 - 8658 sekunder sedan
Its early for April fools
nunobrito - 11322 sekunder sedan
Exactly as necessary as Kotlin itself.
haspok - 11170 sekunder sedan
I would just like to point out the fun fact that instead of the brave new MD speak, there is still a `codespeak.json` to configure the build system itself...
...which seems to suggest that the authors themselves don't dogfood their own software. Please tell me that Codespeak was written entirely with Codespeak!
Instead of that json, which is so last year, why not use an agent to create an MD file to setup another agent, that will compile another MD file and feed it to the third agent, that... It is turtles, I mean agents, all the way down!
Brajeshwar - 17214 sekunder sedan
So, back to a programming language, albeit “simplified.”
lich_king - 19318 sekunder sedan
We built LLMs so that you can express your ideas in English and no longer need to code.
Also, English is really too verbose and imprecise for coding, so we developed a programming language you can use instead.
Now, this gives me a business idea: are you tired of using CodeSpeak? Just explain your idea to our product in English and we'll generate CodeSpeak for you.
- 16943 sekunder sedan
oceanwaves - 17094 sekunder sedan
https://thinkwright.ai/simplex
iLoveOncall - 9140 sekunder sedan
The tweet I saw a few weeks ago about LLMs enabling building stupid ideas that would have never been built otherwise particularly resonates with this one.
ivanjermakov - 9827 sekunder sedan
Another great way to shrink your codebase 10x? Rewrite it in APL. If less code means less information, what are we gonna do when missing information was important?
jajuuka - 17742 sekunder sedan
We created programming languages to direct programs. Then created LLM's to use English to direct programs. Now we've create programming languages to direct LLM's. What is old is new again!
pjmlp - 19531 sekunder sedan
I think stuff like Langflow and n8n are more likely to be adopted, alongside with some more formal specifications.
phplovesong - 11809 sekunder sedan
This is pretty lame. I WANT to write code, something that has a formal definition and express my ideas in THAT, not some adhoc pseudo english an LLM then puts the cowboy hat on and does what the hotness of the week is.
Programming is in the end math, the model is defined and, when done correctly follows common laws.
booleandilemma - 12699 sekunder sedan
Alas, I thought I invented this.
https://news.ycombinator.com/item?id=47284030
tamimio - 15924 sekunder sedan
As someone who hates writing (and thus coding) this might be a good tool, but how’s is it different from doing the same in claude? And I only see python, what about other languages, are they also production grade?
petetnt - 7971 sekunder sedan
Buddy invented RobotFramework, great job.
kittikitti - 17218 sekunder sedan
The intent of the idea is there, and I agree that there should be more precise syntax instead of colloquial English. However, it's difficult to take CodeSpeak seriously as it looks AI generated and misses key background knowledge.
I'm hoping for a framework that expands upon Behavior Driven Development (BDD) or a similar project-management concept. Here's a promising example that is ripe for an Agentic AI implementation, https://behave.readthedocs.io/en/stable/philosophy/#the-gher...
whalesalad - 18086 sekunder sedan
https://en.wikipedia.org/wiki/Literate_programming
theoriginaldave - 18935 sekunder sedan
I for one can't wait to be a confident CodeSpeak programmer /sarc
Does this make it a 6th generation language?
aplomb1026 - 9930 sekunder sedan
[dead]
sriramgonella - 13259 sekunder sedan
[flagged]
neopointer - 2971 sekunder sedan
The next step is to use AI to edit the spec... /s
taintlord - 8206 sekunder sedan
[dead]