AI at the Helm: Building an Entire Open Source Project With GPT-4

I've mostly been a LLM and GPT skeptic. Every so often I'd bang my head against ChatGPT, and it usually gave me junk. I'd wander off grumbling things jaded engineers grumble.

Then I payed for OpenAI's GPT-4 upgrade. GPT-4 actually seemed to work, so I decided to see how far I could push it. Could I write an entire open source project with GPT-4? Turns out, I could.

I had GPT-4 build me Twister, a Java library that converts Avro and Protobuf data to and from Java POJOs. Nearly every element--code, docs, commit messages, tests, its README.md, even this blog post--was written by GPT-4.

This post isn't about Twister, though. It's about how I learned to use ChatGPT effectively to write not just tests and documentation, but code.

Key Lessons from Building with GPT-4

Here's what I learned:

Start Small: Don't aim for complex features right away.
Code, Test, Repeat: Follow the same patterns you use to write code.
GPT-4 Forgets: Always re-paste the code you want to update.
Show Changes Only: Helps you get answers faster. GPT-4 can be slow.
Iterate: Don't write GPT-4 off at first mistake. Keep asking it to correct.
Great for Grunt Work: GPT-4 shines with tests, docs, and commit messages.
Best for 'Pure' Projects: Less business logic, like Twister, suits GPT-4.
Keep GPT-3.5 Handy: Good for simpler tasks. Saves on GPT-4 quota.

Start Small

Break down tasks. Ask GPT-4 to build simple code changes at first. With Twister, I would have it build support for basic primitives (int, float, string, etc.) first. Once I got that working, I'd ask it to add support for complex types (maps, lists, enums, etc.).

Code, Test, Repeat

The best pattern I found was:

Ask GPT-4 to write code.
Ask GPT-4 to write a test for the code it'd written.
Ask GPT-4 to fix the failed test(s) if they don't pass.

Flipping 1 and 2 (TDD) works, too

If you spot bugs, ask it to write tests exposing the bug. Then ask it to fix the test. The template I used for this loop was:

Given this code:
...

And this test:
...

I get this error:
...

Can you fix this?

GPT-4 Forgets

GPT-4 has a bad long-term memory. I found it did much better when I kept re-pasting my code into the prompt on ever iteration. There is a limit to the input size, so you have to get creative sometimes to fit relevant snippets in.

Show Changes Only

GPT-4 can be slow. To speed it up, request that it shows you only the code that's changed.

Iterate

Don't dismiss GPT-4 at first error, even if it says silly things. Ask for corrections.

Great for Grunt Work

GPT-4 excels at mundane tasks. I'm convinced everyone should use it for tests, doc strings, and commit messages.

Best for 'Pure' Projects

Twister is a pretty pure computer science project; there isn't any real "business logic". I think GPT-4 does better at this kind of work.

Keep GPT-3.5 Handy

Use GPT-3.5 for simpler tasks. Saves you on your GPT-4 quota.

Why Not Github Copilot?

Copilot employs GPT-3.5. Compared to GPT-4, it is noticeably less effective. Copilot X offers improvements, yet it's tied to VisualStudio and VSCode. However, IntelliJ IDE is way more pleasant for Java than VSCode. So I just used ChatGPT. In the long run, IDE integration will certainly improve. Yet, for now, GPT-4 offered the optimal solution for Twister.

Future work

The Twister library is a small project right now. I want to add:

Avro default support
Avro logical type support
Protobuf WKT support
Avro Record ➡️ Map wrapper
Protobuf Message ➡️ Map wrapper
.proto ➡️ Protobuf Descriptor converter
JDBC row ➡️ Map wrapper

I'll continue having GPT write the code in this library. The experiment continues!

Conclusion

Though this post is about building with GPT-4, I want to re-iterate that Twister is a real project that I actually want people to use. It's a pretty cool library. If you're a Java developer dealing with Protobuf or Avro, check it out! Contributions are welcome, too (whether from GPT-4 or humans).

Addendum

Here's the prompt I used to generate this blog post:

The blog post should focus on my experience using GPT-4 to write an entire library.
The post should also talk about GPT-4 tricks I learned while building this project:
Start small (don't ask GPT-4 to write all features in a class at once)
Ask GPT for a basic class, then ask it to write a test for the class. If the tests fail, tell GPT, and have it fix the tests. Then go back to the basic class and ask GPT-4 to add the next feature. Rinse and repeat.
Always re-paste the code you want GPT-4 to update. It has bad long-term memory. I frequently use a 2 part template: "Given this code: ... Can you update it to ..."
Tell it to just show changes, not the complete code. GPT-4 is slow, so telling it to skip unchanged code helps get you answers faster.
Don't be afraid to iterate. Many people get a response from GPT-4, and if it's wrong, they declare that it sucks. Instead, keep asking it to fix things.
GPT-4 is really good for grunt work (tests, docs, commit messages)
GPT-4 is also really good for more "pure" projects like Twister, where it doesn't have to understand a lot of business logic.
Keep a GPT-4 and a GPT-3.5 window open, so you can bounce to the GPT-3.5 window for more simple work. This will save you on your GPT-4 quota (currently 25 prompts per-3h window).
The blog post style should be:
Matter-of-fact.
No sentence should be more than 12 words.
Include links to external sites where appropriate.
Written in first-person.
The post should include a bullet-point list of the tips near the introduction.
The post should be written in markdown.
The post should include a catchy title that will get attention on Hacker news.
The post should include a section for each bullet point in the intro.
The intro should say that the the code, docs, git commit messages, tests, README.md, and even this blog post were all written with GPT-4.

I pasted in an rough outline with some notes before this prompt.

Changelog

Fix redirects on February 11, 2025
Migrate to markupdown (#1) on February 11, 2025