How does GPT-4o mini do with programming an app?

Author: Jenesh Napit

Engineering Manager | Senior Software Engineer

Date Modified: July 19, 2024

Date Published: July 19, 2024

The newest GPT-4o mini model just launched from OpenAI and I wanted to test it out. So in this article you'll find how the mini performs when it's asked to program a simple app.

Note: If you prefer to watch the video version of this check out the YouTube video: Making GPT-4o mini code a Quiz App for me. Also the link to the final code is on Github at the end of this article.

Intro to GPT-4o mini

GPT-4o mini is the newest model released by Open AI as of 7/18/2024. It stands at the middle between GPT-3.5 Turbo and GPT-4, as an advanced cost efficient intelligence.

OpenAI is committed to making intelligence as broadly accessible as possible. Today, we're announcing GPT-4o mini, our most cost-efficient small model. We expect GPT-4o mini will significantly expand the range of applications built with AI by making intelligence much more affordable. GPT-4o mini scores 82% on MMLU and currently outperforms GPT-41 on chat preferences in LMSYS leaderboard(opens in a new window). It is priced at 15 cents per million input tokens and 60 cents per million output tokens, an order of magnitude more affordable than previous frontier models and more than 60% cheaper than GPT-3.5 Turbo.

Below is a screenshot of the eval scores given my Open AI.

GPT-4o mini eval scores

So I wanted to see how well it performs in terms of programming. That is why I came up with the idea to aks it to code a complete working quiz app using HTML, CSS, and JavaScript. So let's get started and see how it did.

Start prompting for Quiz App

The very first thing I did was make sure I am using the new model.

Changing the model

After that now we can start prompting. My first prompt was actually choosing the first prompt it gave on the screen.

Design a fun coding game

Chat placeholder prompt

Then it responsed pretty fast and started to ask a few questions since the initial prompt was pretty general.

Chat Prompt response 1

So I went it JavaScript and then it proceeded to brainstorming the details of the game.

chat response 2

These are great questions its asking if I was creating a more robust game. But I wanted something quick and simple so this is what I responded with below.

prompt 2

After a few more back and forth I told it to give me 10 example questions and start writing the code and it game me 3 files: HTML, CSS and JavaScript which was exactly what I wanted.

chat response 3

Creating the files in VS Code

With the 3 code it gave me I then moved over to VS Code and added all 3 of the files.

3files vscode 1

Github Link here for final code

Testing out the code

I then started a the live server and proceeded to test out the app.

app test 1

Untitled 2

It was working and at the end it did say how many I got right and wrong. But I wanted to see if it can update the styling so that was my next prompt.

Prompting the model to update the CSS

Here is my prompt to update the styling for the answer choices.

chat response 4

After making the changes it mentioned this is what it started to look like.

app test 3

Looks decent and I'm happy so far.

Prompting a new feature, time to completion

So far it's doing great but now comes the fun part where I ask it to code a new feature in the app. The feature I want is at the end of the screen to show how long it took for the user to finish the quiz.

With the response it gave me I made the updates and this is what that looked like.

app test 4

Prompting a new feature, quiz summary

That one was a simple feature but I wanted to push it further with a more complex feature. So the next feature I asked it to make is a quiz summary. Basically at the end of the screen, show all the questions the user got wrong and then the right answer for that question.

At first try the code it gave me didn't work so I tried to test it's debugging capabilities.

Debugging the quiz summary bug

I asked it to debug and it gave me several things to look out for.

chat response 5

Luckily the first option seemed correct and I went ahead and made the changes. It worked but it didn't show all the questions so I asked it to debug some more. Second time was the charm and it worked!

app test 6

Asking mini to update summary styling

The feature worked but the styling was off so asked it to update the styling to see if it still remembers all the history from before. Which it did a good job and this was the final result.

app test 7

Closing thoughts

As you can see I tried to create a simple quiz app and with a few prompts we were able to do so. It wasn't very complex but it did involve 3 files and asking it to debug on its own and provide code changes which it handled very well.

I would like to try to continue with this and see how much more we can push it but for now I think this is a great model for everyday use case in terms of programming.

Definately give it a shot and if you want to see the full video of me doing this live you can check this YouTube video: Making GPT-4o mini code a Quiz App for me.

Thank you and subscribe to the channel for more videos in the future.

If you're interested in the code here is the Github link for it.

Enjoyed this article? Share it with your network!

Help spread the knowledge!