ChatGPT Builds Chrome Extensions for Me: Here's How

How I Used ChatGPT to Navigate the World of Chrome Extension Development: A Personal Exploration

Mar 10, 2024

a computer generated image of a network and a laptop — Photo by Growtika on Unsplash

Over the past few months, I built and published a few Chrome extensions such as JiraGPT completely coded by ChatGPT. In this post, I’ll share how I did that.

Please keep in mind, that I'm not claiming it's an error-free process, nor can you build all types of extensions with ChatGPT, but I believe many simple to medium complexity extensions can be constructed using it, even the complex ones, of course, with extra effort on your end.

One more thing to note is that even though you can get ChatGPT to build you the same thing, even if you're primarily a backend engineer (like me), it helps a bit to achieve the perfect results. If you are non-technical, you can get a good output, but you probably won't, for example, be able to prompt ChatGPT about security features, getting or saving an API key, etc.

Why Extensions Are a Great Use-case for ChatGPT vs. Other Software Types?

Before we dive into my process, it’s crucial to discuss why extensions are a perfect use-case for ChatGPT. Unlike many software types, such as web and mobile apps, a Chrome extension in most cases is just a small number of files, so it's somewhat manageable and can fit into the LLM (ChatGPT) context window. Keep in mind, this is only possible if you are not using a JavaScript framework such as React; otherwise, it becomes a lot.

So Why Should You Bother Even If You Don’t Care About Building Chrome Extensions?

Well, if you don’t care about extensions, I assume you at least care about how to use ChatGPT and interact with LLMs in general. If you don’t, you should. XD.

The core idea of interacting with LLMs (ChatGPT) is having a good prompt, so we will discuss some tips that you can apply no matter what you want from the LLM. Prompt engineering can be seen as the technique of carefully crafting instructions to get an ML model to do precisely what you want. Mastering prompt engineering results in great outcomes when dealing with LLMs. This applies to both ChatGPT and GPT APIs if you want to build apps on top of them.

My process for building Chrome extensions with ChatGPT

1. High-level Understanding of How Chrome Extensions Work

Grasping the basics of how Chrome extensions operate is crucial, even if it initially seems unnecessary. This foundational knowledge helped me understand why ChatGPT could be an excellent tool for extension development. Typically, a Chrome extension is built using a few key files, each serving a specific purpose:

Manifest.json: Think of this as the extension's ID card. It tells Chrome important details about the extension, like its name, version, and which files it needs to function.
popup.html (and associated styling files like style.css or popup.css): This is the visual part you interact with when you click the extension icon in your browser. The HTML file is the structure, and the CSS file makes it look good.
popup.js: This JavaScript file brings life to the popup.html, handling user interactions and defining what happens when you click buttons or enter information.
content.js: Imagine this as a behind-the-scenes worker that interacts directly with the web pages you visit. It can read or modify the content of a webpage, enabling the extension to add features or information directly on the page.
background.js: This is the extension's thinker. Running separately from the webpage you're viewing, it manages broader tasks and logic that don't fit into the popup or content scripts, like listening for specific actions or managing complex logic.

Understanding the role of each of these files simplifies troubleshooting and enhances your ability to create or modify extensions. Additionally, knowing how to run or install an extension locally is incredibly useful for testing and development.

2. Write Clear and Very Specific Instructions

I always treat ChatGPT as if it’s a very smart intern but needs everything specified and planned for them to do their job. What does this mean? It means giving a clear description of what the extension should do, what the UI should look like, how the user is going to interact with it, and writing down every feature it should have.

Here is the prompt I used for building the JiraGPT Chrome extension:

You are a software engineer with expertise is building chrome extension. You are working on JiraGPT which is a chrome extension that does the following:

1. the extension adds an icon in Jira advanced search form (JQL) when user visit the advanced search page.

2. When the JiraGPT icon is on (by default), the form now can expect a natural language query instead of JQL and once the user hits another JiraGPT button, the user's natural query should be transformed into a valid JQL. obviously this is done by calling OpenAI GPT model which you should provide the code to as well.

You must provide everything and make sure to use Manifest V3. and also make sure before the user must add their openai api key in a pop-up.

First, i want you as a great software engineer to make sure you understand the requirements and ask questions if anything is missing before starting to develop the extension

3. Make Sure ChatGPT Understands the Requirements Before Starting Development

Just like a good software engineer would do, that is, they would make sure they understand the requirements of the task first and ask questions about it. You should instruct ChatGPT to do the same. You would be surprised by what it comes back with. It will ask questions that are going to make you understand the product better yourself, and it will help you become more specific and cover areas you hadn’t thought of.

You can see that I already asked ChatGPT at the end of the prompt, and here is its response. Notice:

Certainly, understanding the requirements is a critical first step in any software development process. Let me summarize the requirements as I understand them:

Requirements for JiraGPT Chrome Extension
Jira Advanced Search Page: The extension should recognize when the user is on Jira's advanced search page.

JiraGPT Icon: An icon should be added to the Jira advanced search form (presumably next to the JQL input field).

Natural Language Queries:

When the JiraGPT icon is "on" (which it is by default), the form should expect a natural language query instead of JQL.
A separate JiraGPT button should be provided to trigger the transformation of the natural language query into a valid JQL query.
GPT Integration: The transformation from natural language to JQL is to be done by calling an OpenAI GPT model.

API Key Configuration: Before using the extension, the user should be prompted to enter their OpenAI API key.

Manifest V3: The extension should be developed using Chrome's Manifest V3 specifications.

Questions
UI/UX: Are there any specific UI/UX guidelines or mockups for how the icon and button should look or where exactly they should be placed?

Error Handling: How should the extension handle scenarios where the natural language query cannot be accurately transformed into a JQL query?

Settings: Should the extension provide any settings or options for the user to customize its behavior? For example, turning off the "natural language" feature?

Permissions: Are there specific permissions that should be requested from the user? For example, reading the current page, etc.

API Rate Limit: How should the extension handle rate-limiting or other API-related issues?

User Authentication: Should the OpenAI API key be stored securely? If so, do you have any preferences for how this should be done?

Compatibility: Is compatibility with specific versions of Jira or Chrome required?

Offline Mode: Should the extension offer any offline functionality?

Analytics: Do you want to collect any analytics data from the extension?

User Feedback: Should there be a way for users to provide feedback on the extension?

Updates: How should the extension handle updates? Should it auto-update or notify the user of available updates?

Would you like to clarify or add anything to these points?

I then respond to those questions, and then, like a good engineer, it replies with a refined list of requirements so that we are both aligned.

4. Iterative Development: Feature by Feature

Once we set the feature list, I guide it to begin with one feature and not move to the next until it’s working well. This way, everything gets done properly without ending up in a big pile of errors. If it tries to rush, I remind it to take it slow, feature by feature, just like being agile. We make sure each part is working right before we tackle the next. kind of like being agile.

This means there’s a lot of back and forth. It’s more like I’m helping it stay on track. If something isn’t right or I want a change, we address it then and there. We don’t leave anything hanging. Focusing on one feature at a time helps keep things clear and straightforward.

Here is one of the back-and-forth iterations:

User (ME):
two issues upon testing:
1. the go button looks a bit ugly and too close to the icon
2. you didn't actually mock/return the JQL so when clicking the "Go" button, nothing happens


ChatGPT:
I apologize for the oversight. Let's address these issues:

UI Appearance: To improve the UI, we can add some CSS styling to the "Go" button and add some spacing between the icon and the button.

Mock JQL: I missed adding the mock JQL response in the background script. We'll update it to return a sample JQL query when clicked.

Updated content.js
For the UI improvement, I'm adding some inline styles to the "Go" button.

.....

5. Keeping ChatGPT in the Loop: Context Management

Through trial and error, I've learned something important: just like us, ChatGPT can get a bit lost if we go too far without a recap. Imagine bringing a new software engineer onto your team who doesn’t know what’s been built already. They might start from scratch instead of building on what exists. ChatGPT can do the same because it can only keep so much in its "mind" at once. When it loses track of what we've done, it might try to redo things or get a bit mixed up.

This happens because ChatGPT has to manage its memory. Think of it as having to forget some things to learn new stuff, like juggling balls where you have to drop one to catch another. When I notice it's starting to "hallucinate" or forget our progress, I hit pause. I remind it of what we've already done, kind of like giving that new engineer a project update. It helps ChatGPT remember where we are and what's next, making sure we build on our work instead of going in circles.

Here is an example prompt of how I usually do that. Note: this is very similar to the concepts used by autonomous AI agents where with every iteration you pass on the main goal, the list of tasks and each task status, and the next task to work on to the LLM/agent. the challenge here is only to pass relevant data and be mindful of the context window limit.

Remember, we are building a chrome extension that does X and we have a list of features as follows:
1. feature X. status: Done
2. feature Y. status: Done
3. feature Z. status: Not started


Here is our source code so far:

manifest.json:

    {
        "manifest_version": 3,
        "name": "SEO Tags Extractor",
        "version": "1.0",
        "permissions": ["activeTab"],
        "action": {
            "default_popup": "popup.html"
        },
        "background": {
            "service_worker": "js/background.js"
        }
    }
    
popup.html

    <!DOCTYPE html>
    <html>
    <head>
        <link rel="stylesheet" type="text/css" href="css/popup.css">
    </head>
    <body>
        <h1>SEO Tags</h1>
        <div id="tags"></div>
        <script src="js/popup.js"></script>
    </body>
    </html>
 

popup.js:

    chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
        let tagsDiv = document.getElementById('tags');
        for (let tag in message) {
            let p = document.createElement('p');
            p.textContent = tag + ': ' + message[tag];
            tagsDiv.appendChild(p);
        }
        console.log('tagsDiv: ', tagsDiv);
    });
    
content.js:

    let metaTags = document.getElementsByTagName('meta');
    console.log('metaTags: ', metaTags);
    let seoTags = {};

    for (let tag of metaTags) {
        if (tag.name.toLowerCase().startsWith('og:') || tag.name.toLowerCase() === 'description') {
            seoTags[tag.name] = tag.content;
        }
    }

    chrome.runtime.sendMessage(seoTags);
    
background.js:

    chrome.action.onClicked.addListener((tab) => {
        chrome.scripting.executeScript({
            target: { tabId: tab.id },
            files: ['js/content.js']
        });
    });
    
let's start working on feature z

6. Prompt Engineering is an Iterative Process

As I mentioned in a previous article, there are some basic rules to prompt engineering, but the golden rule is learning as you go. The key is trying things out, seeing what works and what doesn't, and not being afraid to tweak your approach based on the AI's feedback.

So, definitely give the strategies I've shared a shot, but also stay flexible. Pay attention to how the conversation with the AI (ChatGPT or any other LLM) shifts with each change you make to your prompt. This back-and-forth will guide you to better results over time.

Conclusion

Wrapping things up, remember what I said at the start: this process isn't flawless or rock-solid. It wasn’t a magic snap of the fingers that got me the solution. It required hours of tweaking and testing to achieve what I initially envisioned. But ultimately, I managed to create a significant Chrome extension in a fraction of the time it would normally take. It's an experience that truly showcases the potential and excitement of blending AI with software development, regardless of whether you're an AI enthusiast or skeptic.

For more insights and stories like this, where we explore the fascinating intersection of AI and software development, make sure to subscribe to our newsletter. Stay ahead of the curve with us, as we dive into more adventures that blend the boundaries of technology and creativity.

AI Driven Development

Discussion about this post