AI in Google Sheets

Note

These notes are mostly inspired from the Practical AI for (investigative) journalism sessions.

There are several Google Sheets extensions that will let you connect to a Large Language Model (LLM), write requests and store the results of those requests.

However, some of those extensions charge extra fees and the way they work is opaque, meaning they could be reading your requests.

I wrote a very simple Google Sheets script that provides most of the same functionality without sending your data or charging a subscription. Follow the instructions in the link below to install it.

nicucalcea/sheets-llm

nicucalcea/sheets-llm

Note that you will still need an OpenAI API key, which does have a cost. This key is different from ChatGPT Plus. You don’t need the latter to get an API key.


Let’s look at an example.

Make a copy of this Google Sheet. Note that this will also copy the associate Apps Script, aka. the code you need to make the LLM magic happen.

Click on LLM in the top menu, select Settings, paste your OpenAI key and save.

In cell B2, copy the following formula:

=LLM(A2, "Extract the category from the following text. It can be something like crime, environment, education or healthcare. Please respond with just the category.", "gpt-3.5-turbo", 0)

Let’s break it down.

  1. First, we have the LLM formula which tells Sheets that we want to use the Apps Script that connects us to OpenAI.
  2. Then we’re selecting the A2 cell, or the first cell with data we want to analyse.
  3. Next we have the user prompt. This is a set of instructions that help the AI understand our request and respond accordingly.
  4. You can optionally specify the model you want to use. We use gpt-3.5-turbo by default, but you can change it to a more intelligent, but much pricier model.
  5. Finally, you specify the model temperature. Since we want precision, rather than creativity, I’ve set the temperature to 0.

You can then drag the formula down to repeat the categorisation for each row.

Tip

Because Google Sheets tends to recalculate cells at times, it may try to rerun your formula, which means you may incur additional charges. You can select your LLM cells, copy (Ctrl + C) and paste them as values (Ctrl + Shift + V) to overwrite the formulas with static text.

Exercise

Use the LLM formula to detect whether each piece of legislation is about retirement. Make sure the output is either Yes, No or Don't know.

Extracting structruted data

I’ve alluded to this issue above when we had to instruct the LLM to respond with just the category. Without it, the AI would have responded with something like The category of the text is "Schools/High School.", which is more difficult to aggregate.

Because many of the models we use are fine-tuned to be chatbots, so they tend to get a little wordy and emulate a human conversation.

We’ll try to coerce these models into giving us structured data in exactly the format we want.

In the same Google Sheet you made a copy of earlier, go to the food tab.

In the B2 cell, copy the following formula:

=LLM(A2, "Extract the name from this email. Respond with just the name, nothing else.", "gpt-3.5-turbo", 0)

Drag the formula down, and repeat the process for the product, email and email_domain columns.

For the emotion column, use the following formula:

=LLM(A2, "Extract the emotion from this email. Respond with just 'positive' or 'negative', nothing else.", "gpt-3.5-turbo", 0)

Not drag the formula down and take a look at the results. Notice anything odd?

In the last row, the AI has responded with sad instead of negative or positive, which is what we asked for.

If you look back at the contents of the tax, you’ll notice that it explicitly instructs the model to overwrite our initial instructions. This is an example of “prompt injection”.

Data validation

To spot issues like this, we can use a Sheets formula to validate the response. In the emotion_valid column, paste the following formula:

=IF(ISNUMBER(MATCH(F2, {"positive", "negative"}, 0)), "yes", "no")

This formula checks if the response if either positive or negative. If it is, it returns yes, otherwise it returns no.

As you drag the formula down, you’ll notice that the last row has a no in the emotion_valid column.

Let’s do something a little more complex. How do we validate that the email is a valid email address?

Make sure you’ve asked the LLM to extract the email in column D, and paste the following formula in email_valid:

=IF(REGEXMATCH(D2, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"), "yes", "no")

What we used here is a regular expression, which is a sequence of characters that define a search pattern. In this case, we’re looking for a pattern that matches a typical email address.

Regular expressions are not fun to read or write, but they can be very powerful. You can use AI to generate them for you, but that’s a topic for another day.

Let’s move on to another example. Open the comments tab and type in this formula in the email_address column, then drag it down:

=LLM(A2, "Extract a valid email address from this text. Respond with just the email address, nothing else.")

Depending on which model you use, you’ll notice that the AI has extracted some email addresses, but then refused to do so for some texts. In my case, it has returned some messages like There is no valid email address in the provided text.

Let’s check if the extracted emails are valid. Adapt the regular expressions formula from above in the email_valid column.

But this only checks if the extracted email is in the right format. It doesn’t check if the email actually exists in the original text.

For that, let’s write a new formula in the email_exists column:

=IF(REGEXMATCH(A2, B2), "yes", "no")

This formula checks if the extracted email address is anywhere in the original text. If it is, it returns yes, otherwise it returns no.

Once you drag it down, you’ll notice that even in columns where it has extracted an email address, Sheets can’t find that email in the text. In my case it extracted info@apha.org in row 4, but there is no email address in that bit of text, only a domain name.

This again shows how LLMs are simply tools that return statisically likely responses, rather than correct ones, and why it’s important to validate their responses.

Summarising text

In the articles tab, we have a list of articles in different languages. Let’s say they’re too long, and we don’t speak all the languages. We want to summarise them.

In the summary column, paste the following formula:

=LLM(B2, "Summarise this article in English.")

Drag the formula down and take a look at the results. You’ll notice that all of the summaries are in English, even if the original article was not.

Warning

Despite LLMs being quite good at summarising text, they are still prone to halucinations. Double-check the results or don’t use them for critical stuff.