Creating With Generative Models: A Case Study in Ubi

jeff-koons-balloon-dog-(blue).jpg

Figure 1: Jeff Koons, Ballon Dog (Blue), 2021

Introduction

In 2019, Jeff Koons set the record for most expensive artwork ever sold at auction by a living artist at $91.1 million dollars. In fact, although he no longer holds it, this was the second time he had set that record. The first time being in 2013, when he sold a different artwork for $58.4 million dollars.

Besides being very successful, Jeff Koons is also known for the way in which he makes his art. He runs a massive studio operation. At one point, he had 100 employees just in the "painting department" alone. His studio is large enough to have "layoffs". 1 From the outside, it seems closer to a factory run by Henry Ford than an art commune.

Although he is a bit of an outlier, this kind of setup is surprisingly not that uncommon for successful artists today. However, that doesn't mean the practice is entirely new. The use of labor from apprentices was very common even among the "Old Masters." The manual labor of the actual painting or creating of a piece can be and often is outsourced.

Why? Because you don't buy a piece by Jeff Koons because he's the best painter. 2 The role of the artist in these cases is more of a creative director on a large industrial project than a craftsman. They need to make sure the final product fits their vision. As long as it does, who cares who painted what?

The recent explosion of generative AI has the potential to further change the way we think about the creative process. These models have the ability to take the same feedback loop that someone like Jeff Koons might utilize and automate half of it. Creating becomes less about what you could do yourself, and more about your ability to guide an existing process to create something interesting.

The goal of this installment is to try to solidify this more through an example. It will ignore any discussion about the technical background of the particular form of generative model we will be using (large language models) or philosophical discussions about what they can and cannot do. The focus is thinking about how we can effectively use these models to perform a task that requires creativity.

Ubi

pic511038.jpg

Figure 2: The Ubi Box

What is Ubi?

Ubi is latin for "where." It is also a illumanti themed (no, really) board game from the 1980's that combines trivia and geography. Every turn you are asked a cryptic trivia question, such as "Ubi Dostoeveski quill quotations?" and you need to locate the answer (in this case, Moscow) on a grid map of the world. The first player to get enough correct answers in enough categories is declared the winner.

Ubi's questions come in a very particular format. They are all:

  1. Heavy on aliteration
  2. Heavy on rhymes
  3. Begin with "Ubi"

Here are some examples:

Ubi Bonneville barriers broken?
Ubi northernmost Olympic torch scorched?
Ubi Alaska’s Sitka sit?
Ubi Cotton Bowl unroll?
Ubi Mount McKinley’s two sizable summits sited?
Ubi Stone of Scone’s home?
Ubi Manx cats meow?
Ubi Banff Springs spring?
Ubi The Exorcist exorcise?
Ubi highest cliff RFK climbed?
Ubi WWII’s first bomb boom?

The question format is a big part of what makes the game fun. Each question is effectively a riddle that you have to solve in order to figure out what it is even asking. For example, "Ubi Arizona turn turtle?" is asking "Where was the USS Arizona sunk in WWII?"

But, there is a problem. Ubi is no longer in print and many of the questions have fallen out of date. The map itself still has the Soviet Union and East Germany. The questions are filled with a lot of references that probably crushed in 1985 ("Ubi Mr. Ed Fed?"), but don't exactly land now.

Our goal will be to create more Ubi questions. Although potentially not as lofty a goal as creating sculptures that will sell for tens of millions of dollars, this simple task has all the attributes we need. There is no quantitative measure of what makes a good Ubi question. In that regard, it requires some amount of creativity to do successfully.

Making More Ubi Questions

Let's think about how we could go about this process. We could:

  1. Write the questions ourselves. This is time consuming and can lead to a lack of variety in the questions.
  2. Pay other people to write questions. This is also relatively slow and potentially very expensive.

Large language models give us a third option. The following is all you need to know about large language models to make this work. Large language models take text as an input and output more text. Think of it like text complete on your phone, but really, really good. We can use this text completion to perform tasks. Imagine if you typed out,

The largest city in Illinois is

The most likely completion of this text is the correct answer (Chicago). If you wanted your language model to be able to answer other questions, you could design a simple pattern

Question: What is the largest city in Illinois?
Answer: Chicago

Question: <insert question here>
Answer: 

The most likely continuation of this text based on the defined pattern is the answer for whatever question we insert. This does not mean that the language model will always output the right answer. There is no constraint that the model will output the actual most likely completion, or that this behavior will lead to the correct answer all of the time. In fact, it will often be wrong. However, the surprising thing about large language models is that they are good enough to right fairly often.

All we can count on is that if we give the model a pattern it is going to do it's best to follow it. Designing the right text inputs to get your language model to perform different tasks is an entire burgeoning field of study called "prompt engineering."

To start making our prompt for this task we first can collect about examples of what we want. This took maybe half an hour of going through the existing cards and putting them into a spreadsheet. We will be using text-davinci-003, commonly refereed to as GPT-3.5, as our language model.

For our first version of the prompt, let's go really crazy and list all of the examples and add, "Generate more questions." In python, this is just,

import os
import openai
import pandas as pd

openai.api_key = os.getenv("OPENAI_API_KEY")

df = pd.read_csv("ubi.csv").dropna()
questions = df['Question'].tolist()
prompt = "\n".join(questions)
prompt += "\nGenerate more questions.\n"

output = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt,
    max_tokens=128,
)
print(output['choices'][0]['text'])

The prompt looks like,

Ubi Aral Sea be?
Ubi a Czech first brandish a Bren gun?
...
Ubi Cretan capital crouch?
Ubi Nebraska’s southeast corner clip?
Generate more questions.

What is the most likely completion of this text? More ubi questions. Here are the first ten it generated:

Ubi highest peak of the Pyrenees perch?
Ubi Rio Grande rise?
Ubi Germany’s Black Forest stand?
Ubi Petra’s ancient tombs tucked?
Ubi first Easter Egg Roll rolled?
Ubi Niagara Falls from?
Ubi first Grand Slam tennis tournament spun?
Ubi Mississippi River mouth meet?
Ubi Dead Sea deep?
Ubi Black Death first strike?
Ubi Inca Empire’s capital clung?

Some of these are good, some of these are bad. The main problem is that they don't really have the same consistent alliteration and flow to them. Let's fix this by using the best tool in the prompt engineers toolbox: asking nicely. We can update our prompt to say "Generate more questions. Pay special attention to the alliteration and general flow." Now, we get

Ubi Rhine River roll?
Ubi Picasso paint a portrait?
Ubi the Four Horsemen ride?
Ubi Lake Ontario’s eastern edge edge?
Ubi King Arthur’s castle court?
Ubi the Grand Banks yield a bounty?
Ubi the first steam locomotive lurch?
Ubi the Danube Delta docks?
Ubi Suez Canal span?
Ubi the Golden Gate bridge bridge?
Ubi the Andes’ highest peak peak?

Better, but many of these alliterations are kind of weird ("Ubi the Golden Gate bridge bridge" is kind of a stretch). Another way we can improve our performance is honing our samples. Right now we are using a random collecting of samples. Instead, we can go through and just pick the best of the best and use those for the prompt. Now, we get

Ubi Niagara plunge its powerful pound?
Ubi Alcatraz’s alumni alight?
Ubi Seine’s source swirl?
Ubi Solzhenitsyn’s soliloquy spoken?
Ubi Sydney Harbour’s hulls held?
Ubi Lapland’s lights lighten?
Ubi Falkland’s future fate?
Ubi Dutch tulips twinkle?
Ubi Colosseum’s colossal crowds clap?
Ubi St. Paul’s steeple stretch?
Ubi Fort Knox gold glimmer?
Ubi Champs-Élysées’ chic shows start?

Looks great. Another fun thing that we can do is generate ubi questions with specific answers by changing around the prompt to include the answers first. To get some questions about Boston, we can

import os
import openai
import pandas as pd

openai.api_key = os.getenv("OPENAI_API_KEY")

df = pd.read_csv("ubi_select.csv").dropna()
questions = df['Question'].tolist()
answers = df['Answer'].tolist()

qa_pairs = zip(questions, answers)
lines = list(map(lambda qa: f'{qa[1]}: {qa[0]}', qa_pairs))
prompt = "\n".join(lines)
prompt += "\nBoston, MA:"

output = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt,
    max_tokens=64,
)
print(output['choices'][0]['text'])

The prompt looks like,

Vatican City: Ubi Ali Agca point a pistol at the Pope?
Northern Uzbekistan: Ubi Aral Sea be?
Brno, Czechoslovakia: Ubi a Czech first brandish a Bren gun?
Naples, Fla.: Ubi Alligator Alley’s western exit at?
London: Ubi Old Vic sit?
Calais, France: Ubi Florence Chadwick challenge the Channel?
Salt Lake City: Ubi Brigham Young set a city?
Paris: Ubi opera had the Phantom found?
The Atlantic Ocean: Ubi Amazon River deliver?
Paris: Ubi Tour de France final furlong found?
Blenheim Palace, England: Ubi whereabouts of Winnie’s Blenheim birthplace?
Boston, MA:

What's the most likely completion of this text? A question about Boston. The model generates:

Ubi Tea Party’s flotilla float?
Ubi Celtics cinch championships?
Ubi Bay State’s capital bustle?
Ubi Paul Revere's ride spied?

Answering Ubi Questions

Just being able to generate a question is only so useful, we also need answers. Let's make a prompt that forces the model to think "step-by-step" through the answering process. First, we want to translate the question out of the ubi format into natural language. Then, we want to know the answer. We can just create three samples of doing this by hand and use this as the prompt,

Question: Ubi subway titled Tube?
Translation: Where do they call the subway the "Tube?"
Answer: London

Question: WWII’s first bomb boom?
Translation: Where was the first engagment in WWII?
Answer: Puck, Poland

Question: Gerry Faust get the oust?
Translation: Where was the college football coach Gerry Faust famously fired from?
Answer: South Bend, Ind.

Now, if we prompt the model with a one of its own generated questions,

Question: Ubi D-Day's dawns' deadly drama?

We get,

Translation: Where did the D-Day landings take place?
Answer: Normandy, France

Which is correct.

However, if we want to generate a lot of new questions, we don't want to have to go through each one and check that its accurate. Getting an ubi question "wrong" because the answer on the card is wrong is, as you can imagine, a very frustrating experience.

Let's see if we can ground the model by teaching it to use a search engine. We can use the langchain bing search api to query the web and return some basic info. For example, if we run:

import os
import openai
from langchain.utilities import BingSearchAPIWrapper

os.environ["BING_SUBSCRIPTION_KEY"] = \
    os.getenv("BING_SUBSCRIPTION_KEY")
os.environ["BING_SEARCH_URL"] = \
    "https://api.bing.microsoft.com/v7.0/search"
openai.api_key = os.getenv("OPENAI_API_KEY")

question = "Where do they call the subway the \"Tube?\""
search = BingSearchAPIWrapper()
print(search.results(question, 5))

We will get an output that looks like,

"[{'snippet': 'The first metro was opened in London and later most of it was soon built underground (under the city), so it was then <b>called</b> THE UNDERGROUND, even to this day. But in general, in the UK we usually <b>call</b> it THE TUBE, because it mostly goes (or went) inside a tunnel, a tube.', 'title': 'Underground / Subway / Metro / Tube - Multimedia-English', 'link': 'https://multimedia-english.com/grammar/underground-subway-metro-tube-59'}, {'snippet': '“Tube” is only used for underground trains in London. The official name is the “Underground”. The first underground railways, the Metropolitan Railway, and the District and Metropolitan Railway, were built to the normal British loading gauge, so the coaches were the normal size for Britain.', 'title': 'Why do British people call an underground train or subway a &#39;tube&#39;?', 'link': 'https://www.quora.com/Why-do-British-people-call-an-underground-train-or-subway-a-tube'}, {'snippet': 'While stations seem to be busier than ever, London Underground trains have been running below our feet for 156 years now. And for most of its continually evolving history the network has been known simply as &quot;the Tube&quot;. It first came about almost 30 years after the first tracks were laid and tunnels dug. But <b>do</b> you know why?', 'title': 'Why the London Underground is commonly called the Tube', 'link': 'https://www.mylondon.news/news/west-london-news/why-london-underground-called-tube-14976587'}, {'snippet': '<b>Subway</b> is the main American term, but I&#39;ve actually heard a handful of people say metro. In New York we usually actually just <b>call</b> it the train. Tube and underground are British as far as I know. I&#39;m not sure about metro; I know it&#39;s used in some other parts of Europe (France, Russia, etc) but I don&#39;t know how common it in England specifically.', 'title': 'Metro, subway, tube or underground? : r/EnglishLearning - reddit', 'link': 'https://www.reddit.com/r/EnglishLearning/comments/e1sfj1/metro_subway_tube_or_underground/'}]"

What we would like to do is find the relevant text from the snippets that answer our question, and return the link as a citation. We can do this manually for the same set of three questions as before,

Question: Where do they call the subway "Tube?"
Web Results: "[{'snippet': 'The first metro was opened in London and later most of it was soon built underground (under the city), so it was then <b>called</b> THE UNDERGROUND, even to this day. But in general, in the UK we usually <b>call</b> it THE TUBE, because it mostly goes (or went) inside a tunnel, a tube.', 'title': 'Underground / Subway / Metro / Tube - Multimedia-English', 'link': 'https://multimedia-english.com/grammar/underground-subway-metro-tube-59'}, {'snippet': '“Tube” is only used for underground trains in London. The official name is the “Underground”. The first underground railways, the Metropolitan Railway, and the District and Metropolitan Railway, were built to the normal British loading gauge, so the coaches were the normal size for Britain.', 'title': 'Why do British people call an underground train or subway a &#39;tube&#39;?', 'link': 'https://www.quora.com/Why-do-British-people-call-an-underground-train-or-subway-a-tube'}, {'snippet': 'While stations seem to be busier than ever, London Underground trains have been running below our feet for 156 years now. And for most of its continually evolving history the network has been known simply as &quot;the Tube&quot;. It first came about almost 30 years after the first tracks were laid and tunnels dug. But <b>do</b> you know why?', 'title': 'Why the London Underground is commonly called the Tube', 'link': 'https://www.mylondon.news/news/west-london-news/why-london-underground-called-tube-14976587'}, {'snippet': '<b>Subway</b> is the main American term, but I&#39;ve actually heard a handful of people say metro. In New York we usually actually just <b>call</b> it the train. Tube and underground are British as far as I know. I&#39;m not sure about metro; I know it&#39;s used in some other parts of Europe (France, Russia, etc) but I don&#39;t know how common it in England specifically.', 'title': 'Metro, subway, tube or underground? : r/EnglishLearning - reddit', 'link': 'https://www.reddit.com/r/EnglishLearning/comments/e1sfj1/metro_subway_tube_or_underground/'}]"
Relevant Snippet: “Tube” is only used for underground trains in London.
Relevant Link: https://www.quora.com/Why-do-British-people-call-an-underground-train-or-subway-a-tube
Answer: London

Question: Where was the first engagment in WWII?
Web Results: "[{'snippet': 'USS Lexington explodes during the Battle of the Coral Sea. A formation of Spitfires shortly before World <b>War II</b>. This is a list of military engagements of World <b>War II</b> encompassing land, naval, and air engagements as well as campaigns, operations, defensive lines and sieges.', 'title': 'List of military engagements of World War II - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/List_of_military_engagements_of_World_War_II'}, {'snippet': 'The attack on the United States gunboat USS Panay on 12 December 1937 by Japanese forces in China (usually referred to as the Panay incident) could be considered as the <b>first</b> hostile American action during World <b>War II</b>.', 'title': 'First American engagement in World War II - Military Wiki', 'link': 'https://military-history.fandom.com/wiki/First_American_engagement_in_World_War_II'}, {'snippet': 'Scholars have identified various events as being the <b>first</b> <b>engagement</b> of neutralUnited Statesin World War IIbefore the attack on Pearl Harbor. They disagree on which events led to formal entry of the United States into the conflict. Contents 1Attacks on Americans 2Attacks by the U.S. military 2.1Germany 2.2Japan 3See also 4References', 'title': 'First engagement of neutral United States in World War II before the ...', 'link': 'https://en.wikipedia.org/wiki/First_American_engagement_in_World_War_II'}, {'snippet': 'With Adolf Hitler leading a German invasion of Poland in 1939, World <b>War II</b> was launched, a deadly global conflict waged across Europe and the Pacific until 1945. Bloody battles raged between the...', 'title': 'World War II Battles: Timeline - HISTORY', 'link': 'https://www.history.com/topics/world-war-ii/world-war-ii-battles-timeline'}]"
Relevant Snippet: With Adolf Hitler leading a German invasion of Poland in 1939, World <b>War II</b> was launched 
Relevant Link: https://www.history.com/topics/world-war-ii/world-war-ii-battles-timeline
Answer: Poland

Question: Where was the college football coach Gerry Faust famously fired from?
Web Results: "[{'snippet': 'In 1986, <b>Faust</b> was hired by the University of Akron after the school <b>fired</b> head <b>coach</b> Jim Dennison. Dennison, who is the Akron career wins leader for <b>football</b>, was forced out by university president, William Muse and athletic director, Dave Adams.', 'title': 'Gerry Faust - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Gerry_Faust'}, {'snippet': '<b>Faust</b>, <b>famously</b> plucked from Cincinnati Moeller High School to <b>coach</b> Notre Dame in the early 1980s, went 43-53-3 from 1986-1994. Like Arth, Owens was a local product, and a high school...', 'title': 'The Akron Zips have fired all their head coaches since 1995. Here&#39;s who ...', 'link': 'https://news.yahoo.com/akron-zips-fired-head-coaches-174645101.html'}, {'snippet': 'CINCINNATI -- In 1960, <b>Gerry</b> <b>Faust</b> pulled a <b>football</b> team out of thin air.. With donated equipment, Archbishop Moeller High School&#39;s first <b>football</b> team -- a reserve squad -- went 4-4. By 1962 ...', 'title': 'From the Vault: Gerry Faust takes Notre Dame job - WCPO', 'link': 'https://www.wcpo.com/news/our-community/from-the-vault/from-the-vault-gerry-faust-puts-moeller-football-on-the-map-leaves-for-notre-dame-after-state-game'}, {'snippet': 'Around 1 p.m. Saturday when a white Moeller transportation van rolled into the private facility along the Little Miami River, <b>Gerry</b> <b>Faust</b> was given a hero&#39;s welcome. He turned 86 on Friday and...', 'title': 'Moeller&#39;s finest honor former football coach Gerry Faust for his birthday', 'link': 'https://www.cincinnati.com/story/sports/high-school/high-school-sports/2021/05/22/moellers-finest-honor-former-football-coach-gerry-faust-his-birthday/5201457001/'}]"
Relevant Snippet: None
Relevant Link: None
Answer: Not listed

It's important to note that for the last question, the web search didn't give use the correct answer. In this case, we would like the model to simply decline to answer.

Now, we can automate this process by using the above as another prompt to the model. By then adding, "Question: Where did the D-Day landings take place?" and the web search results, we can have the model answer and cite it's sources itself.

import os
import openai
from langchain.utilities import BingSearchAPIWrapper

os.environ["BING_SUBSCRIPTION_KEY"] = \
    os.getenv("BING_SUBSCRIPTION_KEY")
os.environ["BING_SEARCH_URL"] = \
    "https://api.bing.microsoft.com/v7.0/search"

openai.api_key = os.getenv("OPENAI_API_KEY")

# Prompt truncated for display. See above for full prompt
prompt = """
Question: Where do they call the subway "Tube?"
...
"""
question = "Where did the D-Day landings take place?"
search = BingSearchAPIWrapper()
results = str(search.results(question, 5))

prompt += f"Question: {question}\nWeb Results: {results}\n"

output = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt,
    max_tokens=128,
)
print(output['choices'][0]['text'])

This will output something like:

Relevant Snippet: The Normandy landings were the landing operations and associated airborne operations on Tuesday, 6 June 1944 of the Allied invasion of Normandy in Operation Overlord during World War II.
Relevant Link: https://en.wikipedia.org/wiki/Normandy_landings
Answer: Normandy, France

Although this process is still prone to some errors, the errors are a lot easier to catch.

AI-Generated Ubi

Now we can generate ubi questions at scale by following a human-in-the-loop algorithm:

  1. Generate some questions.
  2. Generate explanations for those questions.
  3. Generate citations for those explanations.
  4. Verify everything checks out.

We can even connect all of these components into a single script to generate questions on the fly. We can then just keep generating questions over and over and over, and curate the best ones we find.

Here are some AI-generated questions:

Question 1

Conclusion

This entire process was relatively quick and painless. Nowadays, It is hard to justify the expense of paying someone to do this. The total cost of API usage was well less than $20 and the coding portion took an afternoon. You could just sit here all day and keep generating and collecting new questions, continually honing your prompt to get the exact behavior you want. The only limit is your own patience.

In fact, this is what has happened to Jeff Koons studio. The layoffs were mostly in favor of increasing automation. 1 It is not hard to understand why. This loop is almost exactly the same as if we replaced the API call with a person. We started with a vague set of criteria. Through seeing what the response to that criteria was, we were able to slowly adjust it to align with the output we desired. The value of these generative model's is that they give us a medium that is fast, cheap, and reproducible. You don't need to hire a writer, you just need to pay OpenAI a few cents.

In fact, even writing this was a similar experience. All of the code for the simple javascript widget used to display the questions above was mostly written via a collaborative feedback loop with chatGPT. I would ask for a certain feature, it would generate the code, and then I would describe how I would like the code to be changed. The final version is only slightly edited from the verbatim output.

This loop is going to become more and more pervasive in our lives. From software development, to creative writing, to potentially even things like medicine, we are all going to become more like Jeff Koons.

And hopefully we are going to be better off for it, but who knows.

Footnotes:

2

You do because you just want to impress your rich friends. Or, as an investment. Or, a million other reasons but you get the point.