September 2024: A meme was going around Bluesky: use the CLIP Interrogator—an applet running on huggingface.co—to reverse-generate the prompt to produce a partciular picture. Upload a photo, pick a setting (or acquiesce to the default), and generate a prompt. In its own words: “Want to figure out what a good prompt might be to create new images like an existing one? The CLIP Interrogator is here to get you answers!”
The game was fun insofar as the descriptions read and regurgitated words most likely to generate gender in an image; the punctuation of the blurb infested with comma splices, a combination of proper nouns, nationalities, regular nouns, some adjectives and adverbs.
What could this model actually mean when describing my picture as:
a close up of a woman with a donut in her hand, office cubicle background, made in 2019, loosely cropped, forehead jewelry, in australia, jordan, wearing a cute top, post - apokalyptic, brawny, wearing a haori, angela white, ebay product, mexican, uncropped
I was so smitten with this nonsensical description. Bubba was not amused upon receiving my giddy yet spacy explanation of what it was. “Did you use the text to generate an image?”
No. Not for several days as I slowly coped with the idea of needing to pick a platform to use my prompt to generate my image. Most likely there is an option on huggingface.co but I am too ignorant of its techical specifications to make anything of it.
I chose Craiyon to do my experiment because it didn’t require making an account first. I only went though the first five results in my Google search to be honest.
I fed it my prompt:
I watched as the processor counted down on the screen of my phone. The western addition and then USF and then the Richmond flew by outside the bus. I waited for my AI pictures to develop. Once they came up, an initial choice of 9, they were … hilarious. Enchanting even. Amusing to consider as an alter ego. Enlightening in that the CLIP Interrogator generated instructions that Craiyon seemed to interpret correctly.
CLIP Interrogator said the woman has a donut in her hand because if the cloud-shaped “D” on the bag in the background was a donut—which it thinks it is, based on the shape—the woman would have to be holding it close to her face to be in a close up. How does a human hold a donut? With her hand. No one really knows what the random countries signify, or the haori, or what is an ebay product—except one image from Craiyon shows the woman holding up tickets. In my real picture, there are ticket-like ephemera hanging in the background of my cubicle wall. The two LLMs each product is based on knows the words to interpret ticket-like ephemera.
All the other images of generated women with donuts were too pretty. If this was to be in my effigy, it would have to be a little bit more ugly, something uneven, fingers spaced out and melding together, a white tshirt with an illustration of birds. Her earrings hang low, removed from her lobes. But her hair is so flat and sleek, parted straight down the middle and not one flyaway in sight.