Inspired by a Seth Godin blog post, we put a few LLMs to the test to see which would prevail across key services in our business: marketing campaigns, design, development, copywriting, and custom Slack emojis.
With a seemingly new system popping up every day, we couldn’t help but wonder: Would the same prompt elicit different responses? Will one language model prove superior over the others? Will the team be satisfied with the outcomes or demand a retake? And could this be our chance to improve the reaction options of Slack users everywhere, and use AI to generate a :new-bagel:?
It seemed that there was only one way through: establish a panel of judges, conduct a blind test, and assess multiple criteria related to our work at Manual Labor.
Methodology: LLMs included in this exercise: Claude, ChatGPT, Firefly, Meta AI, and Runway. Note: ChatGPT4 was used for Tailwind CSS (prompt #3) and ChatGPT 4o was used for text-to image generation (prompt #5). For all other tasks, the free version of each LLM was used.
As a marketing and design agency that connects the dots through marketing, design and messaging, we wanted our prompts to encompass various aspects of our work. Prompts included the following five subjects and respective LLM challenges:
- Social media content writing
- Project management
- UI design to front-end code conversion
- Text-to-image generation
- Content writing
Please note: this is a longer read with a lot of information (and LLM outputs!) To help you navigate, we've included quick links above so you can easily jump between sections or focus on the sections that interest you the most, whether it's text-to-image generation or project management .
Meet the Judges
Sam's Take on AI:
There's so much discussion around how agencies are using AI. Our recent work with Cognitiv proved to be a great exploration into how AI became a partner. Let it work for you, not in place of you. AI may accelerate execution but the fresh ideas will still come from the creatives.
Todd's Take on AI:
I’m a big believer in AI in the right places, at the right time. Very quickly, AI will significantly help to eradicate diseases. There is so much we can do around “AI for good,” but of course the pendulum swings the other way to war, theft, and taking advantage of people. For our work, it’s a place that can get us to an outcome more quickly based on our big brains and strategy.
Deanna's Take on AI:
To me, AI is a multifaceted field of science where people are doing their best to create tools that help people live and do their work with more ease. I enjoy learning about what is possible and what will soon be possible with AI with feelings of hope, curiosity and excitement.
Judging Criteria
To answer “What was the best output?”, I asked ML’s judges to focus on three criteria:
- Clarity: was the output direct, simple, with no jargon? In the case of evaluating copywriting, did the output sound human?
- Comprension: did the output fully answer the question, or leave key details out?
- Accuracy: was the output correct, or did the LLM hallucinate?
LLM Challenges
Challenge #1: Social Media Content Writing
Judges: (All): Deanna, Todd, Sam
Prompt: Pretend you’re an expert copywriter with a strong understanding of social media best practices. You are working with Manual Labor (ML), a female-and minority owned creative and marketing studio. Suggest three Instagram caption options for ML to use on a post for its 12th Anniversary. Provide captions in a tabular format, limiting each idea to 1 sentence. When copywriting, analyze the following copy for tone of voice, and emulate it in your output. The copy is: “Our ultimate goal is to help companies better define who their brand is (the promise) and how they connect with customers (the experience). And we do that by leveraging messaging, design, and technology to connect the dots between who you are, what you do, and why it matters. Our partnership begins when we establish what success looks like together.”
Output 1:
Output 2:
Output 3:
Judges’s Feedback:
- Todd: Output 3. Starts higher and gets to the customer.
- Sam: Output 3. It feels like a great combination of casual tone and substantive content.
- Deanna: Output 1 options feel most helpful as guidance for social media writing. I think the captions included most comprehensively answers the prompt and takes into account the context of posting for social media and the “why” of the post.
LLM Reveal:
- Output 1: ChatGPT 3.5
- Output 2: Claude
- Output 3: Meta.AI 🏆
Challenge #2: Project Management
Judge: Deanna
Prompt: Pretend you’re the head of operations for a marketing and design agency. You're building a Google Sheets template to monitor the company's marketing activities - including content writing, press outreach, and social posts. What top 5 fields would be included? Limit justification for each field to 30 characters. Provide feedback in a table format.
Output 1:
Output 2:
Judge’s Feedback:
- Deanna: Output 2 answers the prompt best. Providing high-level fields appropriate for the role as described in the prompt.
LLM Reveal:
- Output 1: ChatGPT 3.5
- Output 2: Claude 🏆
Challenge #3: UI design to front-end code conversion
Judge: Deanna
Prompt: Convert this design into a react component using tailwind css
Output 1:
To convert this design into a React component using Tailwind CSS, we can create a new React component file and structure the HTML and CSS accordingly. Here's an example:
In this example, we've created a new React component called ConnectSection. The component renders a div with a green background (bg-green-300) and some padding (py-8). Inside this div, we have a container div with a maximum width of 4xl and some horizontal padding (max-w-4xl mx-auto px-4 sm:px-6 lg:px-8).
Inside the container, we have an h2 element for the "Let's Connect!" heading, a p element for the body text, a button element for the "Contact us" button, and a span element for the "Say hello" text.
The Tailwind CSS classes are applied to the respective elements to match the design. For example, text-3xl font-bold is used for the heading, text-lg for the body text, bg-black text-white py-2 px-4 rounded-md for the button, and text-gray-600 ml-4 for the "Say hello" text.
You can then import and use this ConnectSection component in your React application wherever you need it.
Make sure to include the Tailwind CSS styles in your project, either by importing the Tailwind CSS file or by using a build tool like PostCSS to integrate Tailwind CSS with your project.
Output 2:
To convert the design you provided into a React component using Tailwind CSS, I'll create a basic structure for you. This will include the text, a button, and styles using Tailwind CSS classes. Here's a simple example to get you started:
Explanation:
- Container Style: The container (div) uses bg-teal-200 for a teal background, text-center for centering the text, and p-10 for padding.
- Title: The title (h1) is styled with text-4xl for size, font-bold for font weight, and mb-3 for margin-bottom.
- Paragraph: The paragraph (p) uses text-xl for the text size and mb-5 for margin-bottom.
- Button: The button is styled with bg-gray-800 for a dark gray background, text-white for white text, px-5 and py-2 for padding, and rounded for rounded corners. It includes hover effects with hover:bg-gray-700.
- Signature Text: Another paragraph at the bottom is styled italic with italic and text-lg for a larger text size.
Notes:
- Colors and Styling: Adjust the color codes and sizes as per your exact design specifications. Tailwind CSS offers a wide range of classes for customization.
- Responsiveness: You might want to add responsive classes using Tailwind's breakpoints to ensure the component looks good on all devices.
- Interactivity: For real-world usage, the button could include functionality, like opening a contact form or linking to a contact page.
This code snippet should be placed within a React project where Tailwind CSS is properly configured. If you need help setting up Tailwind CSS with your React project or further customization, feel free to ask!
Judge’s Feedback:
- Deanna: Despite output 2 including a waving emoji where the original design does not have one, output 2’s LLM has provided clearer implementation instructions and has called out needing to also add styling, responsiveness and interactivity.
LLM Reveal:
- Output 1: Claude
- Output 2: ChatGPT4 🏆
Challenge #4: Text-to-image generation
In October 2018, we were outraged in New York. Apple released a bagel emoji that was too shiny and round. It was lacking substance. There was no cream cheese. Apple seemingly heard the outcry and updated the emoji in iOS 12.1. Here is the highlight in Entrepreneur magazine as a refresher.
Six years later, 2024 presents us with a new opportunity: Revamp Slack’s bagel emoji.
Could this be our chance to improve the reaction options of Slack users everywhere, and use AI to generate a :new-bagel:? Maybe. Are we one step away from losing the original premise of this blog post? Potentially.
Judges: (All): Deanna, Todd, Sam
Prompt: Create an image of a superior New York bagel.
Output 1, 2, and 3:
Judge’s Feedback:
- Deanna: Output #1! Truly superior and most apt for emoji inspiration. (Why does output 3 look like there is cheese and mayo?!)
- Todd: Can I vote for none? :-)
- Sam: Output #1 if I had to choose. While it looks more like a store bought freezer bagel, it’s better than the bialys or the bagel paper sandwich.
LLM Reveal:
- Output 1: ChatGPT4o 🏆 (But honestly, we think the slackmoji is probably the best)
- Output 2: Adobe Firefly
- Output 3: Runway
Challenge #5: LL or Shel? (AKA Content Writing)
Instructions for Judges: Read the following poem & guess if it was written by (1) American writer and poet Shel Silverstein, or (2) OpenAI’s new flagship model, GPT-4o
SILLY OLD JOE
In a house of socks, on a hill of shoes,
Lived a quirky old man who had nothing to lose.
He juggled his hats and he danced with his ties,
And he sang silly songs to the starry night skies.
He painted his walls with the colors of dreams,
And sailed paper boats down imaginary streams.
With a whisk and a spoon, he'd stir up delight,
Making moon-pie soufflés in the still of the night.
The townsfolk would laugh, call him "Silly Old Joe,"
But he’d smile and he'd wink, for they’d never quite know,
That the magic he shared, from his toes to his chin,
Was the secret of joy that came from within.
Judge’s Feedback:
- Deanna: ChatGPT, the paired rhyming structure seems too obvious
- Sam: Chat GPT, there’s no such thing as Moon pie souffles.
- Todd: KellyGPT?
Reveal:
- ChatGPT4o’s prose fooled no one.
Final Takeaways
No single clear winner emerged from the Battle of the LLMs - which makes it fun for us to continue experimenting. Some interesting insights: Meta.AI in the social media content writing was specifically called out for substantive content and casual tone, whereas ChatGPT helped provide guidance in comprehensive captions for the same prompt.
ChatGPT4’s output for UI Design to Front-End Code conversion stood out, providing clear implementation instructions and notable callouts for styling, responsiveness and interactivity details. When it comes to content writing, however, ChatGPT4o’s tone was easily detected.
(And we will continue the search for the :superior-bagel:).