Welcome to the future! The generative AI space is moving so fast, it can be a little overwhelming in terms of what's possible or how it works in the context of events. We’re doing our best to keep up ourselves, so we’ll keep this article up to date as a helpful reference.
The New Era of Photo Booths
We all know photo booths from corporate parties, weddings, and every industry event. They’re fun, generate great content, and continue to get more immersive. They also require a physical footprint in-person, and everything changed in 2020.
First with virtual, then hybrid, and now with generative AI, photo booths have remained a staple of events - but have completely transformed into immersive, digitally connected experiences.
In this article we’ll guide you through how AI photo booths work, the different types, what you can and can’t do, and what’s coming in the future.
How AI Photo Booths Work
Most simply, an AI photo booth can learn from a picture of a person, and then generate a portrait of that person from scratch - transformed into a new version of themselves. The style and context of the transformation is driven by a text prompt that is custom scripted for each experience.
The software is web-based, meaning it can be loaded on any device with a web browser. Usually this is someone’s phone, allowing them to use their high-powered phone camera (and camera roll), or an iPad that’s set up to provide an interactive, in-person experience.
It can also work with a more traditional physical photo booth as long as it’s able to load a web-browser. At Snapbar we’re focused on software only and making it as accessible as possible.
Check out our AI Lookbook for quick inspo on what you can do
Understanding AI Photo Booth Technology
First,
let’s define the general type of AI technology being used - generative AI, specifically Large Image Models (LIM).
Generative AI is a type of artificial intelligence (AI) that can create new content, such as images, text, and music. Generative AI models are trained on large datasets of existing content, such as text and images, and they use this data to learn the patterns and rules that govern how that content is created. Once the generative AI model has been trained, it can be used to model and create create new content based on what it knows and understands.
There are two main functions of the AI technology being leveraged to power an AI Photo Booth.
On the larger scale, this is a massive AI model that ingested much of the internet and can now create stunning imagery based on text prompts. This becomes the background or context of the photo booth portrait.
On the smaller scale, this is an AI model trained on 1 or more photos of a person to learn everything it can - about the intricacies of facial features, hair, skin tone, eyes, teeth, clothing, etc. This model can then be used to generate a new version of that person in the pre-defined context.
The accuracy of the output, and the output’s representation of a user’s characteristics are highly dependent on the quality of the input photo. If there is poor lighting or it’s blurry from a low quality camera, the AI has to infer to fill the gaps and may misinterpret what it sees. This is often how skin tone is changed or overall lacking likeness between the portraits and the actual user.
Second,
it’s important to explain prompt engineering - or simply, how to write the text prompts that get the AI to generate the image you want it to. Here’s an example of one:
waist up portrait of ___, ____ in the background, wearing ___, vibrant, symmetrical highly detailed, digital painting, arstation, concept art, smooth, sharp focus, illustration, cinematic lighting art by Artgerm and Greg Turkowski and Alphonse Mucha
That’s an actual template Snapshot AI prompt we’ve used before.
If you ask ChatGPT to create “a good example of a text prompt for image generation that you'd share with someone that is trying to understand prompt engineering for the first time” you get:
Imagine a serene lakeside scene at dusk. The sky, awash with hues of pink, orange, and purple, reflects softly on the still water. In the foreground, a lone, ancient oak tree stands tall, its branches sprawling outwards, silhouetted against the colorful sky. Beneath the tree, a wooden bench faces the lake, inviting a moment of reflection. To the right, a small, wooden rowboat is tethered to a tiny dock, barely disturbing the water's surface. The far side of the lake is lined with distant mountains, their peaks just catching the last light of the sun. The entire scene conveys a sense of peaceful solitude and natural beauty.
Which, provides a good sense of the level of detail you can provide, and how tweaking certain words can have a significant affect on the output image. Here’s what ChatGPT created for me with that prompt:
There are endless possibilities for what you can prompt, which is what’s so exciting about the technology. There’s also an endless number of ways to do it wrong or to get poor results, which is why our engineering team has spent countless hours tracking and testing everything we can - and developing standards to get consistent, quality results.
One last topic to touch on for prompt engineering, is negative prompts.
Just as you can tell the AI what to create with text prompts, you can also tell it what not to create, or what to exclude, with negative prompts. This is a core part of our proprietary “secret sauce” that we refine regularly, and also further customize for each Snapshot AI instance.
🤓 For a deeper dive into Generative AI Technology, read our article How an AI Photo Booth Works Using Generative AI & Custom Prompts
Comparing AI Photo Booth Technologies: Face Swap vs. Trained AI Models
Beyond the high level user experience outlined, the technology starts to diverge - between the two main AI services being leveraged by AI photo booth companies: Midjourney and Stable Diffusion.
“Midjourney is unparalleled for artistic, stylized imagery, while Stable Diffusion excels in producing realistic, detailed images efficiently while enabling you to have extensive customization and technical control.”
eWEEK: Midjourney vs. Stable Diffusion: Best AI Art Generator 2024
Most photo booth companies are using Midjourney, which as noted is fantastic for artistic, stylized imagery. If you’ve generated an image with ChatGPT (like above) or many other text to image generators, there’s a good chance it was Midjourney. It’s fast and can create stunning visuals.
The downsides to Midjourney are accuracy and technical customization, which as noted are what Stable Diffusion excels at. Most Midjourney based photo or portrait booths are using what’s often called “face swap” - which maps and masks your face onto a pre-defined picture or graphic (like onto a superhero, for example).
It’s cool, but often has limited facial accuracy. It is also generally limited to the person’s face, meaning the hair, body, pose, context, etc. mostly remain the same for every user. This is also limiting in creativity - confining the experience to a set of ‘templated’ image or output styles. From a traditional photo booth perspective, it is a fun feature, but in the realm of generative AI it is pretty limiting.
It’s also important to note that it’s still very new technology that is evolving rapidly, so it will get better fast.
At Snapbar we opted to learn and implement Stable Diffusion as the backbone of Snapshot AI (we actually use several foundational models, including several Stable Diffusion models). The core difference is that instead of swapping or mapping a person onto a template image, we’re generating completely original portraits each time based on that person, and the context of the prompt we’ve created (like a superhero flying through the desert in the style of picasso, for example). The technology creates a unique fine-tuned model of each user based on the input photo or uploads, and then can generate images of that person in unique contexts or styles.
On top of providing greater control and fine tuning of output styles, Stable Diffusion also supports additional services that enable more immersive outputs. For example, inserting a mascot or character into the portrait with the user, or placing the user within a known artwork like a movie poster.
This just scratches the surface of what’s possible, and as noted before, the technology is evolving rapidly.
The one downside is that Stable Diffusion a bit slower than Midjourney, taking minutes rather than seconds, but it’s getting faster. The silver lining, however, is that the email delivery with Snapshot AI also creates another touchpoint for brands to customize.
Midjourney and Stable Diffusion are the most common AI models, but it’s hard to emphasize enough how fast the technology is advancing and evolving. We’re staying on the cutting edge of this GenAI tech, and are updating our models / pre-processing / post-processing and software infrastructure to continually get faster, higher quality, and better aligned outputs. We’re very excited for not only what generative AI will be able to create in the future, but also how AI technology in general will enhance photo experiences in novel ways.
Snapshot AI: The Most Advanced AI Photo Booth
Now that we’ve talked about the underlying technologies that have catapulted AI Photo Booths into the zeitgeist of events in record time, let’s go a little deeper on Snapshot AI. We like to think we were first to introduce an AI Photo Booth (circa March 2023), but we did have some friends in the industry right there alongside us. A few things that set us apart:
- Thousands of hours spent testing prompts and prompt parameters
- Direct relationships with engineering teams building the AI models we leverage
- Proprietary pre and post processing (before and after the foundational GenAI models) that we’ve built to improve output accuracy
- Unique tech stack that enables more granular customization, like..
- Product or mascot placement within the image
- Inserting people into existing artwork like a movie poster
- Controlling the pose each person is in
- Masking in specific branded elements, like a soccer jersey for example
At the time of writing we’ve processed and generated over 100,000 AI portraits - each one completely unique.
On top of the robust technical capabilities to deliver very custom output styles, Snapshot AI is also built with brands in mind. Some key features include:
- Accessibility: The experience is equally accessible and functional on a smartphone, computer, or an iPad adapting to any event format - hybrid, virtual, and in-person.
- Multi-prompt support: Give users a choice between multiple output styles, like a professional headshot and a cartoon caricature.
- Lead capture: Gather valuable user data seamlessly as part of the user flow. This is optional and customizable.
- Email delivery: While delivery takes minutes not seconds, the email delivery provides a highly valuable branded touchpoint in every user’s inbox.
- Moderation: Use forced moderation to only publish approved outputs, or moderate as needed in the convenient dashboard.
- Dynamic displays: Use the included live gallery to showcase content in real-time, and/or add more dynamic displays like slideshows, social walls, and mosaics.
The possibilities truly are limitless so it’s an exciting time we live in, in general, but more importantly in our niche of event and marketing photo experiences. It can also be overwhelming, so we’ve categorized some of the most popular output styles and use cases to help with inspiration.
→ AI Movie Posters & Marketing Moments
Snapshot AI: The First AI Photo Booth to Use DALL-E
The latest update in the saga of trying to keep up with AI - is that we can now leverage DALL-E rather than Stable Diffusion (SD). SD is still what powers our core, premium AI photo booth to generate truly unique outputs as described throughout this article, but the option for DALL-E offer some interesting new angles:
- It's much faster, delivering user outputs via email in seconds, not minutes
- It uses GPT4V(ision) to analyze the input photos and translate them to a highly detailed text description, which is then used to create the new image
- It generates the output photos based on a purely text description + prompt, rather than a fine-tuned image model
- It handles group photos much better as it's describing the whole input image rather than trying to learn individual faces
It has limitations, in that it doesn't handle photo-realism well, but it can also consistently create some more fun and whimsical style characters. It also does much better with groups or multi-subject input photos which adds another layer of fun for in-person events.
How the DALL-E Photo Booth works
Using GPT-4V(ision) we generate a highly detailed description of the submitted user photo, use additional logic and automation to clean it up, and then elaborate on the generation request with our proprietary prompt magic. That modified description + engineered prompt are then then uses to generate a brand new image. It’s image to text, then text back to image!
Aside from the different AI models and image interpretation to generation process, it works exactly the same as our other products built on the Snapshot platform. With a dedicated, custom microsite you can engage users from any device, including an iPad on a stand to create an in-person photo booth experience.
Learn more about our DALL-E Photo Booth here >
Limitations of AI Photo Booths and Prompt Engineering
It would be irresponsible to highlight all of the amazing things possible using AI, without recognizing and addressing some of the challenges. We’ll start with the big picture challenges, and then break down the common issues and look at why it’s an issue, and how it can be addressed.
The Big Picture
The AI models that are being used were trained on massive datasets (and continue to be), but they can only be trained on pre-existing data. Meaning history, and largely content available on the internet - including social media. We know how historical content can often skew European / white, and male. It is also riddled with bias depending on who wrote the history. And social media, is, well, social media.
With that very basic primer it’s easy to understand how these AI models might introduce assumptions that are offensive or off-putting. It’s also why we dedicate significant resources to product safety and guardrails to avoid inappropriate content.
Facial Recognition & User Privacy
- The issue: The idea of an AI model learning the intricacies of your face, and facial recognition in general make most people uneasy. How PII (personal identifiable information) is handled, like email, is also always important, particularly for corporate sponsored activations.
- Snapbar solutions:
- We do not store any user data associated with the AI Photo Booth experience. That information is strictly used to power the experience and deliver the intended end result. The exception is lead capture which is provided to the client hosting the experience.
- We lead the experience with a customizable disclaimer to inform the user of what their participation entails.
Race / Ethnicity
- The issue: AI can sometimes misinterpret a person’s race, ethnicity, or skin tone, which can then be translated to the output photo causing a misalignment, and upsetting users. Oftentimes this is a result of a poor quality photo input, whether it’s low resolution, has shadows that darken the person’s appearance, or apparel that hides a person’s features like hats or sunglasses.
- Snapbar solutions:
- Photo quality is critical so we always advise testing in a ‘real environment’ to identity any lighting or quality issues before going live
- We have recommended hardware to provide proper quality and lighting
- We utilize negative prompts to avoid racial bias as best we can, constantly refining our prompts to deliver the best possible results for every client activation
- In a critical situation, we can provide a user dropdown option to select desired skin tone
- In the future: we are exploring additional photo analysis technologies that can better analyze the input photo to enhance the prompt specificity
Gender
- The issue: Gender is very personal, and can be fluid, but as with race, the AI may be working from one low quality input image to learn about that person. Mistakes and assumptions can be made leading to misalignment in the output gender. AI unfortunately also has a history of sexualizing women (again based on what is has learned from history and the internet).
- Snapbar solutions:
- As a default, we put the power of gender in the user’s hands with a dropdown menu for ‘Avatar Features’ with options for Masculine, Feminine, or I’d rather not say
- We have strict testing and prompting standards in place to avoid any explicit or sexualized content
- We work with each client to design prompts and output styles that inherently limit or avoid potential bias
Body Image
- The issue: This tends to be more situational than technology driven, but the AI tends to make people skinny - or “ideal” versions of themselves - when it has to make assumptions. Most often this occurs because people only input a picture of their face or shoulders up, but the output style is a full body pose. The AI has to infer everything about the person it can’t learn from the input photo, and therefore makes assumptions.
- Snapbar solutions:
- If possible, we recommend capturing or uploading full body photos if the output style includes a full body pose. When this is the case the accuracy is generally quite high.
- We advise that certain prompts, like a superhero or professional athlete, will skew towards fit and skinny people. Because superheroes and athletes are generally fit and muscular, and that is what the AI has been trained on.
- In the future: we are exploring additional photo analysis technologies that can better analyze the input photo to enhance the prompt specificity
Stereotypes
- The issue: The AI may introduce stereotypical aspects based on the person’s appearance such as skin color, for example adding dreadlocks to a Black person. This can be exacerbated when the context of the particular prompt or output style may reinforce stereotypes, for example a Rastafarian theme, to reuse the previous example.
- Snapbar solutions:
- We have robust testing and prompting controls in place to address this on a case by case basis
- We work directly with the engineers of the AI models we use to address repeated issues and introduce systemic fixes
- We advise clients to ensure the output experience and expected results are clear to users, so even when this happens on occasion, people understand it’s in fun and part of using an emerging technology
These noted issues can be sufficiently scary for brands, but it’s very important to note that the Generative AI industry at large is grappling with all of these same things. That includes AI photo booths and every AI image generator.
At Snapbar we’ve dedicated countless hours to testing and refining our prompts to eliminate the vast majority of issues - but always want to be transparent with our customers on what’s possible. We work with every client to test their AI prompts early and can refine or pivot as needed to deliver the best possible content and user experience - and have done so successfully for some massive events with some of the world’s largest brands.
The Future of AI Photo Booths
The future of AI photo booths is exciting and only just beginning. As AI technology continues to advance, we can anticipate even more intelligent and intuitive photo-taking experiences. As we’ve been reading about for years, developers are working on incorporating augmented reality (AR) and virtual reality (VR) technologies, but Apple’s launch (as they have done many times before) of the Vision Pro may be the catalyst that soon makes it more common. Imagine navigating an event through AR, engaging with an AI photo booth to create your custom avatar, and using that avatar to interact with vendors and other attendees throughout the event to score points, download apps, network, and much more. I digress.
Aside from the not-so-distant dystopian future where everyone views the world through headsets, AI photo booths are going to get faster, more accurate, and cheaper. The technology has already advanced drastically just in the last year.
Here are some of the trends we’re tracking and predictions we’re bullish on:
- Faster processing reducing AI image generation time
- Cheaper processing reducing costs
- New levels of accuracy, and not just in 2D
- More ‘bolt-on’ experiences to enhance the resulting content (as an example we’ve tested tech that can take your AI portrait and animate it to sing a popular song, quite realistically)
- More extraordinary outputs accurately placing people in truly immersive content
- AI generated video, elaborated on in the next section
- AI suites of services that run the event user experience (avatar and profile creation + personalized scheduling + enhanced networking)
We’re quite excited and heavily invested in the future of photo booth technology at Snapbar, and undeniably AI will be a big part of it. We’ll continue to test the boundaries and new technologies to delivery cutting edge experiences to our clients!
Coming Soon: AI Video Booths
The early version of Snapshot AI Video is actually available now! But it’s a bit different from traditional video we think of - it’s more akin to stop-motion by stringing together a series of AI generated images into a progressive video. Like a new-school flip book:
It’s still very cool and each image is generated from scratch using the same kind of prompt engineering. On top of that, you can build storyline narrative and visual effects into the prompt to drive the progression. Get in touch with us if you’re interested in booking this kind of experience.
AI Video Booths
We’ve seen video replace static content many times throughout history - including with photo booths, introducing 360 video and unique panning experiences to replace static photos over the last several years. It will be no different with AI, except you won’t need complex mechanical structures to create the resulting effect.
As of recently, OpenAI recently released initial footage from their new text to video model - Sora. And they blew everyone’s minds:
“Sora does not merely churn out videos that fulfill the demands of the prompts, but does so in a way that shows an emergent grasp of cinematic grammar.”
WIRED: OpenAI’s Sora Turns AI Prompts Into Photorealistic Videos
“Sora shows how quickly AI generated video is evolving, coming just a year and a half after Meta and Google teased similar research projects featuring short, low-resolution videos.”
Bloomberg: OpenAI’s Sora Video Generator is Impressive, But Not Ready for Prime Time
“The tech is not there yet, but generative video has gone from zero to Sora in just 18 months.”
MIT Technology Review: OpenAI teases an amazing new generative video model called Sora
Considering that just one year ago our minds were getting blown by ChatGPT (and an early version of it at that), it’s wild to consider what the future holds for AI generated video. Looking forward to the image to video AI models in the works, here’s a short list of potential ideas to get the juices flowing:
- User-Generated Commercials: Attendees create their own commercials for a product or brand
- Customized Greetings: Offering attendees the opportunity to create personalized video greetings or messages (based on a picture and a text prompt), which can be branded and shared
- Event Highlights Reel: Utilizing AI (+ facial recognition tech) to compile personalized highlight reels for attendees, featuring moments they were captured in at the event
- Virtual Business Cards: Enabling attendees to create short introduction videos, acting as virtual business cards for networking
- Product Launches: Creating an immersive experience for new product launches, where attendees can interact with the product features through video
- Brand Stories: Allowing attendees to contribute to a collective brand story or campaign through their own video segments
- Influencer Collaborations: Partnering with influencers to create videos with both them and the user together, leveraging their reach for brand exposure
It’s not here yet, but it’s not far away. With the flexibility of the Snapshot platform, you will soon be able to build custom event experiences that combine candid user photography, AI generated portraits and collateral, and AI generated videos - all tied to one central theme or message.
Final thoughts...
AI technology is complex, convoluted, and moving very fast. This is our business and it’s hard to keep up, so we recognize the challenge our clients face leveraging the technology to deliver exciting experiences while adhering to corporate guidelines. Hopefully this article provides a brief depth of understanding of some limitations, and more importantly, the vast possibilities for what can be created!
Whether an AI photo booth experience is a small part of a much larger event, or the central hook in a global marketing campaign, we’re excited to work with you to create something amazing.
Please contact us to provide comments, ask questions, or reach out about your own Snapshot AI activation.
🤓 Want to go even deeper?
Check out these recommended resources from our CTO
- A renown Prompt Engineering Guide: https://www.promptingguide.ai/
- Andrew Ng’s “Generative AI for Everyone” course: https://www.deeplearning.ai/courses/generative-ai-for-everyone/
- AI Explained’s YouTube channel is the best sources for updates in the GenAI / LLM/LIM/LVM industry: https://www.youtube.com/@aiexplained-official