OpenAI's GPT-4 with vision still has flaws, paper reveals

Kyle Wiggers

September 26, 2023 at 11:09 AM·4 min read

When OpenAI first unveiled GPT-4, its flagship text-generating AI model, the company touted the model's multimodality -- in other words, its ability to understand the context of images as well as text. GPT-4 could caption -- and even interpret -- relatively complex images, OpenAI said, for example identifying a Lightning Cable adapter from a picture of a plugged-in iPhone.

But since GPT-4's announcement in late March, OpenAI has held back the model's image features, reportedly on fears about abuse and privacy issues. Until recently, the exact nature of those fears remained a mystery. But early this week, OpenAI published a technical paper detailing its work to mitigate the more problematic aspects of GPT-4's image-analyzing tools.

To date, GPT-4 with vision, abbreviated "GPT-4V" by OpenAI internally, has only been used regularly by a few thousand users of Be My Eyes, an app to help low-vision and blind people navigate the environments around them. Over the past few months, however, OpenAI also began to engage with "red teamers" to probe the model for signs of unintended behavior, according to the paper.

In the paper, OpenAI claims that it's implemented safeguards to prevent GPT-4V from being used in malicious ways, like breaking CAPTCHAs (the anti-spam tool found on many web forms), identifying a person or estimating their age or race and drawing conclusions based on information that's not present in a photo. OpenAI also says that it has worked to curb GPT-4V's more harmful biases, particularly those that relate to a person's physical appearance and gender or ethnicity.

But as with all AI models, there's only so much that safeguards can do.

The paper reveals that GPT-4V sometimes struggles to make the right inferences, for example mistakenly combining two strings of text in an image to create a made-up term. Like the base GPT-4, GPT-4V is prone to hallucinating, or inventing facts in an authoritative tone. And it's not above missing text or characters, overlooking mathematical symbols and failing to recognize rather obvious objects and place settings.

Image Credits: OpenAI

It's not surprising, then, that in unambiguous, clear terms, OpenAI says GPT-4V is not to be used to spot dangerous substances or chemicals in images. (This reporter hadn't even thought of the use case, but apparently, the prospect is concerning enough to OpenAI that the company felt the need to call it out.) Red teamers found that, while the model occasionally correctly identifies poisonous foods like toxic mushrooms, it misidentifies substances such as fentanyl, carfentanil and cocaine from images of their chemical structures.

When applied to the medical imaging domain, GPT-4V fares no better, sometimes giving the wrong responses for the same question that it answered correctly in a previous context. It's also unaware of standard practices like viewing imaging scans as if the patient is facing you (meaning the right side on the image corresponds to the left side of the patient), which leads it to misdiagnose of any number of conditions.

Image Credits: OpenAI

Elsewhere, OpenAI cautions, GPT-4V doesn't understand the nuances of certain hate symbols -- for instance missing the modern meaning of the Templar Cross (white supremacy) in the U.S. More bizarrely, and perhaps a symptom of its hallucinatory tendencies, GPT-4V was observed to make songs or poems praising certain hate figures or groups when provided a picture of them even when the figures or groups weren't explicitly named.

GPT-4V also discriminates against certain sexes and body types -- albeit only when OpenAI's production safeguards are disabled. OpenAI writes that, in one test, when prompted to give advice to a woman pictured in a bathing suit, GPT-4V gave answers relating almost entirely to the woman's body weight and the concept of body positivity. One assumes that wouldn't have been the case if the image were of a man.

Image Credits: OpenAI

Judging by the paper's caveated language, GPT-4V remains very much a work in progress -- a few steps short of what OpenAI might've originally envisioned. In many cases, the company was forced to implement overly strict safeguards to prevent the model from spewing toxicity or misinformation, or compromising a person's privacy.

OpenAI claims that it's building "mitigations" and "processes" to expand the model's capabilities in a "safe" way, like allowing GPT-4V to describe faces and people without identifying those people by name. But the paper reveals that GPT-4V is no panacea, and that OpenAI has its work cut out for it.

Slate
There’s a Lot of Fighting Over Why Harris Lost. But Everyone Seems to Want to Avoid This Explanation.
I’m surprised more people aren’t saying this.
Business Insider
Trump's plan for Social Security will help baby boomers in the short term and cut benefits for anyone younger
Trump promised to cut Social Security taxes. Experts say the plan could slash benefits for low-income retirees and younger generations.
Variety
‘SNL’ Star Michael Che Drinks on Air After Trump Victory, Shouts Out R. Kelly: ‘If White People Can Elect Their Felon, I Can Dance to Mine’
For his first “Weekend Update” segment following Donald Trump’s victory on Tuesday, co-anchor Michael Che appeared to raid his liquor cabinet. “How did I let y’all convince me that rural Pennsylvania would pick the Jamaican Indian lady?” Che lamented before reaching under his desk for a glass of brown liquid. “Clearly, I’ve been spending too …
Business Insider
I moved my family to Puerto Rico for a job. They were miserable — and we returned to Pennsylvania after just a year.
My family thought moving from Pennsylvania to Puerto Rico would be great. But when we moved, they were miserable and we realized we'd made a mistake.
PureWow
Kate Middleton's Body Language at Today's Somber Outing Spoke Volumes
Whether you're a celebrity or not, your body language doesn't lie. But if you are a celebrity, there are cameras capturing your every move, meaning your body language is even more likely to give away your true feelings. Take, for example, Taylor Swift whose recent shift in body language spoke volumes about her relationship with Travis Kelce. Or Heidi Klum, whose behavior alongside husband Tom Kaulitz has been incredibly revealing. And the same goes for the royals. From King Charles's unprecedent
The Hill
Gallego defeats Lake in Arizona Senate race
Rep. Ruben Gallego (D-Ariz.) is projected to beat Republican Kari Lake in a consequential race for a seat in the Senate, dealing the former local news anchor her second straight electoral loss, according to Decision Desk HQ. Gallego, who has served in the House for nearly a decade representing a Phoenix-based seat, will succeed outgoing…
TheBlast
Taylor Swift Faces Backlash Over NYC Date Night Outfit: 'The Worst Style'
Pop star Taylor Swift was spotted out and about with a newly single Zoë Kravitz in New York City on Friday night.
HuffPost
My Mom Said 1 Word On Her Deathbed That Made Me Look At My Life In A Way I Never Had Before
"I let the word wash over me. I wanted Mom to know I’d heard her, and I repeated back the same word. When I said it, I felt as if I was signing a contract between us."
The Daily Beast
The Mysterious Erasure of ‘Mamacita’ Kimberly Guilfoyle
Donald Trump Jr.’s fiancée Kimberly Guilfoyle may have been the only woman in the world more somber than Vice President Kamala Harris following Tuesday’s election results—and all the contouring in the world couldn’t hide it on her face. Though dressed the part in MAGA red, Guilfoyle appeared to be the odd woman out as she joined the Trump clan onstage Wednesday to celebrate President-elect Donald Trump’s victory in Palm Beach, Florida. And Don Jr. seemed to be making sure she felt it. The scene:
The Independent
‘Ice maiden’ Susie Wiles’s demands for accepting top Trump White House role revealed
Wiles – described by Trump himself as the ‘ice maiden’ – was appointed as his chief of staff on Thursday
The Cool Down
New York announces new law banning common hotel amenity: 'When we do something important, it has potential to be looked at as a model'
"New York is the size of many countries, so New York's role is watched."
Complex
NBA Analyst Michelle Beadle Accidentally Uses N-Word During Broadcast, Fans React
The internet is split on whether she should face backlash for using the racial slur.
USA TODAY Sports
College football top five gets overhaul as Georgia, Miami both tumble in US LBM Coaches Poll
Upsets in Week 11 have caused major change to the top five of the US LBM Coaches Poll with Georgia and Miami both tumbling after losses.
PureWow
Buckingham Palace Issues Surprise Health Update on Queen Camilla Ahead of Today’s Kate Middleton Appearance
It appears that royal fans will have to wait a bit longer to see Queen Camilla and Princess Catherine reunited, as the palace announced earlier today that the Queen would no longer be attending Remembrance events this weekend due to a chest infection. She had been expected to make an appearance alongside as King Charles, Prince William, Princess Kate, Prince Edward, Duchess Sophie of Edinburgh and Princess Anne for the Festival of Remembrance at Royal Albert Hall, but instead will stay at home t
CNN
After Trump’s win, some women are considering the 4B movement
Young liberal women across TikTok and Instagram are discussing and sharing information about the South Korean feminist movement, in which straight women refuse to marry, have children, date or have sex with men.
Associated Press
Trump's shunning of transition planning may have severe consequences, governance group says
A good-governance group is warning of severe consequences if President-elect Donald Trump continues to steer clear of formal transition planning with the Biden administration — inaction that it says is already limiting the federal government’s ability to provide security clearances and briefings to the incoming administration. Without the planning, says Max Stier, president and CEO of the nonprofit Partnership for Public Service, “it would not be possible" to "be ready to govern on day one.” The president-elect's transition is being led by Cantor Fitzgerald CEO Howard Lutnick and Linda McMahon, the former wrestling executive who led the Small Business Administration during Trump’s first term.
Business Insider
Iran has developed fentanyl-based chemical weapons
Iran seems to have weaponized the pharmaceutical fentanyl. It may have given these chemical weapons to Hamas and Hezbollah to use against Israel.
Broncos Wire
Twitter reacts to Patrick Mahomes asking ref to help him vs. Broncos
In the second quarter of Sunday's game against the Denver Broncos, Kansas City Chiefs quarterback Patrick Mahomes was tackled by safety Brandon Jones on a two-yard run that was negated by a penalty. After the play, Mahomes jokingly (?
KRQE Albuquerque
City of Las Vegas recovering after unexpected, record-breaking snow
City Manager Tim Montgomery said the snow stopped early Thursday evening, allowing crews to manage the high volume of snowfall.
NBC News
Every uncalled race left in the fight for control in Washington
With votes still being counted, here's a look at the key races that remain uncalled by the NBC News Decision Desk (read more on how those races are called on NBCNews.com).

Recommended Stories