Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos
Catch up on one of our webinars
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
397 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Some other pages relevant to these blogs include:
You can also book hourly online consultancy for your time zone with one of our 7 expert trainers!
|
Comparing how well the main AI tools deal with 5 difficult challenges Part three of a seven-part series of blogs |
|---|
|
In what is becoming almost an annual exercise we revisit the main AI tools to see which (if any) has the edge in accomplishing 5 challenging tasks, each designed to test a different aspect of the use of AI.
|
In this blog
The range of results from this test was much wider!
I think ChatGPT did a great job with this:

I particularly like the dragon and the flyng pig!
ChatGPT also got the pictures just right for the audience. I'm impressed.
Claude produced two separate cards with well-chosen images, but the drawing and labels leave a bit to be desired:

Claude has gone for child-like drawings, rather than drawings for a child.
There are a few anomalies:

This isn't a great representation of a service station.
It's worth also noting that the ambulance picture is completely missing ...
This was a good answer, although irritatingly I only get one image at a time:

The Overhoay sign label is wrong, but everything else seems really good.
Copilot then explains that it can't create more than one image at a time, but goes on to give ideas for the second one with lots of useful follow-up text:

Copilot even then offers to personalise the cards with the children's names.
I think Copilot got the pictures and audience just right, but it's a shame it could only do one picture at a time.
Gemini's cards were by far the quickest - and it showed:

Superficially everything seems good, but ...
Gemini has created 4 cards, not 2, with subtle differences between each of the two pairs. Most of the pictures make sense, but not all:

The legend isn't a great description of the picture.
I also don't like the lack of subtlety in the labelling:

I think the children can work out this is meant to be impossible and funny.
On the plus side, Gemini was by some way the quickest AI tool for this task.
This is hard to judge - I've done my best!
Who | Score out of ten | Why |
|---|---|---|
ChatGPT | 9 | The only answer which you could use without modification - good pictures and good labelling. |
Claude | 6 | A good choice of pictures, but poor drawings and poor labelling (if this had been done by a human, their work called be called sloppy). |
Copilot | 8 | Only one set of images at a time with a few glitches, but good choice of pictures and good labelling on the whole. |
Gemini | 7 | Commendably quick, but with lots of mistakes (and also the only one not to get the audience right, I thought). |
What is really amazing about this is how far AI has come in a couple of years. I used to do a variation of this prompt on a course because I knew the answers would generate so much laughter, but sadly this is yet another instance where AI tools have learned not to hallucinate.
| Parts of this blog |
|---|
|
Some other pages relevant to these blogs include:
You can also book hourly online consultancy for your time zone with one of our 7 expert trainers!
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
6 Bevis Marks
LONDON
EC3A 7BA
c/o Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2026. All Rights Reserved.