Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos
Catch up on one of our webinars
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
397 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Some other pages relevant to these blogs include:
You can also book hourly online consultancy for your time zone with one of our 7 expert trainers!
|
Comparing how well the main AI tools deal with 5 difficult challenges Part two of a seven-part series of blogs |
|---|
|
In what is becoming almost an annual exercise we revisit the main AI tools to see which (if any) has the edge in accomplishing 5 challenging tasks, each designed to test a different aspect of the use of AI.
|
In this blog
This test involved uploading 10 Outlook junk mail messages and asking each AI tool to say who they were from, what they were about and how spammy each is, before giving an opinion on which message I should deal with first.
This seemed faultless to me:

ChatGPT has done a good job of summarising each email and presenting the information in a readable form.
Claude took the longest , but produced excellent results:

Claude's results are similar to ChatGPT's, although I prefer Claude's formatting.
Copilot lost a lot of points for being the only AI tool which limited the number of files that I could upload (to 3). On the plus side, it was lightning fast, and the answers seem accurate. Well, almost:

The sender name for the second email is dfifferent from ChatGPT's and Claude's. Having now inspected the email, I can see that it was sent by a Lithuanian company called UAB Convenity on behalf of Language Bridge Solutions, so I think ChatGPT's and Claude's answer is more accurate.
Gemini's results initially seemed impressive, until you look at the authors chosen:

4 of the emails are attributed to me, despte the fact that I'm not in the habit of sending out spam emaiils to myself (or anyone else, for that matter).
I also don't like the fact that Gemini has tagged 3 of the emails as useful!
For this test there seem to be two clear winners and two clear losers:
Who | Score out of ten | Why |
|---|---|---|
ChatGPT | 9 | I couldn't see any fault with this (apart from the fact that it was slow to answer). |
Claude | 9 | Another great answer, but another slow one too. |
Copilot | 7 | I could only upload 3 files, and Copilot didn't do well for one of the authors. |
Gemini | 7 | Although this was the fastest answer, it was also the worst (the authors were wrong). |
The clear takeaway from the above is that it's better for an AI tool to take longer, and think about the problem more (the same probably applies to humans too). It's a slightly unfair comparison in this respect, although each AI tool chose which model it would use for its answer.
| Parts of this blog |
|---|
|
Some other pages relevant to these blogs include:
You can also book hourly online consultancy for your time zone with one of our 7 expert trainers!
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
6 Bevis Marks
LONDON
EC3A 7BA
c/o Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2026. All Rights Reserved.