Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
559 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Posted by Andrew Gould on 03 June 2019
This video shows you how to use VBA to scrape websites using Internet Explorer and XML HTTP Requests. You'll learn how to reference the correct object libraries, how to create an XML HTTP Request and capture the response text. You'll compare the performance of the XML HTTP Request with Internet Explorer by scraping a website of gambling odds and parsing an HTML table, writing the results to a new Excel worksheet.
See our full range of VBA training resources, or test your knowledge of VBA with one of our VBA skills assessment tests.
This video has the following accompanying files:
File name | Type | Description |
---|---|---|
Internet Explorer vs XML HTTP Request.xlsm | Excel workbook with macros |
Click to download a zipped copy of the above files.
There are no exercises for this video.
You can increase the size of your video to make it fill the screen like this:
Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown above.
When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.
To improve the quality of a video, first click on the Settings icon:
Make sure yoiu're playing your video so that the icons shown above appear, then click on this gear icon.
Choose to change the video quality:
Click as above to change your video quality.
The higher the number you choose, the better will be your video quality (but the slower the connection speed):
Don't choose the HD option shown unless your connection speed is fast enough to support it!
Is your Wise Owl speaking too slowly (or too quickly)? You can also use the Settings menu above to change your playback speed.
From: | William |
When: | 23 Jul 21 at 02:48 |
Hello, I've been watching your youtube videos and from them I managed to create a scraping to remove some information from the site https://www.racingpost.com/, using the XML HTTP Request method.
But I came across a problem that I couldn't overcome, I access the link of a specific race, for example:
https://www.racingpost.com/racecards/2/ascot/2021-07-23/788075
So I just want to collect the last race hyperlink for each runner on the card, which in this case is 8.
The code has the following configuration:
</thead> <tbody class="RC-runnerForm__body" data-test-selector="RC-runnerFormBody">
<tr class="ui-table__row RC-runnerForm__row js-form-row" data-test-selector="RC-runnerFormRow">
<td class="ui-table__cell RC-cell RC-runnerForm__bodyCell RC-runnerForm__linkCell">
<a class="ui-link ui-link_table js-popupLink RC-runnerForm__link js-form-hoverLink" href="/results/15/doncaster/2021-07-02/786225" title="Visit attheraces.com Fillies' Handicap" target="_blank" data-test-selector="RC-runnerFormLink__results">
02Jul21 </a>
<a class="ui-link ui-link_table js-popupLink RC-runnerForm__condLink_xn RC-runnerForm__link js-form-hoverLink hidden-sm-up truncate" href="/results/15/doncaster/2021-07-02/786225" title="left-handed, galloping track" target="_blank" data-test-selector="RC-runnerFormLink__results">
Don 5 GF 4HcF 4K </a>
I couldn't create a path to get to the tag<a and get access to the "href".
Could you clarify for me which variable I declare as MSHTML.IHTMLElementCollection and which I declare just as MSHTML.IHTMLElement, and how I Set to walk to the tag<a ?
If you can help me I appreciate it.
From: | Andrew G |
When: | 23 Jul 21 at 07:33 |
Hi William,
In general, you use an IHTMLElementCollection when you're trying to return a reference to multiple elements (all the rows in a table for example), and an IHTMLElement when you're trying to reference a single element (the table which contains the rows for example).
In this example it might be better to use some specific VBA classes to refer to particular types of element rather than the generic interfaces.
It looks as though the section you've referred to above is contained in a table which begins with this definition:
<table
class="RC-runnerForm RC-runnerForm_hidden"
A good starting point would be to refer to that table using its class. You can then loop through all of the table rows, find the <a> elements in each row and extract the href. Here's the basic pattern for the code, but I'll leave the details up to you
Sub ScrapeRacingPostUsingXMLHTTP()
Dim XMLRequest As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLTable As MSHTML.HTMLTable
Dim HTMLTableRow As MSHTML.HTMLTableRow
Dim HTMLAnchor As MSHTML.HTMLAnchorElement
XMLRequest.Open "GET", "https://www.racingpost.com/racecards/2/ascot/2021-07-23/788075", False
XMLRequest.send
If XMLRequest.Status <> 200 Then
MsgBox XMLRequest.Status & " - " & XMLRequest.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLRequest.responseText
Set HTMLTable = HTMLDoc.getElementsByClassName("RC-runnerForm")(0)
Debug.Print HTMLTable.className
Debug.Print HTMLTable.Rows.Length
For Each HTMLTableRow In HTMLTable.Rows
Debug.Print HTMLTableRow.Cells.Length
For Each HTMLAnchor In HTMLTableRow.getElementsByTagName("a")
Debug.Print HTMLAnchor.href
Next HTMLAnchor
Next HTMLTableRow
End Sub
From: | William |
When: | 23 Jul 21 at 20:00 |
Thank you so much, I finally got it.
It's ready!
Sub ScrapeRacingPostUsingXMLHTTP()
Dim XMLRequest As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLTable As MSHTML.HTMLTable
Dim HTMLTableRow As MSHTML.HTMLTableRow
Dim HTMLAnchor As MSHTML.HTMLAnchorElement
Const url As String = "https://www.racingpost.com"
XMLRequest.Open "GET", "https://www.racingpost.com/racecards/2/ascot/2021-07-23/788075", False
XMLRequest.send
If XMLRequest.Status <> 200 Then
MsgBox XMLRequest.Status & " - " & XMLRequest.StatusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLRequest.responseText
For Each HTMLTable In HTMLDoc.getElementsByClassName("RC-runnerForm")
Set HTMLTableRow = HTMLTable.Rows.Item(1)
Set HTMLAnchor = HTMLTableRow.getElementsByTagName("a").Item(0)
Nexthref = HTMLAnchor.href
Nexturl = url & Mid(HTMLAnchor.href, InStr(HTMLAnchor.href, ":") + 1)
Debug.Print Nexturl
Next HTMLTable
End Sub
Congratulations on the content available on YouTube, thank you very much.
From: | Andrew G |
When: | 25 Jul 21 at 07:31 |
Great stuff, happy it helped!
From: | M G Joshi |
When: | 09 Oct 19 at 21:21 |
Dear Sir -- when running the following code I get the VBA error
>>> Run Time error '91'
>>> Object variable or With block variable not set
Sub ScrapeOddsUsingXMLHTTP()
Dim XMLRequest As New MSXML2.XMLHTTP60
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLDiv As MSHTML.IHTMLElement
Dim HTMLTable As MSHTML.IHTMLElement
XMLRequest.Open "GET", "https://www.oddschecker.com/golf/houston-open/winner", False
XMLRequest.send
If XMLRequest.Status <> 200 Then
MsgBox XMLRequest.Status & " - " & XMLRequest.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLRequest.responseText ' <== Code breaks down at this point
Set HTMLDiv = HTMLDoc.getElementById("oddsTableContainer")
Set HTMLTable = HTMLDiv.getElementsByTagName("table")(0)
Debug.Print HTMLTable.className
End Sub
Any suggestions? Thank you.
From: | Andrew G |
When: | 11 Oct 19 at 10:26 |
Hi,
You haven't created a new instance of the HTMLDocument class before attempting to change its .body.innerHTML property.
You can do this either by adding this line to your code:
Set HTMLDoc = New MSHTML.HTMLDocument
Or by changing the variable declaration like so:
Dim HTMLDoc As New MSHTML.HTMLDocument
I hope that helps!
From: | AceAxe |
When: | 18 Jun 19 at 09:25 |
Dear Andrew,
Has anything changed in that web-site (https://www.oddschecker.com.....) since you pasted this video?
Because no matter how I try to run the macro using IE or HTTP Request result is the same (slightly different but).
IE gives me following messages:
Can’t reach this page
•Make sure the web address https://www.oddschecker.com is correct
•Search for this site on Bing
•Refresh the page
More information
Error Code: INET_E_DOWNLOAD_FAILURE
Can’t connect securely to this page
This might be because the site uses outdated or unsafe TLS security settings. If this keeps happening, try contacting the website’s owner.
Your TLS security settings aren’t set to the defaults, which could also be causing this error.
Try this:
•Go back to the last page
and HTTP Request shows me that funny msgbox:
Run-time error '-2146697208(800c0008)':
The download of the specified resource has failed.
I tryed to use my code that I wrote watching your lesson or when I ran your code that I downloaded from your web-site.
Is anything wrong with my machine or there's something else I do not understand.
Dear Andrew,
Thatk you so much
Sincerily yours
Alexander
From: | Andrew G |
When: | 18 Jun 19 at 13:10 |
Hi Alexander,
I'm not sure why you're receiving this error message but it seems that you're not the only person to experience it. There are some possible solutions listed in these forums
https://www.tenforums.com/browsers-email/90236-microsoft-edge-tls-security.html
Hopefully one of those will help!
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.