VBA - scraping websites videos | Excel VBA Part 47.3 - Internet Explorer vs XML HTTP Request

Posted by Andrew Gould on 03 June 2019


This video shows you how to use VBA to scrape websites using Internet Explorer and XML HTTP Requests. You'll learn how to reference the correct object libraries, how to create an XML HTTP Request and capture the response text. You'll compare the performance of the XML HTTP Request with Internet Explorer by scraping a website of gambling odds and parsing an HTML table, writing the results to a new Excel worksheet.

This video has the following accompanying files:

File name Type Description
Internet Explorer vs XML HTTP Request.xlsm Excel workbook with macros

Click to download a zipped copy of the above files.

There are no exercises for this video.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 3 threads Add post
23 Jul 21 at 02:48

Hello, I've been watching your youtube videos and from them I managed to create a scraping to remove some information from the site https://www.racingpost.com/, using the XML HTTP Request method.

But I came across a problem that I couldn't overcome, I access the link of a specific race, for example:
https://www.racingpost.com/racecards/2/ascot/2021-07-23/788075

So I just want to collect the last race hyperlink for each runner on the card, which in this case is 8.
The code has the following configuration:

</thead>    <tbody class="RC-runnerForm__body" data-test-selector="RC-runnerFormBody">
                    <tr class="ui-table__row RC-runnerForm__row js-form-row" data-test-selector="RC-runnerFormRow">
    <td class="ui-table__cell RC-cell RC-runnerForm__bodyCell RC-runnerForm__linkCell">
        <a class="ui-link ui-link_table js-popupLink RC-runnerForm__link js-form-hoverLink" href="/results/15/doncaster/2021-07-02/786225" title="Visit attheraces.com Fillies' Handicap" target="_blank" data-test-selector="RC-runnerFormLink__results">
            02Jul21        </a>
        <a class="ui-link ui-link_table js-popupLink RC-runnerForm__condLink_xn RC-runnerForm__link js-form-hoverLink hidden-sm-up truncate" href="/results/15/doncaster/2021-07-02/786225" title="left-handed, galloping track" target="_blank" data-test-selector="RC-runnerFormLink__results">
            Don 5 GF 4HcF 4K        </a>

 

I couldn't create a path to get to the tag<a and get access to the "href".
Could you clarify for me which variable I declare as MSHTML.IHTMLElementCollection and which I declare just as MSHTML.IHTMLElement, and how I Set to walk to the tag<a ?

 

If you can help me I appreciate it.

 

23 Jul 21 at 07:33

Hi William,

In general, you use an IHTMLElementCollection when you're trying to return a reference to multiple elements (all the rows in a table for example), and an IHTMLElement when you're trying to reference a single element (the table which contains the rows for example).

In this example it might be better to use some specific VBA classes to refer to particular types of element rather than the generic interfaces.

It looks as though the section you've referred to above is contained in a table which begins with this definition:

<table
    class="RC-runnerForm RC-runnerForm_hidden"

A good starting point would be to refer to that table using its class. You can then loop through all of the table rows, find the <a> elements in each row and extract the href. Here's the basic pattern for the code, but I'll leave the details up to you

Sub ScrapeRacingPostUsingXMLHTTP()

    Dim XMLRequest As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLTable As MSHTML.HTMLTable
    Dim HTMLTableRow As MSHTML.HTMLTableRow
    Dim HTMLAnchor As MSHTML.HTMLAnchorElement
    
    XMLRequest.Open "GET", "https://www.racingpost.com/racecards/2/ascot/2021-07-23/788075", False
    XMLRequest.send
    
    If XMLRequest.Status <> 200 Then
        MsgBox XMLRequest.Status & " - " & XMLRequest.statusText
        Exit Sub
    End If
    
    HTMLDoc.body.innerHTML = XMLRequest.responseText
    
    Set HTMLTable = HTMLDoc.getElementsByClassName("RC-runnerForm")(0)
    
    Debug.Print HTMLTable.className
    Debug.Print HTMLTable.Rows.Length
    
    For Each HTMLTableRow In HTMLTable.Rows
    
        Debug.Print HTMLTableRow.Cells.Length
        
        For Each HTMLAnchor In HTMLTableRow.getElementsByTagName("a")
            Debug.Print HTMLAnchor.href
        Next HTMLAnchor
        
    Next HTMLTableRow
    
End Sub

23 Jul 21 at 20:00

Thank you so much, I finally got it.
It's ready!

 

Sub ScrapeRacingPostUsingXMLHTTP()

    Dim XMLRequest As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLTable As MSHTML.HTMLTable
    Dim HTMLTableRow As MSHTML.HTMLTableRow
    Dim HTMLAnchor As MSHTML.HTMLAnchorElement
    Const url As String = "https://www.racingpost.com"
    
    XMLRequest.Open "GET", "https://www.racingpost.com/racecards/2/ascot/2021-07-23/788075", False
    XMLRequest.send
    
    If XMLRequest.Status <> 200 Then
        MsgBox XMLRequest.Status & " - " & XMLRequest.StatusText
        Exit Sub
    End If
    
    HTMLDoc.body.innerHTML = XMLRequest.responseText
    
    For Each HTMLTable In HTMLDoc.getElementsByClassName("RC-runnerForm")
    
    Set HTMLTableRow = HTMLTable.Rows.Item(1)
        
     Set HTMLAnchor = HTMLTableRow.getElementsByTagName("a").Item(0)
            Nexthref = HTMLAnchor.href
            Nexturl = url & Mid(HTMLAnchor.href, InStr(HTMLAnchor.href, ":") + 1)
    Debug.Print Nexturl
            
        
    Next HTMLTable
        
    
    
End Sub
 

Congratulations on the content available on YouTube, thank you very much.

Andrew G  
25 Jul 21 at 07:31

Great stuff, happy it helped!

09 Oct 19 at 21:21

Dear Sir  -- when running the following code I get the VBA error
>>> Run Time error '91'
>>> Object variable or With block variable not set

Sub ScrapeOddsUsingXMLHTTP()

    Dim XMLRequest As New MSXML2.XMLHTTP60
    Dim HTMLDoc As MSHTML.HTMLDocument
    Dim HTMLDiv As MSHTML.IHTMLElement
    Dim HTMLTable As MSHTML.IHTMLElement
    
    XMLRequest.Open "GET", "https://www.oddschecker.com/golf/houston-open/winner", False
    XMLRequest.send
    
    If XMLRequest.Status <> 200 Then
        MsgBox XMLRequest.Status & " - " & XMLRequest.statusText
        Exit Sub
    End If
    
    HTMLDoc.body.innerHTML = XMLRequest.responseText   ' <== Code breaks down at this point
    
    Set HTMLDiv = HTMLDoc.getElementById("oddsTableContainer")
    Set HTMLTable = HTMLDiv.getElementsByTagName("table")(0)
    
    Debug.Print HTMLTable.className
End Sub

Any suggestions?  Thank you.

11 Oct 19 at 10:26

Hi,

You haven't created a new instance of the HTMLDocument class before attempting to change its .body.innerHTML property.

You can do this either by adding this line to your code:

    Set HTMLDoc = New MSHTML.HTMLDocument

Or by changing the variable declaration like so:

    Dim HTMLDoc As New MSHTML.HTMLDocument

I hope that helps!

18 Jun 19 at 09:25

Dear Andrew,

Has anything changed  in that web-site (https://www.oddschecker.com.....) since you pasted this video?

Because no matter how I try to run the macro using IE or HTTP Request result is the same (slightly different but). 

IE gives me following messages:

Can’t reach this page

•Make sure the web address https://www.oddschecker.com is correct
•Search for this site on Bing
•Refresh the page

More information  
Error Code: INET_E_DOWNLOAD_FAILURE


Can’t connect securely to this page

This might be because the site uses outdated or unsafe TLS security settings. If this keeps happening, try contacting the website’s owner.

Your TLS security settings aren’t set to the defaults, which could also be causing this error.

Try this:
•Go back to the last page

and HTTP Request shows me that funny msgbox:

Run-time error '-2146697208(800c0008)':

The download of the specified resource has failed.

I tryed to use my code that I wrote watching your lesson or when I ran your code that I downloaded from your web-site.

Is anything wrong with my machine or there's something else I do not understand. 

Dear Andrew,

Thatk you so much

Sincerily yours

Alexander

18 Jun 19 at 13:10

Hi Alexander,

I'm not sure why you're receiving this error message but it seems that you're not the only person to experience it.  There are some possible solutions listed in these forums

https://answers.microsoft.com/en-us/windows/forum/windows_10-networking/cant-connect-securely-to-this-page-site-uses/5f1e1e7c-dc02-434c-8ffb-83c3e6e8b512

https://www.tenforums.com/browsers-email/90236-microsoft-edge-tls-security.html

Hopefully one of those will help!