VBA - scraping websites videos | Excel VBA Part 47.1 - Browsing to Websites and Scraping Web Page

Posted by Andrew Gould on 14 November 2016

Scraping a web page in VBA involves getting a reference to an HTML document and then processing its elements to extract the useful parts and write them to another location in a readable format. This video shows you how to do that using both Internet Explorer and basic XML HTTP requests. You'll learn a bit about the Document Object Model, as well as how to identify HTML elements by name and by tag type. You'll also see how to loop through various collections of HTML elements and their children using an example involving exchange rate tables.

This video has the following accompanying files:

File name Type Description
Scraping Web Pages.xlsm Excel workbook with macros

Click to download a zipped copy of the above files.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 23 threads Add post
29 May 21 at 06:50

Hi Andrew,

I need to extract the data from multiple HTML files into excel using VBA. need to extract paramname and Paramvalue to excel. can you help me out this with a code. 

 

29 May 21 at 07:29

Hi Deepak!

Is there something specific that isn't working for you? What code have you written so far?

29 May 21 at 13:19

Hi Andrew,

           The HTML files are present in local folder, Till now i just done a code to locate the files in the folder, after that dont have a clue to proceed further.

         Could you please help me on this with the code to extract the data. TIA

Andrew G  
29 May 21 at 19:25

Hi Deepak! Can I offer a couple of links which might help you get the HTML file loaded into an object:

https://stackoverflow.com/a/59957320 

https://stackoverflow.com/questions/51114836/parse-saved-html-file-vba

From that point you can then work with the HTML document object just as with the websites shown in the video. You'll need to find elements, loop through the ones you're interested in and extract the InnerText of the elements to get the values you need. These techniques are shown in the video, you'll just need to adapt them to your specific example.

I hope it helps!

22 May 21 at 17:21

I am able execute code written by WiseOwl, but i am trying execute code for site - https://fedai.org.in even i am unable to find elements by id or by name etc...

 

Tab "Rates" > Report as under

*Revaluation Rates

Lates 2 excel reports to be extracted 

Need to create folder and report name as Revaluation Rates with respective date.

getting error at object variable or withblock variable not set at HTMLButtons(0).Click because need to close pop-up screen and then need to extract the files

Option Explicit

Sub BrowseToSiteToGetXML1()

    Dim IE As New SHDocVw.InternetExplorer

    Dim XetraHTMLDoc As MSHTML.HTMLDocument

    Dim HTMLButtons As MSHTML.IHTMLElementCollection

    Dim HTMLInput As MSHTML.IHTMLElement

   

    IE.Visible = True

    IE.Navigate "https://www.fedai.org.in/"

   

    Do While IE.ReadyState <> READYSTATE_COMPLETE Or IE.Busy  'Wait until site is ready

    Loop

    Set XetraHTMLDoc = IE.Document

    Set HTMLButtons = XetraHTMLDoc.getElementsByTagName("button")

    HTMLButtons(0).Click

End Sub

23 May 21 at 15:01

Hi there! I'm sorry I don't know the answer to this one I'm afraid. You may find it easier to use Google Chrome rather than Internet Explorer. We have some videos on how to use Chrome starting at Part 57.1 in this playlist https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/

I hope it helps!

17 May 21 at 21:16

Hello, The code failed at the last line ie the code failed to populate the text box...Help Please

I am using the 64 bit version of the 2010 Office Professional Plus 

 

Option Explicit

Sub BrowseToSite()

Dim IE As SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement

IE.Visible = True
IE.Navigate "https://www.wiseowl.co.uk/"


    Do While IE.ReadyState <> READYSTATE_COMPLETE 'Pausing Excel while the website is loading
    Loop

Set HTMLDoc = IE.Document

Set HTMLInput = HTMLDoc.getElementById("what")

HTMLInput.Value = "VBA"


End Sub

17 May 21 at 21:50

Hi there!

The input box you're attempting to access doesn't have an id attribute (although it may have had at some time in the past!). The input box has both a class and a name attribute so you could do this using either:

Set HTMLInput = HTMLDoc.getElementsByName("what")(0)

Or

Set HTMLInput = HTMLDoc.getElementsByClassName("search")(0)

I hope that helps! 

17 May 21 at 22:28

Hi Andrew,

Thanks for the prompt response. Loving the WiseOwl content!! 

Unfortunately both suggestions didn’t work. The Run-time error '91'. "Object variable or with block variable not set" came up under both scenarios. The code is in debug mode and line 11 is highlighted. I made the following observations:-

1) Each time I hovered over line 3, Dim HTMLInput As MSHTML.IHTMLElement,  i see a message box stating HTMLInput = Nothing

2) Each time I hovered over Set  line 10 HTMLInput = HTMLDoc.getElementsByClassName("search")(0), i see a message box stating HTMLInput = Nothing 

3) Each time I hovered  line 5, IE.Visible = True, i see a message box stating the remote server machine does not exist or is unavailable

Do you have any recommendations on how to resolve these errors?

The Internet controls, HTML object library under VBA Project references and all macros have been enabled.

 

 

 

Andrew G  
19 May 21 at 06:06

Excellent! Happy to hear that you got it working!

18 May 21 at 16:27

Thanks again for the prompt response! The Selenium solution worked beautifully!

Andrew G  
18 May 21 at 07:38

Hmm, I'm not sure, both of the alternatives I suggested work for me so I can't replicate the problem.

If memory serves, this issue can sometimes be resolved by declaring your IE variable As InternetExplorerMedium (I believe it changes the security level of the application). 

Alternatively, I wonder if you might have more success using Google Chrome and the SeleniumBasic library to do this? We have some videos on that topic starting at part 57.1 in this list of videos https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/

 

20 Mar 21 at 16:42

Hi Andrew,

Thank you for sharing these tutorials – they are really great.

I am trying to use the search button on https://www.boerse-frankfurt.de/en to find a specific stock by using code similar to the tutorial.

I can get the text into the search box, but the functionality on the site, when you type in manually, is that a comboBox is made with the link to the matching stock.

The problems seems to be that the search button will not bring you to the link without the comboBox. Is there a way to fix this with a code?

You help will be really appreciated.

Best regards, Kim

 

Sub BrowseToSiteToGetXML()

    Dim IE As New SHDocVw.InternetExplorer

    Dim XetraHTMLDoc As MSHTML.HTMLDocument

    Dim HTMLButtons As MSHTML.IHTMLElementCollection

    Dim HTMLInput As MSHTML.IHTMLElement

   

    IE.Visible = True

    IE.navigate "https://www.boerse-frankfurt.de/en"

   

    Do While IE.ReadyState <> READYSTATE_COMPLETE Or IE.Busy  'Wait until site is ready

    Loop

   

    Set XetraHTMLDoc = IE.Document

   

    Set HTMLInput = XetraHTMLDoc.getElementById("mat-input-0")

    HTMLInput.Click

    HTMLInput.Value = "IQQH"

   

    Do While IE.ReadyState <> READYSTATE_COMPLETE Or IE.Busy  'Wait until site is ready

    Loop

   

    Set HTMLButtons = XetraHTMLDoc.getElementsByTagName("button")

    HTMLButtons(0).Click

End Sub

21 Mar 21 at 20:08

Hi Kim,

I'm not even going to pretend that I know why this is working but here are some notes before the code:

1) I'm using the SeleniumBasic library to control Google Chrome rather than IE. I suspect that you can get this to work in IE as well but Selenium is easier. We have several videos on using Selenium starting at part 57.1 on this page

2) I don't know why, but the key to getting things to update on the website you're using appears to be activating a different application window. In the code below I've done this simply by activating the Excel application window. It apparently also works by adding code to open Notepad, PowerPoint, Word, etc. As long as the browser window isn't active it appears to update. I don't know enough about the topic to understand why!

3) Even after activating a different application it seems that the link in the combobox doesn't appear on the screen and so can't be interacted with. I've used the ExecuteScript method to execute the javascript code to click the link.

4) I've added a simple loop to process all the tables on the page you land on after following the link. Each table goes into a new worksheet so you'll end up with 14 new worksheets (just to warn you!).

Here's the code (of course it relies on having Selenium set up and configured as per the videos I linked to above).

Option Explicit

Private cd As Selenium.ChromeDriver

Sub GetTablesForSelectedItem()

    Dim NameInput As Selenium.WebElement
    Dim SuggestionsList As Selenium.WebElement
    Dim Suggestions As Selenium.WebElements
    Dim AllTables As Selenium.WebElements
    Dim SingleTable As Selenium.WebElement
    
    Set cd = New Selenium.ChromeDriver
    
    cd.Start
    cd.Get "https://www.boerse-frankfurt.de/en"

    Set NameInput = cd.FindElementByCss("#mat-input-0")
    
    NameInput.SendKeys "IQQH"
    cd.Wait 2000
    
    Set SuggestionsList = cd.FindElementByCss("#mat-autocomplete-0")
    Set Suggestions = SuggestionsList.FindElementsByCss("mat-option")
    
    Debug.Print Suggestions.Count
    
    If Suggestions.Count > 0 Then
        cd.ExecuteScript _
            Script:="arguments[0].click();", _
            arguments:=Suggestions(1)
    End If
    
    AppActivate ThisWorkbook.Name
    cd.Wait 2000
    
    Set AllTables = cd.FindElementsByCss("table")
    
    Debug.Print AllTables.Count
    
    For Each SingleTable In AllTables
        SingleTable.AsTable.ToExcel Worksheets.Add.Range("A1")
    Next SingleTable
    
End Sub

If you ever work out why the site behaves that way I'd love to hear about it!

22 Mar 21 at 18:42

Thank you Andrew for the quick reply.

Selenium works beautiful :-)

The AppActivate did not work for me. It gave me a run-time error: '5'
Instead I added some code that click the decline button in the the pop-up about coockies.
Then it works for me. See code below.

Now, I will actually like to work with the HTMLDocument without opening the browser. Is that also possible with Selenium?

Best regards, Kim

Option Explicit

Private cd As Selenium.ChromeDriver

Sub GetTablesForSelectedItem()

    Dim NameInput As Selenium.WebElement
    Dim SuggestionsList As Selenium.WebElement
    Dim Suggestions As Selenium.WebElements
    Dim PopupButton As Selenium.WebElement
    Dim AllTables As Selenium.WebElements
    Dim SingleTable As Selenium.WebElement
    
    Set cd = New Selenium.ChromeDriver
    
    cd.Start
    cd.Get "https://www.boerse-frankfurt.de/en"

    Set NameInput = cd.FindElementByCss("#mat-input-0")
    
    NameInput.SendKeys "IQQH"
    cd.Wait 2000
    
    Set SuggestionsList = cd.FindElementByCss("#mat-autocomplete-0")
    Set Suggestions = SuggestionsList.FindElementsByCss("mat-option")
    
    Debug.Print Suggestions.Count
    
    If Suggestions.Count > 0 Then
        cd.ExecuteScript _
            Script:="arguments[0].click();", _
            arguments:=Suggestions(1)
    End If
    
    Set PopupButton = cd.FindElementById("cookie-hint-btn-decline")
    If PopupButton.IsDisplayed Then
        PopupButton.Click
    End If
    
    'AppActivate ThisWorkbook.Name
    cd.Wait 2000
    
    Set AllTables = cd.FindElementsByCss("table")
    
    Debug.Print AllTables.Count
    
    'For Each SingleTable In AllTables
    '    SingleTable.AsTable.ToExcel Worksheets.Add.Range("A1")
    'Next SingleTable
    
End Sub
 

 

Andrew G  
22 Mar 21 at 20:41

Hi Kim, Selenium is great isn't it?!

Funny, I had added code to click the Decline cookies button as the first thing I did but it had no effect on the ability to click the option in the combobox so I removed the code. I'm pleased you got it to work!

You can run an invisible instance of Chrome by using Headless Chrome but you'll still need the browser running in order to create all the elements that are dynamically generated by the javascript for the website you're using. Here's a basic example:

Sub GetTableOfMovies()

    Dim cd As Selenium.ChromeDriver
    Dim t As Selenium.WebElement
    
    Set cd = New Selenium.ChromeDriver
    
    cd.AddArgument "--headless"
    
    cd.Start
    cd.Get "https://en.wikipedia.org/wiki/List_of_highest-grossing_films"
    
    Set t = cd.FindElementsByCss("table")(1)
    
    t.AsTable.ToExcel Worksheets.Add.Range("A1")
    
    cd.Quit
    
End Sub

You can also send http requests without using a browser using XMLHTTP requests - we have a video on that topic which you can see here.

The problem is that all you get back is the response text from the server. If that includes javascript code that is designed to run at the client to produce the elements on the page then you have no effective way to access those elements without using a browser.

It works great as long as the response text includes the elements you want to get:

Sub MovieTableUsingXMLHTTP()

    Dim XMLRequest As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLTable As MSHTML.HTMLTable
    
    XMLRequest.Open "GET", "https://en.wikipedia.org/wiki/List_of_highest-grossing_films", False
    XMLRequest.Send
    
    If XMLRequest.Status <> 200 Then
        MsgBox XMLRequest.Status & " - " & XMLRequest.statusText
        Exit Sub
    End If
    
    HTMLDoc.body.innerHTML = XMLRequest.responseText
    
    Set HTMLTable = HTMLDoc.getElementsByTagName("table")(0)
    
    WriteTableToWorksheet HTMLTable
    
End Sub

Sub WriteTableToWorksheet(TableToProcess As MSHTML.HTMLTable)

    Dim TableRow As MSHTML.HTMLTableRow
    Dim TableCell As MSHTML.HTMLTableCell
    Dim RowNum As Long, ColNum As Long
    Dim OutputSheet As Worksheet
    
    Set OutputSheet = ThisWorkbook.Worksheets.Add
    
    For Each TableRow In TableToProcess.Rows
        RowNum = RowNum + 1
        
        For Each TableCell In TableRow.Cells
            ColNum = ColNum + 1
            OutputSheet.Cells(RowNum, ColNum).Value = TableCell.innerText
        Next TableCell
        
        ColNum = 0
        
    Next TableRow
    
End Sub

I hope that helps!

08 Mar 21 at 15:25

Hi Andrew,

You tutorial is fantastically useful! It made a once daunting task appear to be within the realms of possibility! 

However, I'm struggling to get the code to work for a specific website. The getElementsByTagName isn't picking up the table on this site. Could you let me know what I'm doing wrong? Is it because the site is .aspx? Essentially, my next step will hopefully be to get the data per row ("tr"), but I can't even get the table to work.

Currently I have this:

Sub AustralianSuperRetriever()

Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

    ' XMLPage.Open "Get", "https://x-rates.com/table/?from=GBP&amount=5", False
    XMLPage.Open "Get", "https://portal.australiansuper.com/investments-and-performance/what-we-invest-in/superannuation-premixed-investment-options/high-growth.aspx", False

XMLPage.send

HTMLDoc.body.innerHTML = XMLPage.responseText

ProcessHTMLPage HTMLDoc

End Sub

Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)


Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLTables As MSHTML.IHTMLElementCollection

Set HTMLTables = HTMLPage.getElementsByTagName("Table") '- this doesn't work

'Set HTMLTables = HTMLPage.getElementsByTagName("tr") 'this one doesn't work. TR refers to table row.

'Set HTMLTables = HTMLPage.getElementsByClassName("fn-table-feeder-trigger") '- this one works!

'Set HTMLTables = HTMLPage.getElementsByClassName("fn-table-container table-container") '- this one also seems to work

For Each HTMLTable In HTMLTables

Debug.Print HTMLTable.className, HTMLTable.ID, HTMLTable.innerText, HTMLTable.tagName, HTMLTable.Title 'noIDs

Next HTMLTable


End Sub

 

Any help will be really appreciated! 

 

Best,

08 Mar 21 at 16:42

Hi Woody!

Good try with the XMLHTTP request but sadly most of this particular page uses javscript functions to generate its content. If you check the contents of the responseText property you won't find any table elements at all. One solution is to use a web browser to allow the javascript code to generate the content and then scrape the page once this has happened. I'd recommend using the SeleniumBasic library so that you can use a modern browser like Chrome rather than IE. Here's some basic code to write the table on the page you've linked to into a new worksheet: 

Option Explicit

Private ch As Selenium.ChromeDriver

Sub ScrapeAustralianSuper()

    Dim AussieTable As Selenium.WebElement
    Dim ws As Worksheet
    
    Set ch = New Selenium.ChromeDriver
    
    ch.Start
    ch.Get "https://portal.australiansuper.com/investments-and-performance/what-we-invest-in/superannuation-premixed-investment-options/high-growth.aspx"
    
    Set AussieTable = ch.FindElementByCss(".tablesorter.complexTable")

    Set ws = ThisWorkbook.Worksheets.Add
    
    AussieTable.AsTable.ToExcel ws.Range("A1")
    
End Sub

We recently added some videos on working with SeleniumBasic starting with Part 57.1 on this page https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/

I hope that helps!

09 Mar 21 at 15:18

Hi Andrew,

Thank you so much for your quick response and great explanation! I will have a look at your tutorials on Selenium and Chrome, apply the code you gave, and give it another go! 

Thanks again!

Best,

Ward

Andrew G  
10 Mar 21 at 06:48

You're very welcome! Have fun and good luck!

07 Sep 20 at 02:46

Hi,

I'm testing this code but somehow it doesn't work for me, coming up with an Error (Expected variable or procedure, not module). Below is my code. Your hlep would be appreciated.

Sub BrowseToExchangeRatesWithQueryStringAndXML()

    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument

    XMLPage.Open "GET", "https://x-rates.com/table/?from=GBP&amount=5", False
    XMLPage.send

    HTMLDoc.body.innerHTML = XMLPage.responseText
    
    ProcessHTMLPage HTMLDoc
    
End Sub

_______________________

Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)

    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLTables As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement

    Set HTMLTables = HTMLPage.getElementsByTagName("table")

    For Each HTMLTable In HTMLTables
        Debug.Print HTMLTable.className
        
        For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
            Debug.Print vbTab & HTMLRow.innerText
        Next HTMLRow
        
    Next HTMLTable

End Sub

 

07 Sep 20 at 07:58

Hi Bakir,

It sounds as though you've given the same name to both a module and a subroutine. Try changing the module name to something different.

I hope that helps!

08 Sep 20 at 04:00

Thank you, that was the issue, I had the module name same as the sub name, by changing that, everything works smoothly now.

16 May 20 at 20:19

Hello Andrew, this is Srikanth here, I am trying to get the data from a website "www.gst.gov.in". Once the login page is loaded the Internet Explorer ready state is complete but there are other things load after the Internet Explorer's ready state is complete due to which the I am getting error in finding the elements. How to wait untill the animation is complete and inner page is loaded? I don't want to use the wait function. InternetExplorer is not able to identify that the inner page is not yet loaded so its ready state shows as completed.

Below is my code, please help me in this.

Sub WebScraping()

Dim IE As New SHDocVw.InternetExplorer
Dim Documents As MSHTML.HTMLDocument
Dim Element As MSHTML.IHTMLElement
Dim Elements As MSHTML.IHTMLElementCollection
Dim UserName As MSHTML.IHTMLElement
Dim Password As MSHTML.IHTMLElement

IE.Visible = True
IE.navigate "https://www.gst.gov.in/"

Do Until IE.readyState = READYSTATE_COMPLETE
Loop

Set Documents = IE.document

'For Login Page Click
Set Elements = Documents.getElementsByTagName("a")
For Each Element In Elements
If Element.getAttribute("target") = "_self" And Element.getAttribute("href") = "//services.gst.gov.in/services/login" Then
Element.Click
Exit For
End If
Next Element

'UserID
Set UserName = Documents.getElementById("username")
UserName.Value = "Username"

End Sub

 

17 May 20 at 06:45

Hi Srikanth,

You could try this:

Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE

Otherwise, the Wait method might be your best option.

I hope that helps!

17 May 20 at 07:05

Hey Andrew, Thank you so much for your response. Since i didnt wanted the wait function I found below solution for this.

Do
Set UserName = Documents.getElementById("username")
Loop While UserName Is Nothing
UserName.Value = "Username"
 

Thank you once again for your wonderful course and your teaching technique.

Regards, Srikanth

 

Andrew G  
18 May 20 at 06:24

Hi Srikanth, that's great! Happy to hear that you found a solution that works for you!

20 Apr 20 at 21:13

I am solving the following issue: I want to scrape a title of website when link inserted in column A and put this value to relevant cell (next to it) in column B. The issue seems to be that once I paste the website in column A, the code reruns the entire list from column A2 to "last row" as defined in the code. Is there any way to only modify column B once a single column A is modified? I.e. if I paste a link in column A36 I get a title in B36, regardless of whether the cell is in the middle of the used range or at the very bottom (i.e. only that very cells gets affected..) I would like to use this without having to re-run multiple inputs as it currently stands; (i.e. the loop "for i =2 to last row")? Also, I would like to change the below from Modular macro i.e. sub to private sub reacting to change (i.e. intersect function) where the 'target' is any cell from A:A range. Many thanks!

the code is as follows:

Sub get_title_header()

Dim wb As Object

Dim doc As Object Dim sURL As String

Dim lastrow As Long

lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).row

For i = 2 To lastrow

Set wb = CreateObject("internetExplorer.Application")

sURL = Cells(i, 1) wb.navigate sURL

wb.Visible = False

While wb.Busy

DoEvents Wend ''HTML document

Set doc = wb.document

Cells(i, 2) = doc.Title

On Error GoTo err_clear

Cells(i, 3) = doc.GetElementsByTagName("h1")(0).innerText

err_clear: If Err <> 0 Then Err.Clear Resume

Next

End If

wb.Quit Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit

Next i

End Sub

21 Apr 20 at 07:16

Hi Jay, I think you will find event procedures useful in this case https://www.wiseowl.co.uk/vba-macros/videos/vba-basics/events/

You can use the Worksheet_Change event to detect when the value of a cell on a worksheet has changed and take the appropriate action.

Here's the Microsoft documentation for the Worksheet_Change event https://docs.microsoft.com/en-us/office/vba/api/excel.worksheet.change

I hope that helps!

29 Feb 20 at 17:17

Hi Andrew

I am very impressed with the web scraping course and it is presented very well.  i am however a bit stuck on the code where it stops on the For HTMLButton ... loop stating an object is required.   I have been over the code and cannot see anything different to what you have in the video, can you please help on this?

Sub GetHTMLDocument()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument
    Dim HTMLInput As MSHTML.IHTMLElement
    Dim HTMLButtons As MSHTML.IHTMLElementCollection
    Dim HTMLButton As MSHTML.IHTMLElement
   
    IE.Visible = True
    IE.navigate "wiseowl.co.uk"
   
    Do While IE.ReadyState <> READYSTATE_COMPLETE
   
    Loop

    Set HTMLDoc = IE.Document
    Set HTMLInput = HTMLDoc.getElementById("what")
    HTMLInput.Value = "Excel VBA"
   
    Set HTMLBittons = HTMLDoc.getElementsByTagName("button")
   
    For Each HTMLButton In HTMLButtons
    Debug.Print HTMLButton.className, HTMLButton.tagName, HTMLButton.ID, HTMLButton.innerText
    Next HTMLButton
   
End Sub

 

Thanks in advance

Colin

 

02 Mar 20 at 07:41

Hi Colin,

It appears to be a typo in this line:

Set HTMLBittons = HTMLDoc.getElementsByTagName("button")

If you change HTMLBittons to HTMLButtons it will solve the issue.

Before you do that, try adding Option Explicit to the top of the module, above any subroutines you have written.  When you try to run the code, it will highlight misspelt variable names.

I hope that helps!

08 Oct 19 at 14:53

Dear Sir

Firstly thank you for your excellent videos

For the Example shown in Video 47 both your procedures …

1.      Sub BrowseToExchangeRatesWithQueryString() and

2.      Sub BrowseToExchangeRatesWithQueryStringAndXML()

produce the same results for the statement for the url, https://www.x-rates.com/table/?from=EUR&amount=3 … namely …

Debug.Print HTMLTables.Length --> displays 2 for the number of tags with the name “table” for both procedures.

However, for the following url, https://www.wsj.com/market-data/commodities with has 3 tables with the tag name “table”

Debug.Print HTMLTables.Length --> displays 3 for only the QueryString (procedure 1) and shows 0 for the QueryStringAndXML (procedure 2)

Could you kindly explain why this is the case?  Thank you

11 Oct 19 at 10:33

Hi,

From a brief look at the wsj site it appears as though the tables are being generated by javascript.  The javascript code is executed by a web browser, in this case to dynamically produce the tables and charts on the page.  When you don't use the browser method, the javascript code doesn't run and so you return only the basic response text.  Here's a more indepth explanation which you may find interesting https://stackoverflow.com/questions/5317190/how-do-browsers-execute-javascript

I hope that helps!

09 Mar 19 at 22:41

I have a problem with the code. The website I am trying to use is a ".jsp" site and there is no data to parse using HTMLDoc properties. 

22 Feb 19 at 03:17

Great video - just one thing ever since I went from windows 7 to windows ten I can’t figure out why I am getting a run object error- even when following your basic example of entering document object model in Wikipedia..... run into this yet?

25 Feb 19 at 07:10

I haven't encountered an error with Internet Explorer that's specifically related to Windows 10.  What's the exact message you receive and which line of code causes it?

07 Oct 18 at 07:18

I am very much enjoying your excellent video (Part 47 on scraping websites) but have encountered a problem.

I have referenced the workbook to Microsoft Internet Controls and Microsoft HTML Object Library.
I can compile the following code without a proble. It opens the web page but fails on the final line with a Runtime 13 mismatch error:

Sub GetHTMLDocument()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument
    
    IE.Visible = True
    IE.navigate "wiseowl.co.uk"
    
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    
    Loop
      
    Set HTMLDoc = IE.Document
    
End Sub

I can't for the life of me see how to solve this.
Your help would be much appreciated

Thanks
Neil

07 Oct 18 at 19:47

Hi Neil,

I copy-pasted your code into a new workbook, set references to the two libraries you mentioned and executed the procedure with no errors.  The only things I can suggest are providing a more complete URL:

IE.navigate "https://www.wiseowl.co.uk"

You may also consider adding a further delay to allow the page to load before attempting to capture a reference to it using this code:

Application.Wait Now + TimeValue("00:00:02")

You can adjust the delay as required.  I hope that helps!

10 Oct 18 at 23:27

Andrew

I did as suggested (removed any reference to Microsoft Internet Controls and ran your code).

It errored at "Set HTMLDoc = IE.Document" with the same error message.

I'm afraid I have currently run out of ideas on this problem

Sorry for delay in reply.

Regards
Neil

Andrew G  
11 Oct 18 at 07:27

Hi Neil,

Thanks for the reply, that's really frustrating.  I'll attempt to investigate further and get back to you if I discover anything which may be of use.

08 Oct 18 at 10:46

Andrew

Many thanks for the response. I've tried both your suggestions, with the same results.

From what I can see, on my PC (running Windows 10 and Office 10) the conflict seems to be caused (I'm not experienced enough to understand why) in setting the following:

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument

In the Watch window, HTMLDoc type is set as document with no value and IE.Document type is set as Object/HTMLDocument with a value of "Object HTMLDocument"

Separately, I note that the priority of references can also impact on code; references priority on this workbook is set as:

Visual Basic For Applications
Microsoft Excel 14.0 Object Library
OLE Automation
Microsoft Office 14.0 Object Library
Microsoft Internet Controls
Microsoft HTML Object Library
Microsoft Forms 2.0 Object Library

There are queries relating to this problem on numerous webpages but I've yet to see a solution. I will do my best to get to the bottom of this. If you could possibly help I'd much appreciate it but would understand if you don't have the time.

Neil

Andrew G  
08 Oct 18 at 13:17

Hi Neil,

The issue is less to do with time and more to do with the fact that I can't replicate the problem!  One way we could elminate the object library reference as the issue is to use late-binding.  Remove the reference to the Microsoft Internet Controls library from the project and alter your subroutine so that it looks like this:

Public Const READYSTATE_COMPLETE As Integer = 4

Sub GetHTMLDocument()

    Dim IE As Object
    Dim HTMLDoc As MSHTML.HTMLDocument
    
    Set IE = CreateObject("InternetExplorer.Application")
    
    IE.Visible = True
    IE.navigate "https://www.wiseowl.co.uk"
    
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    
    Loop
      
    Set HTMLDoc = IE.Document
    
    Debug.Print HTMLDoc.Title
    
End Sub

The suggestion here is that the issue may be due to incompatibilities between 32 and 64 bit editions of the software.  You may well encounter further issues even if the above code fixes the current one.  Let me know what happens!

14 Sep 18 at 20:39

Thanks by the excellent videos, I have a problem, when I try to access an application of our company I receive the following message:

"Automation error"

"The object invoked has disconnected from its clients"

My procedure is the following

Sub Browse()

    Dim IE As New SHDocVw.InternetExplorer

    IE.Visible = True
    IE.Navigate "www.idis.ups.com/login.asp?SystemID=UPSDIS"
   
    Do While IE.ReadyState <> READYSTATE_COMPLETE

    Loop
   
    Debug.Print IE.LocationName, IE.LocationURL
   
    IE.Document.forms("login").elements("userid").Value = "xxxxxx"
    IE.Document.forms("login").elements("password").Value = "xxxxxx"

End Sub

Could you please, tell me what the problem is?

Thank you for your help

15 Sep 18 at 09:33

If it's an internal website you might try the following:

Dim ie As New SHDocVw.InternetExplorerMedium

There are other suggestions here https://www.vba-market.com/2017/03/04/the-object-invoked-has-disconnected-from-its-clients/

I hope it helps!

17 Sep 18 at 05:02

Thanks Andrew for the information, with your suggestion I could solve the problem, I'm trying several methods to identify the "login" button, but I could not, Could you tell me how I refer to this button if I need to do click to enter this page?

Thanks

Andrew G  
17 Sep 18 at 07:14

Unfortunately I can't access the page that you've linked to so I can't offer any other suggestions than those shown in the video.

18 Jan 18 at 17:48

Hi Andrew

I have had great success scraping websites after studying your excellent videos but i have come across a line where I cannot get the element either by TagName or ClassName. I want to get "50" from the following:

<td class="ui-table__cell">50</td>

Any ideas?

Best regards

Tony95

 

19 Jan 18 at 08:16

Hi Tony,

I'm happy to hear that you've been having success with web scraping so far!  I'm not quite sure why that particular element isn't working for you.  Just to check, it looks as though you have two underscores in "ui-table__cell", was that intentional?

If you're having trouble getting at the specific element that you're interested in, perhaps you try looping through all the <td> tags as shown at ~ 1:04:45.  You may end up with a lot of information that you don't need but you could always add a bit of code to tidy up at the end.

I hope some of that helps!

19 Jan 18 at 14:06

Hi Andrew

Thanks for your prompt reply. I have tried:-

Set HTMLTables = HTMLPage.getElementsByTagName("td")

Debug.Print HTMLTables.Length

The return in the immediate window is 0

Greyed out at the end of the line <td class="ui-table__cell">50</td> is ==$0

Would that be the problem?

Best regards

Tony95

15 Nov 17 at 12:18

Hi Andy, I learnt so much from your vba classes and want to thank you so much for such great videos. I need your help to sove one of the issue is that i am running loop to open multiple pages on  differebt tabs in internet explorer using vba and checking 2 option on website but after i navigate to second page it still again checking the option on first page please check the code given below for your reference.

On my excel sheet  range("A2") = BOMDXB, range("B2") = 1-dec-17, range("C2") = 10-dec17

Sub mmtlink()

Dim IE As New SHDocVw.InternetExplorer
Dim st As String
Dim htmldoc As MSHTML.HTMLDocument
Dim all As MSHTML.IHTMLElementCollection
Dim one As MSHTML.IHTMLElement
Dim i As Long, d As Long, b As Long

Set IE = CreateObject("InternetExplorer.Application")

IE.Visible = True

d = Range("c2") - Range("b2") + 1
For i = 1 To d
st = Application.WorksheetFunction.Text(Range("b2") + i - 1, "ddmmmyyyy")
If i = 1 Then
    IE.navigate "https://www.makemytrip.com/air/search?tripType=O&itinerary=" _
     & Left(Range("A2"), 3) & "-" & Right(Range("A2"), 3) _
     & "-D-" & st & "&paxType=A-1&cabinClass=E"
          
Else
    IE.Navigate2 "https://www.makemytrip.com/air/search?tripType=O&itinerary=" _
     & Left(Range("A2"), 3) & "-" & Right(Range("A2"), 3) _
     & "-D-" & st & "&paxType=A-1&cabinClass=E", 2048
End If
Do While IE.Busy = True Or IE.readyState <> 4
 Loop
    
    Set htmldoc = IE.document
    
    htmldoc.getElementById("Non stop").Click
    Set all = htmldoc.getElementsByClassName("sortbytype")
  st = "from " & Left(Range("A2"), 3) & " (early)"
    For Each one In all
       If one.getAttribute("Innertext") = st Then
            one.Click
            Exit For
       End If
    Next one
        
Next i

End Sub

 

15 Aug 17 at 21:56

Hi Wise Owls!

Thank you for your very interesting videos regarding VBA. Learnt a lot, but much to learn remains. I´m working with a quite large workbook with football results and after I´ve watched your video (#47) I progressed quite a bit. By trial and error, and some help I solved the problem with results like 1-1 turning into a date in my workbook. Have couple of remaining issues, and if any of this issues has been adressed in previous videos, my apologies.

First, when running the code VBA creates a number of sheets, based on website I´m scraping I get three new sheets. The information I want is in the first sheet (Sheet1). Sheet2 and 3 contains some information, but of no use to me. I tried:

Dim Sheet1 as Worksheet 

Set Sheet1 = ActiveWorksheet

What then happens is that the values earlier placed in the new created Sheet1, 2 and 3 ends up as a mess in Sheeet1.I think I´m on to something but I´ve come to a stop.

Since it´s football I´m dealing with, it´s nice with livescores. how can I loop the VBA/scrape the website every minute and overwrite previous values?

The last one is probaly easy, but my patience is tempted to the limit. How can I assign a table to start in a specific cell?

Keep up the good work!

Freddie

 

16 Aug 17 at 11:45

if all you're doing is importing a table of data, I think there's a much, much easier way to do this.  Our network is being worked on at the moment, but I'll blog on this in the next day or two, and update this post when I've published the blog with a link.

21 Aug 17 at 13:40

As promised, here's the blog!

30 Jun 17 at 19:54

Hi

Taken from this web page:http://www.skysports.com/racing/racecards/beverley/27-06-2017/781785/watch-live-racing-and-bet-with-racing-uk-handicap

I have the following :

<p class="print-hide">

<span class="v5-txt-high">Dutch Artist (IRE) </span>

5-1 (9-7) Held up and always towards rear, never on terms, last of 11, 22 1/4l behind Dark Crystal (8-12) at Ayr 7f hcp (5) sft in Jun.

</p>

and can get either the name ("Dutch Artist") or the name & text in one cell , but cannot cannot split the two, ie, name in column A & text in column B

Can you help?

03 Jul 17 at 10:17

You could always read the entire contents into a cell and use some string functions to separate the two parts. Everything to the left of the first ( is the name and everything to the right is the text.  Something along these lines:

    Dim s As String, sName As String, sText As String
    
    s = Range("A1").Value
    
    sName = Left(s, InStr(s, "(") - 1)
    sText = Mid(s, InStr(s, "("))
    
    Debug.Print sName, sText

I hope that helps!

28 Jun 17 at 00:20

Hi Andy, I was following your lecture 47 on web scraping.  I am pretty certain the first time i ran it everthing worked fine however now i keep getting a runtime error 13 type mismatch  on the following line :

 Set HTMLDoc = IE.Document

I have tried both excel 2010 & 2016, and get the same problem. Here is the code:

Option Explicit
Sub BrowseTo()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
IE.Visible = True
IE.Navigate ("http://x-rates.com")
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Debug.Print IE.Document, HTMLDoc     'produces output:  [object HTMLDocument]       [object]
Set HTMLDoc = IE.Document
End Sub

In debug mode if I hover the mouse over IE.Document it displays:

<The remote server machine does not exist or is unavailable> usually in excel 2016 it displays <Automation error The interface is unknown> or the first message.

This problem is driving me nuts as the code is so simple, but just will not work. Any help would be greatly appreciated.

Thanks Axel

28 Jun 17 at 06:56

Hi Axel,

I'm unable to replicate the problem after copy-pasting your code in Excel 2010 and 2016. Microsoft has this to say about the error message - is this possibly a case of bad timing? Could either your network or the XRates website have been experiencing issues at the point you were testing your code?

28 Jun 17 at 12:30

Hi Andrew, I apprecaite your prompt reply. I don't think that there is a problem with the website as i have tried a few different sites, and retired many times. Also the page appears in the exploere window that opens up, so i presume it should just be referencing something that has already been loaded. The example following this using XML HTTP requests seems to work fine. I thought it might be antivirus(kaspersky), but again the page is loaded by the internet explorer. Is it possible that it is something do with the enviroment.

Andrew G  
29 Jun 17 at 08:07

Hi Axel, it sounds like you've done all the sensible things to eliminate the potential simple causes. I'm afraid that I'm at a loss - I simply can't replicate the issue that you're having even with a straight copy-paste of your code. Other than Googling the exact text of the message and trawling the various forums to find people who've had the same issue I'm not sure what to suggest.

25 May 17 at 15:15

I'm trying to get information on all pages of a website, but I can only get it from the first page.  The pagination uses JavaScript links like this:

<a href="javascript:__doPostBack('ctl00$cphCenter$gvCompany','Page$1')"> 1 </a>
<a href="javascript:__doPostBack('ctl00$cphCenter$gvCompany','Page$1')"> 2 </a>


Could you please teach me how to click and follow these links to scape the information on the next page (57 pages in total)?  Thank you!

25 May 17 at 20:33

I'm not sure if you'll be able to do this.  The website was built using ASP.NET.  If you click on one of the paging buttons, it will submit another request to the server, which will then build a new page of HTML and send this back to you.  Have a look at the URL shown in the address bar of your browser.  If this changes, you'll be ableo to run your VBA scraping code on this page also, but if it doesn't everything is being done at the client side, and your VBA code won't be able to get at the information on the different pages.

 

26 May 17 at 15:25

Thanks Andy for your reply, I found the solution for my example, I have to execute javascript code from VBA by calling execScript command:

        i = 2
 
        For i = 2 To 57
            Call objIE.Document.parentWindow.execScript("javascript:__doPostBack('ctl00$cphCenter$gvCompany','Page$" & i & "');", "JavaScript")
        Next i

I discovered that this code can only run without error if I do not use any comment. If I set any comment in this code, the execution will give me error 80020101: Automation error. I really don't understand why the code is affected by comments. :)

Andy B  
26 May 17 at 15:47

Thanks for taking the time to post the reply - a really good solution, which I'm sure will help others in the same situation.

29 Apr 17 at 02:09

Hello Andrew! Greetings from Rio, Brazil!

I perfectly rounded your macro and was very happy with the results. But I found a wall that I could not overcome.

How do I pull data from a table that returns fields surrounded by {{field name}}?

Do you have a solution for this case? Ever seen something like that?

02 May 17 at 08:50

Hi! I'm sorry but I'm not sure - which web page are you attempting to get your data from?

12 Apr 17 at 12:58

Hi Andrew,

Many thanks for your reply, I have implemented the changes

and the routine works perfectly.

Best regards

Tony95

13 Apr 17 at 06:40

You're welcome, Tony!

11 Apr 17 at 14:46

Hi Andrew

I have been watching your videos for some time and must say thanks for the very

informative content. I am trying to adapt the code from part 47 but I only want to loop

over 1 table. I have tried adding (0) to the end of line "Set HTMLTables = HTMLPage.getElementsByTagName("table")

but it produces an error. Any ideas please?

Best regards

Tony95

11 Apr 17 at 21:44

Hi Tony,

When you add (0) to the end of the line you're changing the type returned by the instruction. The (0) indicates that you're returning a single item but the HTMLTables variable has a type of IHTMLElementCollection. Just change the type of variable to IHTMLElement (or use the HTMLTable variable instead), something like this:

Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)

    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim HTMLCell As MSHTML.IHTMLElement
    Dim RowNum As Long, ColNum As Integer
    
    Set HTMLTable = HTMLPage.getElementsByTagName("table")(0)
        
    Worksheets.Add
    Range("A1").Value = HTMLTable.className
    Range("B1").Value = Now
    
    RowNum = 2
    For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
        
        ColNum = 1
        For Each HTMLCell In HTMLRow.Children
            Cells(RowNum, ColNum) = HTMLCell.innerText
            ColNum = ColNum + 1
        Next HTMLCell
        
        RowNum = RowNum + 1
    Next HTMLRow
            
End Sub