VBA - scraping websites videos | Excel VBA Part 57.2 - Basic Web Scraping with Selenium and Google Chrome

Posted by Andrew Gould on 15 January 2021


In this video you'll learn how to perform basic web scraping tasks using Selenium Basic and Google Chrome. You'll see how to open Chrome, navigate to a web page and interact with items on the page including typing text into an input box and clicking a button. You'll learn how to check if a web element exists on the page to avoid run time errors and how to set a timeout value. You'll see how to reference a collection of elements and loop over them to identify the ones you're interested in. You'll also learn how to write a table into an Excel worksheet using two different techniques, as well as how to extract hyperlinks from a web table and assign them to cells in an Excel worksheet.

This video has the following accompanying files:

File name Type Description
Pt57_2 Basic Web Scraping with Selenium and Google Chrome.xlsm Excel workbook with macros

Click to download a zipped copy of the above files.

There are no exercises for this video.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 4 threads Add post
08 Jun 21 at 12:28

Thanks for sharing such great informative training videos. Even toughest of the tough topic are made easy to understand. Even a person without any coding knowledge can digest your topics and become an expert..

Im getting a

Run-time error '1004': Exception from HRESULT: 0x800A03EC

Table.AsTable.ToExcel ThisWorkbook.Worksheets("DirStatus").Cells(2, 10)

Suprisingly the same code was running fine for 30-35 days, however offlate from past two days without any changes being done to the code it authomatically started throwing up this error. If possible kindly give your thoughts on the same..

Thank again for sharing all your training vidoes.

 

 

08 Jun 21 at 13:40

Hi there!

I don't know the answer to that one but here are some suggested causes and fixes:

https://www.codeproject.com/Questions/470089/Exception-from-HRESULT-0x800A03EC-Error

https://stackoverflow.com/questions/7099770/hresult-0x800a03ec-on-worksheet-range

I hope it helps!

28 May 21 at 01:32

Hi Andrew,

Subsequent to my earier post, i made some adjustments. I am still stuck at how to get the output data into one results sheet in the pivot table format. Your assistance would be greatly appreciated. The sample input data is valid for this code. 

Thanks much!

 

Option Explicit

Private chromebrowser As Selenium.ChromeDriver ' declaration at the module level. note we changed DIM to Private


Sub LoopScrapePropTaxWebsiteNew()

Dim FindBy As New Selenium.By ' declaration of an auto instancing BY object to query any changes to HTML class name

Dim PropertyOwnerQueryResults As Selenium.WebElements ' Declaration of a variable that can take multiple web elements
Dim PropertyTaxResults As Selenium.WebElements   ' Declaration of a variable that can take only one web element
Dim PropertyOwnerQueryResult As Selenium.WebElement
Dim PropertyTaxResult As Selenium.WebElement
Dim PropInfoTable As Selenium.WebElement
Dim PropTaxtable As Selenium.WebElement
Dim ValNo As String
Dim StratNo As String

Set chromebrowser = New Selenium.ChromeDriver

chromebrowser.Start 'The base URL in google chrome
chromebrowser.Get "https://ptsqueryonline.fsl.org.jm/PTSOnlineWeb/ptsquery.jsp" ' access the home page. If we wanted to access any page, we would just indicate the page name after the forward slash

Application.ScreenUpdating = False  'This line will turn off the screen flickering

Worksheets("PropTaxData").Activate ' Go back to the property Tax Sheet
Range("A2").Select
    
'-----------Loop to download the Property Tax Information
    
    Do Until ActiveCell.Value = ""
        
        ValNo = ActiveCell.Value
        StratNo = ActiveCell.Offset(0, 1).Value
        
    If chromebrowser.IsElementPresent(FindBy.Name("valuationno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("valuationno").SendKeys ValNo

    If chromebrowser.IsElementPresent(FindBy.Name("stratalotno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("stratalotno").SendKeys StratNo

    If chromebrowser.IsElementPresent(FindBy.Class("btn"), 5000) = False Then ' this is to account for any changes button name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find submit button", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByClass("btn", 3000).Click

'---------------Retriving the results of the property tax query

    Set PropertyOwnerQueryResults = chromebrowser.FindElementsByClass("form-horizontal")

    If PropertyOwnerQueryResults.Count = 0 Then      ' Condition to account for an empty result set
        'chromebrowser.Quit
        MsgBox "No Results Found" & " " & "for Valuation No" & " " & ValNo, vbExclamation
        Exit Sub
    End If

        For Each PropertyOwnerQueryResult In PropertyOwnerQueryResults ' loops through the web elements

            Set PropInfoTable = PropertyOwnerQueryResult.FindElementByTag("tbody")
             'Debug.Print PropInfoTable.Text ' print the webeleemnt texts to the immediate window
             
        Next PropertyOwnerQueryResult

    ProcessTable PropInfoTable
    
    chromebrowser.FindElementById("back").Click 'Select the back button
     
    Worksheets("PropTaxData").Activate
        
    ActiveCell.Offset(1, 0).Select ' Go to the next cell
    
    Loop
    
    'Application.ScreenUpdating = True

End Sub


Sub ProcessTable(TableToProcess As Selenium.WebElement)

    Dim AllRows As Selenium.WebElements
    Dim SingleRow As Selenium.WebElement
    Dim AllRowCells As Selenium.WebElements
    Dim SingleCell As Selenium.WebElement
    Dim OutputSheet As Worksheet
    Dim RowNum As Long, ColNum As Long
    Dim TargetCell As Range
    

    Set OutputSheet = ThisWorkbook.Worksheets("Results")
    
    Set AllRows = TableToProcess.FindElementsByTag("tr")

    For Each SingleRow In AllRows
    
        RowNum = RowNum + 1
        Set AllRowCells = SingleRow.FindElementsByTag("td")
        
        If AllRowCells.Count = 0 Then
            Set AllRowCells = SingleRow.FindElementsByTag("th")
        End If
        
        For Each SingleCell In AllRowCells
        
            ColNum = ColNum + 1
            
            Set TargetCell = OutputSheet.Cells(RowNum, ColNum)
            
            TargetCell.Value = SingleCell.Text
            
            'Debug.Print SingleCell.Text
        Next SingleCell
    
        ColNum = 0
        
    Next SingleRow

End Sub

27 May 21 at 21:55

Hi Andrew,

I am loving your content. The videos are insightful, simple and easy to understand. Using Selenium, I am writing a webscraping script to loop over an excel array of values to login a website, copy the particulars from that website and write the contents to an excel sheet.  The data is returned across as a table. I know the script works as the data is generated in the immediate window.  I need your guidance how to write the details to an excel file in a pivot table format with the appropriate headers.

Here is some sample values to test the script

16402008114

17006012198

00604005040

06901022020

031D5Y27115

19002019045

031D5Y27129

031D5Y27134

105D5Z02004

031D5Y27109

16301002229

Looking forward to your kind support

 

Option Explicit

Private chromebrowser As Selenium.ChromeDriver ' declaration at the module level. note we changed DIM to Private


Sub LoopScrapePropTaxWebsite()

Dim FindBy As New Selenium.By ' declaration of an auto instancing BY object to query any changes to HTML class name

Dim PropertyOwnerQueryResults As Selenium.WebElements ' Declaration of a variable that can take multiple web elements
Dim PropertyTaxResults As Selenium.WebElements   ' Declaration of a variable that can take only one web element
Dim PropertyOwnerQueryResult As Selenium.WebElement
Dim PropertyTaxResult As Selenium.WebElement
Dim PropInfoTable As Selenium.WebElement
Dim PropTaxtable As Selenium.WebElement
Dim ValNo As String
Dim StratNo As String

Set chromebrowser = New Selenium.ChromeDriver

chromebrowser.Start 'The base URL in google chrome
chromebrowser.Get "https://ptsqueryonline.fsl.org.jm/PTSOnlineWeb/ptsquery.jsp" ' access the home page. If we wanted to access any page, we would just indicate the page name after the forward slash

Application.ScreenUpdating = False  'This line will turn off the screen flickering

Worksheets("PropTaxData").Activate ' Go back to the property Tax Sheet
Range("A2").Select
    
'-----------Loop to download the Property Tax Information
    
    Do Until ActiveCell.Value = ""
        
        ValNo = ActiveCell.Value
        StratNo = ActiveCell.Offset(0, 1).Value
        
    If chromebrowser.IsElementPresent(FindBy.Name("valuationno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("valuationno").SendKeys ValNo

    If chromebrowser.IsElementPresent(FindBy.Name("stratalotno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("stratalotno").SendKeys StratNo

    If chromebrowser.IsElementPresent(FindBy.Class("btn"), 5000) = False Then ' this is to account for any changes button name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find submit button", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByClass("btn", 3000).Click

'---------------Retriving the results of the property tax query

    Set PropertyOwnerQueryResults = chromebrowser.FindElementsByClass("topInfo")

    If PropertyOwnerQueryResults.Count = 0 Then      ' Condition to account for an empty result set
        'chromebrowser.Quit
        MsgBox "No Results Found" & " " & "for Valuation No" & " " & ValNo, vbExclamation
        Exit Sub
    End If

        For Each PropertyOwnerQueryResult In PropertyOwnerQueryResults ' loops through the web elements
            Debug.Print PropertyOwnerQueryResult.Text ' print the webeleemnt texts to the immediate window
        
        Next PropertyOwnerQueryResult


    Set PropertyTaxResults = chromebrowser.FindElementsByClass("middleTable")
    
    If PropertyTaxResults.Count = 0 Then          ' Condition to account for an empty result set
        chromebrowser.Quit
        MsgBox "No Results Found", vbExclamation
        Exit Sub
    End If

        For Each PropertyTaxResult In PropertyTaxResults  ' This For Loop goes through the elements of the results table and prints the text in the immediate window
    
            Debug.Print PropertyTaxResult.Text
            

        Next PropertyTaxResult
        
    
    chromebrowser.FindElementById("back").Click 'Select the back button
     
    Worksheets("PropTaxData").Activate
        
    ActiveCell.Offset(1, 0).Select ' Go to the next cell
    
    Loop
    
   

End Sub

28 May 21 at 06:22

Hi there!

I think that the best thing to do is take a look at some of our videos on creating pivot tables! You can see the list of pivot table videos here https://www.wiseowl.co.uk/vba-macros/videos/vba-pivot-tables-charts/

I hope it helps!

29 May 21 at 22:04

Success! Thanks alot Andrew!

Andrew G  
30 May 21 at 06:05

Excellent!

28 May 21 at 07:17

Hi Andrew,

Thanks for the prompt response. I asked the wrong question. I need guidance on how to consolidate the data from the website to one sheet. The data is currently downloaded to multiple tabs.

Thanks

Andrew G  
29 May 21 at 19:11

OK, that makes more sense! There are plenty of ways to build a list by finding the end of the current list (your suggestion is a good one) and we even have a video dedicated to discussing different techniques! https://www.wiseowl.co.uk/vba-macros/videos/vba-advanced/last-row-column-cell/

A quick way to avoid testing if the sheet is empty is to start at the bottom of column A, end upwards (this will stop at the first cell containing a value or cell A1 if the column is empty) and then offset one row downwards. 

Range("A1048576").End(xlUp).Offset(1, 0)

This will leave row 1 of the worksheet empty but you could use that for column headings or just delete the row at the end.

I hope it helps!

29 May 21 at 13:52

Hi Andrew,

I took out the code that adds a new work sheet and replaced it with this line:

Set OutputSheet = ThisWorkbook.Worksheets("Results"). 

The challenge now is to prevent the results from being overwritten each time a new data set is added.

In other words, I need a dynamic way for the results sheet to be populated with all the results. I was thinking to code the output sheet using the following conditions:

Condition 1:-

If the results sheet is blank, then start writing the results to the output sheet starting at cell A1 

Condition 2:-

If the output sheet is not blank, then go to the first empty cell in column A and then beginning writing the results.

I have been getting an errors each time I tried writing the code.

Your guidance would be greatly appreciated 

Andrew G  
29 May 21 at 06:47

Hi! I was going to say that you can remove the code which creates the new worksheet for each table that you're copying but I can't see anything in the two sets of sample code you've posted which does this. How are you creating the new worksheets that your tables are being written to?

12 Mar 21 at 07:24

Hi Andrew,

Thank you so much for your reply to my post under Excel VBA Part 47.1. I've managed to apply and adjust the code scroll through all pages and tabs on the AustralianSuper site. When I tried to use a similar approach on another website, I seem to be running into a challenge. On this page https://www.pfzw.nl/over-ons/zo-beleggen-we/waarin-we-beleggen/overzicht-aandelen.html I want to copy the table (like with the AustralianSuper). But all of these FindElementByXXX don't seem to find the table, while in the inspector I can see it there clearly:

 Set PFZWTable = ch.FindElementByClass("data-table style-scope p-data-table")
 Set PFZWTable = ch.FindElementByCss(".data-table.style-scope.p-data-table")
 Set PFZWTable = ch.FindElementByTag("Table")
 Set PFZWTable = ch.FindElementsByXPath("//*[@id='tabledatasource-642984829']//div/div/div/div/table")

Inspector element: <table class="data-table style-scope p-data-table" style="min-width: 830px">, XPath //*[@id="tabledatasource-642984829"]//div/div/div/div/table

I've also given the page a little while to load (ch.Wait "3000"), just in case it needed time to load. I seem to be struggling with FindElementBy the above options in general. On other pages, the only FindElement that works for me is FindElementByLinkText, and in one case ch.FindElementByCss(".next") to click the Next button. 

What am I doing wrong? Am I missing something? If an element seems to have the tag 'table' why does ch.FindElementByTag("Table") when there is only one table tag on the page? The same for FindElementByClass("data-table style-scope p-data-table"). I've watch a lot of (but admittedly not all of) your video 57.4, but still can't seem find what I'm doing wrong.

Your guidance on this is really appreciated! 

Best,

19 Mar 21 at 10:40

I thought you'd like to know that I've added a video answer to the question which you can see here. Thanks for the question!

12 Mar 21 at 17:40

Well I've learnt something about web design that I didn't previously know!  Working code below but you might find this useful as background information first

https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_shadow_DOM

Here's the code which writes the table you're trying to get into a new worksheet:

Private ch As Selenium.ChromeDriver

Sub ScrapePFZW()

    Dim ShadowHost As Selenium.WebElement
    Dim ShadowRoot As Selenium.WebElement
    Dim PFZWTable As Selenium.WebElement
    Dim ws As Worksheet
    
    Set ch = New Selenium.ChromeDriver
    
    ch.Start
    ch.Get "https://www.pfzw.nl/over-ons/zo-beleggen-we/waarin-we-beleggen/overzicht-aandelen.html"
    
    'Get a reference to the parent element of the shadow root
    Set ShadowHost = ch.FindElementByCss("p-data-table")
    
    'Return a reference to the shadow root of the parent element
    Set ShadowRoot = ch.ExecuteScript("return arguments[0].shadowRoot", ShadowHost)
    
    'Search within the shadow DOM using normal FindElement methods
    Set PFZWTable = ShadowRoot.FindElementByCss("table")
    
    Set ws = ThisWorkbook.Worksheets.Add
    PFZWTable.AsTable.ToExcel ws.Range("A1")
    
End Sub

Hope that helps!