VBA - scraping websites videos | Excel VBA Part 46 - Querying Web Pages with Query Tables

Posted by Andrew Gould on 14 November 2016


Querying web pages in Excel VBA is simple when you use Query Tables. This video shows you how to create a query table linked to a web page, how to choose which tables from the web page to return, and how to pass values to a URL query string to create dynamic, updatable web queries in your Excel workbooks.

See our full range of VBA training resources, or test your knowledge of VBA with one of our VBA skills assessment tests.

This video has the following accompanying files:

File name Type Description
Web Query Tables.xlsm Excel workbook with macros

Click to download a zipped copy of the above files.

There are no exercises for this video.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 9 threads Add post
12 Apr 22 at 18:07

Good afternoon WiseOwl,

 

Thank you for the amazing videos on scraping websites. I am trying to scrape data from a website and have run into a problem. I followed your video: "Querying Web Pages using Query Tables" and keep recieving the following response in the output:

"Your browser is unsupported. upgrade your browser"

I am currently running chrome and I believe I am running the most up to date version but I am verifying that as we speak. Any help on this would be amazing. Thank you all, I have copy and pasted my code below for any questions.

Option Explicit

Sub Import_Weather_Data()

Dim ws As Worksheet
Dim qt As QueryTable
Dim URL As String

URL = "https://flightaware.com/resources/airport/DNYO/weather"

    
Set qt = wsWeather_Data.QueryTables("YolaWeather")

With qt
    .Connection = "URL;" & URL
    .RefreshOnFileOpen = True
    .Name = "YolaWeather"
    .WebFormatting = xlWebFormattingRTF
    .WebSelectionType = xlAllTables
    '.WebTables = "2"
    
    .Refresh
End With

End Sub

 

13 Apr 22 at 07:31

Hello!

The error message is not based on the browser you use personally but by the version of web browser used by Excel to create query table connections which, I believe, is based on IE8.

QueryTables have been superseded by WorkbookQuery objects in Excel using the Get Data tools to create Power Query queries. You can see some background information here https://docs.microsoft.com/en-us/office/vba/api/excel.queries.add

Here's some basic code to replicate what you're trying to do using a WorkbookQuery

Sub CreateQuery()

    Dim qry As WorkbookQuery
    Dim QueryText As String
    
    QueryText = _
        "let Source = Web.Page(Web.Contents(""https://uk.flightaware.com/resources/airport/DNYO/weather"")), " _
        & "Data1 = Source{1}[Data] in Data1"
    
    
    Set qry = ThisWorkbook.Queries.Add(Name:="MyQuery", Formula:=QueryText)
    
    With Sheet1.ListObjects.Add( _
        SourceType:=0, _
        Source:="OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=" & qry.Name, _
        Destination:=Range("$A$1")).QueryTable
            .CommandType = xlCmdDefault
            .CommandText = "SELECT * FROM [" & qry.Name & "]"
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = True
            .RefreshOnFileOpen = False
            .BackgroundQuery = True
            .RefreshStyle = xlInsertDeleteCells
            .SavePassword = False
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .PreserveColumnInfo = False
            .Refresh BackgroundQuery:=False
    End With
    
End Sub

I hope it helps!

10 May 22 at 22:29

Hello Andrew,

 

Your comment about copying data from the query from each refresh is perfect thank you. I am working on setting that code up. I have been working on the code today but I got a little stuck. I have tried to comment my process on this and make it as simple as possible. I have three major areas I am struggling with:

1 - Can I use the "lastRow" dimiension in a Range("__,__") when specifiying the range of data to copy

2 - Can I use the "lastRow" dimension in a Range("__,__") when specifying the range of data to remove duplicates from and can I refresh that dimension before removing those duplicates

3 - Can I use the "lastRow" dimension in a Range("__,__") when specifiying the range of data to sort? 

4 - I am also not confident I am using the ".RemovingDuplicates" and ".Sort" functions properly.

 

Some extra info to help explain the code below:

 - I am using a userform to require users to run this code upon opening the spreadsheet

 - I have two worksheets "Sheet1" and "Query"

    - The "Query" worksheet houses the query which is contained in columns A:H

    - Next to the cells containing the actual query have a neighboring table which helps to filter, organize, and prepare the needed data, it is contained in cells A2:W17

- Data copied from the "Query" is pasted in the first avaliable row in "Sheet1"

- All data contains headers in row A and has filters

-I have not altered the original code you sent me other than minor location or website changes

 

Thank you again for your help. I look forward to hearing from you!

 

Code:

Private Sub CommandButton1_Click()


Dim lastRow As Long
Dim lastRowQuery As Long

'Move to Query worksheet
Sheets("Query").Select
    'Refresh Weather Data Query
    Range("A1").Select
    Selection.ListObject.QueryTable.Refresh BackgroundQuery:=False

'Move to Sheet1 worksheet
Sheets("Sheet1").Select
    'Identify last open row in Sheet1
    lastRow = Cells(Rows.Count, "A").End(xlUp).Row + 1 'Finds last used row in column A

'Move to Query worksheet
Sheets("Query").Select
    'Identify last open row in Query
    lastRowQuery = Cells(Rows.Count, "V").End(xlUp).Row 'Finds last used row in column v

'Copy Query data to Sheet1
Range("Q2:V,lastRowQuery").Copy Destination:=Sheets(Sheet1).Cells(lastRow, "A")
    Range("A2:A,lastRowQuery").RemoveDuplicates Columns:=Array(1,lastRow), Header:x1Yes
    Range("A2:A,lastRowQuery").Sort key1:=Range("A2:A,lastRowQuery"), order1:=xlDescending, Header:=xlYes


End Sub

 

Andrew G  
11 May 22 at 06:55

Hi!

You might find this video useful for finding the last used row in a worksheet, but the technique you're using is fine.

To make use of the last row number in the way you're attempting to use it, you'll need to concatenate the row number to the end of the cell reference. So, instead of this:

Range("A2:A,lastRowQuery")

Do this:

Range("A2:A" & lastRowQuery)

I hope it helps!

14 Apr 22 at 21:01

Hi Andrew!

 

Thank you for your help in trying to figure this out. I now understand why my previous code wasn't working and what you sent me is very nice! I am new to VBA and am trying to teach myself so forgive me for the stupid question here, but I am trying to import the data to an excel spreadsheet daily, and only the headers and data, no formatting or filters.

Esentially I need to import the Recent Yola METAR History, whenever the document is open, but only if there is new data. I would to keep it all in the same columns and run from newest to oldest. Esentially inserting new rows in each column for lines of data, and reorganizing from newest to oldest, all when the workbook is opened. I do know hwo to refersh on open upsing "RefreshOnFileOpen " and I do know how to offset data, so I could count the number of rows with each import and move the current data down that much with each import, but I am not sure how best to do that. Any help or advice?

Andrew G  
23 Apr 22 at 07:48

Hi! I'm not sure that I fully understand what you're trying to do so apologies if I've made the wrong assumption here. Can you not simply sort the table after refreshing the query? I suppose that you'd lose any historical data if the website table only contains recent values.

Alternatively, a simple approach would be to have one worksheet containing the "live" query and a separate worksheet to hold the static, historical data. The process would be something like this:

1) Refresh the query to return the recent live data.
2) Copy all the live data from the query table and paste it at the bottom of the historical data table.
3) Apply the Remove Duplicates feature to the historical data table.
4) Sort the historical data table.

I hope it helps!

15 Feb 22 at 11:19

Hello, I followed the video step by step but I can't find the way to get to click on this "Switch To Old" on

the site https://tipsterarea.com/

Can you help me? Thank you

15 Feb 22 at 13:07

Hi Enzo!

You might find this video on referring to web elements useful https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/find-elements-selenium/

I hope it helps!

15 Feb 22 at 15:18

Hi I tried with this .....

Sub esempio_due()
    Dim driver As New ChromeDriver
    Dim tabella As Object, r As Long, c As Integer
    With driver
         .Start "Chrome", ""
        .Get "https://tipsterarea.com/"
        .FindElementByName("search").SendKeys "Switch To Old Site"
        .FindElementByName("switch").Click
        Set tabella = driver.FindElementByTag("table")
        For Each th In tabella.FindElementsByTag("tr")
            r = r + 1: c = 1
            For Each td In th.FindElementsByTag("td")
                Cells(r, c).Value = td.text
                c = c + 1
            Next
        Next
    End With
End Sub
 

but it does not work Can you help me Thank you

Andrew G  
18 Feb 22 at 09:39

You're welcome * 5 :D

18 Feb 22 at 09:21

Thank you

thank you

thank you

thank you

thank you

Andrew G  
18 Feb 22 at 08:58

Hi Enzo!

You could try running Chrome using the user profile you used when you installed the ad blocker https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/chrome-user-profile/

I hope it helps!

17 Feb 22 at 11:03

Hi, I installed an extension on chrome to block ads and it works if I open chrome normally, but when I start the macro there is no extension and it tells me that the browser is managed by a third party. Anyway thanks for your help.

Andrew G  
16 Feb 22 at 16:23

Hi, thanks for the routine but, as you can see,
something is wrong It finds a popup page and it fails.
Can you solve? Thanks again for your availability

Hi Enzo! That's strange, I don't see a popup, it just goes straight to the old version of the page for me.

You could try inspecting the popup and find the element you need to click on to close it. Alternatively you can try finding a popup blocker to prevent them appearing.

I hope you find a solution!

16 Feb 22 at 11:56

Hi, thanks for the routine but, as you can see,

 

something is wrong It finds a popup page and it fails.

Can you solve? Thanks again for your availability

Andrew G  
16 Feb 22 at 08:36

Hi Enzo!

Here's the basic code to click the switch button

Private cd As Selenium.ChromeDriver

Sub Click_Button()

    Set cd = New Selenium.ChromeDriver
    
    cd.Start
    cd.Get "https://tipsterarea.com"
    
    cd.FindElementByCss("#switch").Click
    
End Sub

I hope it helps!

27 Apr 20 at 22:34

Hi. I am trying to connect this portal. first I can not connect withh Data>Get and Transform>from web

secondly how can tell vba if internet connection lost or for any reason couldnt get data from link, ignore any error and ask me do you want try again

SourceDataFile > http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0

Connection >   OLEDB;Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source=C:\Users\Nima\AppData\Local\Microsoft\Windows\INetCache\Content.MSO\518F1202.xlsx;Mode=Share Deny Write;Extended Properties="HDR=YES;";Jet OLEDB:System database="";Jet OLEDB:Registry Path="";Jet OLEDB:Engine Type=37;Jet OLEDB:Database Locking Mode=0;Jet OLEDB:Global Partial Bulk Ops=2;Jet OLEDB:Global Bulk Transactions=1;Jet OLEDB:New Database Password="";Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False;Jet OLEDB:Don't Copy Locale on Compact=False;Jet OLEDB:Compact Without Replica Repair=False;Jet OLEDB:SFP=False;Jet OLEDB:Support Complex Data=False;Jet OLEDB:Bypass UserInfo Validation=False;Jet OLEDB:Limited DB Caching=False;Jet OLEDB:Bypass ChoiceField Validation=False

Connection name > MarketWatchPlus

thanks

28 Apr 20 at 07:25

Hi there, you might find this set of videos on error-handling useful https://www.wiseowl.co.uk/online-training/excel-vba/error-handling/

I'm afraid I can't offer any insight into why you can't connect to the site you've mentioned. It's not one I'm familiar with, sorry!

20 May 19 at 16:51

Andrew,

Quick question here:

Assuming I have some data (10 rows) on Sheet1 via a query, if I execute this:

Sheet1.Range("A1").ListObject.QueryTable.Refresh BackgroundQuery:=False

will it clear the the entire contents of Sheet1 first before populating it?

My concern is if the webpage which the query is fetching from has been updated / changed and now only contains fewer than 10 rows (say only 3 rows), I don't want the query to simply grab those 3 rows and paste onto Range("A1") (and leaving the old rows 4 to 10 data still on the sheet).

Thanks

 

21 May 19 at 21:29

Hi Duggie,

From memory (it's been a while) the query table will grow and shrink according to the number of rows returned when it's refreshed so it won't leave historical data sitting in the worksheet if the query returns fewer rows the next time it's refreshed.

I hope that helps!

22 May 19 at 23:21

Thanks Andrew.

What I have noticed is the columns DO NOT alter, ie if the query only contains 10 columns and I manually add an eleventh, after clicking refresh, the eleventh column remains.

 

 

08 May 19 at 20:58

Andrew,

Do you know if it's possible to delete date from a worksheet but still preserving the query behind it?

 

09 May 19 at 08:10

Hi duggie,

Yes, if you create a query table and then subsequently delete the cells the connection will still exist.  You can test that pretty easily with the code in the sample workbook:

Sub ImportXratesThenDeleteTable()

    Dim ws As Worksheet
    Dim qt As QueryTable
    Dim URL As String
    
    URL = "http://www.x-rates.com/table/?from=USD&amount=1"
    
    Set ws = Worksheets.Add
    
    Debug.Print "Count of query tables at start " & ws.QueryTables.Count
    
    Set qt = ws.QueryTables.Add( _
        Connection:="URL;" & URL, _
        Destination:=Range("A5"))
        
    With qt
        .RefreshOnFileOpen = True
        .Name = "XRates"
        .WebFormatting = xlWebFormattingRTF
        .WebSelectionType = xlSpecifiedTables
        .WebTables = "1"
    
        .Refresh
    End With
    
    Debug.Print "Count of query tables after creating one " & ws.QueryTables.Count
    
    Range("A5").CurrentRegion.Delete
    
    Debug.Print "Count of query tables after deleting cells " & ws.QueryTables.Count
    
End Sub

Output:

Count of query tables at start 0
Count of query tables after creating one 1
Count of query tables after deleting cells 1

I hope that helps!

09 May 19 at 11:01

Andrew,

I think I stumbled across a solution myself instead!

Instead of

    Range("A5").CurrentRegion.Delete

I changed it to 

     Range("A5").CurrentRegion.Clear

and after the worksheet was wiped, I could stil right-click and click refresh to retrieve the table.

Thanks again for your help.

 

 

09 May 19 at 10:54

Andrew,

Thanks for your prompt response.

I tried your code and it worked - sort of!

I put a break point in this line:

Range("A5").CurrentRegion.Delete

and could see in the immediate window that the count of query tables is 1.

After executing the Delete line, the count of query tables is still 1.

By now the worksheet contains no data. However, right-clicking on cell A5, I do not see the option to refresh the query.

I tested this in Excel 2010.

 

 

Andrew G  
10 May 19 at 06:52

Oh, but that's what you asked for!  The cells are deleted but the query remains in the workbook.  You didn't mention that you wanted to retain the ability to right-click the cells to refresh the table!

Anyway, glad that you found a solution that worked for you!

02 May 19 at 14:12

Hi,

If the table on a webpage contains 100 records, yet only the first 50 is shown on the first webpage and the remaining on the second webpage, how can I adapt the code so it brings back the full 100 records onto my worksheet?

Thanks

 

03 May 19 at 07:23

Hi duggie,

It's difficult to answer that one without seeing the pages you mention but can you just import each table separately and place one below the other in a worksheet?  It sounds like the simplest approach to what you've described!

04 Apr 19 at 16:46

I have followed your code and produced this, which works perfectly:

 

Sub ImportWebPage()

    Dim qt As QueryTable
    Dim URL As String
   
    URL = "SomeWebPage"
   
    Set qt = Sheet1.QueryTables.Add(Connection:="URL;" & URL, Destination:=Sheet1.Range("A1"))
   
    With qt
   
        .RefreshOnFileOpen = True
   
        .Name = "SomeName"
       
        .WebFormatting = xlWebFormattingRTF
       
        .WebSelectionType = xlSpecifiedTables
       
        .WebTables = "1"
       
        .Refresh
   
    End With
   
End Sub

 

The problem is that I have additional code running afterwards but the code needs to run AFTER the code above has refreshed.

 

Adding Wait or DoEvents hasn't helped.

How can I ensure the web page has completely refreshed before I continue with the next part of the code?

Thanks

05 Apr 19 at 07:03

You could adapt the code shown here https://docs.microsoft.com/en-us/office/vba/api/excel.querytable.refreshing to wait until the refresh has finished.  Perhaps a simple Do Loop immediately after the .Refresh statement:

Do While .Refreshing

Loop

Hope it helps!

05 Apr 19 at 14:11

Thanks.

I found adding this line seemed to work around 90% of the time:

.BackgroundQuery = False

though I'm not sure if it was a fluke!

Andrew G  
08 Apr 19 at 08:15

Hopefully 90% is good enough!

07 Sep 18 at 10:21

Hi,

What can we do If the URL does not include the specific structure to concatenate the parameters and select specific values from a table? For example the URL for Google Finance bellow:

https://www.google.com/search?stick=H4sIAAAAAAAAA...q=finance&tbm=fin#wptab=s:H4sIA...

Thanks in advance for the assistance.

11 Sep 17 at 23:42

I am curious as to getting to a web page that requires a log in, in my case I am attempting to retrieve data from my online brokerage account. Primarily my watch lists and maybe a couple of other tables that I can reference in the Excel workbook.

After watching several of your tutorials I have a basic understanding of VBA (and growing). HTML is completely beyond my abilities, therefore, I am attempting to watch the videos again an adapt them to my specific instance.

While this isn't really required, it is an attempt to keep learning Excel and VBA.

Thank you for all of the great lessons.

13 Sep 17 at 07:57

Hi, I haven't done this myself but here's an example of someone attempting the same thing using QueryTables and VBA which you might find useful:

https://stackoverflow.com/questions/23657843/excel-vba-passing-username-password-to-web-query-connection

If it's not important to use VBA then I'd recommend using Excel's Power Query tool instead (it's called Get & Transform in Excel 2016). This will almost certainly be the easiest way to achieve what you're attempting! 

https://support.office.com/en-gb/article/Connect-to-a-web-page-Power-Query-b2725d67-c9e8-43e6-a590-c0a175bd64d8

I hope that helps!

14 Sep 17 at 10:28

Thanks for the links, I will look them over.