VBA - scraping websites videos | Excel VBA Part 49 - Downloading Files from Websites

Posted by Andrew Gould on 21 November 2016

PLEASE NOTE - The design of the website used in this video has changed since the video was recorded. This means that the code shown in the video no longer works. The downloadable file contains both the original version of the code and a version which works with the current version of the website. Excel VBA doesn't have a native method for downloading files from websites but you can declare an API function that will enable you to do this. This video takes you through the process of declaring the API function and using it in your code, along with a bunch of other useful techniques such as using folder pickers, creating folders with FileSystemObjects and opening a Windows Explorer window using the Shell function.

You can download any files that you need to follow the video here.

You can increase the size of the video:

Full screen mode for YouTube

You can view the video in full screen mode as shown on the left, using the icon at the bottom right of the frame.

You can also increase the quality of the video:

Changing resolution

You can improve the resolution of the video using another icon at the bottom right of the frame. This will slow down the connection speed, but increase the display and sound quality. This icon only becomes visible when you start playing the video.

Finally, if nothing happens when you play the video, check that you're not using IE in compatibility view.

This page has 4 threads Add post
13 Dec 17 at 08:25

Hi, I have one question. Partly by using your super useful vide's I started trying extracting data from websites to excel. There is one thing though that just keeps failing. Here is an example of the code written. The line:

Debug.print Classnames.length

keeps returning 0, whatever searchtype I use, classname, tagname, id. Could you find any incorrect use of code in here?

Thanks in advance for your answer.

Jonathan

Const HappyEmpURL As String = "https://happyemployees.nle.nl/"

Sub GetDataFromWebsite()

Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Article1 As MSHTML.IHTMLElement
Dim Classnames As MSHTML.IHTMLElementCollection

XMLReq.Open "GET", HappyEmpURL, False
XMLReq.send

If XMLReq.Status <> 200 Then
    MsgBox "Problem" & vbNewLine & XMLReq.Status & "-" & XMLReq.statusText
    Exit Sub
End If

HTMLDoc.Body.innerHTML = XMLReq.responseText
Set Classnames = HTMLDoc.getElementsByClassName("mx-layoutcontainer-center mx-scrollcontainer-center ")
Debug.Print Classnames.Length
For Each Article1 In Classnames
Article1.getElementsByid ("159b6aef-dbc3-5b23-a735-cf99f8341771-1")
Debug.Print Article1.getAttribute("href"), Article1.innerText, Article1.className
Next Article1

End Sub

14 Dec 17 at 09:21

Hi Jonathan,

I think the main problem here is that most of the page content appears to be generated by javascript.  I believe this means that that an XML HTTP request won't contain the elements you're looking for because the javascript won't be executed to create them in the first place. An alternative would be to use Internet Explorer as described in this video.

So, for example, this code will return a couple of elements from the login form on the page you've linked to:

Sub GetHTMLDocument()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument
    Dim Article1 As MSHTML.IHTMLElement
    Dim Classnames As MSHTML.IHTMLElementCollection
    
    IE.Visible = True
    IE.navigate "https://happyemployees.nle.nl/"
    
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    
    Application.Wait Now + TimeValue("00:00:02")
    
    Set HTMLDoc = IE.Document
    Set Classnames = HTMLDoc.getElementsByClassName("form-control")
    
    Debug.Print Classnames.Length
    
    For Each Article1 In Classnames
        Debug.Print Article1.className, Article1.getAttribute("name")
    Next Article1
    
End Sub

The results I see are:

form-control mx-focus       username
form-control  password

I hope that helps!

19 Jun 17 at 08:57

I'm trying to scrape data from the Racing Post website, but of of their class name are ridiculously long. For instance "ui-table__body
                  js-sortableTable__body
                  js-formFilterBody"

 

taken from https://www.racingpost.com/profile/horse/836785/danish-duke#race-id=676975.

I can't get my scraper to recognise these names.

Can you help?

19 Jun 17 at 11:49

I'd be tempted to use the VBA REPLACE function before you start scraping, to replace the long class names with shorter, more manageable ones.  If you find this doesn't work, you may find that the long class names include some strange carriage return character codes.  If you find a good solution, please do reply to this thread letting me (and everyone else) know.

07 Jun 17 at 20:51

Using Excel 2007, and it doesn't recognise  'PtrSafe' . I downloaded the workbook and the "Private Declare PtrSafe Function"  shows as all red. However if i remove the 'PtrSafe' , it's ok?

 

08 Jun 17 at 10:21

You could use conditional compilation:

#If VBA7 Then
  Declare PtrSafe Function ...
#Else
   Declare Function ...
#End If

I'd love to claim the credit for this, but I got it from this MSDN page.

25 Nov 16 at 15:12

Kaspersky keeps blocking the file from being downloaded; the message I get says: HEUR: Trojan-Downloader.Script.Generic

Regards,

 

25 Nov 16 at 22:26

Hi Joni,

Do you see this message for a specific file or for every file that it downloads?  If the former then I'd bet that it's a false-positive from Kaspersky. If you're concerned, however, please don't use the file. You can easily recreate the code simply by following along with the video.

Thanks for bringing it to our attention!

11 Apr 17 at 17:11

I have the same problem.

Kaspersky Internet Security said: The object is infected by HEUR:Trojan-Downloader.Script.Generic.

This only for this one file. Other files can be downloaded succesfully.

Andrew G  
11 Apr 17 at 21:31

It seems to be a Kaspersky specific issue - the HEUR indicates that a heuristic analysis has detected code in the file which matches a particular pattern of potentially malicious code. The file contains code which automatically downloads files from URLs which, as you can imagine, out of context could be construed as malicious. As I wrote the code I know that it's not and can only assume that Kaspersky is returning a false positive but, as I said in the previous response, if you're at all concerned by it, don't download it. You can easily recreate the code by simply following along with the video.

I hope that helps!