VBA - scraping websites videos | How do I access elements in the Shadow DOM using Selenium in VBA?

Posted by Andrew Gould on 03 April 2021


Learn how to access elements in a Shadow DOM on a web page. You'll see how to use javascript to access the shadow root element and then search within the Shadow DOM using Selenium's FindElement methods.

There are no files which go with this video.

There are no exercises for this video.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 1 thread Add post
22 Aug 21 at 23:19

Andrew,
I've watched all of the 57 series of videos several times and have learned a great deal and continue to come back to them when i get stuck, however finding this shadow host is stumping me and I haven't found an answer here or anywhere else on the net. So I thought I might go to the source and ask the guru himself.

Here's what i've tried stepping through(F8) the macro to ensure everything is loaded with Selenium Basic of a google chrome pdf with a shadow dom. I'm after a table that is in the chrome browser window that after my macro runs I have to Ctrl Copy and then run another macro to organize and format the table data as I need it. It's a secure website and the rest of the macro runs fine opening 5 tabs in the browser finding all of the elements I need but I'd like to automate this last repetetive task so I would not have to stop Ctrl C and V the data into excel and run another macro. The HTML is:

<body>
    <pdf-viewer id="viewer"></pdf-viewer> flex
        #shadow-root (open)
          <div id="container"> flex
            <div id="sidenav-container">
            <div id="main">
                <embed id="plugin" type="application/x-google-chrome-pdf" ...>    '<-After this

Dim ShadowHost As Selenium.WebElement
Dim ShadowRoot As Selenium.WebElement

With ch
    Set ShadowHost = .FindElement(FindBy.XPath("/html/body/pdf-viewer"))
    Set ShadowHost = .FindElement(FindBy.Css("#viewer")) 
End With

Both the above tried seperatly give:

Run-Time error 7
NOSuchElementError
Element not found for XPath=/html/body/pdf-viewer
or Css=#viewer

'I tried this and didn't get an error:

    Set ShadowHost = .FindElementByXPath("/html/body/*")
    
'BUT the Locals window says ShadowHost tagname = "embed"
'which is the table I'm after!
'but using devtools find gives:
<pdf-viewer id="viewer"></pdf-viewer> flex == $0 
Then I run the following:
    Set ShadowRoot = .ExecuteScript(Script:="retutn arguments[0].ShadowRoot", arguments:=ShadowHost"))
'and i get:

Run-time error '17':
JavaScriptError
javascript error: Unexpected identifier

'So I tried:
    Set ShadowRoot = .ExecuteScript(Script:="retutn arguments[0].ShadowRoot", arguments:=.FindElementByXPath("/html/body/*")"))
'and got the same error 17

'The element has no other shadow-root above but I can't select anything but body.
'I've tried js ch.querySelector("#viewer")
'and a bunch more but can't get the ShadowHost selected.

'So i thought ok I'll just Ctrl A, Ctrl C, .getclipboard like this:
    Set ShadowHost = .FindElementByTag("body").SendKeys(.keys.Control, "a")
    Set ShadowHost = .FindElementByTag("body").SendKeys(.keys.Control, "c")
    Set dataObj = New MSForms.DataObject
    Debug.Print dataObj.GetFromClipboard
'Gives me Compile error mismatch

'So tried this:
    .FindElementByXPath("/html/body/*").AsTable.ToExcel ShDstMLS_ReCOLORADO.Range("M13")
    
'and got:

Run-Time error "59":
UnexpectedTagNameError
Unrecognized tag name.
Expected=table.tbody
Got=embed
'which is what the local window says the tagname of the WebElement is.

So in conclusion, i have 4 questions:
Why the errors:
7?
17?
Compile error mismatch?
59?
ok 5 questions:
Any ideas as to what's going on and can you help a faithful but frustrated follower?

Thanks for any of the questions you might answer,
geophysguy

PS: There is very little on the net on Ctrl copy and paste, using Action or Actions class (do these 2 classes exist in Selenium Basic?)
or SendKeys for Selenium Basic for VBA, plenty with JS JavaScript C and Python but there's enough differences to make trying to answer
a question on one's own very frustrating. Would love to see a video with the Selenium Basic keyboard commands. Guess this counts as 6 questions.

Thanks again.

23 Aug 21 at 15:42

That's a lot of questions!

I believe that working with a PDF is significantly different to working with regular web pages. The pdf-viewer element doesn't appear to be exposed to the DOM (as you've discovered!).

It seems that there are two main ways to do this (neither of which I've attempted):

1) Write a javascript method in your VBA code to use a plugin to copy information from the PDF
https://stackoverflow.com/questions/56648375/how-to-scrape-a-particular-text-from-pdf-using-selenium-with-vba

2) Download the PDF file and use the Acrobat VBA library to manipulate it
https://stackoverflow.com/questions/28835164/copying-data-from-multiple-pdf-files
https://community.adobe.com/t5/acrobat-sdk/accessing-a-pdf-file-through-vba/m-p/9264497
https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/

Sorry I can't be any more help than that but I hope it points you in the right direction!