Company logo
  • Jobs
  • Bootcamp
  • About Us
  • For professionals
    • Home
    • Jobs
    • Courses
    • Questions
    • Teachers
    • Bootcamp
  • For business
    • Home
    • Our process
    • Plans
    • Assessments
    • Payroll
    • Blog
    • Sales
    • Calculator

0

282
Views
Node.js - How to select nested HTML elements in Puppeteer using $$eval

I'm using puppeteer to scrape contents from a website. I have this HTML markup structure:

<li>
    <div class="kode_ticket_text">
      <h6>Tennis</h6>
    <div class="ticket_title">
      <h2>ATP</h2>
      <span>VS</span>
      <h2>Monte Carlo (Monaco), terra battuta</h2>
    </div>
      <p>11:00 AM</p>
    </div>                          
    <div class="ticket_btn">                                
      <a href="https://example.com/event-live">Guarda Gratis</a>
    </div>
</li>

I need to get the link and all the other infromations that are the name of the event, the hour of the streaming and the type. since I have a list of li I've decided to use the page.$$eval() function of puppeteer, but I'm not sure how to proceed to access all the informations I need because there are many nested HTML element and I don't think that I can access it after I've selected all the li?

This is the node.js code I'm using at the moment

puppeteer.launch({
    headless: false
}).then(  async (browser) => {

    const page = await browser.newPage()
    const navigationPromise = page.waitForNavigation()

    await page.goto('https://example.com/')

    //await page.setViewport({ width: 1280, height: 607 })

    await page.waitForSelector('.form-content > .form-items > .form-button > a > .ibtn')
    await page.click('.form-content > .form-items > .form-button > a > .ibtn')

    await navigationPromise

    // await page.waitForSelector('.container > .row > .results-item > .kode_ticket_wraper > .container')
    // await page.click('.container > .row > .results-item > .kode_ticket_wraper > .container')

    // await page.waitForSelector('.container > .kode_ticket_list > li:nth-child(1) > .ticket_btn > a')
    // await page.click('.container > .kode_ticket_list > li:nth-child(1) > .ticket_btn > a')

    // await navigationPromise

    await page.waitForSelector('ul.kode_ticket_list > li')
    await page.$$eval('ul.kode_ticket_list > li', (el) => {
      // here I want to select all the li and if possible all the informations needed
    })

    await browser.close()

})
7 months ago · Santiago Trujillo
1 answers
Answer question

0

page.$$eval's pageFunction returns an array of HTML elements you can iterate over to dive deeper in the DOM of each <li> elements. Just build the structure of the Object you want to return and access the nested elements with querySelector().

For example:

const data = await page.$$eval('ul.kode_ticket_list > li', listElems =>
  listElems.map(li => {
    return {
      link: li.querySelector('.ticket_btn > a').href,
      title: li.querySelector('.kode_ticket_text > h6').innerText,
      time: li.querySelector('.kode_ticket_text > p').innerText
    }
  })
)
console.log(data)

Output:

[
  {
    link: 'https://example.com/event-live-1',
    title: 'Tennis 1',
    time: '11:00 AM'
  },
  {
    link: 'https://example.com/event-live-2',
    title: 'Tennis 2',
    time: '9:00 AM'
  },
  {
    link: 'https://example.com/event-live-3',
    title: 'Tennis 3',
    time: '10:00 AM'
  }
]
7 months ago · Santiago Trujillo Report
Answer question
Find remote jobs