Company logo
  • Jobs
  • Bootcamp
  • About Us
  • For professionals
    • Home
    • Jobs
    • Courses
    • Questions
    • Teachers
    • Bootcamp
  • For business
    • Home
    • Our process
    • Plans
    • Assessments
    • Payroll
    • Blog
    • Sales
    • Calculator

0

55
Views
Scraping table with puppeteer returns wrong results

I'm trying to scrape (this) product page, specifically the modal that shows up when you click "View all bids".

The html structure is just a simple table, I'm trying to get every "Size" element. The problem is that whenever I run my code, it opens up the modal but only returns a few random shoe sizes that are not in order.

Example:

shoeSizeBids: [
      '14', '11.5', '10.5',
      '11', '8.5',  '11',
      '9',  '9',    '7',
      '13'
    ]

My code:

const bidsChartSel =
      '#market-summary > div.ask.ask-button-b > div.sale-size > div:nth-child(2)';
    await Promise.all([page.click(bidsChartSel)]);

    // Get all the shoe size bids 
    const shoeSizeBids= await page.evaluate(() =>
      Array.from(
        document.querySelectorAll('tbody > tr > td:nth-child(1)'),
        (element) => element.textContent
      )
    );
5 months ago · Juan Pablo Isaza
2 answers
Answer question

0

The sorting order comes from that page, i.e. the sizes are rendered in that order. To get them sorted properly, you'd need to:

  1. (optionally) get rid of duplicate sizes
  2. convert the array of strings to an array of floating point numbers
  3. sort the array

That can be achieved with the following:

const uniqueSortedSizes = Array.from(new Set(shoeSizeBids))
  .map(s => parseFloat(s, 10))
  .sort((a, b) => a > b ? 1: a < b ? -1 : 0);
5 months ago · Juan Pablo Isaza Report

0

You are matching multiple HTML tables with the current selector (tbody > tr > td:nth-child(1)). the one inside the modal uses:

.activity-table > tbody > tr > td:nth-child(1)

You can also use page.$$eval as a puppeteer shorthand for Array.from(document.querySelectorAll(selector)):

const shoeSizeBids = await page.$$eval('.activity-table > tbody > tr > td:nth-child(1)', elems => elems.map(el => el.innerText))
5 months ago · Juan Pablo Isaza Report
Answer question
Find remote jobs