I want to web scrap a site but the problem with that site is this, content in the site load in 1s but the loader in the navbar kept loading for 30 to 1min so my puppeteer scrapper kept waiting for the loader in the navbar to stop?
Is there any way to run window.stop()
after a certain timeout
const checkBook = async () => {
await page.goto(`https://wattpad.com/story/${bookid}`, {
waitUntil: 'domcontentloaded',
});
const is404 = await page.$('#story-404-wrapper');
if (is404) {
socket.emit('error', {
message: 'Story not found',
});
await browser.close();
return {
error: true,
};
}
storyLastUpdated = await page
.$eval(
'.table-of-contents__last-updated strong',
(ele: any) => ele.textContent,
)
.then((date: string) => getDate(date));
};
You could strip the
waitUntil: 'domcontentloaded',
in favor of a timeout as documented here https://github.com/puppeteer/puppeteer/blob/v14.1.0/docs/api.md#pagegotourl-options
or set the timeout to zero and instead use one of the page.waitFor...
like this
await page.waitForTimeout(30000);
Similar approach to Marcel's answer. The following will do the job:
page.goto(url)
await page.waitForTimeout(1000)
await page.evaluate(() => window.stop())
// your scraper script goes here
await browser.close()
Notes:
page.goto()
is NOT awaited, so you save time compared to waiting until DOMContentLoaded
or Load
events...goto
was not awaited you need to make sure your script can start the work with the DOM. You can either use page.waitForTimeout()
or page.waitForSelector()
.window.stop()
within page.evaluate()
so you can avoid this kind of error: Error: Navigation failed because browser has disconnected!