Shared Conversation

Cheerio Web Scraping

Give me a code to scrape webpage using cheerio

Feb 8, 202608:54 PM

gemini · gemini-3-flash-preview

To scrape a webpage using Cheerio, you need two things:

Axios (or any HTTP client) to fetch the HTML of the page.
Cheerio to parse the HTML and extract data using jQuery-like syntax.

1. Prerequisites

First, initialize your project and install the necessary packages:

bash

npm init -ynpm install axios cheerio

2. The Scraper Code

Here is a complete example that scrapes a website (in this case, a practice site) to get titles and prices.

javascript

const axios = require('axios');const cheerio = require('cheerio');async function scrapeBooks() {    try {        // 1. Fetch the HTML from the URL        const url = 'https://books.toscrape.com/';        const { data } = await axios.get(url);        // 2. Load the HTML into cheerio        const $ = cheerio.load(data);        // 3. Create an array to store our results        const books = [];        // 4. Use selectors to find the data you want        // In this site, each book is inside an <article class="product_pod">        $('.product_pod').each((index, element) => {            const title = $(element).find('h3 a').attr('title');            const price = $(element).find('.price_color').text();            const availability = $(element).find('.availability').text().trim();            const link = $(element).find('h3 a').attr('href');            books.push({                title,                price,                availability,                link: url + link            });        });        // 5. Output the results        console.log(books);    } catch (error) {        console.error(`Error fetching the page: ${error.message}`);    }}scrapeBooks();

3. Key Functions Explained

axios.get(url): Downloads the raw HTML source code of the page.
cheerio.load(data): Parses the HTML so you can use the $ variable just like jQuery.
$(selector).each(): Loops through every element that matches the CSS selector.
.find(): Searches for elements inside the current container.
.text(): Extracts the text content of an element.
.attr('attributeName'): Gets the value of an attribute (like href, src, or title).

4. Important Limitations

Static HTML Only: Cheerio cannot scrape websites that require JavaScript to render (like React or Vue apps). If the page is blank when you scrape it, you might need Puppeteer or Playwright.

User-Agents: Some websites block scrapers. To fix this, you may need to add a "User-Agent" header to your axios request:

javascript

const { data } = await axios.get(url, {    headers: {        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'    }});

5. How to run it

Save the code as scrape.js and run it in your terminal:

bash

node scrape.js

Feb 8, 202608:54 PM

Can you make code look like written by student by preserving accuracy. Remove all comments and use custom unique variables

Feb 8, 202608:55 PM

gemini · gemini-3-flash-preview

javascript

const webRequest = require('axios');const htmlParser = require('cheerio');async function collectSiteData() {    try {        const targetUrl = 'https://books.toscrape.com/';        const serverResponse = await webRequest.get(targetUrl);        const reader = htmlParser.load(serverResponse.data);                const finalResults = [];        reader('.product_pod').each((idx, box) => {            const bookTitle = reader(box).find('h3 a').attr('title');            const bookPrice = reader(box).find('.price_color').text();            const stockLevel = reader(box).find('.availability').text().trim();            const pagePath = reader(box).find('h3 a').attr('href');            const infoObject = {                label: bookTitle,                cost: bookPrice,                inStock: stockLevel,                source: targetUrl + pagePath            };            finalResults.push(infoObject);        });        console.log(finalResults);    } catch (err) {        console.log('Something went wrong: ' + err);    }}collectSiteData();

Feb 8, 202608:55 PM

Ready to Work Smarter with AI?

Join hundreds of teams building the future on Geekflare AI.
Get started in minutes.