Scrapeless
Scrapeless offers flexible and feature-rich data acquisition services with extensive parameter customization and multi-format export support. These capabilities empower LangChain to integrate and leverage external data more effectively. The core functional modules include:
DeepSerp
- Google Search: Enables comprehensive extraction of Google SERP data across all result types.
- Supports selection of localized Google domains (e.g.,
google.com
,google.ad
) to retrieve region-specific search results. - Pagination supported for retrieving results beyond the first page.
- Supports a search result filtering toggle to control whether to exclude duplicate or similar content.
- Supports selection of localized Google domains (e.g.,
- Google Trends: Retrieves keyword trend data from Google, including popularity over time, regional interest, and related searches.
- Supports multi-keyword comparison.
- Supports multiple data types:
interest_over_time
,interest_by_region
,related_queries
, andrelated_topics
. - Allows filtering by specific Google properties (Web, YouTube, News, Shopping) for source-specific trend analysis.
Universal Scraping
- Designed for modern, JavaScript-heavy websites, allowing dynamic content extraction.
- Global premium proxy support for bypassing geo-restrictions and improving reliability.
Crawler
- Crawl: Recursively crawl a website and its linked pages to extract site-wide content.
- Supports configurable crawl depth and scoped URL targeting.
- Scrape: Extract content from a single webpage with high precision.
- Supports "main content only" extraction to exclude ads, footers, and other non-essential elements.
- Allows batch scraping of multiple standalone URLs.
Overview​
Integration details​
Class | Package | Serializable | JS support | Package latest |
---|---|---|---|---|
ScrapelessDeepSerpGoogleSearchTool | langchain-scrapeless | ✅ | ❌ | |
ScrapelessDeepSerpGoogleTrendsTool | langchain-scrapeless | ✅ | ❌ |
Tool features​
Native async | Returns artifact | Return data |
---|---|---|
✅ | ❌ | Search Results Based on Tool |
Setup​
The integration lives in the langchain-scrapeless
package.
!pip install langchain-scrapeless
Credentials​
You'll need a Scrapeless API key to use this tool. You can set it as an environment variable:
import os
os.environ["SCRAPELESS_API_KEY"] = "your-api-key"
Instantiation​
ScrapelessDeepSerpGoogleSearchTool​
Here we show how to instantiate an instance of the ScrapelessDeepSerpGoogleSearchTool
. The universal Information Search Engine allows you to retrieve any data information.
- Retrieves any data information.
- Handles explanatory queries (e.g., "why", "how").
- Supports comparative analysis requests.
The tool accepts the following parameters:
q
: (str) The search query string. Supports advanced Google syntax likeinurl:
,site:
,intitle:
,as_eq
, etc.hl
: (str) Language code for result content, e.g.,en
,es
,fr
. Default:'en'
.gl
: (str) Country code for geo-specific result targeting, e.g.,us
,uk
,de
. Default:'us'
.google_domain
: (str) Which Google domain to use (e.g.,'google.com'
,'google.co.jp'
). Default:'google.com'
.start
: (int) Defines the result offset. It skips the given number of results. Used for pagination. Examples:0
(default): the first page of results10
: the second page20
: the third page
num
: (int) Defines the maximum number of results to return. Examples:10
(default): returns 10 results40
: returns 40 results100
: returns 100 results
ludocid
: (str) Defines the ID (CID) of the Google My Business listing you want to scrape. Also known as Google Place ID.kgmid
: (str) Defines the ID (KGMID) of the Google Knowledge Graph listing you want to scrape. Also known as Google Knowledge Graph ID. Searches with the kgmid parameter will return results for the originally encrypted search parameters. For some searches,kgmid
may override all other parameters exceptstart
andnum
.ibp
: (str) Responsible for rendering layouts and expansions for some elements. Example: gwp;0,7 to expand searches with ludocid for expanded knowledge graph.cr
: (str) Defines one or multiple countries to limit the search to. Uses formatcountry{two-letter country code}
, separated by|
. Example:countryFR|countryDE
only searches French and German pages.
lr
: (str) Defines one or multiple languages to limit the search to. Uses formatlang_{two-letter language code}
, separated by|
. Example:lang_fr|lang_de
only searches French and German pages.
tbs
: (str) Defines advanced search parameters not possible in the regular query field. Examples include advanced search for:patents
dates
news
videos
images
apps
text
contents
safe
: (str) Defines the level of filtering for adult content. Values:active
: blur explicit contentoff
: no filtering
nfpr
: (str) Defines exclusion of results from auto-corrected queries when the original query is misspelled. Values:1
: exclude these results0
(default): include them- Note: This may not prevent Google from returning auto-corrected results if no other results are available.
filter
: (str) Defines if'Similar Results'
and'Omitted Results'
filters are on or off. Values:1
(default): enable filters0
: disable filters
tbm
: (str) Defines the type of search to perform. Values:none
: regular Google Searchisch
: Google Imageslcl
: Google Localvid
: Google Videosnws
: Google Newsshop
: Google Shoppingpts
: Google Patentsjobs
: Google Jobs