Crawler Policy

PeaCrawler

PeaCrawler is the governed web crawler used by Pea for operator-directed corpus intake, research retrieval, and source archiving.

PeaCrawler is not a general-purpose search indexer and does not crawl continuously for public ranking, advertising, or profiling purposes.

Identity

Crawling Policy

  • PeaCrawler respects robots.txt.
  • PeaCrawler honors crawl-delay when present.
  • PeaCrawler uses bounded page, depth, byte, redirect, and rate limits.
  • PeaCrawler does not bypass login walls, bot challenges, paywalls, or access controls.
  • PeaCrawler records provenance, crawl limits, errors, and source URLs in manifests.

Purpose

PeaCrawler is used when an operator asks Pea to retrieve public web material for research, indexing, evidence review, or governed corpus intake.

Development Status

PeaCrawler is currently under active development. During this phase, crawler activity is limited to operator-directed testing, research retrieval, and controlled corpus intake.

Opt Out / Contact

To request reduced crawl rate, blocking, correction, or more information, contact: jessejr@decentre.io.

Requests are reviewed manually while PeaCrawler is under development.

Verification

Current public IP ranges: PeaCrawler does not currently operate from a fixed published IP range.

Request signing public key: not currently published.