PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Web Agents Python Packages

Python packages with the GitHub topic web-agents. Sorted by relevance, with stars and monthly downloads.
ServiceNow
agentlab

AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.

3K 579 111
reacher-z
clawbench-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

2K 314 19
OSU-NLP-Group
uground-demo-test

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

1K 312 18
jstibal
langchain-openterms

OpenTerms tools for LangChain agents. Check machine-readable permission rules before agent tools execute.

877 0 0
reacher-z
clawbenchmark

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

776 314 19
reacher-z
clawbench-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

421 314 19
reacher-z
claw-harness

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

395 314 19
reacher-z
nail-clawbench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

377 314 19
reacher-z
openclawbench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

354 314 19
reacher-z
clawbench-cli

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

320 314 19
ServiceNow
doomarena-taubench

TauBench extensions for DoomArena

232 58 6
ServiceNow
doomarena

A framework to test the security and robustness of AI agents

194 58 6
reacher-z
claw-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

137 314 19
ServiceNow
doomarena-promptceptor

Promptceptor tool

134 58 6
reacher-z
claw-ai

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

129 314 19
reacher-z
task-harness

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

122 314 19
reacher-z
nail-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

113 314 19
reacher-z
everyday-bench

ClawBench: Can AI Agents Complete Everyday Online Tasks? (alias of claw-bench)

110 314 19
reacher-z
claw-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

109 314 19
reacher-z
mcq-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

107 314 19
reacher-z
life-bench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

104 314 19
reacher-z
harness-hub

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

92 314 19
reacher-z
nail-agent

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

92 314 19
reacher-z
nail-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

92 314 19
    • Data from PyPI, GitHub, ClickHouse, and BigQuery