New
Midscene 1.0 is coming - now in beta

Midscene.js

Vision-Driven UI Automation SDK for All Platforms

10k+
Github Stars
#2
No.2 in github trending
Platforms
Web, iOS, Android, and more
Control browsers and mobile apps with natural language across multiple platforms
Unified API design for seamless cross-platform automation

Web

Integrate with Puppeteer or Playwright, or use Bridge Mode to control desktop browsers.

iOS

Use Javascript SDK with WebDriverAgent to control local iOS devices.

Android

Use Javascript SDK with adb to control local Android devices.

Any Interface

Visual modeling enables automation on any interface, beyond DOM limitations.
MODELS
AI Models for UI Automation
Support for Doubao Seed, Qwen3-VL, Gemini-2.5-Pro, and UI-TARS
Visual-language models recommended for reliable and cost-effective automation
OpenAI SDK-compatible interface for quick integration with major model providers

Doubao Seed

Doubao Seed
Vision model from ByteDance, optimized for visual understanding and UI element recognition with excellent performance.

Qwen3-VL

Qwen3-VL
Alibaba Cloud Qwen vision-language model with high-quality image understanding and UI element recognition at competitive pricing.

Gemini-2.5-Pro

Gemini-2.5-Pro
Google's advanced multimodal model with powerful vision capabilities and comprehensive UI automation support.
UI-TARS
UI-TARS
Vision-language model specifically designed for UI automation, providing precise element localization and operation capabilities.
DEVELOPER EXPERIENCE
Developer APIs & Tools
Interactive visual reports for understanding test execution
Built-in Playground for debugging and testing
Chrome Extension for in-browser experience

Rich APIs

Rich APIs
Enables both smart automation workflows and fine-grained atomic control.

MCP Server

MCP Server
Exposes device operations as an MCP Server for collaboration with various models.

Reports & Playground

Reports & Playground
Enhances debugging with intuitive visualization and testing environments.

Flexible Integration

Flexible Integration
Supports multiple script formats, custom models, and extensible features.
.
View All APIsArrow right
aiAction, aiLocate, aiAssert...
Explore the complete API documentation for more automation capabilities.