Skyvern vs UiPath for vision-RPA: license, cost, and team-fit decision framework

The pick is decided by one question, and it is not accuracy or build speed. It is whether external users will interact with your Skyvern over a network. If yes and you have modified Skyvern, AGPL-3.0 Section 13 obliges you to offer those users your modified source, and that obligation, not feature parity, is what pushes most SaaS embedders to UiPath or to a paid Skyvern license. If your use is internal-only, the clause never fires and Skyvern self-host wins on cost and on selector-maintenance for portals that change layout. Everything else in this comparison is a tie-breaker inside one of those two branches.

The analysis below is built on the full Skyvern AGPL-3.0 LICENSE text and both products' public docs and repos. The license reading is the part teams most often get wrong from memory, so it leads.

Tested companion (May 2026): We built a vision-RPA loop end-to-end on Gemini and published the artifacts in a paired build piece. Read the tested companion.

What "vision-based RPA" actually means

Traditional RPA records a script against the DOM of a web page or the accessibility tree of a desktop app. Find a button by its CSS selector or UIA element id, click, type, move on. It works right up until the selectors stop being stable. The day a vendor rebuilds their portal in a new component framework, every selector you depend on breaks at once, and the bot fails silently or loudly depending on how you wrote it.

Vision-based RPA swaps "find this selector" for "find what looks like the Submit button in this screenshot, then click it." The bot hands a screenshot to a vision-capable language model. The model returns an action plan: click coordinates, text to type, scroll deltas. Page changes? The model re-grounds against the new screenshot instead of dying on a missing selector. DOM access stays useful as a fallback when the visual cue is ambiguous, and the strongest stacks run both.

That premise is Skyvern's whole architecture. It drives a LLM with vision (OpenAI or Anthropic by default), drops to DOM extraction when vision is uncertain, and exposes a Python SDK and HTTP API (docs.skyvern.com). UiPath came to vision later, as the AI Computer Vision activity set, and its default authoring path is still selector-driven workflows. That ordering (vision-first vs vision-as-fallback) is the cultural fault line under the whole comparison.

AGPL-3.0 Section 13: what it means if you embed Skyvern in a SaaS

This is the clause that decides the pick for most teams shipping a product. Skyvern is licensed under the GNU Affero General Public License v3.0. The clause that matters for SaaS embedding is Section 13 (Remote Network Interaction). We quote the FSF source verbatim below, because paraphrasing this clause from memory is the single most common way teams walk into trouble:

Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.

Source: GNU AGPL-3.0 Section 13

Plain reading: modify Skyvern, run that modified version as a network-accessible service other people interact with, and you owe those users the source of your modifications. The trigger is "users interact with it over a network." It is not "users receive a binary."

Three scenarios cover almost everyone:

Internal self-host, your team only. No problem. Section 13 triggers on external users; employees on your own infrastructure are fine.
SaaS product using Skyvern under the hood. The clause triggers. You owe customers the Corresponding Source of your modified version, not your entire SaaS codebase. The unresolved question is where the modified version ends. Patch Skyvern's planner and that patch is plainly in scope. Call an unmodified Skyvern as a separate process over its HTTP API and you have an aggregation argument that your app is not a derivative work. Import Skyvern as a Python library and call into its internals from your service and the line blurs, because AGPL's copyleft follows the derivative-work boundary, which under copyright law turns on linkage and integration, not on whether you edited a file. That boundary is a fact question a court decides, not a setting you configure. Architect the integration as an out-of-process API call on purpose if you want the aggregation argument to hold, and have counsel confirm it before launch.
Skyvern Cloud or separate commercial license. The AGPL obligation is replaced by commercial license terms.

The AGPL is not a trap. It is a deliberate copyleft posture engineered to route commercial embedders toward paying for the hosted product instead of source-sharing. The cost of getting the derivative-work boundary wrong is retroactive, so the fifteen minutes with counsel and a read of the Skyvern LICENSE file buys more than a forum thread does.

Decision matrix: four typical team profiles

This is the section we would actually use to recommend a pick.

Team profile	Pick	Why
Ops team running ten internal workflows against vendor portals that change layouts often	Skyvern (self-host)	Vision-first removes selector maintenance; per-bot pricing model does not match the workload; AGPL is fine for internal use.
Engineering team embedding RPA inside a SaaS product they sell to customers	UiPath (or Skyvern with a commercial license)	AGPL network-use clause kicks in for the SaaS case. Without a commercial Skyvern license, UiPath is the simpler legal posture.
Regulated enterprise (financial services, healthcare) replacing a Selenium estate	UiPath	Audit, role-based access control, and procurement story are not yet matched on Skyvern.
Indie or small team that wants to ship one or two browser agents without a per-bot bill	Skyvern (self-host)	Cost curve scales with task volume rather than bot count; LLM bill is the only variable.

Skyvern Cloud is the hosted version of the same engine. It removes the Section 13 self-host obligation for embedders willing to pay for it, which is precisely the commercial-license escape the AGPL is designed to sell.

Side-by-side rubric

Where a verdict depends on your stack rather than the tool, the dependency is named inline so you can resolve it against your own numbers.

Build speed for a single workflow

There is one crossover variable: how often does the target page change. Against stable selectors a UiPath developer in Studio matches a Skyvern prompt on first-build time, and the recorder-captured selectors cost nothing to maintain because nothing moves. The economics invert the moment the target is a third-party portal you do not control. Selector-driven automation pays a maintenance tax proportional to the target's redesign cadence; vision-first pays a per-run inference tax that is flat against redesigns. A vendor portal shipping a new component framework every quarter turns that maintenance tax into the dominant line item, and that is where Skyvern's vision path stops being a preference and becomes the cheaper system.

Total cost (per-bot vs self-host)

UiPath is per-bot priced for unattended automation, with separate Studio author seats and an Orchestrator runtime (uipath.com/pricing). Public list prices vary by region and contract; teams typically pay in the low four figures per unattended bot per month at modest volume. Skyvern self-host has no per-bot license cost; the variable cost is LLM inference per task, which scales with complexity rather than bot count.

# AI-generated cost-projection sketch.
# Treat the per-token math as a back-of-envelope, not a quote.
# <!-- ai-generated -->

monthly_runs = 1_000

# Skyvern self-host: server + LLM inference per run.
infra_per_month_usd = 200             # small VM + Postgres
llm_tokens_per_run = 60_000           # vision + planning, varies by task
llm_usd_per_1k_tokens = 0.005         # blended vision + chat estimate
skyvern_total = infra_per_month_usd + (monthly_runs * llm_tokens_per_run / 1000) * llm_usd_per_1k_tokens

# UiPath: per-bot license, public-list anchor only.
uipath_unattended_bots = 2
uipath_per_bot_usd = 1_500
uipath_total = uipath_unattended_bots * uipath_per_bot_usd

print(f"Skyvern self-host (sketch): ${skyvern_total:,.0f}/mo")
print(f"UiPath two unattended bots (sketch): ${uipath_total:,.0f}/mo")

The shape of the curve decides this, not the point estimate. UiPath's cost is a step function in bot count and flat in run volume: two unattended bots cost the same whether they run 100 or 100,000 jobs a month. Skyvern's cost is near-zero fixed plus a linear per-run inference charge: cheap at low volume, unbounded as runs climb. With the sketch's inputs (~$0.30 per run, ~$200 infra) Skyvern undercuts two UiPath bots until roughly 9,300 runs a month, then crosses. That crossover, not a vendor list price, is the number to compute. A few low-volume workflows favor Skyvern decisively; a single very high-volume workflow can favor UiPath's flat-rate bot. Run the script against your real numbers before either side claims the win.

Vision model and accuracy on dynamic UIs

Skyvern's posture is vision-first with a DOM fallback. The model is pluggable; the documented defaults are a vision-capable OpenAI model with Anthropic as an alternative. Accuracy on a given page therefore depends on the model you pick, not on Skyvern's plumbing. UiPath's AI Computer Vision activity uses UiPath's own hosted vision service (docs.uipath.com) and sits as a fallback inside a selector-driven workflow. For pages with shifting layouts, vision-first is the better fit; for ironclad selectors with no design churn, vision rarely decides.

License posture summary

The Section 13 reading above is the long version. Short version for the rubric: Skyvern is AGPL-3.0 (LICENSE) and UiPath is a closed-source EULA. AGPL is fine for self-hosted internal use, fine for selling Skyvern services if you offer modifications to your network users, and a real problem for embedding Skyvern inside a closed SaaS without a separate commercial license. UiPath's posture has none of that nuance, with the matching trade-off that you are paying enterprise license fees instead.

Audit and governance for regulated teams

UiPath has the deeper audit story today. Orchestrator ships role-based access control, asset vaulting, queue-level audit logs, and an enterprise compliance posture (SOC 2, ISO 27001, regional data residency). Skyvern's self-host gives you full control of the data plane (prompts, screenshots, and traces never leave your infrastructure), but the audit features are what you build on top of the artifact bundle. For a small ops team controlling the deployment, the bundle Skyvern produces per task (final screenshot, HAR, LLM trace per the docs) is enough; for a regulated enterprise replacing a UiPath estate, the audit gap is real and you should price the build cost in.

When UiPath still wins

The UiPath wins above are not edge cases. RPA's biggest installed base is in financial services, insurance, and healthcare, and those buyers are paying for the audit story, the partner ecosystem, and the multi-year contract structure as much as for the runtime.

Desktop-app automation is the other clean UiPath win. Skyvern is browser-shaped. If half your workflow is driving a Citrix-served Windows app, UiPath's desktop activity catalog is still the default answer, no contest. A hybrid stack (UiPath for desktop, Skyvern for the browser-native portion) is a pattern real teams ship in production, not a compromise.

How UiPath authoring differs in shape

UiPath's authoring shape is a flowchart in Studio, and that shape is what decides the team-fit question. In UiPath you build a flowchart in Studio (a Windows desktop app), drag in Open Browser, Type Into, and Click activities, point each one at a UI element the recorder captured, and publish the workflow as an unattended job to UiPath Orchestrator (docs.uipath.com, uipath.com/product). Where a stable selector is hard to capture, AI Computer Vision activities sit underneath as a fallback.

The artifact in UiPath is a .xaml flowchart and an Orchestrator deployment, not a natural-language prompt. Skyvern's first user types a prompt and reads a screenshot per task; UiPath's first user holds a flowchart in their head and reasons about per-activity selectors and Orchestrator queues. That difference shows up in hiring, in the time-to-first-working-bot, and in what breaks first when the target page changes. It does not show up in any feature checklist; you only see it once you have the team trying to author a workflow.

How to evaluate this for your team (no sandbox required)

Before you commit to either tool, walk through these questions. The answers usually pick the tool for you.

License posture. Do you ship a SaaS product whose users interact with the RPA layer over a network? If yes, AGPL Section 13 is in scope and either you commit to source-sharing your Skyvern modifications, you license Skyvern Cloud, or you pick UiPath. If your use is internal-only, the clause does not trigger and Skyvern self-host is fine.
Target-page volatility. Pull the last twelve months of your "RPA bot broke" incidents (or your Selenium/Playwright equivalent if you have not deployed RPA yet). What fraction were "the page changed and the selector died"? At more than ~30%, vision-first pays for itself in maintenance time. Below ~10%, selector-driven UiPath is fine.
Volume shape. Sketch the cost script in the rubric above against your actual numbers: monthly task runs, average LLM token spend per run for the model you would pick, and the UiPath bot count you would need. If the curves cross above your volume, Skyvern wins on cost; if they cross below, UiPath wins.
Team headcount. Skyvern self-host requires people who can run a Docker compose stack, manage Postgres, rotate LLM API keys, and read an LLM trace when a task fails. UiPath requires Studio authors and an Orchestrator admin. A two-person ops team usually has the first set; a fifty-person enterprise RPA practice usually has the second.
Browser-vs-desktop split. What fraction of your target workflows are browser-only versus include desktop apps? Skyvern is browser-shaped. If more than ~20% of your workflow steps live in a Windows desktop app, UiPath (or a hybrid with Skyvern for the browser portion) is the default.
Audit and procurement. Does your security team require SOC 2, ISO 27001, role-based access control with audit logs, and a vendor-of-record for the runtime? UiPath ships those out of the box. Skyvern's artifact bundle is enough for a small ops team's internal audit; replicating the enterprise audit posture on Skyvern is a build project, and you should scope it before you commit.
Exit cost. A Skyvern workflow is a natural-language prompt plus a target URL; rewriting against a different vision-first agent (or against browser-use directly) is hours, not weeks. A UiPath estate is .xaml flowcharts, Orchestrator queues, and per-bot license commitments; migrating off is a real project. Price the lock-in into your decision.

If you answer most of these toward Skyvern, the next step is a self-host pilot on a non-production target page. If you answer most toward UiPath, the next step is a Studio Community Edition trial against the same target. Either way, do that on your own form, not someone else's demo page.

How this fits with the rest of the agent stack

If your workflow is browser-based and the goal is "fill this form, click through this portal," Skyvern is the right shape. If the workflow is "an LLM agent that uses tools and occasionally drives a browser," you are looking at a multi-agent framework with browser-use as a tool, not a vision-RPA platform. Our LangGraph in production writeup covers the state-graph patterns orchestration sits inside, our n8n AI Agent nodes review covers the low-code path for ops teams, and the best ops-automation tools roundup catalogs the broader category.

FAQ

Is Skyvern's accuracy actually competitive with UiPath's selector-based bots? On stable pages with stable selectors, traditional UiPath bots are near-100% accurate; vision-first systems trade a little of that for resilience to layout change. For shifting third-party portals, vision-first is typically more reliable in production because the failure mode is not "the selector broke today."

Can we run Skyvern without an OpenAI or Anthropic key? The documented defaults are OpenAI and Anthropic, but the model layer is configurable, and any vision-capable chat completion endpoint matching the supported provider shape is wirable. Self-hosted vision models are a real option for teams that need the data plane fully on-prem.

How does the AGPL clause affect a regulated enterprise running Skyvern internally? It does not. Internal employee use over your own corporate network is not the network-interaction case AGPL Section 13 targets. The clause triggers when external users interact with a modified Skyvern remotely.

Does UiPath have an answer to vision-first RPA? Yes, the AI Computer Vision activity set (docs.uipath.com). It is positioned as a fallback inside an otherwise selector-driven workflow, not as the default authoring path. The cultural difference between "vision is the default" and "vision is a fallback" is real even when the underlying capability overlaps.

Should we wait for Skyvern to ship enterprise audit features before evaluating? For the regulated-enterprise profile, yes. For the ops-team profile, no. The artifact bundle Skyvern produces (screenshot, HAR, LLM trace per task per the docs) is already enough for a credible internal audit story.

Can we mix UiPath and Skyvern in one estate? Yes, and for a lot of teams it is the right answer. Run UiPath for the desktop-app and Citrix-served portions where its activity catalog and Orchestrator audit story are unmatched, and run Skyvern for the browser-native portion where vision-first removes the selector-maintenance loop. The seam to watch is operational, not legal: two runtimes means two monitoring stories and two on-call paths. Price that in. The AGPL Section 13 reading above still applies to whichever part of the workflow Skyvern touches if that part is network-exposed to external users.