Driving input¶
The interaction layer of ComposeAutomator sits on top of RobotDriver and dispatches
mouse, keyboard, and clipboard input to whatever surface the target node lives in.
All interaction methods (click, doubleClick, longClick, swipe, scrollWheel,
typeText, clearAndTypeText, pressKey, pressEnter) are suspend — call them
from a coroutine. Real java.awt.Robot work runs inline when the caller is already
off the AWT event dispatch thread, and hops to Dispatchers.IO only when needed to
keep EDT callers from blocking the UI. Internal sleeps use delay rather than
Thread.sleep, so a cancelled coroutine cancels mid-longClick / mid-swipe rather
than parking the worker thread until the hold completes.
screenshot stays sync — it's a single framebuffer read, no blocking I/O to bury
behind a coroutine boundary.
The snippets below are written as if they sit inside a suspend block (e.g. a JUnit
test wrapped in runBlocking { … }).
Mouse: clicks and drags¶
val send = automator.findOneByTestTag("Send") ?: error("button missing")
automator.click(send)
automator.doubleClick(send)
automator.longClick(send, holdFor = 600.milliseconds)
All click helpers resolve the node's centerOnScreen and dispatch through RobotDriver,
which compensates for HiDPI/display scaling.
Mouse: swipes and scrolling¶
val list = automator.findOneByTestTag("MessageList") ?: error("list missing")
val first = automator.findOneByText("First message") ?: error("first row missing")
val last = automator.findOneByText("Last message") ?: error("last row missing")
// node-to-node drag
automator.swipe(from = first, to = last)
// raw coordinates (HiDPI-corrected)
automator.swipe(
startX = 100, startY = 400,
endX = 100, endY = 100,
steps = 16,
duration = 200.milliseconds,
)
// mouse-wheel scrolling — drives Modifier.scrollable / LazyColumn on desktop
automator.scrollWheel(list, wheelClicks = 5) // scroll down
automator.scrollWheel(list, wheelClicks = -5) // scroll up
scrollWheel is the right helper for desktop scrollable containers — they respond to
wheel events rather than touch-style drags.
Keyboard: typing and key events¶
import java.awt.event.InputEvent
import java.awt.event.KeyEvent
val input = automator.findOneByTestTag("MessageInput") ?: error("input missing")
// click-then-type
automator.click(input)
automator.typeText("Hello, Spectre!")
// click-clear-type in one go (uses key events, not the clipboard)
automator.clearAndTypeText(input, "replacement text")
// clipboard paste for large or Unicode text
automator.pasteText("こんにちは, Spectre!")
// raw key events
automator.pressKey(KeyEvent.VK_TAB)
automator.pressKey(KeyEvent.VK_S, modifiers = InputEvent.CTRL_DOWN_MASK) // Ctrl+S
// shorthand
automator.pressEnter()
pressKey's modifiers parameter takes an AWT modifier mask (InputEvent.CTRL_DOWN_MASK,
InputEvent.SHIFT_DOWN_MASK, …) — not a KeyEvent constant. The driver translates the
mask into the right modifier-key presses around the main keyCode.
typeText dispatches key press/release pairs and does not touch the clipboard. It is
intentionally conservative: ASCII letters, digits, space, newline, and common
US-keyboard punctuation. Use pasteText for large strings or arbitrary Unicode; it
stashes the previous clipboard contents, writes the requested text, dispatches the
platform paste shortcut (Cmd+V on macOS,
Ctrl+V elsewhere), waits for the paste handler to drain, then
restores the previous clipboard contents. See Troubleshooting for
macOS clipboard and apple.awt.UIElement=true caveats.
Screenshots¶
import java.awt.Rectangle
import java.io.File
import javax.imageio.ImageIO
// whole virtual screen
val full = automator.screenshot()
ImageIO.write(full, "png", File("screenshot.png"))
// a single window's Compose surface
val mainWindow = automator.screenshot(windowIndex = 0)
ImageIO.write(mainWindow, "png", File("main.png"))
// a single node
val send = automator.findOneByTestTag("Send") ?: error("button missing")
val sendShot = automator.screenshot(send)
// arbitrary screen region
val region = automator.screenshot(Rectangle(0, 0, 800, 600))
Returns a BufferedImage you can save, hash, or compare against a baseline.
ComposeAutomator.screenshot is a screen-region capture
ComposeAutomator.screenshot(...) captures OS framebuffer pixels for a rectangle and
crops to the requested window, node, or region. Before capturing a window or node,
make sure the target window is visible and brought to the front. If another app
overlaps the rectangle, those overlapping pixels can appear in the image; if the
target is partially off-screen, Spectre can only capture the visible screen area.
For top-level windows, spectre-recording also exposes AutoScreenshotter,
which uses native/window-targeted backends on macOS and Windows, and the Linux
helper on Xorg/Xvfb and Wayland.
Captures are normalised to sRGB
The returned BufferedImage is always sRGB (TYPE_INT_ARGB with an sRGB
ColorModel), regardless of the source display's colour profile. Capturing on a
wide-gamut display (Display P3 on a modern Mac, Adobe RGB, etc.) goes through
the OS's display pipeline and lands in the buffer as sRGB pixels. This keeps
captures portable — a baseline collected on one machine compares meaningfully
against a capture from another — but it means the captured pixel values are
post-display-pipeline, not the raw Color(...) your Compose code passed.
Plan for ±1–2 per-channel rounding noise from the gamma round-trip when you
assert on colour, and use a tolerant comparator (see below).
Bitmap comparison needs tolerance
Don't compare screenshots byte-for-byte against a baseline. Identical-looking frames routinely differ at the pixel level because of:
- Encoder/decoder round-trips (PNG re-saves can shift LSBs).
- Text rendering: subpixel positioning, hinting, font fallback, font version.
- Antialiasing on edges, gradients, and blurs.
- OS- and GPU-driven differences in compositing, gamma, and colour profiles.
- HiDPI scaling at non-integer factors.
Always compare with a tolerance — perceptual diff (e.g., a small ΔE threshold), a per-channel allowance, or a structural metric like SSIM. Region-mask the parts of the UI that are inherently noisy (timestamps, cursors, animations).
Spectre intentionally doesn't ship a screenshot comparison suite — it returns
BufferedImage and lets you wire whatever comparator fits your stack. If
there's demand, a built-in tolerant comparator could land later; open an
issue describing the use case if you'd find it valuable.
For test output that records continuous video rather than per-step images, see Recording.
Real vs. synthetic input¶
The RobotDriver your automator wraps governs how input is actually dispatched. The
public surface:
RobotDriver()— the default. Uses a freshjava.awt.Robotplus the system clipboard. Moves the real cursor, takes system-wide keyboard focus, and is visible to other applications. This is what end users experience and whatComposeAutomator.inProcess()wires up by default. On macOS, the first input or screenshot call lazily probes TCC permissions (Accessibility for input, Screen Recording for capture) and throwsIllegalStateExceptionwith remediation guidance when either is denied — see Troubleshooting.RobotDriver(robot)— same as the no-arg form but reuses an existingjava.awt.Robotyou've already constructed (e.g., one targeted at a non-defaultGraphicsDevice).RobotDriver.synthetic(rootWindow)— synthetic AWT events posted straight into the target window's AWT hierarchy. No real cursor motion, no global focus, doesn't fight with other processes. Mouse and wheel events hit-test againstrootWindow, its owned windows, and other visible top-level windows. Key events go to the current AWT focus owner when one exists; when AWT has no focus owner (for example, a macOS helper JVM launched withapple.awt.UIElement=true), Spectre falls back to the key-listening AWT descendant under the last pointer target or Compose host. That lets Compose Desktop's internal focus model routetypeTextinto focusedTextFields even when the host window is not the OS-foreground app.screenshot()under a synthetic driver still uses the OS framebuffer viaRobot.createScreenCapture, so screenshots show the pixels the display compositor currently exposes rather than a Swing repaint of the Compose host. On macOS this still requires Screen Recording permission and an unlocked screen, but synthetic input itself does not need Accessibility permission.RobotDriver.headless()— for read-only flows in headless CI where real OS I/O is unavailable. Every input, clipboard, and screenshot call throwsUnsupportedOperationExceptionso an accidentalautomator.click(...)/typeText(...)/screenshot(...)surfaces at the call site instead of silently dropping. Semantics-tree reads still work — pair this withComposeAutomator.performSemanticsClick(node)if you need to fire clicks without going through the OS. See The automator for the full picture.
Pass a non-default driver via the inProcess factory:
import dev.sebastiano.spectre.core.ComposeAutomator
import dev.sebastiano.spectre.core.RobotDriver
val automator = ComposeAutomator.inProcess(
robotDriver = RobotDriver.synthetic(rootWindow = composeWindow),
)
Synthetic input is the right choice when you're running tests in parallel JVMs, when the
test machine also runs unrelated UI work, or when a macOS test helper runs with
apple.awt.UIElement=true to avoid a Dock icon. That macOS mode is safe for
per-character typeText with RobotDriver.synthetic(rootWindow = ...), but
clipboard-backed pasteText still requires a foreground-capable app
(apple.awt.UIElement=false). Stick to real input for end-to-end smokes where the realism
of the input matters (e.g., validating that a system shortcut reaches the app). See
Running on CI for the macOS CI trade-off.