Automate YouTube Shorts with CapCut: The CLI + Claude Pipeline

Automating YouTube Shorts is a four-step pipeline: pick the right 60-second segment, write a hook that holds attention, build the CapCut draft programmatically, and render. The first and second steps benefit from a model. The third step needs a deterministic tool that can write CapCut project files. The fourth is a render button. This guide is about steps one through three, and the open-source CLI I built to make step three possible.

Want the full pipeline ready to run? I package the complete viral-shorts system — story selection, hook templates, the Claude skill that drives capcut-cli end-to-end — as the Viral Story Shorts Blueprint. The CLI below is the engine. The blueprint is the workflow.

Why automate this at all

A daily YouTube Short is between forty-five minutes and two hours of work if you do it by hand. Pick a clip, scrub for the right twenty seconds, type subtitles, time them, add a hook card, position a CTA, render. Most of that work is mechanical. The parts that actually move the metric — segment selection and hook copy — take fifteen percent of the time. The other eighty-five percent is moving rectangles around inside CapCut.

If you want to ship one Short a day, that mechanical work compounds into a part-time job. If you want to ship five a day across three channels, it becomes a full-time job, and the quality drops because you are tired of moving rectangles. Automation flips the ratio: the model spends compute on the parts that need judgement, the CLI spends milliseconds on the parts that don’t.

The hard constraint is that CapCut has no public API. You can’t POST /drafts and get back a project file. The draft format is a deeply nested draft_content.json with timing in microseconds, subtitle text buried inside escaped JSON-in-JSON, and segment IDs that have to match material IDs that have to match track IDs. Manual editing is brittle. This is exactly the gap capcut-cli fills.

What capcut-cli does

capcut-cli is an open-source Node CLI that reads and writes CapCut and JianYing project files directly. It exposes the project as a list of commands instead of a JSON tree:

npm install -g capcut-cli

capcut info ./project              # overview
capcut texts ./project             # list subtitles
capcut set-text ./project a1b2c3 "New subtitle"
capcut shift-all ./project +0.3s --track text
capcut cut ./project 1:00 2:00 --out ./short.json

JSON output by default, so it pipes into jq and is callable from any language. A --human flag for table output when you read it with your eyes. Backups on every write. Same binary works on macOS, Windows, and Linux. The package and source live on npm and GitHub.

The point is not the CLI by itself. The point is that with a deterministic CLI in front of CapCut’s project format, an LLM can drive the editing step without hallucinating timestamps or breaking the schema.

The four-step pipeline

Here is the shape of the automation. Each step is independent and replaceable.

1. Pick the segment

Long-form video has a few moments worth clipping. The rest is filler. Segment selection is judgement, but it is bounded judgement: given a transcript with timestamps, pick the sixty-second window with the strongest hook potential.

# Export the transcript from your CapCut project
capcut export-srt ./long-video > transcript.srt

# Send to Claude with a prompt that returns a JSON array of candidate ranges
claude --prompt "Rank candidate 60s windows from this SRT. Return JSON: [{start_ms, end_ms, hook_strength, rationale}]" \
       --input transcript.srt > candidates.json

# Pick the top window
START=$(jq -r '.[0].start_ms / 1000 | floor' candidates.json)
END=$(jq -r '.[0].end_ms / 1000 | floor' candidates.json)

# Cut the segment
capcut cut ./long-video "${START}s" "${END}s" --out ./short.json

capcut cut clips edge segments at the boundary, rebases timing to zero, drops empty tracks, and cleans up materials that no longer have segments. The output is a standalone draft, not a slice that secretly references the parent project.

2. Generate the hook

The opening three seconds of a Short determine whether a viewer scrolls. The Claude API with a sharp system prompt is good at this. Cheap, too — Haiku is enough.

HOOK=$(claude --model haiku-4.5 \
  --prompt "Write three opening hook variants for a YouTube Short whose body is: <SEGMENT_TRANSCRIPT>. \
            Constraints: under 8 words, no clickbait, must connect to the body. \
            Return JSON: [{hook, predicted_retention, why}]" \
  | jq -r '.[0].hook')

capcut add-text ./short.json 0s 3s "$HOOK" \
  --font-size 28 --color "#FFD700" --align 1 --y -0.4

Note the position parameters. --y -0.4 puts the hook in the upper third where the eye lands first. --align 1 centres it. These are the parts that an editor would do by hand and that a model has no spatial intuition about — the CLI carries them as defaults.

3. Assemble the draft

Once you have the segment and the hook, the rest of the draft is deterministic. Add the lower-third, the call-to-action card at the end, the music bed, the CTA arrow if you use one. Every element is a one-liner.

# Lower-third with channel handle
capcut add-text ./short.json 5s 15s "@your-channel" \
  --font-size 14 --color "#FFFFFF" --align 0 --x -0.4 --y -0.45

# End-card CTA
capcut add-text ./short.json 55s 5s "Full video in description" \
  --font-size 14 --color "#FFFFFF" --align 1 --y 0

# Music bed at low volume
capcut add-audio ./short.json ./music/lofi-loop.mp3 0s 60s --volume 0.2

Or as a batch, which is faster because it parses the project file once:

echo '{"cmd":"add-text","start":"0s","duration":"3s","text":"Hook line","y":-0.4}
{"cmd":"add-text","start":"5s","duration":"15s","text":"@your-channel","y":-0.45,"x":-0.4}
{"cmd":"add-text","start":"55s","duration":"5s","text":"Full video in description","y":0}' \
  | capcut batch ./short.json

Templates make this even faster across multiple Shorts. Save a styled title once, stamp it into every new short:

# Once, save your house-style title
capcut save-template ./reference-short a1b2c3 "house-title" --out ~/.shorts/house-title.json

# Every new short
capcut apply-template ./short.json ~/.shorts/house-title.json 0s 3s "$HOOK"

4. Open, review, render

CapCut still owns the render step and a few keyframe-heavy effects the CLI doesn’t expose. Open the draft in CapCut, eyeball the timing, render to MP4. The automation puts you at the start of the render step, not all the way to upload. That is intentional. You want a human review gate before something with your name on it goes to YouTube.

For upload, the YouTube Data API works fine. yt-upload --file ./short.mp4 --title "$TITLE" --description "$DESCRIPTION" is the canonical pattern. Don’t automate uploads without a review gate either — bad metadata is much harder to fix than bad pacing.

What makes Shorts go viral (the hard part)

The pipeline above gets you to ship velocity. It does not get you to viral. Viral is segment selection, hook quality, and packaging — all of which are creative judgements the CLI doesn’t make.

A few patterns that move the metric:

Story arc, not summary. A Short with a beginning, middle, and end retains better than a Short that summarises a longer video. The CLI lets you re-cut the same source material into different micro-narratives; pick the one with the strongest arc.

Front-load the payoff cue. Tell the viewer what they will get inside the first two seconds, even if the actual payoff is at 0:45. This is the hook job and it is the highest-leverage line in the entire short.

Subtitles always, large and high-contrast. Most Shorts are watched on mute. The CLI places subtitles via capcut add-text with --y controlling vertical position; default to the lower third, large font, white-on-shadow.

One CTA, end of clip, no link spam. Full video in description outperforms three competing CTAs. Add it via capcut add-text at the 55s mark with a fade-in if you want polish.

These are creative decisions the CLI cannot make for you. The CLI makes them cheap to execute repeatedly so you can A/B them.

The complete blueprint

If you want the full pipeline — the prompts that pick segments well, the hook templates that have actually shipped, the Claude skill that orchestrates capcut-cli end-to-end with no manual glue — that is what I package as the Viral Story Shorts Blueprint. Drop-in for Claude Code, ready to run on top of the open-source CLI.

If you want only the CLI: it is on npm and GitHub, MIT-licensed, no telemetry, no upsell in the binary.

Author

I’m Rene Zander. I build AI-driven content automation systems for solo operators and small teams. More guides on renezander.com, or hire me for a custom build.