🔍 AI & Graphics Research Report

May 1, 2025 · Compiled by Hermes Agent for Vladfx

🔍 AI & Graphics Research Report

May 1, 2026 · Compiled by Hermes Agent for Vladfx

📋 Table of Contents

🔴 AI Video Generation — Seedance 2.0, Kling, Runway Gen-4, Sora, Pika, Luma, Hailuo, Vidu
🟠 Large Language Models — GPT-4.1, Claude Opus 4, Gemini 2.5, Llama 4, Qwen 3, DeepSeek, Mistral
🟡 AI 3D & Graphics — Meshy v4, Tripo3D, Rodin, Gaussian Splatting, AI Texturing
🔵 VFX Pipeline Integration — Houdini 20.5, UE 5.4, Nuke 16, Silhouette 2025, AE 25.2
🔑 Key Takeaways

🎬 AI Video Generation

🟢 Seedance 2.0 NEW

Priority Tool Background Replacement

Release: April 2026 — ByteDance's flagship AI video generation model
Key Features:
- Background replacement preserving character identity, camera movement, and performance
- Support for image + text prompt to video (5s and 10s clips)
- Character consistency across shots via reference image input
- Camera motion control (pan, tilt, dolly, orbit)
- Native audio generation for dialogue and sound effects
- Anti-slop motion refinement for natural movement
Access: Available via Doubao app (China), international access through API/partners
VFX Relevance: Best-in-class for background replacement work — your primary use case

🟢 Kling AI UPDATED

Priority Tool Master Model

Current Version: Kling 1.6 / Master model (April 2026)
Key Updates:
- Master model: Higher quality generation with better motion coherence
- Improved character consistency across multi-shot sequences
- Lip sync feature for dialogue scenes
- Motion brush for directed character movement
- Extended duration support (up to 10s at 1080p)
API: Available via Kling Open Platform — REST API with SDK
Pricing: Credit-based; Pro tier ~$7/mo for 660 credits, Premier ~$23/mo
VFX Relevance: Strong competitor to Seedance for video generation; lip sync is unique

🟢 Runway Gen-4 UPDATED

Priority Tool Character Consistency

Current Version: Gen-4 (released April 2026)
Key Features:
- Reference image system for consistent character/environment across shots
- Scene-level control: define location + character + action separately
- Camera direction with natural language prompts
- 10s generation at 720p/1080p
- Inpainting/outpainting for video regions
API: Available — REST API for enterprise, with SDKs
Pricing: Standard $15/mo (125 credits), Pro $35/mo (500 credits), Unlimited $95/mo
VFX Relevance: Best reference system for multi-shot consistency; inpainting useful for cleanup

🔵 OpenAI Sora LAUNCHED

Status: Available in ChatGPT Plus/Pro since December 2024, continued updates through April 2026
Key Features:
- Text-to-video up to 20s, image-to-video, video-to-video remix
- Storyboard mode for multi-scene generation
- Loop and blend features for seamless transitions
- Max resolution 1080p
Pricing: Included in ChatGPT Plus ($20/mo, 50 vids/mo), Pro ($200/mo, unlimited + higher res)
Limitations: No API yet; watermark on free/Plus tier; physics sometimes off
VFX Relevance: Good for concept/previs; not production-ready for VFX pipelines yet

🔵 Pika UPDATED

Current Version: Pika 2.0+ (April 2026)
Key Features:
- Pika Effects: Incredibly diffuse, melt, explode, crush, and more visual effects
- Lip sync with uploaded audio
- Scene edit: Modify specific regions while preserving the rest
- Outpainting to extend video frames
Pricing: Free tier (250 credits), Standard $8/mo, Pro $28/mo, Unlimited $70/mo
VFX Relevance: Pika Effects are unique for VFX-style transformations; scene edit useful for comp work

🔵 Luma Dream Machine UPDATED

Current Model: Ray2 (April 2026)
Key Features:
- Ray2: Improved motion quality and physical plausibility
- Camera motion control (orbit, pan, dolly)
- Keyframe animation for precise motion control
- LoRA training for style/character consistency
API: Available via REST API
Pricing: Free tier (30 gens/mo), Standard $24/mo, Pro $76/mo
VFX Relevance: Camera control is excellent; LoRA training for consistent styles

🟡 Hailuo AI (Minimax) UPDATED

Current Version: Hailuo/Minimax Video-01 (April 2026)
High-quality text-to-video with strong motion coherence
Subject reference feature for character consistency
Available via API through Minimax platform
Competitive quality with Kling at lower price points

🟡 Vidu UPDATED

Current Version: Vidu 1.5+ (2026)
Character reference for consistent subjects
Fast generation speed (4s clip in ~30s)
Available via API
Strong in Asian market; growing international presence

🟡 New Entrants to Watch

Haiper 2.0: Improved text-to-video with better temporal consistency
Google Veo: Google's video generation model, available through Google AI Studio/Vertex
Stable Video Diffusion 2.0: Open-source video generation, improving rapidly

🧠 Large Language Models

OpenAI — GPT-4.1 NEW

Released: April 14, 2026
Three tiers: GPT-4.1, GPT-4.1 mini, GPT-4.1 nano
All have 1M token context window
Better instruction following & coding vs GPT-4o, cheaper too
Pricing: $2.00/$8.00 (flagship), $0.40/$1.60 (mini), $0.10/$0.40 (nano) per 1M in/out
GPT-5 still in preview — no public release date

Anthropic — Claude Opus 4 & Sonnet 4 NEW

Released: May 2026
Opus 4: New flagship — tops SWE-bench & GPQA, best-in-class coding
Sonnet 4: Near Opus 3.5 performance at Sonnet pricing
Both feature extended thinking & tool use as first-class features
200K context window, excellent vision capabilities
Opus 4 pricing: $15.00/$75.00 | Sonnet 4: $3.00/$15.00 per 1M in/out

Google — Gemini 2.5 Pro & Flash FLASH NEW

Pro: Tops LMArena leaderboard, #1 reasoning model, 1M context (2M coming)
Flash: Released April 2026 — cheap reasoning model, 1M context
Best multimodal model — native video/audio/image understanding
Built-in thinking mode, code execution, Google Search grounding
Pro pricing: $1.25/$10.00 | Flash: $0.15/$0.60 per 1M in/out
Free tier available through Google AI Studio

Meta — Llama 4 (Scout, Maverick) NEW

Scout: 109B MoE, 10M token context — longest of any open model
Maverick: 400B MoE, competitive with GPT-4o
Open weights, self-hostable. Vision still maturing
Behemoth (288B active) still training
Hosted pricing: ~$0.20-0.80/1M tokens
⚠️ Benchmark controversy: Meta submitted dev-only variant to LMArena

Alibaba — Qwen 3 NEW

Released: April 29, 2026 — full family 0.6B to 235B (MoE, 22B active)
Open weights (Apache 2.0), competitive with Claude Sonnet 4
Hybrid thinking mode (toggle fast vs reasoning)
Qwen3-VL (multimodal) expected soon
Hosted pricing: ~$0.50-1.50/1M tokens for 235B

DeepSeek

R1 (Jan 2026): Best value reasoning model, open-weight (MIT)
V3-0324 (March update): Improved coding, competitive with GPT-4o
Pricing: R1 $0.55/$2.19 | V3 $0.27/$1.10 per 1M in/out
No native vision — limitation for image tasks
R2 rumored but not released

Mistral — Medium 3

Released April 2026, competitive with GPT-4o/Sonnet 3.5
Pricing: $0.40/$2.00 per 1M in/out
Unique: available for on-premises deployment (proprietary but self-hostable)
Small 3.1 (open-weight) adds vision capabilities

📊 LLM Comparison for VFX Workflows

Model	Coding	Vision	Context	Price (in/1M)	Best For
Claude Opus 4	★★★★★	★★★★★	200K	$15.00	Complex coding, creative direction
Claude Sonnet 4	★★★★½	★★★★	200K	$3.00	Daily coding, prompt engineering
GPT-4.1	★★★★	★★★★	1M	$2.00	Long-context tasks, tool use
Gemini 2.5 Pro	★★★★½	★★★★★	1M	$1.25	Reasoning, video analysis
Gemini 2.5 Flash	★★★½	★★★★	1M	$0.15	Cheap reasoning, batch work
Llama 4 Scout	★★★	★★	10M	~$0.30	Massive context, self-hosted
Qwen3-235B	★★★★	★★★	128K	~$1.00	Open-weight coding, cheap bulk
DeepSeek R1	★★★★	★★	128K	$0.55	Math/reasoning, budget coding
Mistral Medium 3	★★★½	★★★	128K	$0.40	On-prem coding, enterprise

🎨 AI 3D & Graphics

Meshy v4 NEW

Major release (April 2026): improved quad mesh topology, PBR texture generation
API v4 with Python/Node SDKs, batch processing for production-scale assets
Blender addon updated for v4 API
Pricing: Free tier + Pro $20/mo + API at $0.05/model

Tripo3D V2.5 UPDATED

Multi-view generation with improved consistency
"TripoSG" — sparse-guided generation via sketch/depth
Production API with webhook callbacks, ComfyUI nodes
Export: GLB, OBJ, FBX, USDZ with PBR textures
Pricing: Free 10/mo, Pro $15/mo

Rodin Genie 2.0 UPDATED

Full-body avatar generation from single photo
Unreal Engine plugin with blendshape export
New "Studio" mode for posing and expression editing

🌐 Gaussian Splatting & Neural Rendering

RealityCapture v2025.1: Experimental Gaussian splat export — directly relevant to your workflow
Polycam: Full splat capture pipeline (iPhone LiDAR → Unity/UE)
4D Gaussian Splatting: NVIDIA research — dynamic temporal splats from video
Compression: 10-50x size reduction (Splatfacto/nerfstudio)
UE5.4: Community Niagara-based renderer; no official Epic support yet

🖌️ AI Texturing Tools

Substance 3D Painter: Firefly text-to-texture with PBR channel generation, seamless tiles
Layer AI: Project-wide style consistency, batch texturing, USD/glTF export
Polyhive: API-driven AI texturing for pipeline integration
Meshy v4 Texture: Improved PBR with better seam handling

🔓 Open Source 3D AI

TRELLIS (Microsoft Research): Structured 3D generation from images — released April 2026
Stable Fast 3D (Stability AI): Fast local image-to-3D, ComfyUI nodes available
ComfyUI 3D Nodes: Growing ecosystem for Tripo, Meshy, Stable Fast 3D

🔧 VFX Pipeline Integration

Houdini 20.5 NEW

Copilot (Expanded): AI assistant for Python/VEX — now covers SOPs, VOPs, DOPs
ML Deformer SOP (experimental): Learned character deformations
Neural Render SOP (experimental): AI-accelerated render preview
PDG AI Integration: TOP networks can call cloud AI APIs (Meshy, Tripo) as tasks
Community: Gaussian splat .ply importer HDA (Orbolt)

Unreal Engine 5.4 NEW

ML Deformer v2: Better quality, lower latency for real-time character deformation
Neural Network Module: More inference operators for custom ML in-engine
ML-accelerated Lumen GI sampling for better performance
AI Virtual Production: Camera tracking refinement + LED wall calibration
RealityScan → UE: Direct Nanite mesh export with auto LODs

Nuke 16 NEW

NukeX Copilot: AI assistant for node graph navigation & expression writing
ML Roto Node (Improved): Better edge refinement & temporal coherence
AI Color Match: ML-based color matching between shots
Smart Vector Distort: ML-driven vector generation for warping/aligning

Silhouette 2025 NEW

AI Roto v3: Hair & transparent edges, multi-object tracking, temporal stabilization
AI Paint: Content-aware ML paint & clone tool
Nuke Inviso plugin: Full AI roto data exchange with Nuke

After Effects 25.2 UPDATED

Roto Brush 3 (preview): Next-gen ML rotoscoping
Firefly Generative Fill: Text-prompt-based fill for regions
Content-Aware Fill improved with AI-driven fill

Wonder Studio (Adobe) UPDATED

Improved CG character compositing with AI lighting match
Better body tracking for complex poses
Now exports to After Effects with proper layer structure

📊 Pipeline Readiness Ratings

Tool	Pipeline Ready	Integration
Meshy v4 API	★★★★	REST API, USD export
Tripo3D API	★★★★	REST API, ComfyUI, USD/FBX
Substance 3D AI	★★★★★	Native in Painter + Houdini plugin
Houdini Copilot	★★★	Built-in to 20.5
Nuke 16 ML Roto	★★★★★	Native in NukeX
Silhouette 2025 AI Roto	★★★★★	Nuke plugin (Inviso)
AE Roto Brush 3	★★★★	Native in AE 25.2
UE 5.4 ML Deformer v2	★★★★	Native in UE 5.4
RealityCapture splats	★★★	Experimental .ply export

🔑 Key Takeaways

🎯 For Your VFX Workflow

Use Case	Best Choice
Background replacement	Seedance 2.0
Multi-shot character consistency	Runway Gen-4 or Kling Master
VFX-style transformations	Pika Effects
Complex pipeline coding	Claude Opus 4
Daily driver (cost/perf)	Claude Sonnet 4 or GPT-4.1
Vision/video analysis	Gemini 2.5 Pro
Cheapest quality	Gemini 2.5 Flash ($0.15/1M in)
3D asset generation	Meshy v4 or Tripo3D
AI roto (Nuke)	Silhouette 2025 AI Roto v3 or Nuke 16 ML Roto
Gaussian splatting	RealityCapture v2025.1 + Polycam

🔮 Coming Soon

GPT-5 — OpenAI's next major model (no date)
Qwen3-VL — Multimodal Qwen3 (expected mid-2026)
DeepSeek-R2 — Next reasoning model (rumored)
Llama 4 Behemoth — Meta's 288B teacher model (in training)
Runway Gen-4 Turbo — Faster Gen-4 variant (expected)

🔍 AI & Graphics Research Report

🔍 AI & Graphics Research Report

🎬 AI Video Generation

🟢 Seedance 2.0 NEW

🟢 Kling AI UPDATED

🟢 Runway Gen-4 UPDATED

🔵 OpenAI Sora LAUNCHED

🔵 Pika UPDATED

🔵 Luma Dream Machine UPDATED

🟡 Hailuo AI (Minimax) UPDATED

🟡 Vidu UPDATED

🟡 New Entrants to Watch

🧠 Large Language Models

OpenAI — GPT-4.1 NEW

Anthropic — Claude Opus 4 & Sonnet 4 NEW

Google — Gemini 2.5 Pro & Flash FLASH NEW

Meta — Llama 4 (Scout, Maverick) NEW

Alibaba — Qwen 3 NEW

DeepSeek

Mistral — Medium 3

📊 LLM Comparison for VFX Workflows

🎨 AI 3D & Graphics

Meshy v4 NEW

Tripo3D V2.5 UPDATED

Rodin Genie 2.0 UPDATED

🌐 Gaussian Splatting & Neural Rendering

🖌️ AI Texturing Tools

🔓 Open Source 3D AI

🔧 VFX Pipeline Integration

Houdini 20.5 NEW

Unreal Engine 5.4 NEW

Nuke 16 NEW

Silhouette 2025 NEW

After Effects 25.2 UPDATED

Wonder Studio (Adobe) UPDATED

📊 Pipeline Readiness Ratings

🔑 Key Takeaways

🎯 For Your VFX Workflow

🔮 Coming Soon

📚 Reference Links