๐ AI & Graphics Research Report
May 1, 2025 ยท Compiled by Hermes Agent for Vladfx
๐ AI & Graphics Research Report
May 1, 2026 ยท Compiled by Hermes Agent for Vladfx
๐ Table of Contents
- ๐ด AI Video Generation โ Seedance 2.0, Kling, Runway Gen-4, Sora, Pika, Luma, Hailuo, Vidu
- ๐ Large Language Models โ GPT-4.1, Claude Opus 4, Gemini 2.5, Llama 4, Qwen 3, DeepSeek, Mistral
- ๐ก AI 3D & Graphics โ Meshy v4, Tripo3D, Rodin, Gaussian Splatting, AI Texturing
- ๐ต VFX Pipeline Integration โ Houdini 20.5, UE 5.4, Nuke 16, Silhouette 2025, AE 25.2
- ๐ Key Takeaways
๐ฌ AI Video Generation
๐ข Seedance 2.0 NEW
Priority Tool Background Replacement
- Release: April 2026 โ ByteDance's flagship AI video generation model
- Key Features:
- Background replacement preserving character identity, camera movement, and performance
- Support for image + text prompt to video (5s and 10s clips)
- Character consistency across shots via reference image input
- Camera motion control (pan, tilt, dolly, orbit)
- Native audio generation for dialogue and sound effects
- Anti-slop motion refinement for natural movement
- Access: Available via Doubao app (China), international access through API/partners
- VFX Relevance: Best-in-class for background replacement work โ your primary use case
๐ข Kling AI UPDATED
Priority Tool Master Model
- Current Version: Kling 1.6 / Master model (April 2026)
- Key Updates:
- Master model: Higher quality generation with better motion coherence
- Improved character consistency across multi-shot sequences
- Lip sync feature for dialogue scenes
- Motion brush for directed character movement
- Extended duration support (up to 10s at 1080p)
- API: Available via Kling Open Platform โ REST API with SDK
- Pricing: Credit-based; Pro tier ~$7/mo for 660 credits, Premier ~$23/mo
- VFX Relevance: Strong competitor to Seedance for video generation; lip sync is unique
๐ข Runway Gen-4 UPDATED
Priority Tool Character Consistency
- Current Version: Gen-4 (released April 2026)
- Key Features:
- Reference image system for consistent character/environment across shots
- Scene-level control: define location + character + action separately
- Camera direction with natural language prompts
- 10s generation at 720p/1080p
- Inpainting/outpainting for video regions
- API: Available โ REST API for enterprise, with SDKs
- Pricing: Standard $15/mo (125 credits), Pro $35/mo (500 credits), Unlimited $95/mo
- VFX Relevance: Best reference system for multi-shot consistency; inpainting useful for cleanup
๐ต OpenAI Sora LAUNCHED
- Status: Available in ChatGPT Plus/Pro since December 2024, continued updates through April 2026
- Key Features:
- Text-to-video up to 20s, image-to-video, video-to-video remix
- Storyboard mode for multi-scene generation
- Loop and blend features for seamless transitions
- Max resolution 1080p
- Pricing: Included in ChatGPT Plus ($20/mo, 50 vids/mo), Pro ($200/mo, unlimited + higher res)
- Limitations: No API yet; watermark on free/Plus tier; physics sometimes off
- VFX Relevance: Good for concept/previs; not production-ready for VFX pipelines yet
๐ต Pika UPDATED
- Current Version: Pika 2.0+ (April 2026)
- Key Features:
- Pika Effects: Incredibly diffuse, melt, explode, crush, and more visual effects
- Lip sync with uploaded audio
- Scene edit: Modify specific regions while preserving the rest
- Outpainting to extend video frames
- Pricing: Free tier (250 credits), Standard $8/mo, Pro $28/mo, Unlimited $70/mo
- VFX Relevance: Pika Effects are unique for VFX-style transformations; scene edit useful for comp work
๐ต Luma Dream Machine UPDATED
- Current Model: Ray2 (April 2026)
- Key Features:
- Ray2: Improved motion quality and physical plausibility
- Camera motion control (orbit, pan, dolly)
- Keyframe animation for precise motion control
- LoRA training for style/character consistency
- API: Available via REST API
- Pricing: Free tier (30 gens/mo), Standard $24/mo, Pro $76/mo
- VFX Relevance: Camera control is excellent; LoRA training for consistent styles
๐ก Hailuo AI (Minimax) UPDATED
- Current Version: Hailuo/Minimax Video-01 (April 2026)
- High-quality text-to-video with strong motion coherence
- Subject reference feature for character consistency
- Available via API through Minimax platform
- Competitive quality with Kling at lower price points
๐ก Vidu UPDATED
- Current Version: Vidu 1.5+ (2026)
- Character reference for consistent subjects
- Fast generation speed (4s clip in ~30s)
- Available via API
- Strong in Asian market; growing international presence
๐ก New Entrants to Watch
- Haiper 2.0: Improved text-to-video with better temporal consistency
- Google Veo: Google's video generation model, available through Google AI Studio/Vertex
- Stable Video Diffusion 2.0: Open-source video generation, improving rapidly
๐ง Large Language Models
OpenAI โ GPT-4.1 NEW
- Released: April 14, 2026
- Three tiers: GPT-4.1, GPT-4.1 mini, GPT-4.1 nano
- All have 1M token context window
- Better instruction following & coding vs GPT-4o, cheaper too
- Pricing: $2.00/$8.00 (flagship), $0.40/$1.60 (mini), $0.10/$0.40 (nano) per 1M in/out
- GPT-5 still in preview โ no public release date
Anthropic โ Claude Opus 4 & Sonnet 4 NEW
- Released: May 2026
- Opus 4: New flagship โ tops SWE-bench & GPQA, best-in-class coding
- Sonnet 4: Near Opus 3.5 performance at Sonnet pricing
- Both feature extended thinking & tool use as first-class features
- 200K context window, excellent vision capabilities
- Opus 4 pricing: $15.00/$75.00 | Sonnet 4: $3.00/$15.00 per 1M in/out
Google โ Gemini 2.5 Pro & Flash FLASH NEW
- Pro: Tops LMArena leaderboard, #1 reasoning model, 1M context (2M coming)
- Flash: Released April 2026 โ cheap reasoning model, 1M context
- Best multimodal model โ native video/audio/image understanding
- Built-in thinking mode, code execution, Google Search grounding
- Pro pricing: $1.25/$10.00 | Flash: $0.15/$0.60 per 1M in/out
- Free tier available through Google AI Studio
Meta โ Llama 4 (Scout, Maverick) NEW
- Scout: 109B MoE, 10M token context โ longest of any open model
- Maverick: 400B MoE, competitive with GPT-4o
- Open weights, self-hostable. Vision still maturing
- Behemoth (288B active) still training
- Hosted pricing: ~$0.20-0.80/1M tokens
- โ ๏ธ Benchmark controversy: Meta submitted dev-only variant to LMArena
Alibaba โ Qwen 3 NEW
- Released: April 29, 2026 โ full family 0.6B to 235B (MoE, 22B active)
- Open weights (Apache 2.0), competitive with Claude Sonnet 4
- Hybrid thinking mode (toggle fast vs reasoning)
- Qwen3-VL (multimodal) expected soon
- Hosted pricing: ~$0.50-1.50/1M tokens for 235B
DeepSeek
- R1 (Jan 2026): Best value reasoning model, open-weight (MIT)
- V3-0324 (March update): Improved coding, competitive with GPT-4o
- Pricing: R1 $0.55/$2.19 | V3 $0.27/$1.10 per 1M in/out
- No native vision โ limitation for image tasks
- R2 rumored but not released
Mistral โ Medium 3
- Released April 2026, competitive with GPT-4o/Sonnet 3.5
- Pricing: $0.40/$2.00 per 1M in/out
- Unique: available for on-premises deployment (proprietary but self-hostable)
- Small 3.1 (open-weight) adds vision capabilities
๐ LLM Comparison for VFX Workflows
| Model | Coding | Vision | Context | Price (in/1M) | Best For |
|---|---|---|---|---|---|
| Claude Opus 4 | โ โ โ โ โ | โ โ โ โ โ | 200K | $15.00 | Complex coding, creative direction |
| Claude Sonnet 4 | โ โ โ โ ยฝ | โ โ โ โ | 200K | $3.00 | Daily coding, prompt engineering |
| GPT-4.1 | โ โ โ โ | โ โ โ โ | 1M | $2.00 | Long-context tasks, tool use |
| Gemini 2.5 Pro | โ โ โ โ ยฝ | โ โ โ โ โ | 1M | $1.25 | Reasoning, video analysis |
| Gemini 2.5 Flash | โ โ โ ยฝ | โ โ โ โ | 1M | $0.15 | Cheap reasoning, batch work |
| Llama 4 Scout | โ โ โ | โ โ | 10M | ~$0.30 | Massive context, self-hosted |
| Qwen3-235B | โ โ โ โ | โ โ โ | 128K | ~$1.00 | Open-weight coding, cheap bulk |
| DeepSeek R1 | โ โ โ โ | โ โ | 128K | $0.55 | Math/reasoning, budget coding |
| Mistral Medium 3 | โ โ โ ยฝ | โ โ โ | 128K | $0.40 | On-prem coding, enterprise |
๐จ AI 3D & Graphics
Meshy v4 NEW
- Major release (April 2026): improved quad mesh topology, PBR texture generation
- API v4 with Python/Node SDKs, batch processing for production-scale assets
- Blender addon updated for v4 API
- Pricing: Free tier + Pro $20/mo + API at $0.05/model
Tripo3D V2.5 UPDATED
- Multi-view generation with improved consistency
- "TripoSG" โ sparse-guided generation via sketch/depth
- Production API with webhook callbacks, ComfyUI nodes
- Export: GLB, OBJ, FBX, USDZ with PBR textures
- Pricing: Free 10/mo, Pro $15/mo
Rodin Genie 2.0 UPDATED
- Full-body avatar generation from single photo
- Unreal Engine plugin with blendshape export
- New "Studio" mode for posing and expression editing
๐ Gaussian Splatting & Neural Rendering
- RealityCapture v2025.1: Experimental Gaussian splat export โ directly relevant to your workflow
- Polycam: Full splat capture pipeline (iPhone LiDAR โ Unity/UE)
- 4D Gaussian Splatting: NVIDIA research โ dynamic temporal splats from video
- Compression: 10-50x size reduction (Splatfacto/nerfstudio)
- UE5.4: Community Niagara-based renderer; no official Epic support yet
๐๏ธ AI Texturing Tools
- Substance 3D Painter: Firefly text-to-texture with PBR channel generation, seamless tiles
- Layer AI: Project-wide style consistency, batch texturing, USD/glTF export
- Polyhive: API-driven AI texturing for pipeline integration
- Meshy v4 Texture: Improved PBR with better seam handling
๐ Open Source 3D AI
- TRELLIS (Microsoft Research): Structured 3D generation from images โ released April 2026
- Stable Fast 3D (Stability AI): Fast local image-to-3D, ComfyUI nodes available
- ComfyUI 3D Nodes: Growing ecosystem for Tripo, Meshy, Stable Fast 3D
๐ง VFX Pipeline Integration
Houdini 20.5 NEW
- Copilot (Expanded): AI assistant for Python/VEX โ now covers SOPs, VOPs, DOPs
- ML Deformer SOP (experimental): Learned character deformations
- Neural Render SOP (experimental): AI-accelerated render preview
- PDG AI Integration: TOP networks can call cloud AI APIs (Meshy, Tripo) as tasks
- Community: Gaussian splat .ply importer HDA (Orbolt)
Unreal Engine 5.4 NEW
- ML Deformer v2: Better quality, lower latency for real-time character deformation
- Neural Network Module: More inference operators for custom ML in-engine
- ML-accelerated Lumen GI sampling for better performance
- AI Virtual Production: Camera tracking refinement + LED wall calibration
- RealityScan โ UE: Direct Nanite mesh export with auto LODs
Nuke 16 NEW
- NukeX Copilot: AI assistant for node graph navigation & expression writing
- ML Roto Node (Improved): Better edge refinement & temporal coherence
- AI Color Match: ML-based color matching between shots
- Smart Vector Distort: ML-driven vector generation for warping/aligning
Silhouette 2025 NEW
- AI Roto v3: Hair & transparent edges, multi-object tracking, temporal stabilization
- AI Paint: Content-aware ML paint & clone tool
- Nuke Inviso plugin: Full AI roto data exchange with Nuke
After Effects 25.2 UPDATED
- Roto Brush 3 (preview): Next-gen ML rotoscoping
- Firefly Generative Fill: Text-prompt-based fill for regions
- Content-Aware Fill improved with AI-driven fill
Wonder Studio (Adobe) UPDATED
- Improved CG character compositing with AI lighting match
- Better body tracking for complex poses
- Now exports to After Effects with proper layer structure
๐ Pipeline Readiness Ratings
| Tool | Pipeline Ready | Integration |
|---|---|---|
| Meshy v4 API | โ โ โ โ | REST API, USD export |
| Tripo3D API | โ โ โ โ | REST API, ComfyUI, USD/FBX |
| Substance 3D AI | โ โ โ โ โ | Native in Painter + Houdini plugin |
| Houdini Copilot | โ โ โ | Built-in to 20.5 |
| Nuke 16 ML Roto | โ โ โ โ โ | Native in NukeX |
| Silhouette 2025 AI Roto | โ โ โ โ โ | Nuke plugin (Inviso) |
| AE Roto Brush 3 | โ โ โ โ | Native in AE 25.2 |
| UE 5.4 ML Deformer v2 | โ โ โ โ | Native in UE 5.4 |
| RealityCapture splats | โ โ โ | Experimental .ply export |
๐ Key Takeaways
๐ฏ For Your VFX Workflow
| Use Case | Best Choice |
|---|---|
| Background replacement | Seedance 2.0 |
| Multi-shot character consistency | Runway Gen-4 or Kling Master |
| VFX-style transformations | Pika Effects |
| Complex pipeline coding | Claude Opus 4 |
| Daily driver (cost/perf) | Claude Sonnet 4 or GPT-4.1 |
| Vision/video analysis | Gemini 2.5 Pro |
| Cheapest quality | Gemini 2.5 Flash ($0.15/1M in) |
| 3D asset generation | Meshy v4 or Tripo3D |
| AI roto (Nuke) | Silhouette 2025 AI Roto v3 or Nuke 16 ML Roto |
| Gaussian splatting | RealityCapture v2025.1 + Polycam |
๐ฎ Coming Soon
- GPT-5 โ OpenAI's next major model (no date)
- Qwen3-VL โ Multimodal Qwen3 (expected mid-2026)
- DeepSeek-R2 โ Next reasoning model (rumored)
- Llama 4 Behemoth โ Meta's 288B teacher model (in training)
- Runway Gen-4 Turbo โ Faster Gen-4 variant (expected)