Performance Optimization - Full PR Workflow
Workflow Overview
- Tools Layer: Unity (Profiler / Frame Debugger / Memory Profiler / Overdraw) + Platform Tools (e.g., Xcode/Metal) + Scripting (Lua Profiler) + Project-level Regression (UPR/Automated Reports)
- Metrics Layer: Memory / CPU / GPU (organized by “What to Collect → Common Signals → Common Fixes”)
- Process Layer: Reproduce → Baseline → Locate → Solution → Implementation → Comparison → Regression → PR Report → Monitoring & Rollback
Toolset Summary
Internal Unity Tools
Profiler
The most frequently used tool for investigating performance causes. Tutorial:https://zhanglan.pages.dev/en/blog/unity_profiler_usage
FrameDebugger
Mainly used for troubleshooting render order, Pass/Draw Call composition, batching status, and material/keyword switching.
Common cases:
- “Pink” rendering on mobile builds (Missing Shaders/Variants/Material references).
- Debugging Shader variable values for specific game objects. Tutorial:https://zhanglan.pages.dev/en/blog/unity-frame-debugger
Memory Profiler
Used for viewing top allocations, locating unreleased reference chains, and troubleshooting memory leaks or high retention. Tutorial:https://zhanglan.pages.dev/en/blog/unity_memory_profiler
OverDraw
Used for checking fill rate and re-draw rates (especially UI, transparent VFX, fog/clouds, etc.). Tutorial:https://zhanglan.pages.dev/en/blog/rendering-debug-overdraw
Xcode
Mainly used for pinpointing mobile GPU bottlenecks (e.g., GPU Frame Capture / Metal System Trace), assisting in:
- Main rendering phase costs (Tile/Fragment/Compute, etc.)
- Render Pass / Texture Bandwidth / Overdraw tendencies
- Fill rate pressure from resolution, post-processing, and alpha blending
SLua
Used for locating CPU hotspots, memory allocations, and call stacks in Lua scripts. Tutorial: https://github.com/Tencent/sluaunreal/wiki/slua-Profiler-Instructions
Good for diagnosing Lua-specific performance issues.
UPR
UPR can be understood as a “Project-level Performance Health Check and Regression System”:
- Uniformly runs fixed cases (Scenes/Combat/Main City/Loading).
- Collects core metrics (FPS, CPU/GPU Frame time, memory peaks/drops, GC, loading spikes).
- Generates comparison reports (Version vs. Version / Branch vs. Branch) to quickly identify regressions and responsible modules.
Performance Optimization Workflow
Notes (Ensuring Data Credibility)
- Controlled Variables: Same device model, same quality settings, same resolution/render scale, same scene and operation path, same build configuration (standardize on Development/Release).
- Sampling Standards:
- Spike issues: Capture at least 3–5 seconds before and after the spike on the Timeline (add custom Markers if needed).
- Steady-state issues: Run for 30–120 seconds to avoid “accidental sampling bias.”
- Recommended to run each case 3 times, tracking mean and variance.
- Device Coverage: Test on high, mid, and low-end devices. Selection should be based on the top models in the live monitoring backend.
- Comparison Method: Before and after optimization, you must provide Before/After data and visual evidence (Profiler/Memory Profiler/Frame Debugger/Xcode, etc.).
Testing Frequency Recommendations
- Before New Features/Big Events: Conduct specialized performance tests for new modules (focused on CPU/GPU peaks and memory retention).
- After Architectural/Core Logic Changes: Targeted testing is mandatory (Code efficiency, GC, resource retention, loading spikes).
Process
The following is a general outline of my workflow during performance optimization, without diving into deep analysis of each content.
For New Functionality Modules (Feature/Specialized Optimization PR)
- Reproduce the issue: Provide a reproducible path (Device/Quality/Scene/Steps/Probability).
- Establish Baseline: Capture Profiler/Memory/GPU evidence for the “issue version.”
- Locate Bottleneck Ownership:
- Profiler: Confirm if CPU is over budget, identify hotspot functions and call chains.
- Memory Profiler: Confirm if there’s high retention/leakage/large resources.
- GPU: Run on high, mid, and low-end devices to judge if GPU is over budget; use Overdraw/Fillrate visibility.
- Propose Solutions & Trade-offs: Give reasons “why this solution was chosen” and list potential risks.
- Implement & Verify: Capture After data using the same case/metrics; compare gains vs. side effects.
- Regression Testing: Cover critical scenes (Login/Scene transition/Combat/Main City/Idle/Re-entry).
- Output PR Report: Solidify the evidence chain and conclusions for review and future traceability.
Full Project Performance Audit (Version Health/Regression)
Mainly use UPR (or equivalent automated/semi-automated flows) to run overall performance. Use the report to find regression points:
- Locate “cases/metrics with largest version variance.”
- Return to the corresponding module to complete the evidence chain with Profiler/Memory/GPU tools.
- Form a “Regression Checklist + Responsible Module + Fix Plan.”
Performance Module Summary
Memory
Audio/Sound
Collect: Total AudioClip count, long audio streaming status, decoding/format, scene cleanup.
Common Signals:
- Audio resources not dropping after scene/account transitions.
- High number of stereo tracks, excessive sample rates, too many simultaneous plays.
Common Fixes:
- Force To Mono (where applicable), lower sample rates, reasonable Compression.
- Streaming for long clips, Decompress on Load for short SFX (balanced by scene).
- Audio pre-loading/pooling to avoid frequent load/unload spikes.
Mesh
Collect: Mesh count, total vertices/triangles, duplicate meshes, SkinnedMesh percentage.
Common Signals:
- Mesh peaks soaring in Main City/Large Worlds, redundant retention.
- Heavy SkinnedMeshes causing high CPU/GPU overhead.
Common Fixes:
- LOD / Impostor (for far-distance), merging strategies (caution: avoid “mega meshes”).
- Shared Mesh / Reduce duplicate instancing.
Texture
Collect: Total texture memory, Top textures, Mipmap status, compression formats, redundant retention.
Common Signals:
- Peaks not dropping (duplicate textures remaining after transition).
- UI/VFX large textures causing abnormal peaks.
Common Fixes:
- Texture compression, resolution tiers (downscaling for low-end devices).
- Atlases/Texture2DArray (where applicable), reduce redundant textures.
- Check Read/Write Enabled, check if MipMap usage is appropriate.
Animation
Collect: Animator/Controller count, clip retention, Avatar/SkinnedMesh overhead.
Common Signals:
- Large number of Animators on screen causing significant Memory + CPU cost.
Common Fixes:
- Animation culling, reduce State Machine complexity.
- Massive units on screen: Evaluate GPU Animation/Baking solutions (already in my notes).
Shader
Collect: Shader/Variant count, runtime compilation spikes, shader-related memory.
Common Signals:
- Shader.CreateGPUProgram causing occasional large spikes.
Common Fixes:
- Variant stripping, Warmup, control timing of first appearance.
- Material reuse, reduce keyword/variant explosions.
Particle Systems
Collect: Particle instance count, texture/material count, transparency coverage (Overdraw correlation).
Common Signals:
- Peak memory and bandwidth soaring during heavy combat.
Common Fixes:
- Texture compression/atlasing, VFX tiers (particle caps for low-end devices).
- Control transparency layers and screen coverage.
CPU
Animation (CPU)
Collect: Animator.Update, SkinnedMesh, bone count, units on screen.
Common Signals:
- Animation modules consistently dominating the CPU Timeline.
Common Fixes:
- Lower bone/animation update frequency, LOD animation, visibility-driven updates.
- Massive units on screen: Evaluate Baking/GPU Animation.
Scripts
Collect: Main Thread Top Hotspots, GC Alloc, Update chain, synchronous waits.
Common Signals:
- Consistently over budget: Logic is too heavy.
- Occasional spikes: GC/IO/sync loading/lock contention/logging.
Common Fixes:
- Update → Event-driven/Frame-splitting; Cache references; Reduce temporary allocations.
- Switch to Async resource loading, control timing; Add markers at spike points to prove gains.
- GC governance for UI/string concatenation/frequent “new” calls (e.g., NGUI optimizations).
GPU
Render
Collect: GPU Frame time, major Pass costs, camera stack, post-processing overhead.
Common Signals:
- GPU consistently over budget or significantly worse after certain post-effects are enabled.
Common Fixes:
- On/Off comparison: Post-effects, Shadows, extra lights, Render Scale, MSAA.
- Streamline camera stacks to avoid redundant render paths.
Fillrate
Collect: Full-screen post-effects, transparent objects, fog/clouds, pixel cost of UI overlays.
Common Signals:
- Performance crashes as resolution increases, or especially poor results on low-end devices.
Common Fixes:
- Lower Render Scale, reduce post-effect iterations/sampling.
- Lower transparency coverage, reduce full-screen overlapping layers.
Overdraw
Collect: Overdraw view (Scene/RenderDoc/Frame Debugger), focus on VFX/UI/Fog.
Common Signals:
- VFX on screen during combat causing sharp GPU performance drops.
Common Fixes:
- VFX tiers, control transparency sorting and lifecycle, lower screen coverage.
Draw Call
Collect: DrawCall/SetPass, SRP Batcher/Instancing effectiveness, material switch reasons.
Common Signals:
- High SetPass calls, frequent switching due to too many materials/keywords.
Common Fixes:
- Unify materials and keywords, enable SRP Batcher, use GPU Instancing for instanced objects.
- Large world static objects: Batching/Chinking strategies (aligned with Repo B).
PR Report Output (Reusable Template)
It is recommended to attach the following to every performance optimization PR (even small ones) to greatly improve review efficiency and credibility.
1. Background & Problem Definition
- Description:
- Impact (Device/Scene/Gameplay/Frequency):
- Repo Steps:
- Goals (Budget/Metrics Targets):
2. Test & Sampling Metadata
- Devices: High/Mid/Low-end (Model + OS Version)
- Build Config: Development / Release (Script Debugging / Deep Profile status)
- Quality/Resolution/Render Scale/MSAA:
- Test Case: Scene name + Operation path
- Duration & Count:
3. Baseline (Before)
- Core Metrics (Table recommended):
| Metric | Before | Remarks |
|---|---|---|
| FPS (Steady) | ||
| CPU Main (ms) | ||
| GPU Frame (ms) | ||
| Memory Peak (MB) | ||
| Memory Drop (MB) | ||
| GC Alloc (KB/frame) | ||
| Spikes (ms) | Trigger point/Frequency |
- Evidence/Screenshots: Profiler, Memory Profiler, Xcode Capture, etc.
4. Root Cause (Based on evidence)
- Category: CPU / GPU / Memory / IO / Resource Retention / Variant Compilation / Script Allocation
- Key Evidence: Hotspot functions/call chains, allocation stacks, Render Pass breakdown, material switch reasons, etc.
- Root Logic: Explanation of logic vs. engine mechanics (avoid just quoting results).
5. Solution & Implementation
- Description:
- Key Changes (Modules/Classes/Resources/Config):
- Trade-offs & Risks: Compatibility, visual difference, bundle size, live rollback strategy, etc.
6. After (Comparison)
| Metric | Before | After | Change | Remarks |
|---|---|---|---|---|
| FPS (Steady) | ||||
| CPU Main (ms) | ||||
| GPU Frame (ms) | ||||
| Memory Peak (MB) | ||||
| Memory Drop (MB) | ||||
| GC Alloc (KB/frame) | ||||
| Spikes (ms) |
- Evidence (Same source/conditions):
7. Regression & Deployment
- Coverage: Login/Transitions/Combat/Main City/Long-run/Re-entry
- KPIs: Live FPS/Stutter rate/Memory peak/Crash rate
- Rollback: Strategy when visual/performance regressions or crash spikes occur.