Performance Optimization - Full PR Workflow

Workflow Overview

Tools Layer: Unity (Profiler / Frame Debugger / Memory Profiler / Overdraw) + Platform Tools (e.g., Xcode/Metal) + Scripting (Lua Profiler) + Project-level Regression (UPR/Automated Reports)
Metrics Layer: Memory / CPU / GPU (organized by “What to Collect → Common Signals → Common Fixes”)
Process Layer: Reproduce → Baseline → Locate → Solution → Implementation → Comparison → Regression → PR Report → Monitoring & Rollback

Toolset Summary

Internal Unity Tools

Profiler

The most frequently used tool for investigating performance causes. Tutorial:https://zhanglan.pages.dev/en/blog/unity_profiler_usage

FrameDebugger

Mainly used for troubleshooting render order, Pass/Draw Call composition, batching status, and material/keyword switching.

Common cases:

“Pink” rendering on mobile builds (Missing Shaders/Variants/Material references).
Debugging Shader variable values for specific game objects. Tutorial:https://zhanglan.pages.dev/en/blog/unity-frame-debugger

Memory Profiler

Used for viewing top allocations, locating unreleased reference chains, and troubleshooting memory leaks or high retention. Tutorial:https://zhanglan.pages.dev/en/blog/unity_memory_profiler

OverDraw

Used for checking fill rate and re-draw rates (especially UI, transparent VFX, fog/clouds, etc.). Tutorial:https://zhanglan.pages.dev/en/blog/rendering-debug-overdraw

Xcode

Mainly used for pinpointing mobile GPU bottlenecks (e.g., GPU Frame Capture / Metal System Trace), assisting in:

Main rendering phase costs (Tile/Fragment/Compute, etc.)
Render Pass / Texture Bandwidth / Overdraw tendencies
Fill rate pressure from resolution, post-processing, and alpha blending

SLua

Used for locating CPU hotspots, memory allocations, and call stacks in Lua scripts. Tutorial: https://github.com/Tencent/sluaunreal/wiki/slua-Profiler-Instructions

Good for diagnosing Lua-specific performance issues.

UPR

UPR can be understood as a “Project-level Performance Health Check and Regression System”:

Uniformly runs fixed cases (Scenes/Combat/Main City/Loading).
Collects core metrics (FPS, CPU/GPU Frame time, memory peaks/drops, GC, loading spikes).
Generates comparison reports (Version vs. Version / Branch vs. Branch) to quickly identify regressions and responsible modules.

Performance Optimization Workflow

Notes (Ensuring Data Credibility)

Controlled Variables: Same device model, same quality settings, same resolution/render scale, same scene and operation path, same build configuration (standardize on Development/Release).
Sampling Standards:
- Spike issues: Capture at least 3–5 seconds before and after the spike on the Timeline (add custom Markers if needed).
- Steady-state issues: Run for 30–120 seconds to avoid “accidental sampling bias.”
- Recommended to run each case 3 times, tracking mean and variance.
Device Coverage: Test on high, mid, and low-end devices. Selection should be based on the top models in the live monitoring backend.
Comparison Method: Before and after optimization, you must provide Before/After data and visual evidence (Profiler/Memory Profiler/Frame Debugger/Xcode, etc.).

Testing Frequency Recommendations

Before New Features/Big Events: Conduct specialized performance tests for new modules (focused on CPU/GPU peaks and memory retention).
After Architectural/Core Logic Changes: Targeted testing is mandatory (Code efficiency, GC, resource retention, loading spikes).

Process

The following is a general outline of my workflow during performance optimization, without diving into deep analysis of each content.

For New Functionality Modules (Feature/Specialized Optimization PR)

Reproduce the issue: Provide a reproducible path (Device/Quality/Scene/Steps/Probability).
Establish Baseline: Capture Profiler/Memory/GPU evidence for the “issue version.”
Locate Bottleneck Ownership:
- Profiler: Confirm if CPU is over budget, identify hotspot functions and call chains.
- Memory Profiler: Confirm if there’s high retention/leakage/large resources.
- GPU: Run on high, mid, and low-end devices to judge if GPU is over budget; use Overdraw/Fillrate visibility.
Propose Solutions & Trade-offs: Give reasons “why this solution was chosen” and list potential risks.
Implement & Verify: Capture After data using the same case/metrics; compare gains vs. side effects.
Regression Testing: Cover critical scenes (Login/Scene transition/Combat/Main City/Idle/Re-entry).
Output PR Report: Solidify the evidence chain and conclusions for review and future traceability.

Full Project Performance Audit (Version Health/Regression)

Mainly use UPR (or equivalent automated/semi-automated flows) to run overall performance. Use the report to find regression points:

Locate “cases/metrics with largest version variance.”
Return to the corresponding module to complete the evidence chain with Profiler/Memory/GPU tools.
Form a “Regression Checklist + Responsible Module + Fix Plan.”

Performance Module Summary

Memory

Audio/Sound

Collect: Total AudioClip count, long audio streaming status, decoding/format, scene cleanup.

Common Signals:

Audio resources not dropping after scene/account transitions.
High number of stereo tracks, excessive sample rates, too many simultaneous plays.

Common Fixes:

Force To Mono (where applicable), lower sample rates, reasonable Compression.
Streaming for long clips, Decompress on Load for short SFX (balanced by scene).
Audio pre-loading/pooling to avoid frequent load/unload spikes.

Mesh

Collect: Mesh count, total vertices/triangles, duplicate meshes, SkinnedMesh percentage.

Common Signals:

Mesh peaks soaring in Main City/Large Worlds, redundant retention.
Heavy SkinnedMeshes causing high CPU/GPU overhead.

Common Fixes:

LOD / Impostor (for far-distance), merging strategies (caution: avoid “mega meshes”).
Shared Mesh / Reduce duplicate instancing.

Texture

Collect: Total texture memory, Top textures, Mipmap status, compression formats, redundant retention.

Common Signals:

Peaks not dropping (duplicate textures remaining after transition).
UI/VFX large textures causing abnormal peaks.

Common Fixes:

Texture compression, resolution tiers (downscaling for low-end devices).
Atlases/Texture2DArray (where applicable), reduce redundant textures.
Check Read/Write Enabled, check if MipMap usage is appropriate.

Animation

Collect: Animator/Controller count, clip retention, Avatar/SkinnedMesh overhead.

Common Signals:

Large number of Animators on screen causing significant Memory + CPU cost.

Common Fixes:

Animation culling, reduce State Machine complexity.
Massive units on screen: Evaluate GPU Animation/Baking solutions (already in my notes).

Shader

Collect: Shader/Variant count, runtime compilation spikes, shader-related memory.

Common Signals:

Shader.CreateGPUProgram causing occasional large spikes.

Common Fixes:

Variant stripping, Warmup, control timing of first appearance.
Material reuse, reduce keyword/variant explosions.

Particle Systems

Collect: Particle instance count, texture/material count, transparency coverage (Overdraw correlation).

Common Signals:

Peak memory and bandwidth soaring during heavy combat.

Common Fixes:

Texture compression/atlasing, VFX tiers (particle caps for low-end devices).
Control transparency layers and screen coverage.

CPU

Animation (CPU)

Collect: Animator.Update, SkinnedMesh, bone count, units on screen.

Common Signals:

Animation modules consistently dominating the CPU Timeline.

Common Fixes:

Lower bone/animation update frequency, LOD animation, visibility-driven updates.
Massive units on screen: Evaluate Baking/GPU Animation.

Scripts

Collect: Main Thread Top Hotspots, GC Alloc, Update chain, synchronous waits.

Common Signals:

Consistently over budget: Logic is too heavy.
Occasional spikes: GC/IO/sync loading/lock contention/logging.

Common Fixes:

Update → Event-driven/Frame-splitting; Cache references; Reduce temporary allocations.
Switch to Async resource loading, control timing; Add markers at spike points to prove gains.
GC governance for UI/string concatenation/frequent “new” calls (e.g., NGUI optimizations).

GPU

Render

Collect: GPU Frame time, major Pass costs, camera stack, post-processing overhead.

Common Signals:

GPU consistently over budget or significantly worse after certain post-effects are enabled.

Common Fixes:

On/Off comparison: Post-effects, Shadows, extra lights, Render Scale, MSAA.
Streamline camera stacks to avoid redundant render paths.

Fillrate

Collect: Full-screen post-effects, transparent objects, fog/clouds, pixel cost of UI overlays.

Common Signals:

Performance crashes as resolution increases, or especially poor results on low-end devices.

Common Fixes:

Lower Render Scale, reduce post-effect iterations/sampling.
Lower transparency coverage, reduce full-screen overlapping layers.

Overdraw

Collect: Overdraw view (Scene/RenderDoc/Frame Debugger), focus on VFX/UI/Fog.

Common Signals:

VFX on screen during combat causing sharp GPU performance drops.

Common Fixes:

VFX tiers, control transparency sorting and lifecycle, lower screen coverage.

Draw Call

Collect: DrawCall/SetPass, SRP Batcher/Instancing effectiveness, material switch reasons.

Common Signals:

High SetPass calls, frequent switching due to too many materials/keywords.

Common Fixes:

Unify materials and keywords, enable SRP Batcher, use GPU Instancing for instanced objects.
Large world static objects: Batching/Chinking strategies (aligned with Repo B).

PR Report Output (Reusable Template)

It is recommended to attach the following to every performance optimization PR (even small ones) to greatly improve review efficiency and credibility.

1. Background & Problem Definition

Description:
Impact (Device/Scene/Gameplay/Frequency):
Repo Steps:
Goals (Budget/Metrics Targets):

2. Test & Sampling Metadata

Devices: High/Mid/Low-end (Model + OS Version)
Build Config: Development / Release (Script Debugging / Deep Profile status)
Quality/Resolution/Render Scale/MSAA:
Test Case: Scene name + Operation path
Duration & Count:

3. Baseline (Before)

Core Metrics (Table recommended):

Metric	Before	Remarks
FPS (Steady)
CPU Main (ms)
GPU Frame (ms)
Memory Peak (MB)
Memory Drop (MB)
GC Alloc (KB/frame)
Spikes (ms)		Trigger point/Frequency

Evidence/Screenshots: Profiler, Memory Profiler, Xcode Capture, etc.

4. Root Cause (Based on evidence)

Category: CPU / GPU / Memory / IO / Resource Retention / Variant Compilation / Script Allocation
Key Evidence: Hotspot functions/call chains, allocation stacks, Render Pass breakdown, material switch reasons, etc.
Root Logic: Explanation of logic vs. engine mechanics (avoid just quoting results).

5. Solution & Implementation

Description:
Key Changes (Modules/Classes/Resources/Config):
Trade-offs & Risks: Compatibility, visual difference, bundle size, live rollback strategy, etc.

6. After (Comparison)

Metric	Before	After	Change	Remarks
FPS (Steady)
CPU Main (ms)
GPU Frame (ms)
Memory Peak (MB)
Memory Drop (MB)
GC Alloc (KB/frame)
Spikes (ms)

Evidence (Same source/conditions):

7. Regression & Deployment

Coverage: Login/Transitions/Combat/Main City/Long-run/Re-entry
KPIs: Live FPS/Stutter rate/Memory peak/Crash rate
Rollback: Strategy when visual/performance regressions or crash spikes occur.