Optimizing Node Graph for Parallel Evaluation
Using the Profiler
Optimization is a cat-and-mouse game about speeding up compared to a baseline.
FPS isn't as helpful a metric as average frame time in milliseconds (1 frame = 41 milliseconds). The goal should be around 32ms.
Optimizing to a specific target, both user and hardware, is essential.
Maya is one of many apps using the CPU. Close your web browser and run nimby to kill all render processes. And if you are working on a Mac, close Visual Studio Code!
Setup
Turn OFF Caching!
Unload the cacheEvaluator plugin.
The Parallel Graph drops nodes that don't contribute to the outcome or have static values.
Set at least two different keyframes on each control or tag controls with Maya's controller node system.
Be aware that you will get a different graph depending on
If you keyframe all the controls most animators will use (the best case is the average case)
Or you keyframe everything on the character that an animator is allowed to key (worst case)
Sampling
Start playing the animation and let it run for a few frames before you hit the record button of the parallel eval.
Ignore the first few frames while Parallel Eval "warms" up.
There is no guarantee that the same node will run for precisely the same time every time.
Set the buffer size to get as many samples as possible without waiting too long or bringing down your machine. This depends on whether you evaluate a simple module of the entire character.
What do I see
There are a few nodes that come with the overhead from Maya, i.e., the viewport, which we can't do anything about.
Pick a frame that represents the average.
Identify the longest nodes.
Check how much time you spend uploading to the GPU.
Check for how Cycle Clusters are affecting speed.
Cycle Clusters
Maya is scheduling node groups as one "übernode."
The übernode helps differentiate between DG and EM evaluation modes and how they differ in graph construction.
Check for clusters that have more than ten cycles.
Avoid Cycle Clusters as they result from either poorly designed custom nodes or bad old habits the DG allowed for.
In short, don't send data back upstream!
Macro Optimization
Break apart cycle clusters bigger than 10
Move deformation to GPUOverride
Layer skinClusters --> Don't chain
Replace follicles with uvPin
Process
Profile rig for the baseline performance
Analyse and come up with an idea on how to adjust the graph
Script out your concept and apply it
Profile with your changes and compare to baseline.
If needed, iterate over these steps.
Meso Optimization
This part of the optimization aims to find places to calculate less and get meshes onto the GPU using GPUOverrides for deformation.
Most of our work revolves around visible shape nodes (meshes & curves for controls) in character!
Be aware of old habits
Avoid expressions if you can use nodes like condition, multiplyDevide, addDoubleLinear, etc.
Avoid using set-driven keys; replace them with simple node connections, especially for linear mapping.
Calculation Flow and Direction
Data should flow ONLY downstream! The graph should not loop back on itself - even if Maya allows it.
Parent relationships are data; parents must calculate first before their children!
Division of Calculation
Meshes are an element of the rig where we have complete control and can divide or combine most calculations. But we have to be aware that:
We can't use animated input/orig shapes, i.e., blendShapes, for deformers.
We can't use animated topology.
We can't use non-GPU-approved deformers after a GPU-approved deformer.
We can't prune deformer sets - at least as of Maya 2020!
We don't want unrelated areas of the body relying on each other regions. For example, we can break the dependencies of all limbs to have one cluster for each of them.
Micro Optimization
Here we are counting microseconds and attempting to shave them off as they can add up over the whole character into milliseconds we can save.
Limit the testing to one module and track your changes using Profiler Tags. Using Profiler Tags allows you to group your nodes in the profiler interface and have them selectable.
Break apart the small cycle clusters which are still around. An excellent example is constraints, as they have a built-in loop. Note that replacing constraints with matrix math is only sometimes the answer. Profile and compare!
Nodes you need to remember
decomposeMatrix: get the translate, rotate, and scale values out of a matrix composeMatrix: the opposite, allowing you to make a matrix from TRS components
multMatrix: multiplies ("concatenates") two matrices to get the parent/child relationship.
wtAddMatrix: similar to multMatrix, but weighted so that you can blend between multiple objects
vectorProduct/pointMatrixMult: transform a point or vector by a matrix
inverseMatrix: create a matrix from the input that does the reverse transformation
To make things even more efficient, we can use matrices instead of joints to shorten the critical path. Joints are used in many Maya tools, such as when working with skin weights, but the skinCluster node keeps track of the current world matrices for the joints and the original matrix positions.
Last updated