Corrective sculpts are difficult to beat because they're easy to create and very art directable, the only thing you have to figure out is how to active them, it's fine to activate an elbow or knee corrective with a rotation axis if you choose the a good rotation order but areas like the shoulder and hips are harder. In newer versions of Maya you have the Pose Editor, I don't know what type of interpolation it uses but radial basis functions (RBF) are a popular type of interpolation in the community but they can be tricky to understand at first. Obviously if your working in a game engine then you'll want to primarily use joints but you can still use RBF interpolation to drive them.
Judd Simantov used to have a really good video on his Game Character Academy website which demonstrated how to setup smart joints, unfortunately the website no longer exists except for an archived version from 2014 which doesn't contain the video, the video was called "Introduction to Rigging: Arm Deformations Part 3", maybe you could contact him and ask the to see it.
Another nice trick it to play with the positions of the ribbon controls, assuming you have ribbons setup for the upper and lower parts of your arms and legs, you can drive the positions of the controls based on rotation/angle of the elbow and knee. For example, if you have a ribbon on the lower arm it will attach to the elbow and wrist, when you bend your arm you can translate the elbow attach point towards the wrist, this avoids a lot of intersections and combined with some DQ blend weighting can give excellent results even without correctives and if you want to implement correctives it's easier to sculpt them because there's less intersections :)
I did a test using the Evaluation Toolkit in Maya 2018, I created 200 driver groups and 200 driven group.
In the first test each driven group was parent and scale constrained to the driver group with an offset and I got the following results: DG: 283.019 fps, Serial: 23.1 fps, Parallel: 70.4 fps
In the second test I used matrix nodes and was careful to use as few as possible, each constraint is created with an offset matrix (multiplyMatrix), a parent space matrix (multiplMatrix) and is then processed (matrixDecomposition) before the translate, rotate and scale attributes were connected and I got the following results: DG: 319.1 fps, Serial: 39.6 fps and Parallel: 100.7 fps
parent constraint
constraint: DG: 277.778 fps, Serial: 29.5276, Parallel: 85.2273
matrix nodes: DG: 288.462 fps, Serial: 42.6136 fps, Parallel: 104.895 fps
scale constraint
constraint: DG: 300 fps, Serial: 38.4615 fps, Parallel: 100.671 fps
matrix nodes: DG: 306.122 fps, Serial: 45.8716 fps, Parallel: 113.636 fps
orient constraint
constraint: DG: 283.019 fps, Serial: 34.4828 fps, Parallel: 95.5414 fps
matrix nodes: DG: 365.854 fps, Serial: 46.1538 fps, Parallel: 110.294 fps
Looks like matrix nodes are the way to go.
I think the best thing to do is layer your deformers, you can even do this with skin clusters. You should definitely read this article: https://medium.com/@charles_76959/deformation-layering-in-mayas-parallel-gpu-world-15c2e3d66d82
I forgot about the rotate order issue with the decomposeMatrix; it still doesn't do anything.
=====
I did six "Test performance" tests, the results for the matrix setup with benign cycles are:
DG = 340.909 fps, Serial = 39.6825 fps, Parallel = 99.3377 fps
DG = 300 fps, Serial = 41.0959 fps, Parallel = 99.3377 fps
DG = 326.087 fps, Serial = 41.3223 fps, Parallel = 102.74 fps
DG = 288.462 fps, Serial = 41.2088 fps, Parallel = 100 fps
DG = 306.122 fps, Serial = 39.6825 fps, Parallel = 102.041 fps
DG = 306.122 fps, Serial = 39.6825 fps, Parallel = 98.0392 fps
DG average = 320.57216666666665, Serial average = 40.445750000000004, Parallel average = 100.24926666666666
the results for the matrix setup without benign cycles are:
DG = 333.333 fps, Serial = 39.1645 fps, Parallel = 100 fps
DG = 333.333 fps, Serial = 39.5778 fps, Parallel = 98.6842 fps
DG = 312.5 fps, Serial = 39.8936 fps, Parallel = 98.0392 fps
DG = 326.087 fps, Serial = 40.9836 fps, Parallel = 98.6842 fps
DG = 365.854 fps, Serial = 38.4615 fps, Parallel = 100 fps
DG = 319.149 fps, Serial = 40.6504 fps, Parallel = 98.6842 fps
DG average = 331.7093333333334, Serial average = 39.78856666666666, Parallel average = 99.01530000000001
It's not exactly conclusive but it seems the benign cycles are a bit faster...
These models are good and probably better for presentation, they're worth $12: https://www.characterrigs.com/shop.html