Accelerating Falcon
It’s been two years now that Falcon appeared as part of Cheetah3D v7 when the public beta testing started. Still the feature set is incomplete but it's coming along nicely at quite a pace with lots of new features in every release and it looks like it won’t be long when the last missing feats like sss and volumetrics will be finished and the feature set will be on par with the competition.
But what then, what perspectives are there for further development? I think that Falcon like every other path tracer by design is a rather slow renderer and would profit from speed improvement. There are several possible ways to accelerate path tracers and in this post I want to explore them and and also find out what the community thinks and wants.
There are hardware and software accelerators, let’s start with hardware first:
Thankfully Apple has already solved all problems regarding slow render speed by releasing the iMac Pro with a 14-core (28-thread) for 7500 bucks only
I assume there’s a small group of economical losers like me who just can’t afford that thing right now and are dabbling along with ordinary, may be even outdated i7 Macs, so we’re not stopping here but continue looking for other options (/sarc).
Some folks may have several macs or friends with macs or something and especially when rendering animations it's quite reasonable to render, say, frames 0 to 100 on one machine and frames 101 to 300 on another simultaneously. Then the clips need to be joined and that would need editing software with possible conversion problems arising. So my first wish here is to make it easier to join stuff by getting the frame numbering right: In the render history each frame of an image sequence should get the proper frame number and not always start with Image_00000 when the animation render starts at a later frame.
Another option would be if someone could write a macro script that joins selected render projects from the render manager, avoiding the need to deal with the image sequences in the render history. Because when I close Cheetah and add a folder with a render from a different system to the render history, Cheetah will recognize it when opened again. That would open the possibility for simple render farms. People on this forum who like to have more render power and are ready to offer their machines for the projects of others as well could set up a “Cheetah3D Render Farm“ thread right here on the forum, exchange dropbox links and have a go at happy renderfarming.
Third possibility for hardware acceleration would be GPU support. There are commercially successful GPU accelerated path tracers out there like Octane Render, also Cycles does it for free. But those use CUDA API from nVidia and since nVidia graphics are not common on Macs that’s no option. There are also OpenCL path tracers like LuxRender and recently also Cycles but the wise people at Apple have decided after inventing OpenCl not to develop it beyond v1.2 and now to deprecate it with OS 10.14.
So the most successful hardware accelerator strategies have to be discarded but there is one option left, that would be Apple’s Metal API. The advantage here is that it works also with the Intel Iris graphics that are included in most Intel Macs (except for Zeon cores) so Cheetah could get a boost even on double core i3 machines like MacBook Air. Also Metal works with USB3.1 connected eGPUs. I think Cheetah currently uses OpenGl for the 3D viewport and when Apple plans to move away from that, replacing viewport rendering with a Metal based solution would open the possibility of realtime rendered previews which would be a really great improvement. But moving to Metal means a complete overhaul of the render engine that probably takes years to achieve. Nonetheless, when Apple has decided Metal is the future coders will have to follow and therefore I wish that with version 8 Falcon would move to Metal API with GPU support which means dropping support for pre-Sierra OS I think but at least older Macs since 2012 would work with it.
While that is more of a long term prospect let’s now take a look at possible software accelerators most of which are easier to implement.
There are two standard methods already implemented in Falcon: Direct Light Sampling and (Multiple) Importance Sampling. When a ray hits a diffuse surface there will be no automatic random recast but instead a test if a light source is visible directly and if so, the lighting can be calculated directly which is way faster. If several lights are visible, importance sampling will deal with those.
There is another great way of accelerating available in Falcon, Adaptive Sampling. Here the noise level is evaluated and if an area is considered clean it will be excluded from further sampling so that the renderer concentrates on the noisy parts. Currently there is no adjustability to this feature and it also struggles with caustic noise which comes with blurred reflections so there is still room for some improvement.
A quite similar approach would be Denoising. The great advantage with render denoising is that the renderer „knows“ about geometry and material detail and can include that in the algorithm which makes it vastly more effective than anything provided with digital cameras or image editors. A denoiser like the one implemented in Cycles easily can cut down render times to 25% without visible issues. I see that most of the other path tracers are including this feature now and so this is my prime wish for this thread: Please let’s get a denoise function because it currently is the most effective acceleration strategy and would work with all hardware/software combinations so everyone would profit.
There are other tricks that offer minor improvement like tiling or branching. While the Cheetah renderer works with render tiles/buckets starting up left working its way to the bottom right the Falcon renderer does progressive rendering across the whole frame (which is better for previewing). The talk on the forums of other renderers is that properly sized tiles utilize the cores/threads better and so have a speed advantage. If that is really the case, I would like to see an option for tiled rendering in Falcon.
Some materials, especially those with blurred reflections are difficult to render and show very slow noise converging. Some renderers offer the options of branched path tracing, that means according to the material category a number of bounces can be specified which means that difficult materials get a higher sample count for more even converging. Now the tweaking of a branched path tracer is way more challenging than that of a simple one. I wonder if obstacles like a reflection blur value =/=0 could be registered internally and countered with activating branching automatically ? But however it’s done, it helps with cutting down render times and I would like to see an implementation.
Another problem for path tracers is indirect lighting, when there are no parts of the frame showing direct light the fast direct sampling method does not apply. Like when rendering an interior scene where the light only comes through a window from an exterior HDRI or skylight.
One possible solution are Light Portals which guide the sampling algorithm towards the light. In Cycles this works very well and speeds up render times significantly. On the other hand there are newer solutions like visibility maps which automatically detect openings. This would definitely be an advancement I’d like to see in Falcon.
Bidirectional path tracers find ways to the light by design but that would be another render engine altogether instead of an addition. When playing around with LuxRender BiDir I can see some advantages (like with caustics) but quick convergence is not among them so I’m not requesting it for Falcon.
Then there is MLT, Metropolis Light Transport, a sophisticated algorithm that can sort out effective paths. Again in LuxRender there is an implementation of MLT with a path tracer but it isn’t faster then Cycles so I don’t know if it would help with Falcon.
Generally everything helps that would guide rays towards the light. When I read the papers that the geeks publish at SIGGRAPH and elsewhere (not that I understand the math) it looks like now everyone tries to combine bidirectional path tracing with photon mapping because they complement each other well. This is already implemented in LuxRender with BidirVCM but in my opinion not very convincingly (at least concerning render speed). Now i wonder how a combination of an unidirectional path tracer like Falcon or Cycles with photon mapping would look like? Photon mapping is already here in the Cheetah renderer. It produces a light map not of the frame but the whole scene, including the hidden parts. Could not such a map be used with importance sampling to help sampling the bright parts preferentially? Also a light cache could improve such a map and and increasingly improve sampling over time. But this again is nothing easy to implement but more a speculation about long term development.
That’s basically about it.
Apart from Falcon I have Blender with Cycles, LuxRender, Mitsuba and Appleseed (Radeon ProRender doesn’t work with my nVidia) installed and from those I know what’s possible currently with freeware path tracers working on a Mac.
My conclusion is:
Integrated denoising would be the most effective tool to speed up render times.
In the long run there may be no way around porting Falcon to Metal which would enable GPU support and maybe live previews.
Now I’m curious what everyone else thinks about the future of Falcon!
:smile:
It’s been two years now that Falcon appeared as part of Cheetah3D v7 when the public beta testing started. Still the feature set is incomplete but it's coming along nicely at quite a pace with lots of new features in every release and it looks like it won’t be long when the last missing feats like sss and volumetrics will be finished and the feature set will be on par with the competition.
But what then, what perspectives are there for further development? I think that Falcon like every other path tracer by design is a rather slow renderer and would profit from speed improvement. There are several possible ways to accelerate path tracers and in this post I want to explore them and and also find out what the community thinks and wants.
There are hardware and software accelerators, let’s start with hardware first:
Thankfully Apple has already solved all problems regarding slow render speed by releasing the iMac Pro with a 14-core (28-thread) for 7500 bucks only
I assume there’s a small group of economical losers like me who just can’t afford that thing right now and are dabbling along with ordinary, may be even outdated i7 Macs, so we’re not stopping here but continue looking for other options (/sarc).
Some folks may have several macs or friends with macs or something and especially when rendering animations it's quite reasonable to render, say, frames 0 to 100 on one machine and frames 101 to 300 on another simultaneously. Then the clips need to be joined and that would need editing software with possible conversion problems arising. So my first wish here is to make it easier to join stuff by getting the frame numbering right: In the render history each frame of an image sequence should get the proper frame number and not always start with Image_00000 when the animation render starts at a later frame.
Another option would be if someone could write a macro script that joins selected render projects from the render manager, avoiding the need to deal with the image sequences in the render history. Because when I close Cheetah and add a folder with a render from a different system to the render history, Cheetah will recognize it when opened again. That would open the possibility for simple render farms. People on this forum who like to have more render power and are ready to offer their machines for the projects of others as well could set up a “Cheetah3D Render Farm“ thread right here on the forum, exchange dropbox links and have a go at happy renderfarming.
Third possibility for hardware acceleration would be GPU support. There are commercially successful GPU accelerated path tracers out there like Octane Render, also Cycles does it for free. But those use CUDA API from nVidia and since nVidia graphics are not common on Macs that’s no option. There are also OpenCL path tracers like LuxRender and recently also Cycles but the wise people at Apple have decided after inventing OpenCl not to develop it beyond v1.2 and now to deprecate it with OS 10.14.
So the most successful hardware accelerator strategies have to be discarded but there is one option left, that would be Apple’s Metal API. The advantage here is that it works also with the Intel Iris graphics that are included in most Intel Macs (except for Zeon cores) so Cheetah could get a boost even on double core i3 machines like MacBook Air. Also Metal works with USB3.1 connected eGPUs. I think Cheetah currently uses OpenGl for the 3D viewport and when Apple plans to move away from that, replacing viewport rendering with a Metal based solution would open the possibility of realtime rendered previews which would be a really great improvement. But moving to Metal means a complete overhaul of the render engine that probably takes years to achieve. Nonetheless, when Apple has decided Metal is the future coders will have to follow and therefore I wish that with version 8 Falcon would move to Metal API with GPU support which means dropping support for pre-Sierra OS I think but at least older Macs since 2012 would work with it.
While that is more of a long term prospect let’s now take a look at possible software accelerators most of which are easier to implement.
There are two standard methods already implemented in Falcon: Direct Light Sampling and (Multiple) Importance Sampling. When a ray hits a diffuse surface there will be no automatic random recast but instead a test if a light source is visible directly and if so, the lighting can be calculated directly which is way faster. If several lights are visible, importance sampling will deal with those.
There is another great way of accelerating available in Falcon, Adaptive Sampling. Here the noise level is evaluated and if an area is considered clean it will be excluded from further sampling so that the renderer concentrates on the noisy parts. Currently there is no adjustability to this feature and it also struggles with caustic noise which comes with blurred reflections so there is still room for some improvement.
A quite similar approach would be Denoising. The great advantage with render denoising is that the renderer „knows“ about geometry and material detail and can include that in the algorithm which makes it vastly more effective than anything provided with digital cameras or image editors. A denoiser like the one implemented in Cycles easily can cut down render times to 25% without visible issues. I see that most of the other path tracers are including this feature now and so this is my prime wish for this thread: Please let’s get a denoise function because it currently is the most effective acceleration strategy and would work with all hardware/software combinations so everyone would profit.
There are other tricks that offer minor improvement like tiling or branching. While the Cheetah renderer works with render tiles/buckets starting up left working its way to the bottom right the Falcon renderer does progressive rendering across the whole frame (which is better for previewing). The talk on the forums of other renderers is that properly sized tiles utilize the cores/threads better and so have a speed advantage. If that is really the case, I would like to see an option for tiled rendering in Falcon.
Some materials, especially those with blurred reflections are difficult to render and show very slow noise converging. Some renderers offer the options of branched path tracing, that means according to the material category a number of bounces can be specified which means that difficult materials get a higher sample count for more even converging. Now the tweaking of a branched path tracer is way more challenging than that of a simple one. I wonder if obstacles like a reflection blur value =/=0 could be registered internally and countered with activating branching automatically ? But however it’s done, it helps with cutting down render times and I would like to see an implementation.
Another problem for path tracers is indirect lighting, when there are no parts of the frame showing direct light the fast direct sampling method does not apply. Like when rendering an interior scene where the light only comes through a window from an exterior HDRI or skylight.
One possible solution are Light Portals which guide the sampling algorithm towards the light. In Cycles this works very well and speeds up render times significantly. On the other hand there are newer solutions like visibility maps which automatically detect openings. This would definitely be an advancement I’d like to see in Falcon.
Bidirectional path tracers find ways to the light by design but that would be another render engine altogether instead of an addition. When playing around with LuxRender BiDir I can see some advantages (like with caustics) but quick convergence is not among them so I’m not requesting it for Falcon.
Then there is MLT, Metropolis Light Transport, a sophisticated algorithm that can sort out effective paths. Again in LuxRender there is an implementation of MLT with a path tracer but it isn’t faster then Cycles so I don’t know if it would help with Falcon.
Generally everything helps that would guide rays towards the light. When I read the papers that the geeks publish at SIGGRAPH and elsewhere (not that I understand the math) it looks like now everyone tries to combine bidirectional path tracing with photon mapping because they complement each other well. This is already implemented in LuxRender with BidirVCM but in my opinion not very convincingly (at least concerning render speed). Now i wonder how a combination of an unidirectional path tracer like Falcon or Cycles with photon mapping would look like? Photon mapping is already here in the Cheetah renderer. It produces a light map not of the frame but the whole scene, including the hidden parts. Could not such a map be used with importance sampling to help sampling the bright parts preferentially? Also a light cache could improve such a map and and increasingly improve sampling over time. But this again is nothing easy to implement but more a speculation about long term development.
That’s basically about it.
Apart from Falcon I have Blender with Cycles, LuxRender, Mitsuba and Appleseed (Radeon ProRender doesn’t work with my nVidia) installed and from those I know what’s possible currently with freeware path tracers working on a Mac.
My conclusion is:
Integrated denoising would be the most effective tool to speed up render times.
In the long run there may be no way around porting Falcon to Metal which would enable GPU support and maybe live previews.
Now I’m curious what everyone else thinks about the future of Falcon!
:smile: