Accelerating Falcon

#1
Accelerating Falcon

It’s been two years now that Falcon appeared as part of Cheetah3D v7 when the public beta testing started. Still the feature set is incomplete but it's coming along nicely at quite a pace with lots of new features in every release and it looks like it won’t be long when the last missing feats like sss and volumetrics will be finished and the feature set will be on par with the competition.
But what then, what perspectives are there for further development? I think that Falcon like every other path tracer by design is a rather slow renderer and would profit from speed improvement. There are several possible ways to accelerate path tracers and in this post I want to explore them and and also find out what the community thinks and wants.


There are hardware and software accelerators, let’s start with hardware first:
Thankfully Apple has already solved all problems regarding slow render speed by releasing the iMac Pro with a 14-core (28-thread) for 7500 bucks only :rolleyes:
I assume there’s a small group of economical losers like me who just can’t afford that thing right now and are dabbling along with ordinary, may be even outdated i7 Macs, so we’re not stopping here but continue looking for other options (/sarc).

Some folks may have several macs or friends with macs or something and especially when rendering animations it's quite reasonable to render, say, frames 0 to 100 on one machine and frames 101 to 300 on another simultaneously. Then the clips need to be joined and that would need editing software with possible conversion problems arising. So my first wish here is to make it easier to join stuff by getting the frame numbering right: In the render history each frame of an image sequence should get the proper frame number and not always start with Image_00000 when the animation render starts at a later frame.
Another option would be if someone could write a macro script that joins selected render projects from the render manager, avoiding the need to deal with the image sequences in the render history. Because when I close Cheetah and add a folder with a render from a different system to the render history, Cheetah will recognize it when opened again. That would open the possibility for simple render farms. People on this forum who like to have more render power and are ready to offer their machines for the projects of others as well could set up a “Cheetah3D Render Farm“ thread right here on the forum, exchange dropbox links and have a go at happy renderfarming.

Third possibility for hardware acceleration would be GPU support. There are commercially successful GPU accelerated path tracers out there like Octane Render, also Cycles does it for free. But those use CUDA API from nVidia and since nVidia graphics are not common on Macs that’s no option. There are also OpenCL path tracers like LuxRender and recently also Cycles but the wise people at Apple have decided after inventing OpenCl not to develop it beyond v1.2 and now to deprecate it with OS 10.14.
So the most successful hardware accelerator strategies have to be discarded but there is one option left, that would be Apple’s Metal API. The advantage here is that it works also with the Intel Iris graphics that are included in most Intel Macs (except for Zeon cores) so Cheetah could get a boost even on double core i3 machines like MacBook Air. Also Metal works with USB3.1 connected eGPUs. I think Cheetah currently uses OpenGl for the 3D viewport and when Apple plans to move away from that, replacing viewport rendering with a Metal based solution would open the possibility of realtime rendered previews which would be a really great improvement. But moving to Metal means a complete overhaul of the render engine that probably takes years to achieve. Nonetheless, when Apple has decided Metal is the future coders will have to follow and therefore I wish that with version 8 Falcon would move to Metal API with GPU support which means dropping support for pre-Sierra OS I think but at least older Macs since 2012 would work with it.


While that is more of a long term prospect let’s now take a look at possible software accelerators most of which are easier to implement.
There are two standard methods already implemented in Falcon: Direct Light Sampling and (Multiple) Importance Sampling. When a ray hits a diffuse surface there will be no automatic random recast but instead a test if a light source is visible directly and if so, the lighting can be calculated directly which is way faster. If several lights are visible, importance sampling will deal with those.
There is another great way of accelerating available in Falcon, Adaptive Sampling. Here the noise level is evaluated and if an area is considered clean it will be excluded from further sampling so that the renderer concentrates on the noisy parts. Currently there is no adjustability to this feature and it also struggles with caustic noise which comes with blurred reflections so there is still room for some improvement.

A quite similar approach would be Denoising. The great advantage with render denoising is that the renderer „knows“ about geometry and material detail and can include that in the algorithm which makes it vastly more effective than anything provided with digital cameras or image editors. A denoiser like the one implemented in Cycles easily can cut down render times to 25% without visible issues. I see that most of the other path tracers are including this feature now and so this is my prime wish for this thread: Please let’s get a denoise function because it currently is the most effective acceleration strategy and would work with all hardware/software combinations so everyone would profit.

There are other tricks that offer minor improvement like tiling or branching. While the Cheetah renderer works with render tiles/buckets starting up left working its way to the bottom right the Falcon renderer does progressive rendering across the whole frame (which is better for previewing). The talk on the forums of other renderers is that properly sized tiles utilize the cores/threads better and so have a speed advantage. If that is really the case, I would like to see an option for tiled rendering in Falcon.

Some materials, especially those with blurred reflections are difficult to render and show very slow noise converging. Some renderers offer the options of branched path tracing, that means according to the material category a number of bounces can be specified which means that difficult materials get a higher sample count for more even converging. Now the tweaking of a branched path tracer is way more challenging than that of a simple one. I wonder if obstacles like a reflection blur value =/=0 could be registered internally and countered with activating branching automatically ? But however it’s done, it helps with cutting down render times and I would like to see an implementation.

Another problem for path tracers is indirect lighting, when there are no parts of the frame showing direct light the fast direct sampling method does not apply. Like when rendering an interior scene where the light only comes through a window from an exterior HDRI or skylight.
One possible solution are Light Portals which guide the sampling algorithm towards the light. In Cycles this works very well and speeds up render times significantly. On the other hand there are newer solutions like visibility maps which automatically detect openings. This would definitely be an advancement I’d like to see in Falcon.
Bidirectional path tracers find ways to the light by design but that would be another render engine altogether instead of an addition. When playing around with LuxRender BiDir I can see some advantages (like with caustics) but quick convergence is not among them so I’m not requesting it for Falcon.
Then there is MLT, Metropolis Light Transport, a sophisticated algorithm that can sort out effective paths. Again in LuxRender there is an implementation of MLT with a path tracer but it isn’t faster then Cycles so I don’t know if it would help with Falcon.

Generally everything helps that would guide rays towards the light. When I read the papers that the geeks publish at SIGGRAPH and elsewhere (not that I understand the math) it looks like now everyone tries to combine bidirectional path tracing with photon mapping because they complement each other well. This is already implemented in LuxRender with BidirVCM but in my opinion not very convincingly (at least concerning render speed). Now i wonder how a combination of an unidirectional path tracer like Falcon or Cycles with photon mapping would look like? Photon mapping is already here in the Cheetah renderer. It produces a light map not of the frame but the whole scene, including the hidden parts. Could not such a map be used with importance sampling to help sampling the bright parts preferentially? Also a light cache could improve such a map and and increasingly improve sampling over time. But this again is nothing easy to implement but more a speculation about long term development.


That’s basically about it.
Apart from Falcon I have Blender with Cycles, LuxRender, Mitsuba and Appleseed (Radeon ProRender doesn’t work with my nVidia) installed and from those I know what’s possible currently with freeware path tracers working on a Mac.

My conclusion is:
Integrated denoising would be the most effective tool to speed up render times.
In the long run there may be no way around porting Falcon to Metal which would enable GPU support and maybe live previews.

Now I’m curious what everyone else thinks about the future of Falcon!

:smile:
 
#2
Faster renders, faster workflow, more renders per day

First, thank you Misoversaturated for your many — 446 in two years — helpful posts to this forum. This post is mostly over my head technically but it’s interesting to get an idea of the overall state of the art from a C3D user with a deep knowledge of alternative renderers and recent SIGGRAPH papers. From what you have described, it sounds like a denoiser would be the most practical improvement for two reasons:

1 Easier (?) for Martin to integrate into the current and future versions of Falcon
2 Being functional on all Macs, regardless of OS version, CPU, or GPU.

I have a couple of suggestions that are kind of related, so I will add them to this thread for the benefit of current users who have the same basic issue, getting more renders done per unit time. For simplicity’s sake I have stuck with C3D as my sole 3D software. As a hobbyist-level artist, still images only, suffice to say I do not have cutting edge hardware. I have never stopped to count them, but I am pretty sure my i5 does not have 14 cores.

1) An item that has been on the Wish List for a decade is the ability to select a small area of a completed render and re-render that area, leaving the rest as-is. Many times a render is 99% perfect but you just need to tweak one detail, sometimes repeatedly. Bryce had this feature 20 years ago, and it’s the one thing that still bugs me about C3D. There is a work-around using a solid white adjustable frame, but then you have to use a picture editor to combine them. Clunky. Slow. If it was built-in it would effectively accelerate the rendering process, which includes fine-tuning the scene.

Thanks again to Hiroto:
http://www.tres-graficos.jp/blog/files/article.php?id=72

https://www.cheetah3d.com/forum/showthread.php?t=10944

2) Shifting paradigms, there is a great script that doesn’t do more renders per hour, but rather more renders per day. This is another simple feature that was in Bryce 20 years ago, loading a number of files to be rendered into a folder and letting the program render them one after another, per individual settings, in the background or when your Mac is otherwise not being used.

Tomas made a script using built-in Applescript and Automator to render a queue of files automatically, so that you can load a folder with .jas files and it will render them during downtime or overnight. It was posted two years ago, and I’m not sure if it has been affected by the switch to Falcon. Maybe some of you can test it out and report back.

https://www.cheetah3d.com/forum/showpost.php?p=102503&postcount=10
 
Last edited:
#3
Thanks Joel for your feedback!

You're totally right, it's not only the fancy features requiring lots of coding that can reduce render times, relatively simple stuff like an area render tool can also do the trick.
Especially if you could render in a new layer on top of the last render (saving the effort to compose different renders) could be a huge time saver.

Also the batch render proposal is spot on and would complement my render farm suggestion.
I tried Tomas' automator workflow and it works here on my system (Mac OS 10.9.5 with Cheetah 7.3b1), there are some warning beeps but it proceeds nonetheless and all the renders appear in the specified folder.

While the batch render farm stuff should be scriptable (though iirc there was no reaction of the scripters to the last script requests on the forum) an integrated area render tool referencing previous renders probably isn't so we have to wait till Martin finds the time for it.


I'm somewhat concerned that there are so few Falcon renders made by users appearing in the forum gallery and I think the relatively slow render speed might be a reason.
But I don't know really, that's why I'm asking :smile:
 

uncle808us

Well-known member
#4
I'm somewhat concerned that there are so few Falcon renders made by users appearing in the forum gallery and I think the relatively slow render speed might be a reason.
I don't use Falcon it always looks grainy to me.
It makes the fans run on my quad core. C3D renderer never does.
It always takes much longer.
Soft shadows are not worth listening to the fans. :smile: I rarely use caustics.
I want soft shadows in C3D renderer. Please If You're listening Martin.
 
#5
Thanks Uncle for your reply, good to know.

You're right that in most cases the Cheetah renderer is faster, especially when photo-realism is not the goal.

I'd like to see soft skylight shadows in Cheetah too, given that there already is a sample amount property.

Cheers!
 
#6
While I was daydreaming about new fancy features for Falcon reality has struck on WWDC 2018 by Apple announcing to deprecate OpenGL and OpenCL.
Martin has already commented and though I'd like to see Cheetah move towards Metal that's apparently a very tedious undertaking and would take a long time to accomplish.
And that means probably no free resources to add more features like a denoiser or else.
:frown:
 
#7
Martin has already commented and though I'd like to see Cheetah move towards Metal that's apparently a very tedious undertaking and would take a long time to accomplish.
:frown:
My Mac mini(Late 2012) uses Intel HD4000 integrated adapter card, it has only 16 GPU cores, which computing power is as same as Intel Core i5 2.5G CPU.
Even if Martin ported OpenGL to Metal, the speed will not boost as we expect.
 
#8
The OpenGL→ Metal port is a different thing. That's for the real-time preview viewport.

Metal, however, does also support computing and could be used for hardware pathtracing. In Theory, even if your GPU is just as fast as your CPU at this, you could use both simultaneously to render twice as fast.

I'm on an iMac pro with a Vega 64 and just finished a 20 second Falcon video rendering that took about 2.5 days to complete at 1080p, 60 fps and 64 samples (still very noisy). I wanted to render at 4k with 256 samples but that would have taken half a year x). What I have right now is OK, but if we decide "We want that", I'll have to render at higher resolution and can't just deliver in half a year while shutting down my workstation. I'll probably make some apple and cheetah scripts that distribute rendering on multiple Macs, but I'll need a lot.

I think, it should be possible to do better, given that this machine can crank out 11 tflops of single-precision performance. That's about 27 times faster than, for example, a HD 4000. If we could leverage that, I could render high-quality 4K videos in a few days instead instead of half a year :)

Even on the go, when I work on my 15" MacBook Pro it would be great to be able to crank out preview renders in a second instead of 10 seconds.
 
Last edited:
#9
I'm on an iMac pro with a Vega 64 and just finished a 20 second Falcon video rendering that took about 2.5 days to complete at 1080p, 60 fps and 64 samples (still very noisy)
I have been experimenting with DaVinci Resolve lately and it looks promising.
There is a "super scale" functionality and it works quite well, same with the "optical flow" retiming.

Didn't use it on Falcon renders yet but from Blender it was possible to turn 1920x1200 @ 15fps into 2880x1800 @ 30fps.
Davinci Resolve is Metal native and will work fast with your Vega, also it's for free in the App Store.

I wasn't able to get Apple ProRes output though, whatever color management I tried the result was always to dark.
The geeks in the forums recommend to NOT use quicktime formats.
But the Tiff image sequence looks well though that workflow is more tedious.
 

Swizl

Active member
#10
I have been experimenting with DaVinci Resolve lately and it looks promising.
There is a "super scale" functionality and it works quite well, same with the "optical flow" retiming.

Didn't use it on Falcon renders yet but from Blender it was possible to turn 1920x1200 @ 15fps into 2880x1800 @ 30fps.
Davinci Resolve is Metal native and will work fast with your Vega, also it's for free in the App Store.

I wasn't able to get Apple ProRes output though, whatever color management I tried the result was always to dark.
The geeks in the forums recommend to NOT use quicktime formats.
But the Tiff image sequence looks well though that workflow is more tedious.
I agree with the posts on this thread. Nice information you've provided as well as what some others have posted.

I'm wondering if the results are dark because of a linearized gamma? I know that Modo changed to a default of linear curve gamma in recent versions and it was/is causing some confusion for some people. There are options to switch to sRGB or others, like Nuke default. Do you know if there are any linear image settings or gamma settings that could be throwing the curve off?

Also you may want to look at applying a LUT in post for images that end up being too dark. Davinci may have some already built in. If you download Corona renderer, it comes with a bunch of them that work in Photoshop too. https://filtergrade.com/use-3d-luts-photoshop/

Screen Shot 2019-01-14 at 5.14.07 PM.png
 
#12
I tried DaVinci Resolve 15 on my Video to upscale from 1080p to 4k with noise reduction.

What settings are you using? It doesn't look significantly better here. Perhaps, there's just too much noise?
 
Top