HitFilm using about half of the resources available

AramMAramM Website User Posts: 154 Enthusiast
edited February 2016 in HitFilm

I would not call this a bug, but I am puzzled about why Hitfilm does not seem to use my computer anywhere near its full power. As you see in the screencap below, the CPU has not hit even 50% and the GPU has not gone over 50%. There is still plenty of RAM available and I am using SSDs (not the fastest, but still pretty fast). 

Any idea of what could be the problem? 

Also worth noticing, Hitfilm is never smooth on my computer, so it is not like it is not using all the resources because it is already doing amazingly well.  To get smooth playback I often have to go down to 1/4 resolution.

Comments

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast
    edited February 2016

    I cannot really comment about specifics since you have given no specifics. Showing load percentages without any specifics does not really mean anything.

    Some functions are single threaded. To simplify discussion equate an execution thread to a CPU "core".

    For example, decoding a video stream is mostly single threaded. If that single thread is not fast enough to decode the stream in real time then you will get stuttering playback even with only 12-15% CPU load showing. 12.5%, or 1/8th, is roughly equal to a single 100% fully loaded thread on a common 4 core CPU with hyperthreading (8 logical cores).

    Now if you are compositing video with various stock media clips and they are all blending together and thus all being displayed at the same time one will see the CPU load go way up. Roughly 3x. Basically one thread per simultaneous video stream.

    Computer generated graphics that are GPU bound will account for another single CPU thread. Something has to tell the GPU what to do and it is the CPU that does this. Generally here, the CPU is waiting (0% load) for the GPU to finish a task before it tells the GPU what to do next. A faster GPU here lets that CPU thread do more work (less 0% waiting) and thus register a higher CPU load percentage. The GPU has some idle time (0%) while the CPU figures out what to do next.

    Now take all these various functions operating on various threads and they must be organized and things done in an orderly manner. Any stage in that pipeline sequence that stalls out some can lower overall throughput which can affect the global CPU load you see with your utility.

    I have done particle simulator items, with animated particle textures, where a single frame takes around 5 seconds to compute. No video. Pure CGI. From memory I remember seeing about 15% CPU load and 40-50% GPU load (8 logical core CPU). In this case there was a single threaded function somewhere in the particle sim that was bottlenecking overall throughput.

    I have seen pure CGI at that same mid teens CPU load but my GPU in the 90%-100% range. For simple load meters anything above 90% is basically considered fully loaded.

    In your screenshot you show something during a file export render. That adds something else to the pot. Hitfilm first generates the video frames and then passes those to the file encoder. e.g. In circumstances PNG image sequence output can lower overall utilization.

    None of what I have said is any answer per se, but just concepts of the type of things that go on and that even an app that is working "100%" may not show near 100% on a load meter.

    So why isn't everything multi-threaded all the time and everywhere? Anyone with a simple answer to that can make money as a gun for hire. 

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast
    edited February 2016

    While vacuuming my carpet I thought of a good example. Consider the following simplified argument, Our project is 30fps.

    You have a 30fps video file and all by itself and no effects you get a 12.5% CPU load during playback.

    You have a particle simulator layer that by itself has a load of 12.5% and computes frames at 1fps. Not real time.

    So will you get a load of 25% (12.5+12.5) with the two running simultaneously on screen? No, because the particle simulator in this example is only capable of computing frames at 1fps and the video decode load was 12.5% at 30fps. At 1fps it will effectively use 1/30th the CPU it was using at 30fps.

    This is a simple example of how you can have many tasks that are all able of fully loading your CPU meter, and you add one item and it can slow the others down.

  • AramMAramM Website User Posts: 154 Enthusiast

    That was an excellent and comprehensive answer, @NormanPCN

    The screencap is from a system with an i7 4710hq and a GTX970M.  As you noticed, it was taken while rendering a short video where I did some chroma keying and applied a few effects, but no 3D. The resource usage pattern is kind of typical for Hitfilm on my system, though.

    I would guess it is what you say the bottleneck must be in the coordination of the multiple threads, because none of the cores has gone much over 60%.

    I was looking at it today and was quite confused because when I use Resolve it uses almost everything my computer has available.

    In any case, thank you for the lesson. I really appreciate it.

    Also, man, I hope they optimize the code further and Hitfilm starts running better on my computer soon.

  • chibichibi Website User Posts: 257 Just Starting Out

    I posted a similar thread some time ago related to encoding video.
    HF is slower because its not fully multi threaded compared to vegas which uses 100% cpu.

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast
    edited February 2016

    Whew! I'm you pickup up on some of the concepts I was trying to get across.

    Yes, Hitfilm does seems to have more overhead than other editors. Even in basic playback. It is common to see playback issues with AVC video when other editors might not. That other editor might have been on the hairy edge becoming stuttery as well. Who is to say.

    I'm sure FxHome knows all these things but what to spend development money on? Existing feature performance or new features? Also, getting trick with non-trivial threaded performance tweaks can create reliability issues that would need to be ironed out.  Nobody likes crashes and increased frequency of such is a step backwards.

    In the recent update FxHome had some real performance boosts in some particle simulator situations. I had a project that used those features and with the changes at Half preview it could do playback at realtime speed and close to realtime in Full. Previously it was not even close.

    I've seen various items with pure CGI stuff where the CPU is just one loaded thread which is fine but it would be nice to see the GPU pegged in those situations. Often it is not. I believe the GPU load % reported by utilities is a shader utilization which should be applicable to 2D effects functions.

  • chibichibi Website User Posts: 257 Just Starting Out

    "what to spend development money on? Existing feature performance or new features?"

    When the product is mature, performance and bugs.
    I hate it when devs add new things and leave old issues untouched. Then the new release comes around and the new things that were added previously had issues and also not fixed. Vicious cycle.
    Tweaking performance and bugfixing is boring to do but sometimes some software needs them desperately.
    With HF, I think encoding speed in HF is a major issue. Using only half of cpu process is really slow. Difference of encoding 8 hours to 4 hours is a lot.

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast

    Hard to do something about file encoders. FxHome/Hitfilm does not write the encoders. Neither does Sony/Vegas and likely most others. These things are usually licensed.

    I've only been around Hitfilm since July/August but they seem to fix a good number of things in releases and some things have have directly affected me so I actually feel it verses some item on a list.

    In the HF4 initial release I was bit by the OFX issue. It was hotfixed before Update 1. Update 2 had a regression with the masking change and while not in the fix list, I am not seeing the same issue in Update 3. Update 3 had a fix with keyframes I had only reported a couple of weeks before. This is a two month time period. I still have a few outstanding reported recently so we shall see what happens with those. Two are related to new features in HF4.

    Users bitch about the software. It's what we do. Be it performance, lack of features or clumsy UI. Everyone has something and their own priorities. My wishlist items are loaded with clumsy UI type things.

    Bugs are an honest gripe and even then, there can be arguments about is something a "real bug". Yes, even real bugs get prioritized with feature additions. This is planet Earth. You can't have everything. Software must advance to stay in the conversation, or even get into the conversation. Only hindsight tells out about the choices we make.

  • kevin_nkevin_n Website User Posts: 1,929 Enthusiast

    es HitFilm not support multiple threads (100% during render/export)?

    That plays a big part for me when choosing a CPU. If it doesn't then my plans on getting AMD FX isn't a very good idea.

    Hopefully we get some clarification on this. And I agree that performance >features. But right now I think it's very balanced, if you look at the patch notes.

  • AdyAdy Staff Administrator, HitFilm Beta Tester Posts: 1,436 Staff

    AramMchibi  has already stated, he did bring up this question a while back. You can see the discussion in the link below:

    Previous discussion

  • Triem23Triem23 Moderator Moderator, Website User, Ambassador, Imerge Beta Tester, HitFilm Beta Tester Posts: 18,291 Ambassador

    @NormanPCN that may be the most concise breakdown of how multithreading applies to Hitfilm, and now I need to bookmark this thread so the next time this question comes up I can refer that user here. 

  • chibichibi Website User Posts: 257 Just Starting Out
    edited February 2016

    "Hard to do something about file encoders. FxHome/Hitfilm does not write the encoders. Neither does Sony/Vegas and likely most others. These things are usually licensed."

    With vegas I'm using the mainconcept encoder.
    What does hf use for h.264?

  • MarcinMarcin Website User Posts: 132 Just Starting Out
    edited February 2016

    @NormanPCN

    "So why isn't everything multi-threaded all the time and everywhere?"

    Here is the answer:

    http://9gag.com/gag/amLpoEX/multithreaded-programming-theory-and-practice

    Marcin

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast
    edited February 2016

    "With vegas I'm using the mainconcept encoder.
    What does hf use for h.264?"

    The same Mainconcept AVC encoder. Vegas and Hitfilm both use Mainconcept for decoders and encoders. From what I gather a lot of software uses the Mainconcept suite.

    Vegas does have some non Mainconcept decoder/encoder. Things like HDCAM SR, Sony AVC and some others.

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast
    edited February 2016

    Hitfilm certainly has room for improvement in computer utilization. This is both good and bad. Good in that a software update can bring seat of your pants performance improvements. Bad in that better hardware gets less of a benefit as you might prefer. I've been thinking about buying hulking video card (980 or Fury) but for some of the things I do that I want to be sped up I wonder what I would get me as I do not see my current 7950 GPU pegged to the wall. 

    Of course it depends on exactly what you are doing. Some things can peg everything full and other things not so much.

    Seeing low CPU utilization in an app like Hitfilm is expected, again depending on what you are doing. Big work in any image editing application is about processing the pixels. Processing pixels in an inherently parallel task. Easy multi-thread implementation. If an app uses the GPU to process those pixels, the CPU does not have much to do the GPU dispatch thread is mostly waiting for the GPU to finish.

    I do see Hitfilm not able to peg the GPU with some things so there is an overhead there and maybe some pipelining needs to be done to overlap and hide some of the overhead. Pipelining is typically not trivial to implement and it has its hazards and performance drawbacks.

    A simple pipeline like concept that FxHome could implement and make a real world difference in some types of Export renders. PNG image compression is very compute intensive and therefore slow and it can slow down the Hitfilm video engine to the speed of the PNG output. If FxHome simply had a small thread pool to buffer a frame and release the video engine to compute the next frame you can then have multiple threads/cores do the PNG compression in parallel with the video frame computation to increase utilization and decrease overall render time. Right now Hitfilm is serial and does one frame/PNG at a time and the PNG encoder is single threaded. PNG is CPU bound and Hitfilm is commonly GPU bound so there is typically extra unused CPU power available to be used by the export. PNG and also EXR. Maybe even AVC/H.264. Although given the nature of AVC and temporal compression that particular encoder is likely buffering frames and immediately releasing the app to do more work (compute the next frame).

  • chibichibi Website User Posts: 257 Just Starting Out

    If both are using mainconcept HF should be hitting 100% cpu usage when encoding :)

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast

    @chibi That is not accurate. The answer is that it always depends.

    I have done CGI that takes 6 seconds per frame. Encoding something like that to the AVC/H.264 encoder will hardly make the CPU loading bump a little. This is because the AVC encoder is only getting a frame every 6 seconds.The file encoder time is swamped compared to the application computations.

    The opposite end of the scale is encoding a video file with nothing done to the file. Basically just transcoding. Even with that one cannot directly compare Hitfilm to Vegas for example. Even with both using Mainconcept AVC we do not have control over the precise encoder settings. Some can really affect encode speed.

    I just did an encode test of 1 minute of the same AVC input file in Hitfilm and Vegas 13. Mainconcept AVC, High profile, 16Mbps avg, 25Mbps max, single pass VBR.

    Hitfilm took 1:12 (65% CPU) and Vegas took 2:35 (85% CPU). Vegas certainly had higher CPU utilization but the encode took twice as long. We cannot really read anything into that since we have no idea how Hitfilm and Vegas setup Mainconcept relative to each other. In my mind it seems obvious Hitfilm is choosing some parameters for better encoding speed. Hitfilm may be using a new encoder version. Who is to say.

    All that said, the 65% of Hitfilm seems kinda low. There are threads waiting for something to do too much.

    I think it is reasonable to state that Hitfilm has overhead or inefficiency issues in various ways. Even without source code and profiling, there is a lot of smoke surrounding this and where there is smoke there usually is fire. Simple playback performance of media files with nothing going on is the most common issues on this forum.  For FxHome things like DNxHD are a solution. My opinion is that is a crutch. It gets the job done. However,  anyone can walk faster/better without using a crutch than with a crutch, if you follow my meaning.

    Certainly FxHome could devote time to not adding features and work on performance as you alluded to earlier. Consider this. Hitfilm 4 added Normal maps, a new lighting model and bezier keyframe animations. Without certain features, we are stopped cold from even starting something. With some features added, then we get to have fun bitching about how those features could be better. Maybe we are not running, but at least we are walking.

  • chibichibi Website User Posts: 257 Just Starting Out

    "The opposite end of the scale is encoding a video file with nothing done to the file. "

    That is what I'm referring to. The same video file with no editing done just pure encoding. In HF it uses 50% cpu. In vegas its hovering above 80%, hitting 100%.
    If they are both using mainconcept for encoding then it has to be the host app that's the bottleneck.

  • kevin_nkevin_n Website User Posts: 1,929 Enthusiast
    edited February 2016

     @chibi

    HitFilm's H.264 encoder is multi-threaded and will make use of all cores and hyper-threading if available.

    The reason the CPU Usage doesn't reach (or get close to) 100% is because HitFilm does all of its timeline rendering on the GPU.  No matter how simple or complex your timeline is, it gets rendered on the GPU.

    The exporting works like this:  Render frame on GPU; encode it on CPU; render next frame on GPU; encode it on CPU.  And so on.  At periodic intervals a bunch of encoded frames will also be written into the file container on disk.  The encoding threads have to wait on the next frame to be rendered by the GPU before they can actually encode it.

    The speed of the GPU (not just in terms of processing power, but also how quickly the driver can upload textures to the GPU, and read textures from the GPU) will directly impact the utilization of the encoding threads.

    Below is a screenshot of exporting a simple timeline (single video clip) on my system.  The GPU averages about 30% load, while the CPU averaged between 50 and 60% load.  You can see in the graphs that all cores are running threads. 

    I didnt write this, I copied it from user @Hendo here: http://hitfilm.com/forum/discussion/6143/multi-threading-support-when-encoding/p1

    Kevin

  • Triem23Triem23 Moderator Moderator, Website User, Ambassador, Imerge Beta Tester, HitFilm Beta Tester Posts: 18,291 Ambassador

    Regarding Kevin's repost above, he copied a post by Hendo--a Hitfilm dev--therefore this information should be considered accurate. 

  • NormanPCNNormanPCN Website User Posts: 3,947 Enthusiast
    edited February 2016

    It is true that that Hitfilm renders on the GPU. In my trivial example  there is effectively nothing to "render" and thus no GPU action and GPU utilization indicates this fact. There is a GPU blurp every 5 seconds or so. Hitfilm does do a levels conversion on AVC/H.264 encodes so that might offer some GPU function. Maybe a source of the blurps. Video decode is done by the CPU.

    That said, this trivial setup is an anomaly since no effects are used. The CPU decoded frame needs to be transferred to the GPU and then sucked back from the GPU to main memory. In the trivial example that setup is effectively a "waste" since not a single effect is executed. Nobody designs an app for someone to do nothing so this setup is probably always done.

    Transferring data/frames across the bus to/from GPU is done via DMA and that does not show up on CPU/GPU meters. My installation is PCIx3 so this transfer should be awfully fast. Some overhead, that you cannot eliminate, like this transfer, goes back to my previous comment about pipelining.

     I did take my trivial example and added a few standard grading effects. Curves, saturation, sharpening. It renders in the same time and the CPU utilization goes up by maybe 5 points or so and the GPU now kicks in and registers utilization at about 25%.

    Most internet folk put too much into utilization numbers. Most do know what does and does not register on those meters. The main thing a typical user should care about is how fast is it. Does it playback in real time. How long is a render or ram preview. Something can be faster at a lower CPU utilization. Just look at my previous post of the Hitfilm/Vegas render. Hitfilm was twice as fast at a much lower utilization. As stated that is most likely due to our lack of control over exact AVC encoder render settings. But it does illustrate greater speed at lower utilization.

    Other other thing most don't get is the CPU meter. Most of us have hyperthreaded CPUs. A typical 4 core + hyperthread CPU counts as 8 cores on a meter. There are really only 4 real physical cores. The other 4 logical cores will only get you maybe  a 10% boost. What this means is that 50% can mean the CPU is doing very close to as much work as it possibly can.

    Run a cpu performance test app with 4 threads and measure the time on the wall to completion. It will show 50% utilization. Now do it with 8 threads. It will register 100% but the real world completion time will only be maybe 10% faster. GPUs don't have this physical/logical thing like CPUs.

    I'm okay if GPU dispatch logic is essentially single threaded as long as it can hammer the GPU. If the GPU is hammered and CPU is at 15% you are still at your limit. Thing is I have seen HF not hammer the GPU in certain circumstances. Most often in the particle simulator. I understand that graphic subsystems have overhead and latencies. This is where structuring yourself and pipelining or other to hide/overlap some of this can come into play.

    I remember reading a tech doc from the Sony guys about their new OpenCL GPU architecture used in the Catalyst line. Vegas is an old app and the fit OpenCL GPU into its dataflow model. There were better ways to use the GPU so they built a new subsystem for the new software line and letting Vegas die rather attempt to update it internals. One thing was that they wanted to work on more than one frame at a time. Setting up one while the other is during compute. The compute is working the GPU while the setup is working the CPU.

This discussion has been closed.