If you’ll remember from my last blog entry, I’ve been stress-testing Minefield using Chromeexperiments in an attempt to identify bugs and bottlenecks. Without being able to identify any crashes or unusual non-performance related behaviour, I then had the task of investigating profiling tools to find the issue. And then things got messy.
I thought things would be as simple the tool, running the test and the tool would tell you exactly what was wrong. That assumed two things: 1) That I knew how to use the tool and 2) That the tool knew how to identify problems. Turns out I started with incorrect assumptions on both. I had never used a profiler before, and the Windows SDK-supplied xperf had a lot of options. Mozilla was great at giving me the setup, but there was more to learn. Firstly, that profilers are not sentient and that their results still require analysis.
The idea behind a profiler is that it generally runs in the background, taking samples of CPU and memory states (or other conditions still, like registry accesses or file io) periodically while all processes are running. Good ones like xperf can be configured to hook into kernel events to create a stack trace to show state at every function call.
This tool was great for discounting some of my earlier theories by using the Windows-default perfmon tool. The ability of xperf to monitor both CPU and memory during stack-walking disproved some of my earlier causal theories and showed me that in most cases, CPU and memory usage was eaten up within DirectDraw drivers.
To figure out where the information I wanted was required a fair bit of googling. Even something so simple as viewing a stack walk took a bit of detective work though for those looking for the Coles notes, here’s how to capture both Heap and CPU state for each stack frame (courtesy of Mozilla and MSDN links above):
xperf -on latency -stackwalk profile
call xperf -start heapsession -heap -PidNewProcess “%platform% %siteToGo% %args%” -stackwalk HeapAlloc+HeapRealloc -BufferSize 512 -MinBuffers 128 -MaxBuffers 512
%platform% would be the absolute path for the executable to run (in this case, Minefield)
%siteToGo% is the web site you wish to test (saves on collecting unneccessary data from loading home page or google)
%args% would be any other arguments you wish to pass to the application. In this case, I opened Minefield with “-P Testing -no-remote” (no quotes) to use my test profile. The test profile was a very stripped-down profile to eliminate any background work in Firefox which may have thrown in extra data. The complete script can be found here.
Various options can be specified in the “On” parameter when calling xperf as Richard Russell shows, but I went with the latency option suggested above by Mozilla. The result were large (1.7 GB for 5 minutes) files, so perhaps for CPU monitoring I would have done well to stick with the “PROC_THREAD” option. Most of the size was heap data, the result of the second call to xperf but every bit helps. Once finished, I stopped the profiler instances and merged the results:
xperf -stop heapsession -d heap.etl
xperf -d main.etl
xperf -merge main.etl heap.etl result.etl
Once the report was compiled, I loaded it and saw some complex graphs. It took some getting used to where everything was, but after a long while I learned the following best practices:
To load symbols (so as to view symbolic function names), enter them under “Trace -> Configure Symbol Paths” and then select “Trace -> Load Symbols” to actually load the symbols. From here, the app has probably become non-responsive for a short time while it associates everything and you’re left still looking at the same graphs. Here’s where the magic happens:
Right click on a graph (“CPU Sampling by Thread” and “Heap Total Allocation Size” work well) and select “Summary Table” or “Simple Summary Table”. From here there was a tree-view allowing a drill down from process to thread to dll to individual function calls. Richard Russell’s blog entry was so helpful, I’ll link it here as well, when I say how much it helped describe using the GUI.
In the end, xperf helped identify some choke-points on some of the experiments I was testing, specifically isolating the need to test both with and without hardware acceleration. Speaking of which, hardware acceleration in Minefield and the upcoming Firefox 4 can be adjusted by use of the following options (enter about:config into your Firefox address bar to adjust):
gfx.direct2d.disabled
gfx.direct2d.force-enabled
gfx.font_rendering.directwrite.enabled
layers.accelerate-all
layers.accelerate-none
The first 3 are for DirectDraw, which handles 2D graphics and text, while the last 2 are for Direct3D, which handles 3D drawing. This is a lot to work with to test hardware acceleration, so Joe the graphics guru from Mozilla has an easier way to test hardware acceleration (Windows only at present).
At around 848 words this seems to be by far my largest blog post, which given that I’m blogging about what I’ve learned through bug filing must be a good thing. Ah yes, bug filing: the ends to these means. That’s the next entry in this blog post queue.