6 private links
Perf is probably the most widely used general purpose performance debugging tool on Linux. There are multiple contenders for the #2 spot, and, like perf, they’re sampling profilers. Sampling profilers are great. They tend to be easy-to-use and low-overhead compared to most alternatives. However, there are large classes of performance problems sampling profilers can’t debug effectively, and those problems are becoming more important.
Hardware performance monitoring counters have recently received a lot of attention. They have been used by diverse communities to understand and improve the quality of computing systems: for example, architects use them to extract application characteristics and propose new hardware mechanisms; compiler writers study how generated code behaves on particular hardware; software developers identify critical regions of their applications and evaluate design choices to select the best performing implementation. We propose that counters be used by all categories of users, in particular non-experts, and we advocate that a few simple metrics derived from these counters are relevant and useful. For example, a low IPC (number of executed instructions per cycle) indicates that the hardware is not performing at its best; a high cache miss ratio can suggest several causes, such as conflicts between processes in a multicore environment.
How do we write a lot of data to disk really fast?
Speed memory access by arranging data to take advantage of CPU caching.
Recently I’ve been doing some benchmarking and came upon a very surprising behavior from a number of different Intel i7 CPUs (it manifests on Sandy Bridge and Haswell desktop-class CPUs as well as Sandy Bridge-EP Xeon CPUs).
lockfree, waitfree, obstructionfree synchronization algorithms and data structures, scalability-oriented architecture, multicore/multiprocessor design