Vampir meets SC '1201
Welcome to the autumn edition of Vampir News, designed to keep you informed about the latest developments in our performance analysis environment. This includes the visualization and analysis framework Vampir and the performance monitors Score-P and VampirTrace.
Score-P is the primary code instrumentation and run-time measurement framework for Vampir 8. The latest version 1.1 provides a number of improvements and new features:
- Profiling tasks and basic support for nested parallelism with OpenMP.
- Support for instrumenting and monitoring of CUDA applications. It shows the CUDA API calls on the host processes as well as kernel invocations on the accelerator devices.
- More flexible performance metrics. They are no longer restricted to individual threads but can now be recorded at process level.
- The "rewind" functionality assists long-term trace recording. Recurring uniform phases of application traces can be discarded with minimal additional overhead during trace recording.
- Support for ARM-based systems and compilers.
- Two new command line tools: scorep-info, which assists the user with the run-time configuration and scorep-score that provides hints for performance data reduction.
For more information and downloads visit www.score-p.org.
Vampir 8 fully supports Score-P, the new monitoring infrastructure. This includes the new OTF2 file format as well as all new monitoring capabilities of Score-P. Compatibility with the earlier OTF and VampirTrace releases is maintained. New feature highlights include:
- The ability to pre-select the time interval of interest prior to loading the actual performance data. The resulting performance data subset limits the data processing times to relevant run-time phases only, i.e. long waiting times are no longer needed for inspecting arbitrary program iterations.
- Automated detection of identical or very similar task recordings. This feature saves additional memory, CPU, and display resources and targets the inspection of ultra-scalable application traces with 250.000 cores and above.
- Enhanced customization of search criteria, performance metrics, and the program flow structure. The new semi-transparent Performance Data Overlay mode allows defining and studying of custom performance issues in combination with traditional event-based program flow graphs.
Furthermore, various features and performance improvements, scalability and stability enhancements have been incorporated.