Skip to content

Releases: KernelTuner/kernel_tuner

Version 1.0.0b6

07 Dec 08:19

Choose a tag to compare

Version 1.0.0b6 Pre-release
Pre-release

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

  • Inclusion of tests in the source package, as requested in #225
  • Updated dependencies

Version 1.0.0b5

01 Nov 14:11

Choose a tag to compare

Version 1.0.0b5 Pre-release
Pre-release

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

  • Expanded documentation on backends by @benvanwerkhoven in #213
  • A fix for an issue that could cause incorrect conversion to Constraint
  • Extended tests to detect this
  • Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in #222
  • Updated dependencies

Full Changelog: 1.0.0b4...1.0.0b5

Version 1.0.0b4

22 Oct 14:11

Choose a tag to compare

Version 1.0.0b4 Pre-release
Pre-release

This is a beta release for early access to the new features. Not intended for production use.

This release contains several improvements:

  • nvidia-ml-py added to tutorial extra dependencies.
  • Additional checks for coherent Poetry configuration and warning in case of outdated development environment.
  • Updated dependencies.

Version 1.0.0b3

12 Oct 13:02

Choose a tag to compare

Version 1.0.0b3 Pre-release
Pre-release

This is a beta release for early access to the new features. Not intended for production use.

This version contains several bugfixes:

  • Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
  • Fixed an issue where some restrictions would not be recognized by the old check_restrictions function.
  • Fixed an issue where bayes_opt would not handle pruned parameters correctly.

Full Changelog: 1.0.0b2...1.0.0b3

Version 1.0.0b2

11 Oct 16:37

Choose a tag to compare

Version 1.0.0b2 Pre-release
Pre-release

This is a beta release for early access to the new features. Not intended for production use.

Full Changelog: 1.0.0b1...1.0.0b2

Version 1.0.0 beta 1

11 Oct 07:03

Choose a tag to compare

Version 1.0.0 beta 1 Pre-release
Pre-release

This is a beta release for early access to the new features. Not intended for production use.

What's Changed

New Contributors

Full Changelog: 0.4.5...1.0.0b1

Version 0.4.5

01 Jun 20:11

Choose a tag to compare

Version 0.4.5 adds support of using PMT in combination with Kernel Tuner enabling power and energy measurements on a wide range of devices. In addition, we have worked extensively on the internals of Kernel Tuner and the interfaces of the separate components that together make up Kernel Tuner. Along with a few bugfixes, fixes of small errors in examples and documentation.

[0.4.5] - 2023-06-01

Added

  • PMTObserver to measure power and energy on various platforms

Changed

  • Improved functionality for storing output and metadata files
  • Updated PowerSensorObserver to support PowerSensor3
  • Refactored interal interfaces of runners and backends
  • Bugfix in interface to set objective and optimization direction

Version 0.4.4

09 Mar 11:21

Choose a tag to compare

Version 0.4.4

Version 0.4.4 adds extended support for energy efficiency tuning. In particular, with the new capability to fit a performance model to the target GPUs power-frequency curve. How to use these features is demonstrated in:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/going_green_performance_model.py

And described in the paper:

Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022
https://arxiv.org/abs/2211.07260

Other than that, we've implemented a new output and metadata JSON format that adheres to the 'T4' auto-tuning schema created by the auto-tuning community at the Lorentz Center workshop in March 2022.

From the changelog:

[0.4.4] - 2023-03-09

Added

  • Support for using time_limit in simulation mode
  • Helper functions for energy tuning
  • Example to show ridge frequency and power-frequency model
  • Functions to store tuning output and metadata

Changed

  • Changed what timings are stored in cache files
  • No longer inserting partial loop unrolling factor of 0 in CUDA

Version 0.4.3

19 Oct 15:45

Choose a tag to compare

The version 0.4.3 release consists of a large number of changes to the internals of Kernel Tuner, including the addition of a new backend based on Nvidia's official Python bindings for CUDA, as well as improved functionality for tuning energy efficiency, e.g. measuring core voltages, the measurement of power and the interface with NVML has also improved a lot.

Some of the changes are also in the "externals" of Kernel Tuner. In the sense that we have migrated from https://github.com/benvanwerkhoven/ to https://github.com/KernelTuner. The goal of this move is to bring the collection of repositories belonging to the larger Kernel Tuner project under one organization.

From the Changelog:

[0.4.3] - 2022-10-19

Added

  • A new backend that uses Nvidia cuda-python
  • Support for locked clocks in NVMLObserver
  • Support for measuring core voltages using NVML
  • Support for custom preprocessor definitions
  • Support for boolean scalar arguments in PyCUDA backend

Changed

  • Migrated from github.com/benvanwerkhoven to github.com/KernelTuner
  • Significant update to the documentation pages
  • Unified benchmarking loops across backends
  • Backends are no longer context managers
  • Replaced the method for measuring power consumption using NVML
  • Improved NVML measurements of temperature and clock frequencies
  • bugfix in parse_restrictions when using and/or in expressions
  • bugfix in GreedyILS when using neighbor method "adjacent"
  • bugfix in Bayesian Optimization for small problems

Version 0.4.2

23 May 14:59

Choose a tag to compare

Version 0.4.2 includes a lot of work on the search space representation, application of restrictions, and optimization strategies. In addition to the addition of several new optimization strategies, most optimization strategies should see improved performance both in terms of the number of evaluated kernel configurations as well as execution time.

Added

  • new optimization strategies: dual annealing, greedly ILS, ordered greedy MLS, greedy MLS
  • support for constant memory in cupy backend
  • constraint solver to cut down time spent in creating search spaces
  • support for custom tuning objectives
  • support for max_fevals and time_limit in strategy_options of all strategies

Removed

  • alternative Bayesian Optimization strategies that could not be used directly
  • C++ wrapper module that was too specific and hardly used

Changed

  • string-based restrictions are compiled into functions for improved performance
  • genetic algorithm, MLS, ILS, random, and simulated annealing use new search space object
  • diff evo, firefly, PSO are initialized using population of all valid configurations
  • all strategies except brute_force strictly adhere to max_fevals and time_limit
  • simulated annealing adapts annealing schedule to max_fevals if supplied
  • minimize, basinhopping, and dual annealing start from a random valid config