8. API Reference

class mlsdk.MNDevice

Class to specify which device to use.

Parameters:

device_name (str) –

Device name separated by ":" (e.g. "mncore2:auto"). The first string indicates which device to use, with the subsequent string modifying it. Available device_name are as follows:

  • mncore, mncore2: Each refers to either the first or second generation of MN-Core and requires modification using a device index or auto (e.g. "mncore2:0").

  • pfvm: Used for using MLSDK with backends other than MN-Core. Must be modified for cpu or cuda (e.g. "pfvm:cpu").

  • emu, emu2: Each refers to an emulator designed for the first or second generation MN-Core. No modifiers are required (e.g. "emu2").

class mlsdk.Context(device: MNDevice)
compile(function: Callable[[Dict[str, Tensor]], Dict[str, Tensor]], inputs: Mapping[str, Tensor | TensorProxy], codegen_dir: Path, *, options: Dict[str, Any] | None = None, cache_options: CacheOptions | None = None, num_compiler_threads: int | None = None, quiet: bool = True, exit_after_generate_codegen_dir: bool = False, optimizers: List[Optimizer] | None = None, export_kwargs: Dict[str, Any] | None = None, training: bool = True, initialize: bool | None = True, optimizer_spec: List[OptimizerSpecParamGroup] | None = None, optional_options: Set[str] | None = None, group: ProcessGroup | None = None, predefined_symbols: Dict[str, MNDeviceBuffer] | None = None) CompiledFunction

Compile a Python callable to a function that can be executed on the device.

Parameters:
  • function – The Python callable to compile.

  • inputs – Sample inputs to the function.

  • codegen_dir – The directory to store intermediate and generated files.

  • options

    Specify compile options to control the compilation process. Predefined options (like O0.json through O4.json in the preset_options directory, /opt/pfn/pfcomp/codegen/preset_options/ in MLSDK) are available, with higher numbers indicating more advanced optimizations but longer compilation times.

    A crucial setting is float_dtype to prevent unintended precision degradation. This option controls the floating-point type assigned to torch.float32 tensors:

    • mixed (default): Uses half-precision for GEMM operations (in/out) and float otherwise.

    • half, float, double: Assigns the specified type to all such tensors.

    To avoid the default mixed precision, set float_dtype to float.

  • cache_options – Options for caching. See CacheOptions for details.

  • num_compiler_threads – The number of threads to use for compilation. If None, the number of threads will automatically be determined.

  • quiet – If True, suppress output from the compiler.

  • exit_after_generate_codegen_dir – For internal use only. If True, exit after generating the codegen directory. This is useful for decomposed layers test.

  • optimizers – For internal use only. A list of PyTorch optimizers to use for training.

  • export_kwargs – For internal use only. kwargs related to exporting the model to ONNX.

  • training – For internal use only. If True, the function is used for training.

  • optimizer_spec – For internal use only. The optimizer spec to use for training.

  • initialize – For internal use only. TODO (akirakawata): Add description.

  • optional_options – For internal use only. TODO (akirakawata): Add description.

  • group – For internal use only. TODO (akirakawata): Add description.

  • predefined_symbols – For internal use only. TODO (akirakawata): Add description.

get_registered_value_proxy(value: Tensor) TensorProxy

Get the TensorProxy for the given value if it is registered in the context. :param value: The torch.Tensor to get the proxy for. :return: The TensorProxy for the given value.

load_codegen_dir(codegen_dir: Path) CompiledFunction

Load a function that can be executed on the device from codegen_dir without validation.

Parameters:

codegen_dir – The directory to load compile results files.

Note

This method will fail if the required compiled artifact, model.app.zst, is not found within the codegen_dir. Be aware that the returned function is strict; it requires an input dictionary with the exact same keys (variable names) and tensor shapes as the input used during the original compilation.

register_buffer(buffer: Tensor) None

Registers a buffer in the context.

Note

Before calling this method, you must set the name of the buffer using set_tensor_name_in_module or set_tensor_name.

register_optimizer_buffers(optimizer: MNCoreOptimizer) None

Registers optimizer buffers in the context.

Note

Before calling this method, you must set the name of the buffer using set_buffer_name_in_optimizer or set_tensor_name.

register_param(param: Parameter) None

Registers a parameter in the context.

Note

Before calling this method, you must set the name of the parameter using set_tensor_name_in_module or set_tensor_name.

static switch_context(new_context: Context) None

Switching a context causes all the current tensors in the device to be moved back to host memory and the next context ones to be loaded in mncore.

synchronize() None

Synchronizes the context by moving tensors to the torch framework and marks the context for initialization.

This method performs the following steps:

  1. Calls the synchronize() method of the device associated with the context.

  2. Iterates over all tensor names in the registry and moves each tensor to the torch framework.

Note

Different from sync torch.cuda.synchronize(), which only wait for all kernels in all streams on a CUDA device to complete. This function also moves all tensors in the context’s registry from the device to the host.

class mlsdk.CompiledFunction(context: Context, code_block: _CompiledFunction, *, output_signature: ValueSignature | None = None)
allocate_input_proxy() Dict[str, TensorProxy]

Allocate input proxies for the function. :return: A dictionary mapping input names to their corresponding TensorProxy objects.

class mlsdk.TensorProxy(context: Context, codegen_data: TensorProxyCodegenData, *, is_input: bool = False)
cpu() Tensor

Transfer the corresponding data to CPU (Host) to access as torch.Tensor.

load_from(value: Tensor | TensorProxy, *, clone: bool = True) None

Load data from a torch.Tensor or another TensorProxy to this TensorProxy.

Parameters:
  • value – The source tensor to copy data from.

  • clone – If True and value is a torch.Tensor, it will be cloned before copying, enabling the source tensor to be modified without affecting this TensorProxy.

mlsdk.TensorLike = Union[torch.Tensor, TensorProxy]
class mlsdk.CacheOptions(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False)
__init__(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False) None

The options for specifying the cache directory and controlling cache behavior.

Parameters:
  • cache_dir_str – A path string of the root directory to store cache.

  • enable_app_cache – If True, cache compiled GPFNApp files from ONNX files. GPFNApp is the binary format of MN-Core compiler uses.

  • enable_onnx_cache – If True, cache exported ONNXs from the given function.

  • enable_codegen_cache – If True, cache the codegen compilation. This option is mainly for developers.

  • enable_gpfn2obj_cache – If True, cache the GPFN object data. This option is mainly for developers.

class mlsdk.MNCoreOptimizer(params: Iterator[Parameter], defaults: Dict[str, Any])
zero_grad(set_to_none: bool = True) None

Clear the gradient of the parameters.

Parameters:

set_to_none (bool) – If True, the gradients will be set to None instead of zero.

class mlsdk.MNCoreSGD(params: Iterator[Parameter], lr: float | Tensor = 0.001, momentum: float = 0, dampening: float = 0, weight_decay: float | Tensor = 0, nesterov: bool = False, *, maximize: bool = False, foreach: bool | None = None, differentiable: bool = False, fused: bool | None = None)
step(closure=None) None

Perform a single optimization step to update parameter.

Args:
closure (Callable): A closure that reevaluates the model and

returns the loss. Optional for most optimizers.

class mlsdk.MNCoreAdam(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, foreach: bool | None = None, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False, chainer_use_torch: bool = True)
step(closure=None) None

Perform a single optimization step to update parameter.

Args:
closure (Callable): A closure that reevaluates the model and

returns the loss. Optional for most optimizers.

class mlsdk.MNCoreAdamW(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, foreach: bool | None = None, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)
step(closure=None) None

Perform a single optimization step to update parameter.

Args:
closure (Callable): A closure that reevaluates the model and

returns the loss. Optional for most optimizers.

class mlsdk.MNCoreLRScheduler(scheduler: LRScheduler, context: Context | None)
step() None

Perform a step.

mlsdk.set_buffer_name_in_optimizer(optimizer: MNCoreOptimizer, name: str) None

Set the buffer names in the optimizer.

This function sets the names of the tensors in the optimizer (i.e. buffers) according to the optimizer’s name, so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the optimizer buffers to the context.

Parameters:
  • optimizer – The optimizer to set buffer names.

  • name – The name of the optimizer.

mlsdk.get_tensor_name(tensor: Tensor) str | None

Get the name of the tensor.

This function returns the name of the tensor set by set_tensor_name_in_module or set_buffer_name_in_optimizer.

Parameters:

tensor – The tensor to get the name of.

Returns:

The name of the tensor.

mlsdk.set_tensor_name_in_module(module: Module, module_name: str | None) None

Set the tensor names in the module.

This function sets the names of the tensor in the module (i.e. parameters and buffers such as BN stats), so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the module parameters and buffers to the context.

Parameters:
  • module – The module to set tensor names.

  • module_name – The name of the module.

storage.path(target: str) Path
mlsdk.path(target: str) Path
mlsdk.trace_scope(output_filename: str | Path | None, ignore_if_traced: bool = False) Iterator[None]