5. FAQ

5.1. About MLSDK

5.1.1. What exactly does implementing MLSDK entail?

First, we isolate the computational logic written in PyTorch as a function and represent the function’s input/output using the Mapping[str, torch.Tensor] type. Next, we compile this function using the mlsdk.Context API to obtain a compiled function. This compiled function can be called identically to the original, so simply replacing the original function completes the implementation.

For detailed implementation steps, please refer to Getting Started.

5.1.2. Will programs with MLSDK integration run on GPUs?

The short answer is yes. When MLSDK is implemented, the original PyTorch-based program is represented as a computation graph in an FX2ONNX + PFVM environment, and the processing environment can be configured to use either MN-Core 2, CPU, or GPU (CUDA). This means that programs optimized for MN-Core 2 can similarly run on GPUs.

Additionally, during the conversion to computation graph format, optimizations such as Common Subexpression Elimination are applied, making the compiled version generally more memory-efficient and faster in terms of execution speed. Of course, this process does not alter the computation results.

For specific switching methods, please refer to Ecosystem.

5.2. Common Errors and Solutions

When encountering errors while using MLSDK, they generally fall into three categories:

Errors related to the device hardware or runtime internals
Errors in MLSDK API usage
Errors during compilation

5.2.1. Device Hardware or Runtime Internal Errors

gpfn3-smi list command produces no output

This indicates that the system has not recognized the MN-Core 2 board. You may need to recreate the workspace or reboot the system (for DevKit only). When creating the workspace, verify that you have set the device request count to 1 or higher.

MLSDK fails to lock the device

If a program using MN-Core 2 becomes unresponsive, it may be due to an internal failure in device locking. Ensure no other programs are using MN-Core 2, then try executing gpfn3-smi reset <device>.

Warning

Resetting a device that is currently locked by another program may cause data loss for that program.

QFAIL in DeviceProcessor at …: Failed to allocate XXth IDMA chunk

The MLSDK runtime has failed to allocate pinned memory for DMA operations. In most cases, rerunning the program will resolve this issue. If the problem persists frequently, please report it for further investigation.

5.2.2. MLSDK API Error

ModuleNotFoundError: No module named ‘mlsdk’

The environment variable PYTHONPATH is not properly configured. Please refer to the Running Sample Programs documentation to set up the necessary environment variables.

AssertionError: … is not in inputs.

Not all expected inputs for the CompiledFunction have been provided. Also see Call CompiledFunction for reference.

what(): shape ‘[…]’ is invalid for input of size …

The input received by the CompiledFunction has different dimensions from those specified during compilation. Also refer to Call CompiledFunction for guidance.

Additionally, if this issue occurs just before the training loop completes, it may be due to improper configuration of drop_last. In such cases, please refer to Specifying drop_last.

Compilation runs repeatedly for the same operation

You can reuse compilation results by specifying cache_options for Context.compile. Also see Function Compilation for additional information.

Check `is_tmp_region_safe’ failed! in RunTasks

This indicates a memory exhaustion condition caused by too many tasks being queued in MLSDK’s internal execution queue. You can resolve this by strategically inserting calls to Context.synchronize or TensorProxy.cpu.

5.2.3. Compilation Errors

google.protobuf.message.DecodeError: Error parsing message with type ‘onnx.ModelProto’

This error occurs when generating large ONNX files. Please rerun the process with the following environment variable set:

$ export MNCORE_USE_EXTERNAL_DATA_FORMAT=1

You can also reduce the size of the generated ONNX by configuring the following compilation options in Context.compile. This will prevent embedding Python backtrace information within the ONNX file.

options={"strip_doc_string": True},

TileGrad for dynamic input shape is not supported

This issue occurs when the environment variable MNCORE_USE_LEGACY_ONNX_EXPORTER=1 is set, causing Torch’s ONNX exporter to be used instead of the FX2ONNX exporter. Please rerun the process with the following environment variable configured:

$ export MNCORE_USE_LEGACY_ONNX_EXPORTER=0

Program termination due to OOM during emit_code/emit_mncore_code execution

This occurs when too many concurrent compilation threads are consuming excessive host memory. You can prevent OOM by setting the number of threads (defaulting to the number of CPUs recognized by the system) through either the compilation option num_threads or environment variable CODEGEN_NUM_THREADS. Since thread count involves a trade-off with compilation speed, we’ve set it to 1 in this example, but you should determine an appropriate value for your specific use case.

options={"num_threads": 1},

$ export CODEGEN_NUM_THREADS=1