Easy way to debug TensorFlow XLA Compiler using VSCode

Posted on 2021-08-04

It would be easier to read the source code if we are aware of the runtime information, including call stacks and variable values. This tutorial introduces how to utilize our powerful VSCode to trace XLA Compiler.

Preparing Environment

Of course we need to download the source code of TensorFlow, and install all the dependencies. I suggest to use Conda to manage the environment, and use build-in GCC on Ubuntu 18.04 (or above, maybe) to build the code. Note that building from source requires about 50GiB of free space.

# Fetch Source Code
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow

# Install dependencies
conda create -n tf_dev python numpy wheel -y
conda activate tf_dev
pip install keras_preprocessing
conda install -c conda-forge bazel -y

Compile the source code

First of all, configure the project and build it.

1 2	./configure bazel build --config=dbg //tensorflow/tools/pip_package:build_pip_package

During the configuration process, it is recommended to choose ALL the default options if it is not a must to debug on GPU, since enabling GPU support needs additional configuration (Please refer to this article) and much more time to compile.

As for the bazel build flag,

--config=dbg adds debugging symbols. Required.
--config=monolithic should generate the binary code as a single dynamic library. But this option seems to be buggy. Not recommended.

Compiling TensorFlow is quite time-consuming, and it took about 20min using 48 CPU threads on my server. Time for coffee now.

Pick a unit test to compile

In fact, we don't have to write something in Python frontend to trigger breakpoints inside XLA compiler, as there are already tons of unit tests that covers most of codes and demonstrates the capability of the compiler.

Let pick a simple test first to validate the code is compiled correctly.

1	bazel test --config=dbg //tensorflow/compiler/xla/tests:tuple_test_cpu

From the compiling log, we could find the executable file locates at bazel-bin/tensorflow/compiler/xla/tests/tuple_test_cpu. Execute it! If everything works well, the program will print out the message below.

1
2
3

[----------] Global test environment tear-down
[==========] 25 tests from 2 test suites ran. (3618 ms total)
[  PASSED  ] 25 tests.

Then pick a test you interest, and repeat the steps above.

Fix broken dependency (Optional)

Take spmd_partitioner_test as an example. This unit test can be compiled without any error message, but when you directly run the executable, you will see this message.

1
2
3

[ RUN      ] SpmdPartitioningTest.BroadcastAsReplicate3
2021-08-04 10:44:13.324501: I tensorflow/compiler/xla/service/platform_util.cc:72] platform Host present but no XLA compiler available: could not find registered compiler for platform Host -- check target linkage (hint: try adding tensorflow/compiler/jit:xla_cpu_jit as a dependency)
[       OK ] SpmdPartitioningTest.BroadcastAsReplicate3 (6 ms)

This is because this executable is not linked to a valid backend, which means this executable doesn't contain the code of JIT Execution Environment. The solution is modifying the BUILD file manually to fix the dependency as the message suggests.

Open the BUILD file in the directory where the unit test locates. In this example, the test tensorflow/compiler/xla/service/spmd/spmd_partitioner_test.cc corresponds to tensorflow/compiler/xla/service/spmd/BUILD. And add this dependency //tensorflow/compiler/jit:xla_cpu_jit.

tf_cc_test(
    name = "spmd_partitioner_test",
    srcs = ["spmd_partitioner_test.cc"],
    deps = [
        ":spmd_partitioner",
        "//tensorflow/compiler/xla:util",
        "//tensorflow/compiler/xla:xla_data_proto_cc",
        "//tensorflow/compiler/xla/service:hlo",
        "//tensorflow/compiler/xla/service:hlo_casting_utils",
        "//tensorflow/compiler/xla/service:hlo_matchers",
        "//tensorflow/compiler/xla/service:hlo_parser",
        "//tensorflow/compiler/xla/service:hlo_pass_pipeline",
        "//tensorflow/compiler/xla/service:hlo_verifier",
        "//tensorflow/compiler/xla/tests:hlo_test_base",
        "//tensorflow/compiler/xla/tests:xla_internal_test_main",
        "//tensorflow/compiler/jit:xla_cpu_jit",
        "//tensorflow/core:test",
    ],
)

Configuring VSCode

Since the unit test was built as an executable with debugging symbols, there is nothing special about the configuration of VSCode. Install C/C++ Extension, and write the following lines to .vscode/launch.json.

You could open that json file by clicking ctrl/command+shift+p, typing launch.json, and selecting Add Configuration -> C/C++: (gdb) Launch

{
  "name": "(gdb) Launch",
  "type": "cppdbg",
  "request": "launch",
  "program": "${workspaceFolder}/bazel-bin/tensorflow/compiler/xla/service/spmd/spmd_partitioner_test",
  "args": [],
  "stopAtEntry": false,
  "cwd": "${workspaceFolder}",
  "environment": [],
  "externalConsole": false,
  "MIMode": "gdb",
  "setupCommands": [
    {
      "description": "Enable pretty-printing for gdb",
      "text": "-enable-pretty-printing",
      "ignoreFailures": true
    }
  ]
}

Everything is all set! Press F5 to start debugging.

Preparing Environment

Compile the source code

Pick a unit test to compile

Fix broken dependency (Optional)

Configuring VSCode

Reference