Running a Simple CNN Inference on CIFAR-10
This tutorial demonstrates how to use TT-NN to perform inference with a simple Convolutional Neural Network (CNN) on the CIFAR-10 dataset.
We will:
Load the CIFAR-10 dataset.
Define a simple CNN using TT-NN operations.
Run inference on sample images.
Observe outputs and accuracy.
Setup and Imports
In this script, several libraries are imported to support image classification using a simple CNN on the CIFAR-10 dataset. The OS module checks if pretrained weight files exist on disk. Torch loads model weights, torchvision and its transforms submodule downloads the CIFAR-10 dataset and applies preprocessing, converting images to tensors and normalizing pixel values for example. The TT-NN library is Tenstorrent’s neural network API, responsible for interfacing with Tenstorrent hardware. TT-NN performs operations like convolution, pooling, activation, linear layers, data layout management, and type conversions between PyTorch and TT-NN formats. Finally, loguru logs messages and debugging output, providing insights into model operations and predictions throughout the inference process.
[ ]:
import os
import torch
import torchvision
import torchvision.transforms as transforms
import ttnn
from loguru import logger
Open the Device
Create the device to run the program with custom L1 memory config. The following parameter allocates on-chip L1 memory for sliding-window operations like convolutions, and other kernels that need quick, scratchpad-like memory: l1_small_size. 8 kB is enough for simple CNNS, complex models require up to 32 kB or more.
[ ]:
device = ttnn.open_device(device_id=0, l1_small_size=8192)
logger.info("\n--- Simple CNN Inference Using TT-NN on CIFAR-10 ---")
Load the CIFAR-10 Dataset
Normalize images and load the test set.
[ ]:
# Define input transforms: Convert to tensor and normalize
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# Load CIFAR-10 test data
testset = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=False)
Load or Initialize Weights
Optimally, pretrained weights are loaded and used for the model, but in case the weights file is not found, default to random values which will likely yield poor results. Run the provided train_and_export_cnn.py script to generate weights to a file named simple_cnn_cifar10_weights.pt.
[ ]:
if os.path.exists("simple_cnn_cifar10_weights.pt"):
weights = torch.load("simple_cnn_cifar10_weights.pt")
weights = {
k: ttnn.from_torch(v, layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device)
for k, v in weights.items()
}
logger.info("Loaded pretrained weights")
else:
logger.warning("Weights not found, using random weights")
torch.manual_seed(0)
weights = {
"conv1.weight": ttnn.rand((16, 3, 3, 3), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"conv1.bias": ttnn.rand((16,), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"conv2.weight": ttnn.rand((32, 16, 3, 3), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"conv2.bias": ttnn.rand((32,), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"fc1.weight": ttnn.rand((128, 2048), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"fc1.bias": ttnn.rand((128,), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"fc2.weight": ttnn.rand((10, 128), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
"fc2.bias": ttnn.rand((10,), layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device),
}
Define Convolution and Pooling Stage
The function, conv_pool_stage, encapsulates a typical convolutional neural network stage where an input tensor undergoes a 2D convolution followed by an activation and a max pooling operation, all using Tenstorrent’s TT-NN API. It accepts an input tensor in NHWC layout, along with metadata like shape, number of output channels, references to specific weight and bias tensors, activation type (e.g., ReLU), and the target hardware device. First, it extracts the appropriate weight and bias
tensors from the given dictionary and reshapes the bias to a broadcastable shape. It defines convolution parameters—kernel sizegur, stride, and padding. It sets up a TT-NN specific configuration including the activation function. If enabled, it logs details like tensor shapes and convolution parameters for debugging the first sample. The convolution is then performed using ttnn.conv2d, followed by a max pooling operation configured with standard 2×2 kernel and stride values. Again, if
logging is enabled, pooling parameters and resulting tensor shapes are recorded. Finally, the resulting TT tensor after max pooling is returned for use in the next stage of the network. This function modularizes a common pattern in CNNs and provides flexibility for different layers and debug logging.
For more information on convolution functions see: ttnn.Conv2dConfig.
[ ]:
def conv_pool_stage(
input_tensor: ttnn.Tensor,
input_NHWC: ttnn.Shape,
conv_outchannels: int,
weights: dict,
weight_str: str,
bias_str: str,
activation: ttnn.UnaryWithParam,
device: ttnn.Device,
log_first_sample: bool = False,
) -> ttnn.Tensor:
"""
Perform convolution + activation + max pooling using TT-NN.
Args:
input_tensor: Input TT tensor in NHWC format.
input_NHWC: Tuple representing (Batch, Height, Width, Channels) of the input tensor.
conv_outchannels: Number of output channels for the convolution layer.
weights: Dictionary containing model weights and biases.
weight_str: Key name for convolution weights in the weights dict.
bias_str: Key name for convolution biases in the weights dict.
activation: Activation function as UnaryWithParam to apply after conv.
device: Target TT device to execute the operations on.
log_first_sample: Whether to log detailed info (used for debugging first sample).
Returns:
Output tensor after conv + max pooling (TT format).
"""
# Extract weight and bias tensors from weights dictionary
W = weights[weight_str]
B = weights[bias_str]
B = ttnn.reshape(B, (1, 1, 1, -1)) # Ensure bias is in correct shape for TT-NN
# Define convolution parameters
conv_kernel_size = (3, 3)
conv_stride = (1, 1)
conv_padding = (1, 1)
# Set up TT-NN convolution configuration including activation function
conv_config = ttnn.Conv2dConfig(weights_dtype=ttnn.bfloat16, activation=activation)
# Optional detailed logging for the first sample (shape, config, etc.)
if log_first_sample:
logger.info("=====================================================================")
logger.info("Input parameters to conv2d:")
logger.info(f" input_tensor shape: {input_tensor.shape}")
logger.info(f" weight_tensor shape: {W.shape}")
logger.info(f" bias_tensor shape: {B.shape}")
logger.info(f" in_channels: {input_NHWC[3]}")
logger.info(f" out_channels: {conv_outchannels}")
logger.info(f" device: {device}")
logger.info(f" kernel_size: {conv_kernel_size}")
logger.info(f" stride: {conv_stride}")
logger.info(f" padding: {conv_padding}")
logger.info(f" batch_size: {input_NHWC[0]}")
logger.info(f" input_height: {input_NHWC[1]}")
logger.info(f" input_width: {input_NHWC[2]}")
logger.info(f" conv_config: {conv_config}")
logger.info(f" groups: {0}")
# Perform convolution
conv1_out = ttnn.conv2d(
input_tensor=input_tensor,
weight_tensor=W,
bias_tensor=B,
in_channels=input_NHWC[3],
out_channels=conv_outchannels,
device=device,
kernel_size=conv_kernel_size,
stride=conv_stride,
padding=conv_padding,
batch_size=input_NHWC[0],
input_height=input_NHWC[1],
input_width=input_NHWC[2],
conv_config=conv_config,
groups=0,
)
# Define max pooling parameters
max_pool2d_kernel_size = [2, 2]
max_pool2d_stride = [2, 2]
max_pool2d_padding = [0, 0]
max_pool2d_dilation = [1, 1]
# Optional logging for max pooling input and parameters
if log_first_sample:
logger.info("Input parameters to max_pool2d:")
logger.info(f" input shape: {conv1_out.shape}")
logger.info(f" batch_size: {input_NHWC[0]}")
logger.info(f" input_h: {input_NHWC[1]}")
logger.info(f" input_w: {input_NHWC[2]}")
logger.info(f" channels: {conv_outchannels}")
logger.info(f" kernel_size: {max_pool2d_kernel_size}")
logger.info(f" stride: {max_pool2d_stride}")
logger.info(f" padding: {max_pool2d_padding}")
logger.info(f" dilation: {max_pool2d_dilation}")
logger.info(f" ceil_mode: {False}")
# Perform max pooling
max_pool2d_out = ttnn.max_pool2d(
conv1_out,
batch_size=input_NHWC[0],
input_h=input_NHWC[1],
input_w=input_NHWC[2],
channels=conv_outchannels,
kernel_size=max_pool2d_kernel_size,
stride=max_pool2d_stride,
padding=max_pool2d_padding,
dilation=max_pool2d_dilation,
ceil_mode=False,
)
# Log output shape after pooling
if log_first_sample:
logger.info(f"max_pool2d output shape: {max_pool2d_out.shape}")
logger.info("=====================================================================")
return max_pool2d_out
Run Inference on Test Samples
This code sample performs inference on the first five test samples from the CIFAR-10 dataset using a simple convolutional neural network (SimpleCNN) running on Tenstorrent hardware via the TT-NN API. It initializes counters tracking correct predictions and total samples processed. For each sample, it converts the input image from a PyTorch tensor to a TT-NN tensor, rearranging its layout from NCHW to NHWC format. The image is then passed through two convolution and pooling stages using the
conv_pool_stage function. The output is flattened and passed through two fully connected layers (FC1 and FC2), with ReLU applied after FC1. The weights and biases for these layers are converted to the appropriate TT-NN format with tiling and transposing as needed. After obtaining the final logits from FC2, the output is converted back to a PyTorch tensor, and the predicted label is determined by taking the index of the highest logit. The prediction is compared to the true label to update the
accuracy counters, and the result for each sample is logged. Finally, the overall inference accuracy is printed after processing the five samples.
[ ]:
correct = 0
total = 0
# Run inference on a few test samples
for i, (image, label) in enumerate(testloader):
if i >= 5:
break
# Convert image to TT tensor
ttnn_image = ttnn.from_torch(image, layout=ttnn.ROW_MAJOR_LAYOUT, dtype=ttnn.bfloat16, device=device)
ttnn_image_permuated = ttnn.permute(ttnn_image, (0, 2, 3, 1)) # NCHW -> NHWC
# Only log details for first sample
log_this = i == 0
# Apply first conv + pool stage
conv1_pool = conv_pool_stage(
ttnn_image_permuated,
ttnn_image_permuated.shape,
16,
weights,
"conv1.weight",
"conv1.bias",
ttnn.UnaryWithParam(ttnn.UnaryOpType.RELU),
device,
log_first_sample=log_this,
)
# Apply second conv + pool stage
conv2_pool = conv_pool_stage(
conv1_pool,
(1, 16, 16, 16),
32,
weights,
"conv2.weight",
"conv2.bias",
ttnn.UnaryWithParam(ttnn.UnaryOpType.RELU),
device,
log_first_sample=log_this,
)
# Flatten for FC layers
B, H, W, C = conv2_pool.shape
out_flat = ttnn.to_torch(conv2_pool) # Convert back to torch
out_flat = out_flat.permute(0, 3, 1, 2).contiguous().view(B, -1) # NHWC -> NCHW -> Flatten
# Prepare fully connected layers
W3 = weights["fc1.weight"]
B3 = weights["fc1.bias"].reshape((1, -1)) # Reshape bias for broadcast compatibility
W4 = weights["fc2.weight"]
B4 = weights["fc2.bias"]
# Convert to TT format for FC1
W3_tt = ttnn.to_layout(ttnn.transpose(W3, 0, 1), ttnn.TILE_LAYOUT)
B3_tt = ttnn.to_layout(B3.reshape((1, -1)), ttnn.TILE_LAYOUT)
# Convert input to TT format
x_tt = ttnn.from_torch(out_flat, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)
# Apply FC1 + ReLU
out = ttnn.linear(x_tt, W3_tt, bias=B3_tt)
out = ttnn.relu(out)
# Convert to TT format for FC2
W4_tt = ttnn.to_layout(ttnn.transpose(W4, 0, 1), ttnn.TILE_LAYOUT)
B4_tt = ttnn.to_layout(B4.reshape((1, -1)), ttnn.TILE_LAYOUT)
# Apply FC2 (output logits)
out = ttnn.linear(out, W4_tt, bias=B4_tt)
# Convert prediction back to torch
prediction = ttnn.to_torch(out)
predicted_label = torch.argmax(prediction, dim=1).item()
correct += predicted_label == label.item()
total += 1
logger.info(f"Sample {i+1}: Predicted={predicted_label}, Actual={label.item()}")
logger.info(f"\nTT-NN SimpleCNN Inference Accuracy: {correct}/{total} = {100.0 * correct / total:.2f}%")
Close the Device
[ ]:
ttnn.close_device(device)
We have built and run a simple CNN using Tenstorrent’s TT-NN library on the CIFAR-10 dataset, observed predictions, and computed accuracy on a few samples.
For full-scale inference or training, pre-trained weights should be used, and additional optimization strategies may be applied.
Full Example and Output
Lets put everything together in a complete example that can be run directly.
Running this script will generate the following output:
$ python3 $TT_METAL_HOME/ttnn/tutorials/basic_python/ttnn_simplecnn_inference.py
2025-07-07 13:10:17.041 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.043 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.050 | info | Device | Opening user mode device driver (tt_cluster.cpp:190)
2025-07-07 13:10:17.050 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.051 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.057 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.058 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.064 | info | SiliconDriver | Harvesting mask for chip 0 is 0x100 (NOC0: 0x100, simulated harvesting mask: 0x0). (cluster.cpp:282)
2025-07-07 13:10:17.161 | info | SiliconDriver | Opened PCI device 7; KMD version: 1.34.0; API: 1; IOMMU: disabled (pci_device.cpp:198)
2025-07-07 13:10:17.224 | info | SiliconDriver | Opening local chip ids/pci ids: {0}/[7] and remote chip ids {} (cluster.cpp:147)
2025-07-07 13:10:17.235 | info | SiliconDriver | Software version 6.0.0, Ethernet FW version 6.14.0 (Device 0) (cluster.cpp:1039)
2025-07-07 13:10:17.321 | info | Metal | AI CLK for device 0 is: 1000 MHz (metal_context.cpp:128)
2025-07-07 13:10:17.889 | info | Metal | Initializing device 0. Program cache is enabled (device.cpp:428)
2025-07-07 13:10:17.891 | warning | Metal | Unable to bind worker thread to CPU Core. May see performance degradation. Error Code: 22 (hardware_command_queue.cpp:74)
2025-07-07 13:10:19.734 | INFO | __main__:main:15 -
--- Simple CNN Inference Using TT-NN on CIFAR-10 ---
Files already downloaded and verified
2025-07-07 13:10:20.471 | INFO | __main__:main:30 - Loaded pretrained weights
2025-07-07 13:10:21.075 | INFO | __main__:conv_pool_stage:86 - =====================================================================
2025-07-07 13:10:21.075 | INFO | __main__:conv_pool_stage:87 - Input parameters to conv2d:
2025-07-07 13:10:21.075 | INFO | __main__:conv_pool_stage:88 - input_tensor shape: Shape([1, 32, 32, 3])
2025-07-07 13:10:21.075 | INFO | __main__:conv_pool_stage:89 - weight_tensor shape: Shape([16, 3, 3, 3])
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:90 - bias_tensor shape: Shape([1, 1, 1, 16])
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:91 - in_channels: 3
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:92 - out_channels: 16
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:93 - device: MeshDevice(1x1 grid, 1 devices)
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:94 - kernel_size: (3, 3)
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:95 - stride: (1, 1)
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:96 - padding: (1, 1)
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:97 - batch_size: 1
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:98 - input_height: 32
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:99 - input_width: 32
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:100 - conv_config: Conv2dConfig(weights_dtype=DataType::BFLOAT16,activation=relu,deallocate_activation=0,reallocate_halo_output=1,act_block_h_override=0,act_block_w_div=1,reshard_if_not_optimal=0,override_sharding_config=0,shard_layout=std::nullopt,core_grid=std::nullopt,transpose_shards=0,output_layout=Layout::TILE,enable_act_double_buffer=0,enable_weights_double_buffer=0,enable_split_reader=0,enable_subblock_padding=0,in_place=0,enable_kernel_stride_folding=0)
2025-07-07 13:10:21.076 | INFO | __main__:conv_pool_stage:101 - groups: 0
2025-07-07 13:10:22.960 | INFO | __main__:conv_pool_stage:129 - Input parameters to max_pool2d:
2025-07-07 13:10:22.960 | INFO | __main__:conv_pool_stage:130 - input shape: Shape([1, 1, 1024, 16])
2025-07-07 13:10:22.960 | INFO | __main__:conv_pool_stage:131 - batch_size: 1
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:132 - input_h: 32
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:133 - input_w: 32
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:134 - channels: 16
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:135 - kernel_size: [2, 2]
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:136 - stride: [2, 2]
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:137 - padding: [0, 0]
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:138 - dilation: [1, 1]
2025-07-07 13:10:22.961 | INFO | __main__:conv_pool_stage:139 - ceil_mode: False
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:157 - max_pool2d output shape: Shape([1, 1, 256, 32])
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:158 - =====================================================================
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:86 - =====================================================================
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:87 - Input parameters to conv2d:
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:88 - input_tensor shape: Shape([1, 1, 256, 32])
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:89 - weight_tensor shape: Shape([32, 16, 3, 3])
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:90 - bias_tensor shape: Shape([1, 1, 1, 32])
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:91 - in_channels: 16
2025-07-07 13:10:24.026 | INFO | __main__:conv_pool_stage:92 - out_channels: 32
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:93 - device: MeshDevice(1x1 grid, 1 devices)
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:94 - kernel_size: (3, 3)
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:95 - stride: (1, 1)
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:96 - padding: (1, 1)
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:97 - batch_size: 1
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:98 - input_height: 16
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:99 - input_width: 16
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:100 - conv_config: Conv2dConfig(weights_dtype=DataType::BFLOAT16,activation=relu,deallocate_activation=0,reallocate_halo_output=1,act_block_h_override=0,act_block_w_div=1,reshard_if_not_optimal=0,override_sharding_config=0,shard_layout=std::nullopt,core_grid=std::nullopt,transpose_shards=0,output_layout=Layout::TILE,enable_act_double_buffer=0,enable_weights_double_buffer=0,enable_split_reader=0,enable_subblock_padding=0,in_place=0,enable_kernel_stride_folding=0)
2025-07-07 13:10:24.027 | INFO | __main__:conv_pool_stage:101 - groups: 0
2025-07-07 13:10:25.120 | INFO | __main__:conv_pool_stage:129 - Input parameters to max_pool2d:
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:130 - input shape: Shape([1, 1, 256, 32])
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:131 - batch_size: 1
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:132 - input_h: 16
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:133 - input_w: 16
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:134 - channels: 32
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:135 - kernel_size: [2, 2]
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:136 - stride: [2, 2]
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:137 - padding: [0, 0]
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:138 - dilation: [1, 1]
2025-07-07 13:10:25.121 | INFO | __main__:conv_pool_stage:139 - ceil_mode: False
2025-07-07 13:10:25.669 | INFO | __main__:conv_pool_stage:157 - max_pool2d output shape: Shape([1, 1, 64, 32])
2025-07-07 13:10:25.669 | INFO | __main__:conv_pool_stage:158 - =====================================================================
2025-07-07 13:10:30.120 | INFO | __main__:main:238 - Sample 1: Predicted=8, Actual=3
2025-07-07 13:10:30.136 | INFO | __main__:main:238 - Sample 2: Predicted=8, Actual=8
2025-07-07 13:10:30.151 | INFO | __main__:main:238 - Sample 3: Predicted=8, Actual=8
2025-07-07 13:10:30.166 | INFO | __main__:main:238 - Sample 4: Predicted=0, Actual=0
2025-07-07 13:10:30.181 | INFO | __main__:main:238 - Sample 5: Predicted=6, Actual=6
2025-07-07 13:10:30.181 | INFO | __main__:main:240 -
TT-NN SimpleCNN Inference Accuracy: 4/5 = 80.00%
2025-07-07 13:10:30.181 | info | Metal | Closing mesh device 1 (mesh_device.cpp:488)
2025-07-07 13:10:30.182 | info | Metal | Closing mesh device 0 (mesh_device.cpp:488)
2025-07-07 13:10:30.182 | info | Metal | Closing device 0 (device.cpp:468)
2025-07-07 13:10:30.182 | info | Metal | Disabling and clearing program cache on device 0 (device.cpp:783)
2025-07-07 13:10:30.183 | info | Metal | Closing mesh device 1 (mesh_device.cpp:488)