In this age of GPU Compute, haves and have nots are defined by the availability of GPU compute to them, accessing GPUs can be a challenge, especially since there are only two options are available at our disposal, namely Kaggle and Colab; Gradient requires storing credit card information to let us access the RTX4000 and P4000 GPUs.¶

to check which one is better for our use, we can run Matrix Multiplication(as suggested by Grok) and benchmark those details (Claude).¶

The code suggested by Grok and bettered by Claude is given below- with little notes (me trying to understand what happens), also the results from both Kaggle and Colab are reproduced in form of screengrab, for different accelrators available on both platforms.¶

In [ ]:
#Code suggested by Grok- 
import tensorflow as tf
import time

# Check GPU availability     #first we are checking if the GPU is available/running or not??
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

# Define matrix size
matrix_size = 10000      #a large matrix to imitate the deep learning operations

# Create two large matrices
a = tf.random.normal([matrix_size, matrix_size])
b = tf.random.normal([matrix_size, matrix_size])

# Benchmark GPU performance
def benchmark(gpu):
    with tf.device('/device:GPU:0' if gpu else '/device:CPU:0'):
        start_time = time.time()
        c = tf.matmul(a, b)
        delta = time.time() - start_time
        print(f"{device_name if gpu else 'CPU'} matrix multiplication took {delta:.2f} seconds")

# Run benchmarks
benchmark(True)  # GPU
benchmark(False) # CPU for comparison

GPU P100 on Kaggle Screenshot%202025-01-01%20202132.png

GPU T4 x 2 on Kaggle

  • Found GPU at: /device:GPU:0
  • /device:GPU:0 matrix multiplication took 0.14 seconds
  • CPU matrix multiplication took 10.72 seconds

On Colab, though there are two options available as free, only 1- T4 GPU is available -¶

Screenshot%202025-01-01%20203100.png

Conclusion 1- for matrix size of 10000, we can see that GPU P100 is fastest, while Colab takes three times as much times as P100, and as depicted in its name Kaggle has 2x T4, while colab has only 1. So, in Grok suggested test, Kaggle has the upper hand¶

Grok code improved by Claude¶

In [ ]:
import tensorflow as tf
import time
import platform
import psutil
import numpy as np
from datetime import datetime #importing all the required libaries- generally Claude is better at coding

def get_system_info():
    gpu_info = []
    try:
        gpu_devices = tf.config.list_physical_devices('GPU')
        for device in gpu_devices:
            gpu_details = tf.config.experimental.get_device_details(device)
            gpu_info.append(gpu_details)
    except:
        gpu_info = "Error getting GPU info"
    
    return {
        "Platform": platform.platform(),
        "Python Version": platform.python_version(),
        "TensorFlow Version": tf.__version__,
        "RAM": f"{psutil.virtual_memory().total / (1024.0 ** 3):.1f} GB",
        "GPU Info": gpu_info
    }     #accessing all the platform and version details, running in the background 

def run_benchmark(matrix_sizes=[5000, 10000, 15000]): #Claude improved, suggesting 3 different sizes of matrix, rest its matmul @
    results = []
    
    for size in matrix_sizes:
        print(f"\nRunning benchmark for {size}x{size} matrix...")
        
        a = tf.random.normal([size, size])
        b = tf.random.normal([size, size])
        
        with tf.device('/device:GPU:0'):
            _ = tf.matmul(a, b)
            gpu_times = []
            for _ in range(3):
                start_time = time.time()
                _ = tf.matmul(a, b)
                gpu_times.append(time.time() - start_time)
        
        with tf.device('/device:CPU:0'):
            _ = tf.matmul(a, b)
            cpu_times = []
            for _ in range(3):
                start_time = time.time()
                _ = tf.matmul(a, b)
                cpu_times.append(time.time() - start_time)
        
        results.append({
            "matrix_size": size,
            "gpu_mean_time": np.mean(gpu_times),
            "gpu_std_time": np.std(gpu_times),
            "cpu_mean_time": np.mean(cpu_times),
            "cpu_std_time": np.std(cpu_times),
            "speedup": np.mean(cpu_times) / np.mean(gpu_times)
        })
        
        print(f"GPU mean time: {results[-1]['gpu_mean_time']:.3f}s ± {results[-1]['gpu_std_time']:.3f}s")
        print(f"CPU mean time: {results[-1]['cpu_mean_time']:.3f}s ± {results[-1]['cpu_std_time']:.3f}s")
        print(f"GPU Speedup: {results[-1]['speedup']:.1f}x")
    
    return results #saving and printing results

def save_results(system_info, results, platform_name):
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    result_str = f"# GPU Benchmark Results - {platform_name}\nRun at: {timestamp}\n\n## System Information\n```\n{system_info}\n```\n\n## Benchmark Results\n| Matrix Size | GPU Time (s) | CPU Time (s) | Speedup |\n|------------|--------------|--------------|---------|"
    
    for r in results:
        result_str += f"\n| {r['matrix_size']}x{r['matrix_size']} | {r['gpu_mean_time']:.3f} ± {r['gpu_std_time']:.3f} | {r['cpu_mean_time']:.3f} ± {r['cpu_std_time']:.3f} | {r['speedup']:.1f}x |"
    
    return result_str

if __name__ == "__main__":
    PLATFORM_NAME = "YOUR_PLATFORM_NAME"
    print("Getting system information...")
    system_info = get_system_info()
    print("\nRunning benchmarks...")
    results = run_benchmark()
    print("\nGenerating report...")
    report = save_results(system_info, results, PLATFORM_NAME)
    print("\nReport:")
    print(report) #again, quite a bit of improvement, detailing a comprehensive report

Kaggle GPU T4 x 2¶

image.png

Kaggle GPU P100¶

image.png

Colab T4¶

image.png

Conclusion -2¶

As we can see that, all the GPUs are taking almost no time in all three matrix multiplication.¶

RAM Size, ~32 GB on Kaggle, ~13GB on Colab¶

GPU Speedup- used for gauging performance gains of GPU for intensive tasks- Colab T4 is more than double of T4 x 2 Kaggle GPU¶

So, even if the GPU speedup is more in Colab, the RAM size constraints, the larger dataset sizes model running in Colab, rather than Kaggle.¶

In [ ]: