Measuring Performance

While optimizing a transformer can lower time and memory requirements, it might result in a minor decrease in performance. Therefore, it’s crucial to evaluate the model’s performance after applying these optimization techniques.

Deploying transformers in production environments involves a trade-off among several constraints, the most common being

Model Performace
Time
Memory

Let’s start by developing a basic benchmark that evaluates each metric for a specific pipeline and test set.

import torch
from pathlib import Path
from time import perf_counter


class PerformanceBenchmark:
    def __init__(self, pipeline, dataset, optim_type="BERT baseline"):
        self.pipeline = pipeline
        self.dataset = dataset
        self.optim_type = optim_type

    def compute_accuracy(self):
        preds, labels = [], []
        for example in self.dataset:
            pred = self.pipeline(example["text"])[0]["label"]
            label = example["intent"]
            preds.append(intents.str2int(pred))
            labels.append(label)
        accuracy = accuracy_score.compute(predictions=preds, references=labels)
        return accuracy


    def compute_size(self):
        state_dict = self.pipeline.model.state_dict()
        tmp_path = Path("model.pt")
        torch.save(state_dict, tmp_path)
        # Calculate size in megabytes
        size_mb = Path(tmp_path).stat().st_size / (1024 * 1024)
        # Delete temporary file
        tmp_path.unlink()
        return {"size_mb": size_mb}
    
    
    def time_pipeline(self):
        latencies = []
        # Warmup
        for _ in range(10):
            _ = self.pipeline(query)
        # Timed run
        for _ in range(100):
            start_time = perf_counter()
        _ = self.pipeline(query)
        latency = perf_counter() - start_time
        latencies.append(latency)
        # Compute run statistics
        time_avg_ms = 1000 * np.mean(latencies)
        time_std_ms = 1000 * np.std(latencies)
        return {"time_avg_ms": time_avg_ms, "time_std_ms": time_std_ms}
    
    
    def run_benchmark(self):
        metrics = {}
        metrics[self.optim_type] = self.compute_size()
        metrics[self.optim_type].update(self.time_pipeline())
        metrics[self.optim_type].update(self.compute_accuracy())
        return metrics

The PerformanceBenchmark class in the provided code evaluates a machine learning pipeline’s performance in terms of size, inference time, and accuracy.
The compute_accuracy method calculates accuracy by comparing predicted and actual labels.
The compute_size method determines the model’s size in megabytes by temporarily saving and measuring the state dictionary.
The time_pipeline method measures inference latency, including a warm-up phase for accuracy. Finally, run_benchmark compiles these metrics into a comprehensive performance report, organizing the results by optimization type.

No comments yet! You be the first to comment.

GET HELP

CONTACT US

Address : Sector 63A, Anishi's Utsav, Noida

Practical NLP With Transformers

Measuring Performance

Leave a Reply Cancel reply

GET HELP

CONTACT US

Address : Sector 63A, Anishi's Utsav, Noida

Modal title