Written by Nickolay Shmyrev
on November 28, 2019

Spectre and deep learning

I noticed a big slowdown in RELU layer performance recently, essentially the RELU operation can now take up to 10% in the total CPU count. This is with kernel 4.15. On older machines everything is just fine.

RELU is a computation of max(x, 0) for a vector of floats, so I suspect a Spectre patch which should significantly slowdown CPU branch prediction. Who could think about that.

The solution seems to be:

diff --git a/src/matrix/kaldi-matrix.cc b/src/matrix/kaldi-matrix.cc
index faf23cdf0..3ef686310 100644
--- a/src/matrix/kaldi-matrix.cc
+++ b/src/matrix/kaldi-matrix.cc
@@ -2164,8 +2164,10 @@ void MatrixBase
   const Real *src_row_data = src.Data();
   for (MatrixIndexT row = 0; row < num_rows;
        row++,row_data += stride_, src_row_data += src.stride_) {
-    for (MatrixIndexT col = 0; col < num_cols; col++)
-      row_data[col] = (src_row_data[col] < floor_val ? floor_val : src_row_data[col]);
+    for (MatrixIndexT col = 0; col < num_cols; col++) {
+      Real diff = src_row_data[col] - floor_val;
+      row_data[col] = (src_row_data[col] + floor_val + std::abs(diff)) * 0.5;
+    }
   }
 }

← Top →