Written by
Nickolay Shmyrev
on
Spectre and deep learning
I noticed a big slowdown in RELU layer performance recently, essentially
the RELU operation can now take up to 10% in the total CPU count. This is
with kernel 4.15. On older machines everything is just fine.
RELU is a computation of max(x, 0) for a vector of floats, so I suspect a
Spectre patch which should significantly slowdown CPU branch prediction.
Who could think about that.
The solution seems to be:
diff --git a/src/matrix/kaldi-matrix.cc b/src/matrix/kaldi-matrix.cc
index faf23cdf0..3ef686310 100644
--- a/src/matrix/kaldi-matrix.cc
+++ b/src/matrix/kaldi-matrix.cc
@@ -2164,8 +2164,10 @@ void MatrixBase
const Real *src_row_data = src.Data();
for (MatrixIndexT row = 0; row < num_rows;
row++,row_data += stride_, src_row_data += src.stride_) {
- for (MatrixIndexT col = 0; col < num_cols; col++)
- row_data[col] = (src_row_data[col] < floor_val ? floor_val : src_row_data[col]);
+ for (MatrixIndexT col = 0; col < num_cols; col++) {
+ Real diff = src_row_data[col] - floor_val;
+ row_data[col] = (src_row_data[col] + floor_val + std::abs(diff)) * 0.5;
+ }
}
}