c++ - No speedup with OpenMP -

- February 15, 2012

i working openmp in order obtain algorithm near-linear speedup. unfortunately noticed not desired speedup.

so, in order understand error in code, wrote code, easy one, double-check speedup in principle obtainable on hardware.

this toy example wrote:

#include <omp.h> #include <cmath> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <string.h> #include <cstdlib> #include <fstream> #include <sstream> #include <iomanip> #include <iostream> #include <stdexcept> #include <algorithm> #include "mkl.h"  int main () {       int number_of_threads = 1;       int n = 600;       int m = 50;       int n = n/number_of_threads;       int time_limit = 600;       double total_clock = omp_get_wtime();       int time_flag = 0;        #pragma omp parallel num_threads(number_of_threads)        {           int thread_id = omp_get_thread_num();           int iteration_number_local = 0;           double *c = new double[n]; std::fill(c, c+n, 3.0);           double *d = new double[n]; std::fill(d, d+n, 3.0);           double *cd = new double[n]; std::fill(cd, cd+n, 0.0);            while (time_flag == 0){                 (int = 0; < n; i++)                                          for(int z = 0; z < m; z++)                         for(int x = 0; x < n; x++)                             for(int c = 0; c < n; c++){                                 cd[c] = c[z]*d[x];                                 c[z] = cd[c] + d[x];                             }                 iteration_number_local++;                 if ((omp_get_wtime() - total_clock) >= time_limit)                      time_flag = 1;             }        #pragma omp critical        std::cout<<"i "<<thread_id<<" , got" <<iteration_number_local<<"iterations."<<std::endl;        }     }

i want highlight again code toy-example try see speedup: first for-cycle becomes shorter when number of parallel threads increases (since n decreases).

however, when go 1 2-4 threads number of iterations double expected; not case when use 8-10-20 threads: number of iterations not increase linearly number of threads.

could please me this? code correct? should expect near-linear speedup?

results

running code above got following results.

1 thread: 23 iterations.

20 threads: 397-401 iterations per thread (instead of 420-460).

your measurement methodology wrong. small number of iterations.

1 thread: 3 iterations.

3 reported iterations means 2 iterations finished in less 120 s. third 1 took longer. time of 1 iteration between 40 , 60 s.

2 threads: 5 iterations per thread (instead of 6).

4 iterations finished in less 120 s. time of 1 iteration between 24 , 30 s.

20 threads: 40-44 iterations per thread (instead of 60).

40 iterations finished in less 120 s. time of 1 iteration between 2.9 , 3 s.

as can see results not contradict linear speedup.

it simpler , accurate execute , time 1 single outer loop , see perfect linear speedup.

some reasons (non exhaustive) why don't see linear speedup are:

memory bound performance. not case in toy example n = 1000. more general speaking: contention shared resource (main memory, caches, i/o).
synchronization between threads (e.g. critical sections). not case in toy example.
load imbalance between threads. not case in toy example.
turbo mode use lower frequencies when cores utilized. can happen in toy example.

from toy example approach openmp can improved better using high level abstractions, e.g. for.

more general advise broad format , require more specific information non-toy example.

Search This Blog

If cop

c++ - No speedup with OpenMP -

Comments

Post a Comment

Popular posts from this blog

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -