Wednesday, December 26, 2007

OpenMP rules, isn't it?

So, I have this comment by timothy_g telling me to add schedule(dynamic,100) to the pragma of my code to increase performance.
I did it, and WOW, OpenMP is almost half the time of serial version (e.g. for N=3000, TBB 15.8937 s, OpenMP 10.2118 s, Serial 18.9615 s). 
The  "problem" is the poor load balance of my code, so I must tell to OMP to change its default schedule and obtain dynamically the next chunk of work once finished the current. Pretty nice.
As I see this,in order of making OMP faster I have to know and say more. Roland Barthes said once that "La langue, comme performance de tout langage, n'est ni réactionnaire, ni progressiste ; elle est tout simplement : fasciste ; car le fascisme, ce n'est pas d'empêcher de dire, c'est d'obliger à dire.
What I like of this quote is that the fascism is to bind to say.
For me, and in this contex, OpenMP is more fascist than TBB, because you must say more. And don't matter if is just a line of code, the semantics of what you say is the important, not the syntax.

So, here is the version 2 of the code (I changed to an auto_partitioner() in TBB in order of say less):
parallel_for(blocked_range(0, N),
     ApplyFoo( a ), auto_partitioner() );

#pragma omp parallel for private (i,j) schedule(static, 100)
     b[i] += j;  

Tuesday, December 25, 2007


I was reading this blog at intel and I decided to make my own stupid-parallel_for benchmark.
The idea is to make a no-so-simple loop in parallel, so I made the work for each iteration dependent of the index:
b[i] += j;
The results are quite interesting (IMHO), I think as OpenMP doesn't have task stealing,  it waste a lot of time waiting for the other thread to finish his chunk of work. (At least, my Activity Monitor looks like that, just 1 core used most of the time for large i with omp.) 
I would like to test this with more cores to see the scaling.
If you want to test by yourself, here's the code:

#include "tbb/task_scheduler_init.h"
#include "tbb/parallel_for.h"
#include "tbb/blocked_range.h"
#include "tbb/tick_count.h"

using namespace tbb;
using namespace std;

static const size_t N = 23;

class ApplyFoo {
  float *const my_a;
  void operator() ( const blocked_range& r ) const {
    float *a = my_a;
    for( size_t i=r.begin();i!=r.end();i++){
      for(size_t j=0;j!=i*i;j++)
a[i] += j;
  ApplyFoo( float a[] ) :

int main(size_t argc, char *argv[]) {
 task_scheduler_init init;
 int N = 8000;
 float a[N],b[N];
 int i,j; 
 tbb::tick_count t0 = tbb::tick_count::now();
 parallel_for(blocked_range(0, N),
     ApplyFoo( a ), auto_partitioner() );
 tbb::tick_count t1 = tbb::tick_count::now();
#pragma omp parallel for private (i,j) 
     b[i] += j;  
 tbb::tick_count t2 = tbb::tick_count::now();
     b[i] += j;  
 tbb::tick_count t3 = tbb::tick_count::now();
 printf("TBB %g seconds\n",(t1 - t0).seconds());
 printf("OpenMP %g seconds\n",(t2 - t1).seconds());
 printf("Serial %g seconds\n",(t3 - t2).seconds());
 printf("%d %g %g %g\n",N,(t1 - t0).seconds(), (t2 - t1).seconds(), (t3 - t2).seconds());

 return 0;

Friday, December 14, 2007

Mex files with TBB (Multicore Matlab)

I finally make my Matlab run a mex file with TBB. 
Since I dind't see this in google, and involves a little hack to Matlab script, I start my blog with this entry.
The instructions are for leopard, but I think should be straight forward for linux (I don't know how to do it in Windows) .
First, you need to install intel TBB (or any other library you want) and set the environment variables. This most be done in any shell you start, so the easiest way to do it is add a few lines to your .profile
MacBook:Soka~$ cat .profile
PS1='MacBook:Soka\w\$ ';
source /Library/Frameworks/TBB.framework/Versions/2.0/bin/ 
source /opt/intel/cc/10.1.006/bin/ 
source /Library/Frameworks/Intel_MKL.framework/Versions/ 

The source command runs the scripts to set the environment variables so the program you compile can find the libraries it needs.
So, the idea its to tell Matlab to set this variables before start. You can do it by open a X11 terminal, set the variables and then run the Matlab script. As I am a lazy person, I want Matlab do this for me.
I grep for DYLD in Matlab directory and found that the only file that appears was bin/matlab. I opened this file with my favorite fast text editor, joe, and add the mentioned source commands at the beginning of the script. 
Incredibly, I started Matlab from Spotlight, and all was just fine. So I compile my helloworld.cpp (that has included intel TBB for doing a stupid for) just like this in Matlab prompt:

>> mex helloworld.cpp -ltbb

and the I am able to run my parallel-multicore mexfile.
Ok, it's no so nice, because if there is some kind of error in your mexfile, Matlab will die silently without any warning or error message.
More the next.