So, I have this comment by timothy_g telling me to add schedule(dynamic,100) to the pragma of my code to increase performance.
I did it, and WOW, OpenMP is almost half the time of serial version (e.g. for N=3000, TBB 15.8937 s, OpenMP 10.2118 s, Serial 18.9615 s).
The "problem" is the poor load balance of my code, so I must tell to OMP to change its default schedule and obtain dynamically the next chunk of work once finished the current. Pretty nice.
As I see this,in order of making OMP faster I have to know and say more. Roland Barthes said once that "La langue, comme performance de tout langage, n'est ni réactionnaire, ni progressiste ; elle est tout simplement : fasciste ; car le fascisme, ce n'est pas d'empêcher de dire, c'est d'obliger à dire."
What I like of this quote is that the fascism is to bind to say.
For me, and in this contex, OpenMP is more fascist than TBB, because you must say more. And don't matter if is just a line of code, the semantics of what you say is the important, not the syntax.
So, here is the version 2 of the code (I changed to an auto_partitioner() in TBB in order of say less):
parallel_for(blocked_range(0, N), ApplyFoo( a ), auto_partitioner() );
#pragma omp parallel for private (i,j) schedule(static, 100) for(i=0;i<=n;i++){ for(j=0;j!=i*i;j++) b[i] += j; }