Question: Trading off the efficiencies of add() vs. a customized for ... do...end do

Hello wizards,

I'm given to understand that using add() or if possible Task:-Add() is more efficient than a FOR...DO loop. Today I'm asking about the limits of this generalization. My illustration is probably missing some evalf's, but hopefully the concept is clear:

A computation I'm working on involves between 10E4 and 10E6 computations like

add(F[i-disp]*((i+i-1)*A*x^(B/(i+i-1))*G[(1/2)*floor(lnx2Dlnx1*(i+i-1))-disp]-C)/((i+i-1)*A-B), i = First .. Last)

in which each add has at least a thousand terms. If I rewrote this as a FOR loop, I could maintain a couple of loop variables in parallel with addition, leaving a few multiplies, a floor() call, two divides and a fractional exponentiation per term. I would guess that the floor(), fractional exponentiation  and divisions dominate the effort, so that add() is preferable to FOR...DO.

Question 1: As the intermediate variables get more repetitive and efficiencies less obvious to an interpreter, however, I wonder if a FOR..DO will become more efficient. What addition all overhead does a FOR loops impose that an add() does not, and is the interpreter clever enough and avoid repeating the unchanged calculations (e.g., 2i-1 above)?

Question 2: Now about evalf() and Task:-Add. I'm working on a 6-way AMD processor and would love to distribute the computations, but my understanding from Maple support is that evalf() is not a thread-safe call. When I tried to use it in the task model on an embarassingly parallelizable sw floating point problem, I quickly lost contact with the kernel and had a memory explosion unless I reduced num_cpus to 2, which sometimes but not always worked. I might try this using hfloat, but I'm concerned about rounding. Can anyone shine a light on the safe use of evalf or avoiding its use in the task model (or even using threads outside of tasks, although I'm not at all excited about going there)?

Question 3: If I can't exploit my extra processors, should I even think about trying to exploit CUDA capabilities of an NVDA graphics card? I think not, but would be glad to be corrected.

I hope that others will learn from this post. I appreciate and look forward to your thoughts.

 - Jimmy

Please Wait...