dohashi

1192 Reputation

10 Badges

19 years, 77 days
I am a Senior Software Developer in the Kernel Group, working on the Maple language interpreter. I have been working at Maplesoft since 2001 on many aspects of the Kernel, however recently I have been focusing on enabling parallel programming in Maple. I have added various parallel programming tools to Maple, and have been trying to teaching parallel programming techniques to Maple programmers. I have a Master's degree in Mathematics (although really Computer Science) from the University of Waterloo. My research focused on Algorithm and Data Structure, Design and Analysis.

MaplePrimes Activity


These are replies submitted by dohashi

OK, I was able to reproduce this issue.  It seems to be 64 bit windows specific.  We will investigate further.

@mthkvv I was able to run this example on my linux box, and it ran fine, with a peak alloc of around 4.6 G.  Just to double check, could you show the output of:

kernelopts( system )

and

kernelopts( datalimit )

 

@mthkvv What platform?  Sometimes linux/mac will have default shell limits, etc.

What platform are you running on?  Any chance you are running a 32 bit version of Maple?  Or imposing a shell or command line memory limit?

On windows the maximum stack is set at compile time so that may be why Xss is not helping.  The Maple kernel is built with a large maximum stack size so that we can handle deeply recursive function calls.  I'm not sure what java's maximum stack size is.

Unforunately for OpenMaple the stack size is outside of Maple's control.

If you can do your work under linux (or OS X), I'd suggest using those platforms.  They allow user control over the processes stack limit, as you have already discovered.

 

Darin

@itsme 

Awesome!

I'm glad we were (eventually) able to find a work-around for you.

Darin

@itsme 

Try anames.  That should list all the assigned names in your Maple session.  anames( 'user' ) should list all the names that were assigned values by you.  That will probably be everything you need to send.

 

Darin

@itsme I can't really give specific timelines for issues getting fixed.  What I can say is that 1 is a bug and looks to be relatively easy to fix, which makes it more likely to be fixed sooner rather than later.  2 is a feature and so it will take more time to implement and thus is more likely to take longer.

Sorry I can't be more specific, but I don't want to tell you something that I can't be sure is correct.

Darin

 

@Alejandro Jakubi 

Grid:-Map calls Grid:-Launch and uses the same defaults.  The difference is that Launch requires the user to specify how they want the work distributed around the nodes and to co-ordinate the work, whereas Map, because of its simpler calling sequence (evaluating a function on each element of a structure), can do that low level work for the user (although imperfectly, as @itsme discovered).  This means that Map is much easier to use that Launch.

For some cases it is possible to be more efficient using Launch, however many cases using Map vs doing the same thing with Launch will not be signficantly slower.  The main efficiency difference would be those cases where there was lots of data that needed to be shared, but not every kernel needs all the data (kernel 1 needs data A, kernel 2 data B, 3 C, 4 D).  With Launch you can specify exactly which data is shared to which kernel.  With Map, all the data will be shared which each kernel.  Of course there is also the issue that Map's attempts to identify what data is needed by the grid kernels is imperfect so there are cases that don't work in Map that could be coded with Launch, as @itsme issue illustrates.

What you are seeing with my example is expected.  Notice that the length of the mapped list goes from 1 to 2*N.  If we say that L is the length of the list for a particular call to Map, then if L = i*N for some integer i, then each grid kernel has the same amount of work.  However if L != i*N (which is most of the time), then some grid kernels will have to do more work than others.  Thus some grid kernels will be idle while others are still running.  What is more important is the real time it takes to execute each Map.  For L = 1 to N you should see times around 5 seconds, where as for L = N+1 to 2*N you should see 10 seconds.  The time should be approximately ceil( L/N )*5 seconds.  If you are seeing that the work is being spread around the grid kernels relatively evenly.

You may also be running into the shortcomings of the process monitor program.  It may update at a frequency that is too long to really see what is happening.  The easy way to fix this is to increase the time spent in the busy wait loop in the f function.

Darin

@Alejandro Jakubi 

Case 1:  The main kernel creates N grid kernels for a total of N+1 kernels.  The main kernel is not taking part in the grid computation, so it is idle, waiting for the grid computation to end.  Thus the main kernel does not require significant cpu time, and so each grid kernel gets exclusive access to a CPU.  N cpus, N running kernels.

As @Carl Love says, the main kernel will be swapped out, so it will not interfere with the running kernels.  If you look at a process monitor you should see N+1 kernels, but only N of them actively running.  If you run the simple example I posted near the top of this thread you should see that behaviour.  Now, if you look at the process monitor, you'll see that there are dozens if not hundreds of processes on your system, the vast majority of which are not actively running.  These idle processes are not assigned a specific CPU.  When they need to run, the OS will put them onto a CPU when one is available, and move whatever process was there off.  However this happens very quickly so that if there is a processes that wants to run at 100%, you'll rarely notice the other processes getting swapped in and out.

Case 2: The distinction between case 1 and case 2 is if the kernel calling Map is the main kernel or a grid kernel.  Main kernel is case 1, grid kernel is case 2.  For case 2 to happen, Map called by a grid kernel, there must already be grid kernels, so Maple will not create more grid kernels in Case 2.  Some other command, for example Launch, was called and that call created the grid and launched a computation.  As part of that computation the grid kernels call Map, which is case 2.  Case 2 happens on an existing grid, it does not create one.

When Maple creates a local grid, it follows the same default behaviour no matter which Grid command is used to create it.  Maple will create N grid kernels (for a total of N+1 kernels). 

Darin

@Alejandro Jakubi 

This comment applies to local grid.

Whenever any Grid command (including Map) needs to start a grid, by default, it will create N kernels, where N is the number of cpus on the local machine.  Each of these kernels is a separate process and so are scheduled according to however the current operating system schedules processes.  Basically this means if a process is ready to execute and there is an available cpu it will run on that cpu.  Thus on an N cpu machine there will be one cpu for each grid kernel. If there are no other processes requiring significant cpu time, then each grid kernel will be able to run at 100%.  If you want more details about how operating systems schedule processes, I suggest you read the link I provided above.

The above description applies to Case 1 from the Map help page.  As Case 2 can only happen after a grid is running, (Case 2 occurs when Map is called from a kernel in a grid) the above is not really relavent.  However it will apply to whatever command was used to launch the grid in the first place.

Darin

@Alejandro Jakubi

In the local case, the controller kernel and the main kernel are the same.

My comment "With local grid the default is always to create N grid kernels." applies to Map.  Only case 1 can cause the main kernel to launch kernels, because case 2 can only occur if the grid is already running.  If the grid has not already been started, a call to Grid:-Map from the main kernel (local, case 1) will cause N local grid kernels to be started.  If the grid has already been started, it will simply use the exisiting grid kernels.

You have said something to this effect a few times now:

there are N+1 kernels, the controler and a grid kernel in one cpu, and a grid kernel in each of the other cpus.

This is not how multitasking operating systems schedule processes to cpus.

Darin

@Alejandro Jakubi 

There is a slight difference between controller and main.  There is a main kernel, the kernel associated with your Maple session, in both local and distributed.  However only with the local grid does that main kernel also manage (control) the grid itself.  In distributed the grid is managed outside of Maple.

The two cases effect how Grid:-Map behaves, they do not effect the layout of the Grid itself.  In case 2, Grid:-Map is called by a kernel in the grid, thus the grid must already exist for this to happen.  With local grid the default is always to create N grid kernels.

Darin

p.s. You may want to read up on multitasking.  Modern OS don't assign processes to CPUs in the way you seem to believe.

@Alejandro Jakubi 

That first paragraph is not about local vs distributed.

If you open a new Maple session, you get a kernel that executes your commands.  This kernel is not part of any grid.  I will refer to this as the main kernel.  If you start a local grid computation, Maple launches multiple local grid kernels.  These local grid kernels can run Maple commands, including Grid commands.  The two cases discussed in the first paragraph you quoted are the main kernel calling Grid:-Map vs a grid kernel calling Grid:-Map, not local Grid vs distributed Grid (aka Grid Computing Toolbox).  You have "main" kernels and grid kernels in both local and distributed grid. 

In case 1, the common case for most users, you start Maple and call Map in your worksheet (using the main kernel).  This in turn creates some local grid nodes and executes the action described in the Map command across those local nodes.

In case 2, a grid node, as part of a running grid computation, calls Grid:-Map.  This is the advanced case that is shown in the last example of the help page.   So in the given example the main kernel creates a grid computation by calling Launch and the grid nodes, as part of that computation, call Map.

Both of these cases work with or without the Grid Computing Toolbox.

Darin

@Alejandro Jakubi 

I find that you are not addressing this question. Once again: can distributed grid mode computations be done within a single machine by means of the Grid package alone, without the Grid Computing Toolbox?

 

I think we are having a terminology issue here.  Distributed means computing on more that one computer.  So no, you can't use "distributed" on a single machine, by definition.  The local grid mode (which does not require the Grid Computing Toolbox) creates multiple kernel processes on the local machine.  Thus you can get parallel processing on a single machine using the local grid mode.

Without the Grid Computing Toolbox you have parallel processes running on a single machine.  With the Grid Computing Toolbox you can have parallel processes running on many machines (aka distributed computation).  The same Grid commands work in both cases.  If you are only interested in the local grid mode issues like the cost of communication are less important.  It is (relatively) inexpensive to communicate with local nodes.  However with distributed nodes, you may be sending data across the internet to computers thousands of kilometers away.  In that case, the cost of communication can be absolutely critical so managing it properly (using commands like Send and Receive) is very important.

Darin

1 2 3 4 5 6 Page 1 of 6