i tried use cython.parallel prange. can see 2 cores 50% being used. how can make use of cores. i.e. send loops cores simultaneously sharing arrays, volume , mc_vol?
edit: edited purely sequential for-loop 30 seconds faster than cython.parallel prange version. both of them using 1 core only. there way parallelize this.
cimport cython cython.parallel import prange, parallel, threadid libc.stdio cimport sprintf libc.stdlib cimport malloc, free cimport numpy np @cython.boundscheck(false) @cython.wraparound(false) cpdef mc_surface(np.ndarray[np.int_t,ndim=3] volume, np.ndarray[np.float32_t,ndim=3] mc_vol): cdef int vol_len=len(volume)-1 cdef int k, j, cdef char* pattern # string pointer - allocate later perm_area = { "00000000": 0.000000, ... "00011101": 1.515500 } try: pattern = <char*>malloc(sizeof(char)*260) k in range(vol_len): j in range(vol_len): in range(vol_len): sprintf(pattern, "%i%i%i%i%i%i%i%i", volume[i, j, k], volume[i, j + 1, k], volume[i + 1, j, k], volume[i + 1, j + 1, k], volume[i, j, k + 1], volume[i, j + 1, k + 1], volume[i + 1, j, k + 1], volume[i + 1, j + 1, k + 1]); mc_vol[i, j, k] = perm_area[pattern] # if perm_area[pattern] > 0: # print pattern, 'area: ', perm_area[pattern] #total_area += perm_area[pattern] finally: free(pattern) return mc_vol
edit following davidw's suggestion, prange considerably slower:
cpdef mc_surface(np.ndarray[np.int_t,ndim=3] volume, np.ndarray[np.float32_t,ndim=3] mc_vol): cdef int vol_len=len(volume)-1 cdef int k, j, cdef char* pattern # string pointer - allocate later perm_area = { "00000000": 0.000000, ... "00011101": 1.515500 } nogil,parallel(): try: pattern = <char*>malloc(sizeof(char)*260) k in prange(vol_len): j in range(vol_len): in range(vol_len): sprintf(pattern, "%i%i%i%i%i%i%i%i", volume[i, j, k], volume[i, j + 1, k], volume[i + 1, j, k], volume[i + 1, j + 1, k], volume[i, j, k + 1], volume[i, j + 1, k + 1], volume[i + 1, j, k + 1], volume[i + 1, j + 1, k + 1]); gil: mc_vol[i, j, k] = perm_area[pattern] # if perm_area[pattern] > 0: # print pattern, 'area: ', perm_area[pattern] # total_area += perm_area[pattern] finally: free(pattern) return mc_vol
my setup file looks like:
setup( name='surfacearea', ext_modules=[ extension('c_marchsurf', ['c_marchsurf.pyx'], include_dirs=[numpy.get_include()], extra_compile_args=['-fopenmp'], extra_link_args=['-fopenmp'], language="c++") ], cmdclass={'build_ext': build_ext}, requires=['cython', 'numpy', 'matplotlib', 'pathos', 'scipy', 'cython.parallel'] )
the problem with gil:
, defines block can run on 1 core @ once. aren't doing else inside loop shouldn't expect speed-up.
in order avoid using gil need avoid using python features possible. avoid in string formatting part using c sprintf
create string. dictionary lookup part, easiest thing use c++ standard library, contains map
class similar behaviour. (note you'll need compile cython's c++ mode)
# @ top of file libc.stdio cimport sprintf libc.stdlib cimport malloc, free libcpp.map cimport map libcpp.string cimport string import numpy np cimport numpy np # ... code omitted .... cpdef mc_surface(np.ndarray[np.int_t,ndim=3] volume, np.ndarray[np.float32_t,ndim=3] mc_vol): # note above i've defined volume numpy array # can fast, gil-less direct array lookup cdef char* pattern # string pointer - allocate later perm_area = {} # dictionary, before # depending on size of perm_area, conversion # c++ object potentially quite slow (it involves lot # of string copies) cdef map[string,float] perm_area_m = perm_area # ... code omitted ... nogil,parallel(): try: # assigning pattern here makes thread local # it's assigned once per thread isn't bad pattern = <char*>malloc(sizeof(char)*50) # when allocate pattern need make big enough # either calculating size, or making overly big # ... more code omitted... # later, inside loops sprintf(pattern, "%i%i%i%i%i%i%i%i", volume[i, j, k], volume[i, j + 1, k], volume[i + 1, j, k], volume[i + 1, j + 1, k], volume[i, j, k + 1], volume[i, j + 1, k + 1], volume[i + 1, j, k + 1], volume[i + 1, j + 1, k + 1]); # , dictionary lookup without gil # because we're using c++ class instead. # unfortunately, need string copy (which might slow things down) mc_vol[i, j, k] = perm_area_m[string(pattern)] # aware can throw exception if # pattern not match (same python). finally: free(pattern)
i've had change volume being numpy array, since if python object i'd need gil index elements.
(edit: changed take dictionary lookup out of gil block using c++ map)
Comments
Post a Comment