python - pandas cut with infinite upper/lower bounds -


the pandas cut() documentation states that: "out of bounds values na in resulting categorical object." makes difficult when upper bound not clear or important. example:

cut (weight, bins=[10,50,100,200]) 

will produce bins:

[(10, 50] < (50, 100] < (100, 200]] 

so cut (250, bins=[10,50,100,200]) produce nan, cut (5, bins=[10,50,100,200]). i'm trying produce > 200 first example , < 10 second.

i realize cut (weight, bins=[float("inf"),10,50,100,200,float("inf")]) or equivalent, report style following doesn't allow things (200, inf]. realize specify custom labels via labels parameter on cut(), means remembering adjust them every time adjust bins, often.

have exhausted possibilities, or there in cut() or elsewhere in pandas me this? i'm thinking writing wrapper function cut() automatically generate labels in desired format bins, wanted check here first.

after waiting few days, still no answers posted - think that's because there no way around other writing cut() wrapper function. posting version of here , marking question answered. change if new answers come along.

def my_cut (x, bins,             lower_infinite=true, upper_infinite=true,             **kwargs):     r"""wrapper around pandas cut() create infinite lower/upper bounds proper labeling.      takes same arguments pandas cut(), plus 2 more.      args :         lower_infinite (bool, optional) : set whether lower bound infinite             default true. if true, , first bin element 20,             first bin label '<= 20' (depending on other cut() parameters)         upper_infinite (bool, optional) : set whether upper bound infinite             default true. if true, , last bin element 20,             first bin label '> 20' (depending on other cut() parameters)         **kwargs : standard pandas cut() labeled parameters      returns :         out : same pandas cut() return value         bins : same pandas cut() return value     """      # quick passthru if no infinite bounds     if not lower_infinite , not upper_infinite:         return pd.cut(x, bins, **kwargs)      # setup     num_labels      = len(bins) - 1     include_lowest  = kwargs.get("include_lowest", false)     right           = kwargs.get("right", true)      # prepend/append infinities indiciated     bins_final = bins.copy()     if upper_infinite:         bins_final.insert(len(bins),float("inf"))         num_labels += 1     if lower_infinite:         bins_final.insert(0,float("-inf"))         num_labels += 1      # decide boundary symbols based on traditional cut() parameters     symbol_lower  = "<=" if include_lowest , right else "<"     left_bracket  = "(" if right else "["     right_bracket = "]" if right else ")"     symbol_upper  = ">" if right else ">="      # inner function reused in multiple clauses labeling     def make_label(i, lb=left_bracket, rb=right_bracket):         return "{0}{1}, {2}{3}".format(lb, bins_final[i], bins_final[i+1], rb)      # create custom labels     labels=[]     in range(0,num_labels):         new_label = none          if == 0:             if lower_infinite:                 new_label = "{0} {1}".format(symbol_lower, bins_final[i+1])             elif include_lowest:                 new_label = make_label(i, lb="[")             else:                 new_label = make_label(i)         elif upper_infinite , == (num_labels - 1):             new_label = "{0} {1}".format(symbol_upper, bins_final[i])         else:             new_label = make_label(i)          labels.append(new_label)      # pass thru pandas cut()     return pd.cut(x, bins_final, labels=labels, **kwargs) 

Comments