the pandas cut()
documentation states that: "out of bounds values na in resulting categorical object." makes difficult when upper bound not clear or important. example:
cut (weight, bins=[10,50,100,200])
will produce bins:
[(10, 50] < (50, 100] < (100, 200]]
so cut (250, bins=[10,50,100,200])
produce nan
, cut (5, bins=[10,50,100,200])
. i'm trying produce > 200
first example , < 10
second.
i realize cut (weight, bins=[float("inf"),10,50,100,200,float("inf")])
or equivalent, report style following doesn't allow things (200, inf]
. realize specify custom labels via labels
parameter on cut()
, means remembering adjust them every time adjust bins
, often.
have exhausted possibilities, or there in cut()
or elsewhere in pandas
me this? i'm thinking writing wrapper function cut()
automatically generate labels in desired format bins, wanted check here first.
after waiting few days, still no answers posted - think that's because there no way around other writing cut()
wrapper function. posting version of here , marking question answered. change if new answers come along.
def my_cut (x, bins, lower_infinite=true, upper_infinite=true, **kwargs): r"""wrapper around pandas cut() create infinite lower/upper bounds proper labeling. takes same arguments pandas cut(), plus 2 more. args : lower_infinite (bool, optional) : set whether lower bound infinite default true. if true, , first bin element 20, first bin label '<= 20' (depending on other cut() parameters) upper_infinite (bool, optional) : set whether upper bound infinite default true. if true, , last bin element 20, first bin label '> 20' (depending on other cut() parameters) **kwargs : standard pandas cut() labeled parameters returns : out : same pandas cut() return value bins : same pandas cut() return value """ # quick passthru if no infinite bounds if not lower_infinite , not upper_infinite: return pd.cut(x, bins, **kwargs) # setup num_labels = len(bins) - 1 include_lowest = kwargs.get("include_lowest", false) right = kwargs.get("right", true) # prepend/append infinities indiciated bins_final = bins.copy() if upper_infinite: bins_final.insert(len(bins),float("inf")) num_labels += 1 if lower_infinite: bins_final.insert(0,float("-inf")) num_labels += 1 # decide boundary symbols based on traditional cut() parameters symbol_lower = "<=" if include_lowest , right else "<" left_bracket = "(" if right else "[" right_bracket = "]" if right else ")" symbol_upper = ">" if right else ">=" # inner function reused in multiple clauses labeling def make_label(i, lb=left_bracket, rb=right_bracket): return "{0}{1}, {2}{3}".format(lb, bins_final[i], bins_final[i+1], rb) # create custom labels labels=[] in range(0,num_labels): new_label = none if == 0: if lower_infinite: new_label = "{0} {1}".format(symbol_lower, bins_final[i+1]) elif include_lowest: new_label = make_label(i, lb="[") else: new_label = make_label(i) elif upper_infinite , == (num_labels - 1): new_label = "{0} {1}".format(symbol_upper, bins_final[i]) else: new_label = make_label(i) labels.append(new_label) # pass thru pandas cut() return pd.cut(x, bins_final, labels=labels, **kwargs)
Comments
Post a Comment