python - UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 894: invalid start byte -


i using scikit-learn project. while performing feature extraction (working_with_text_data tutorial) unicodedecodeerror: 'utf8' codec can't decode byte.

using python 2.7.8 , have build scikit-learn using make.

from sklearn.feature_extraction.text import countvectorizer count_vect = countvectorizer() x_train_counts = count_vect.fit_transform(dataset.data) print(x_train_counts.shape) 

kindly on how resolve?


Comments