python - How to import csv file as a training with label and testing with target data for classifier in scikit-learn? -

July 15, 2014

i have 2 csv files training , testing data. both of them (i show 1 of them, both of them same form of data , same attributes name) :

full,id,id & ppdb,id & words sequence,id & synonyms,id & hypernyms,id & hyponyms,gold standard 1.667,0.476,0.952,0.476,1.429,0.952,0.476,2.345 3.056,1.111,1.667,1.111,3.056,1.389,1.111,1.9 1.765,1.176,1.176,1.176,1.765,1.176,1.176,2.2 0.714,0.714,0.714,0.714,0.714,0.714,0.714,0.0 1.538,0.769,0.769,0.769,1.538,0.769,0.769,2.586 2.188,1.875,1.875,1.875,1.875,2.188,1.875,1.667 3.333,1.333,1.333,1.333,3.333,2.0,1.333,2.8 2.5,1.667,1.667,1.667,2.222,1.944,1.667,2.481

i'm newbie in scikit-learn. learn example of training+label , testing+target data input :

x_train = np.array(["new york hell of town",                     "new york dutch",                     "the big apple great",                     "new york called big apple",                     "nyc nice",                     "people abbreviate new york city nyc",                     "the capital of great britain london",                     "london in uk",                     "london in england",                     "london in great britain",                     "it rains lot in london",                     "london hosts british museum",                     "new york great , london",                     "i london better new york"]) y_train_text = [["new york"],["new york"],["new york"],["new york"],["new york"],                 ["new york"],["london"],["london"],["london"],["london"],                 ["london"],["london"],["new york","london"],["new york","london"]]  x_test = np.array(['nice day in nyc',                    'welcome london',                    'london rainy',                    'it raining in britian',                    'it raining in britian , big apple',                    'it raining in britian , nyc',                    'hello welcome new york. enjoy here , london too']) target_names = ['new york', 'london']

is possible import csv files contain of float numbers training label , testing target data input? also, want make gold standard attribute label training data , target testing data. if it's possible, how make input? thanks

as suggested in @vivek kumar's comment, job done using pandas' csv_read , iloc this:

in [12]: import pandas pd  in [13]: import numpy np  in [14]: df = pd.read_csv('train.txt')  in [15]: x_train = np.asarray(df.iloc[:, :-1])  in [16]: y_train = np.asarray(df.iloc[:, -1])  in [17]: x_train out[17]:  array([[ 1.667,  0.476,  0.952, ...,  1.429,  0.952,  0.476],        [ 3.056,  1.111,  1.667, ...,  3.056,  1.389,  1.111],        [ 1.765,  1.176,  1.176, ...,  1.765,  1.176,  1.176],        ...,         [ 2.188,  1.875,  1.875, ...,  1.875,  2.188,  1.875],        [ 3.333,  1.333,  1.333, ...,  3.333,  2.   ,  1.333],        [ 2.5  ,  1.667,  1.667, ...,  2.222,  1.944,  1.667]])  in [18]: y_train out[18]: array([ 2.345,  1.9  ,  2.2  ,  0.   ,  2.586,  1.667,  2.8  ,  2.481])

please notice have saved data provided file train.txt.

Search This Blog

Force Net

python - How to import csv file as a training with label and testing with target data for classifier in scikit-learn? -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -