python - Nltk classify based on single parameter -
i trying use naivebayesclassifier classify times spent in areas of smart home.
my training data looks this:
[[{'time': '00:00'}, 'in'], [{'time': '00:01'}, 'in'], [{'time': '00:02'}, 'out'], [{'time': '00:03'}, 'out'], [{'time': '00:04'}, 'out'], [{'time': '00:05'}, 'out'], [{'time': '00:06'}, 'out'], ......, [{'time': '08:06'}, 'in'], [{'time': '08:07'}, 'in'], [{'time': '08:08'}, 'in'], ... ]
this code:
classifier = nltk.naivebayesclassifier.train(training_data) start_date = datetime.strptime('2010-11-19 00:00', '%y-%m-%d %h:%m') end_date = datetime.strptime('2010-11-19 23:59', '%y-%m-%d %h:%m') test_data = [] while start_date < end_date: test_data.append(dict(time=start_date.strftime('%h:%m'))) start_date += timedelta(0, 60) test = classifier.classify_many(test_data) print(test)
result looks this:
['out', 'out', 'out', 'out', 'out', 'out', 'out', 'out', 'out',....]
i never 'in'
result. can see wrong classifier?
as medali suggested, problem in dataset has 11% of in
, had adjust dataset according to: http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
i changed dataset, having hourly based data (if sensor activated during hour, added in
).
this not perfect solution, enough case.
Comments
Post a Comment