Is it good to create Spark batch job for every new Use cases -

April 15, 2012

i run 100sof computer in network , 100sof user access machines. every day, thousands or more syslogsare generated machines. syslog log including system failures, network, firewall, application errors etc.

sample log below

may 11 11:32:40 scrooge sg_child[1829]: [id 748625 user.info] m:wr-sg-block-111- 00 c:y th:block , no allow rule matched request entryurl:http:url on  mapping:bali [ rid:t6zcuh8aaaeaagxyaqyaaaaq sid:a6bbd3447766384f3bccc3ca31dbd50n ip:192.24.61.1]

from logs, extract fields timestamp, loghost, msg, process, facility etc , store them in hdfs. logsare stored in json format. now, want build system can type query in web application , analysis on logs. able queries like

get logs message contains "firewall blocked" keywords.
get logs generated user jason
get logs containing "access denied" msg.
get log count grouped user, process, loghost etc. there thousands of different types of analytics want do. add more, want combined results of historical data , real time data i.e. combining batch , realtime results.

now questions is

to batch result, need run batch spark jobs. should creating batch jobs every unique query user makes. if so, end creating 1000s of batch jobs. if not, kind of batch jobs should run can results type of analytics.
am thinking right way. if approach wrong, share should correct procedure.

while it's possible (via thrift server example), apache spark main objective not query engine building data pipelines stream , batch data sources.

if transformation projecting fields , want enable ad-hoc queries, sounds need data store - such elasticsearch example. additional benefit comes kibana enable analytics extent.

another option use sql engine such apache drill.

Search This Blog

Force Net

Is it good to create Spark batch job for every new Use cases -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -