Unescape json string in python -
i getting following string log file , want remove backslashes string.
string file:
this exact string fro log file, except sensitive info replaced dummy values.
2017-08-17 17:29:49.249 error org.foo.bar.logging.applicationlogger - apierror={"input":"{\"requestbody\":\"{\\\"request\\\":{\\\"drequests\\\":{\\\"items\\\":[{\\\"description\\\":\\\"i add additional card.\\\",\\\"fields\\\":{\\\"field\\\":[{\\\"fieldname\\\":\\\"severity\\\",\\\"fieldvalue\\\":\\\"4\\\"},{\\\"fieldname\\\":\\\"contact\\\",\\\"fieldvalue\\\":\\\"phone\\\"},{\\\"fieldname\\\":\\\"callbacknumber\\\",\\\"fieldvalue\\\":\\\"1 (123) 123456\\\"},{\\\"fieldname\\\":\\\"version\\\",\\\"fieldvalue\\\":\\\"11.1\\\"},{\\\"fieldname\\\":\\\"language\\\",\\\"fieldvalue\\\":\\\"english\\\"}]},\\\"product\\\":\\\"visa\\\",\\\"subject\\\":\\\"adding addition card\\\",\\\"serial_number\\\":\\\"123456789\\\"}]},\\\"email\\\":\\\"someone@gmail.com\\\",\\\"first_name\\\":\\\"foo\\\",\\\"last_name\\\":\\\"bar\\\"}}\"}"} python code
str = read_from_file() print str.replace('\\"', '"') i tried line of code not having effect. how can rid of backslahses json string?
edit
i tried solution of recursively doing json.loads didn't remove backslashes.
just give better context - not processing json string json, instead writing file more readable human. below complete code.
import re import json tailf import tailf def parserecursive(obj): if isinstance(obj, str): try: # see whether json: if so, parse obj = json.loads(obj) except json.jsondecodeerror: pass # if not, leave if isinstance(obj, dict): # perform recursion prop, val in obj.items(): obj[prop] = parserecursive(val) return obj line in tailf("/var/log/test.log"): m = re.search('([\d\-:\s]+).*error.*apierror=(.*)', line) if m none: print "no match" else: print m.group(1) print parserecursive(m.group(2)); when run script print string , backslashes not removed @ notice 2 u' character in beginning of second line.
output
2017-08-17 17:29:49 {u'input': u'{"requestbody":"{\\"request\\":{\\"drequests\\":{\\"items\\":[{\\"description\\":\\"i add additional card.\\",\\"fields\\":{\\"field\\":[{\\"fieldname\\":\\"severity\\",\\"fieldvalue\\":\\"4\\"},{\\"fieldname\\":\\"contact\\",\\"fieldvalue\\":\\"phone\\"},{\\"fieldname\\":\\"callbacknumber\\",\\"fieldvalue\\":\\"1 (123) 123456\\"},{\\"fieldname\\":\\"version\\",\\"fieldvalue\\":\\"11.1\\"},{\\"fieldname\\":\\"language\\",\\"fieldvalue\\":\\"english\\"}]},\\"product\\":\\"visa\\",\\"subject\\":\\"adding addition card\\",\\"serial_number\\":\\"123456789\\"}]},\\"email\\":\\"someone@gmail.com\\",\\"first_name\\":\\"foo\\",\\"last_name\\":\\"bar\\"}}"}'} update
it silly mistake! had type case oobject string , call replace function. below code worked.
import re tailf import tailf line in tailf("/var/log/test.log"): m = re.search('([\d\-:\s]+).*error.*apierror=(.*)', line) if m none: print "no match" else: print m.group(1) encodedstring = m.group(2) + '' print str(encodedstring).replace('\\', '')
the json represented in string needs backslashes, apparently represented object has property values json encoded strings. these embedded strings need double quotes escaped. remove them make overall json invalid.
what may need resolve embedded json strings objects represent. best use recursive function uses json.loads method parse each of nested jsons.
nb: not idea use name str data, name of python data type.
here suggested solution:
import json import re def parserecursive(obj): if isinstance(obj, basestring): try: # see whether json: if so, parse obj = json.loads(obj) except json.jsondecodeerror: pass # if not, leave if isinstance(obj, dict): # perform recursion prop, val in obj.items(): obj[prop] = parserecursive(val) return obj # sample input msg = r'2017-08-17 17:29:49.249 error org.foo.bar.logging.applicationlogger - apierror={"input":"{\"requestbody\":\"{\\\"request\\\":{\\\"drequests\\\":{\\\"items\\\":[{\\\"description\\\":\\\"i add additional card.\\\",\\\"fields\\\":{\\\"field\\\":[{\\\"fieldname\\\":\\\"severity\\\",\\\"fieldvalue\\\":\\\"4\\\"},{\\\"fieldname\\\":\\\"contact\\\",\\\"fieldvalue\\\":\\\"phone\\\"},{\\\"fieldname\\\":\\\"callbacknumber\\\",\\\"fieldvalue\\\":\\\"1 (123) 123456\\\"},{\\\"fieldname\\\":\\\"version\\\",\\\"fieldvalue\\\":\\\"11.1\\\"},{\\\"fieldname\\\":\\\"language\\\",\\\"fieldvalue\\\":\\\"english\\\"}]},\\\"product\\\":\\\"visa\\\",\\\"subject\\\":\\\"adding addition card\\\",\\\"serial_number\\\":\\\"123456789\\\"}]},\\\"email\\\":\\\"someone@gmail.com\\\",\\\"first_name\\\":\\\"foo\\\",\\\"last_name\\\":\\\"bar\\\"}}\"}"}' # extract json part: take "{ ... }" message string match = re.search(r"\{.*\}", msg) if match: # parse nested json obj = parserecursive(match.group(0)) print obj see run on repl.int. note output obj dict. prefixed u in output means keys unicode strings. can access nested value in it, like:
print obj['input']['requestbody']['request']['email'] output:
someone@gmail.com
Comments
Post a Comment