Sveriges mest populära poddar

Functional Design in Clojure

Ep 029: Problem Unknown: Log Lines

35 min • 17 maj 2019

Nate is dropped in the middle of a huge log file and hunts for the source of the errors.

  • We dig into the world of DonutGram.
  • "We are the devops."
  • Problem: we start seeing errors in the log.
  • "The kind of error that makes total sense to whoever wrote the log line."
  • (04:10) We want to use Clojure to characterize the errors.
  • Why not grep?
  • Well...we grep out the lines, count them, and we get 10,351 times. Is that a lot? Hard to say. We need more info.
  • (05:50) Developers think it's safe to ignore, but it's bugging us.
  • We want to come up with an incident count by user.
  • "Important thing we do a lot in the devops area: we see data and then we have to figure out, what does it mean?"
  • "What story is the data telling us?"
  • The "forensic data" versus "operating data" (Ep 022, 027)
  • "With forensic data, you're trying to save the state of the imprint of the known universe over time."
  • (07:40) The log file is the most common forensic data for applications.
  • To characterize, we need to manipulate the data at a higher level than grep.
  • First goal: make the data useful in Clojure
  • Get a nice sample from the production log, put it in a file, load up the REPL, and start writing code to parse it.
  • Need to parse out the data from the log file and turn it into something more structured.
  • (12:05) Let's make a function, parse-line that separates out all the common elements for each line:
    • timestamp
    • log level
    • thread name
    • package name
    • freeform "message"
  • "A message has arrived from the developers, lovingly bundled in a long log line."
  • (13:10) We can parse every line generically, but we need to parse the message and make sense of that.
  • "We want to lift up the unstructured data into structured data that we can map, filter, and reduce on."
  • We will have different kinds of log lines, so we need to detect them and parse appropriately.
  • We want to amplify the map to include the details for the specific kind of log line.
  • We'll use a kind field to identify which kind of log line it is.
  • There are two steps:
    1. recognize which kind of log line the "message" is for
    2. parse the data out of that message
  • "Maybe we'll have a podcast that's constantly about constantly. It's not just juxt."
  • (18:45) How do we do this concisely?
  • Let's use cond:
    • flatten out all the cases
    • code to detect kind (the condition)
    • code to parse the details (the expression)
  • Can use includes? in the condition to detect the kind and re-matches to parse.
  • (20:00) Why not use the regex to detect the kind too?
  • Can avoid writing the regex twice by using a def.
  • It's less code, but now we're running the regex twice: once in the condition and once to parse.
  • We can't capture the result of the test in cond. No "cond-let" macro.
  • We could write our own cond-let macro, but should we?
  • "When you feel like you should write a macro, you should step back and assess your current state of being. Am I tired? Did I have a bad day? Do I really need this?"
  • (24:05) New goals for our problem:
    • one regex literal
    • only run the regex once
    • keep the code that uses the matches close to the regex
  • Similar problem to "routes" for web endpoints: want the route definition next to the code that executes using the data for that route.
  • Feels like an index or table of options.
  • (25:20) Let's make a "table". A vector of regex and handler-code pairs.
  • We need a coffee mug: "Clojure. It's just a list."
  • The code can be a function literal that takes the matches and returns a map of parsed data.
  • Write a function parse-details which goes through each regex until one matches and then invokes the handler for that one. (See below.)
  • (30:15) Once we have higher-level data, it's straight forward to filter and group-by to get our user count.
  • Once again, the goal is to take unstructured data and turn it into structured data.
  • "You have just up-leveled unstructured information into a sequence of structured information."
  • Can slurp, split, map, filter, and then aggregate.
  • (32:10) What happens when we try to open a 10 GB log file?
  • Sounds like a problem for next week.
  • "Fixing production is always a problem for right now."
  • "If a server falls over and no one outside of the devops team knows about it, did the server really fall over?"
  • "Who watches the watchers?"

Related episodes:

Clojure in this episode:

  • #""
  • re-matches
  • case
  • cond
  • if-let
  • ->>
  • slurp
  • filter
  • group-by
  • def
  • constantly
  • juxt
  • clojure.string/
    • includes?
    • split

Related links:

Code sample from this episode:

(ns devops.week-01
  (:require
    [clojure.java.io :as io]
    [clojure.string :as string]
    ))

;; General parsing

(def general-re #"(\d\d\d\d-\d\d-\d\d)\s+(\d\d:\d\d:\d\d)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|\s(.*)")

(defn parse-line
  [line]
  (when-let [[whole dt tm thread-name level ns message] (re-matches general-re line)]
    {:raw/line whole
     :log/date dt
     :log/time tm
     :log/thread thread-name
     :log/level level
     :log/namespace ns
     :log/message message
     }))

(defn general-parse
  [lines]
  (->> lines
       (map parse-line)
       (filter some?)))


;; Detailed parsing

(def detail-specs
  [[#"transaction failed while updating user ([^:]+): code 357"
    (fn [[_whole user]] {:kind :code-357 :code-357/user user})]
   ])

(defn try-detail-spec
  [message [re fn]]
  (when-some [matches (re-matches re message)]
    (fn matches)))

(defn parse-details
  [entry]
  (let [{:keys [log/message]} entry]
    (if-some [extra (->> detail-specs
                         (map (partial try-detail-spec message))
                         (filter some?)
                         (first))]
      (merge entry extra)
      entry)))


;; Log analysis

(defn lines
  [filename]
  (->> (slurp filename)
       (string/split-lines)))

(defn summarize
  [filename calc]
  (->> (lines filename)
       (general-parse)
       (map parse-details)
       (calc)))


;; data summarizing

(defn code-357-by-user
  [entries]
  (->> entries
       (filter #(= :code-357 (:kind %)))
       (map :code-357/user)
       (frequencies)))


(comment
  (summarize "sample.log" code-357-by-user)
  )

Log file sample:

2019-05-14 16:48:55 | process-Poster | INFO  | com.donutgram.poster | transaction failed while updating user joe: code 357
2019-05-14 16:48:56 | process-Poster | INFO  | com.donutgram.poster | transaction failed while updating user sally: code 357
2019-05-14 16:48:57 | process-Poster | INFO  | com.donutgram.poster | transaction failed while updating user joe: code 357
Förekommer på
00:00 -00:00