Christoph and Nate lift concepts from the raw log-parsing series.
- Reflecting on the lessons learned in the log series.
- (01:15) Concept 1: We found Clojure to be useful for devops.
- Everything is a web application these days,
- "The only UIs in Devops are dashboards."
- For most of the series, our UI was our connected editor.
- We grabbed a chunk of the log file and were fiddling with the data in short order.
- We talk about connected editors in our REPL series, starting with Episode 12.
- Being able to iteratively work on the log parsing functions in our editor was key to exploring the data in the log files.
- (04:04) Concept 2: Taking a lazy approach is essential when working with a large data set.
- Lazily going through a sequence is reminiscent of database cursors. You are at some point in a stream of data.
- We ran into some initial downsides.
- When using
with-open
, fully lazy processing results in an I/O error, because the file has been closed already.
- Shouldn't be too eager too early, because then the entire dataset will reside in memory.
- Two kinds of functions: lazy and eager.
- Lazy functions only take from a sequence as they need more values.
- Eager functions consume the whole sequence before returning.
- Ensure that only the last function in the processing chain is eager.
- "It only takes one eager to get everybody unlazy."
- (08:38) Concept 3: Clojure helps you make your own lazy sequences using
lazy-seq
.
- Clojure has a deep library of functions for making and processing lazy sequences.
- We were able to make our own lazy sequences that could then be used with those functions.
- Wrap the body in
lazy-seq
and return either nil
(to indicate the end) or a sequence created by calling cons
on a real value and a recursive call to itself.
- (12:41) Concept 4: We work with information at different levels, and that forms an information hierarchy.
- The data goes from bits to characters to lines, and then we get involved.
- We move from lines on up to more meaningful entities. Parsed lines are maps that have richer information, and then errors are richer still.
- Our parsers take a sequence and emit a new sequence that is at a higher level of information.
- We first explored this concept in the Time series.
- The transformations from one level to the next are all pure.
- (14:53) Concept 5: Sometimes you have to go down before you can go up again another way.
- We pre-abstracted a little bit, and only accepted lines that had all of the data we were looking for (time, log level, etc.).
- Exceptions broke that abstraction, so we reworked our "parsed line" map to make the missing keys optional.
- (15:54) Concept 6: Maps are flexible bags of dimensions. They are a set of attributes rather than a series of rigid slots that must be filled.
- Functions only need to look at the parts of the map that they need.
- Every time we amplify the data, we add a new set of dimensions.
- Thanks to namespacing, all of these dimensions coexist peacefully.
- Multiple levels of dimensions give you more to filter/map/reduce on.
- Just because you distill, doesn't mean you want to lose essence.
- (21:09) Concept 7: Operating within a level of information is a different concern than lifting up to a higher level of information.
- Within a level, functions aid in filtering and aggregating.
- Between levels, functions recognize patterns and groupings to produce higher levels of information.
- Make the purpose of the function clear in how you name it.
- Separate functions that "lift" the data from functions that operate at the same level of information.
- When exploring data, you don't know where it will lead, so start by moving the data up a level in small steps.
Related episodes:
Clojure in this episode: