Christoph finds map doesn't let him be lazy enough.
- Last week, we were dealing with multi-line sprinkle errors.
- We were able to get more context using
partition
.
- (01:33) Problem: the component lines had to be adjacent.
- Solution last week was to create larger partitions to hopefully get the rest of the error.
- This became a magic number problem, guessing how far we had to look ahead.
- "If there's anything I've learned in my career, telling the future is one of the hardest things to do.'
- What number should be big enough? 100? 1000?
- (04:00) The other problem is that the function is handed a pre-selected set of lines.
- The decision about how many lines is appropriate is made outside the function.
- Wouldn't it be nice if the function had control over how far to look ahead.
- "The function can't function."
- "Functions are all we've got in functional programming. Well, that and lists."
- It would be great if the function itself could take a sequence and look as far as it needs to.
- How about handing the function the entire lazy sequence?
- (05:52) Problem: Handing in the entire sequence means we can't use
map
to convert lines into sprinkle errors anymore.
- We can write a function that gives us just one sprinkle error from the sequence, but we want to convert the sequence into a sequence of all the sprinkle errors.
- We're going from something that operates on a subset of the sequence to something that operates on the entire sequence, which is too much control.
- We need a way for it to look ahead but still
- It's no longer just working on a chunk of the sequence, but on the unbounded sequence itself.
- We need to elevate it to the same power as other sequence operators, like
map
and filter
.
- We don't, however, want the function to eagerly find all sprinkle errors in the sequence. It needs to be lazy.
- (09:26) Solution 1: How can we just get one sprinkle error out?
- If the first line isn't the error start,
recur
with the tail until found.
- Do a
take-while
to find the second half of the error.
- When both found, return the value.
- We need to terminate the search if we hit the end of the sequence, so we only continue if
(seq lines)
is not nil.
- "There's no sense in looking in an empty bucket."
- But we don't want just one, we want the entire sequence.
- It would be really nice to return the value when we find it and then wait to find the next one until it is requested.
- Conceptually, we could tell the calling function an index of where to start looking for the next error.
- (13:44) Solution: In Clojure, we keep our place using the
lazy-seq
function.
lazy-seq
is a sequence, but it hasn't been realized yet.
- It's like being able to hand back a value and a function to call for the next value.
- When you find a value, you can
cons
it onto the head of an invocation of lazy-seq
to make a new sequence.
- Step 1. Wrap your entire function body in
lazy-seq
.
- This is similar to using
delay
, because it wraps the code in something that will only be evaluated when it is first accessed.
- Step 2. Ensure that the body obeys the contract. It must return either:
nil
, which indicates that the sequence is complete.
- a sequence, usually constructed by calling
cons
on a value and a call to lazy-seq
.
- Top of the body is a call to
(when (seq lines) ...
, to ensure that the sequence terminates when there is no data left.
- Since the top of our function is
lazy-seq
, we can cons
the found value onto a recursive call to the function.
- In the recursive call, we must pass the next section of the sequence, so that when evaluated it will pick up at the right place.
- If we don't find the start of the error, we recurse with the
rest
of the sequence to try parsing from there.
- This function will go through the sequence eagerly until it finds something.
- Instead of operating on single elements in the sequence, we can take a sequence and produce a sequence, powered by
lazy-seq
.
- With this capability, you can build a higher level sequence that consumes this sequence and produces a new summary, all done lazily.
Related episodes:
Clojure in this episode:
partition
seq
, cons
, rest
lazy-seq
, delay
map
, filter
, take-while
recur
Code sample from this episode:
(ns devops.week-04
(:require
[devops.week-01 :refer [parse-line]]
[devops.week-02 :refer [process-log]]
[devops.week-03 :refer [sprinkle-errors-by-type]]
))
(defn sprinkle-error-seq
[lines]
(lazy-seq
(when (seq lines)
(let [[first-line second-line & tail] lines
[_whole donut-id] (some->> first-line :log/message (re-matches #"failed to add sprinkle to donut (\d+)"))
[_whole error] (some->> second-line :log/message (re-matches #"sprinkle fail reason: (.*)"))]
(if (and donut-id error)
(cons (merge first-line
{:kind :sprinkle
:sprinkle/donut-id donut-id
:sprinkle/error error})
(sprinkle-error-seq tail))
(sprinkle-error-seq (next lines)))))))
(comment
(process-log "sample.log" #(->> % (map parse-line) sprinkle-error-seq doall))
(process-log "sample.log" #(->> % (map parse-line) sprinkle-error-seq sprinkle-errors-by-type))
)