Christoph and Nate discuss the flavor of pure data.
- "The reduction of the good stuff."
- "We
filter
the points and reduce
the good ones."
- Concept 1: To use the power of Clojure core, you give it functions as the
"vocabulary" to describe your data.
- "predicate" function: produce truth values about your data
- "view" or "extractor" function: returns a subset or calculated value from your data
- "mapper" function: transforms your data into different data
- "reduction" (or "reducer") function: combines your data together
- Concept 2: Don't ignore the linguistic aspect of how you name your functions.
- Reading the code can describe what it is doing.
- Good naming is for humans. Clojure doesn't care.
- Concept 3: Transform the data source into a big "bag" data that is true
to structure and information of the source.
- Source data describe the source information well and is not concerned with
the processing aspects.
- Transform into data that is useful for processing.
- Concept 4: Using
loop
+ recur
for data transform is a code smell.
- Not composable: encourages shoving everything together in one place.
- "End up with a ball of mud instead of a bag of data you can sift through."
- "You know what mud sticks to really well? More mud! It's very cohesive! And
what couldn't be better than cohesive programs!"
- Concept 5: Use
loop
+ recur
for recursion or blocking operations (like core.async
)
- Data shows up asynchronously
- Useful when logic is more naturally expressed as recursion than
filter
+ map
+ reduce
.
- Concept 6: Duality: stepwise vs aggregate
- Stepwise problem: advance a game state, apply async event, stream processing, etc.
- Stepwise:
reduce
, loop
+ recur
- Aggregate problem: selecting the right data and combining it together.
- Aggregate:
filter
+ map
+ reduce
- Aggregate problems tend to be eager--they want to process the whole data set.
- Concept 7: Use your bag of granular data to work toward a bag of higher-level data.
- We went from lines → entries → days → weeks
- "Each level of data allows you to answer different questions."
- Concept 8: Duality: higher-level data vs granular data with lots of dimensions
- Eg. having a single "day" record vs a bunch of "entry" records that all
share the same "date" field.
- The "right" choice depends on your usage pattern.
- Dimensional data tends to stay flat, but high-level data tends toward nesting.
- A high-level record is a pre-calculated answer you can use over and over quickly.
- Highly-dimensional, granular record allows you to "ask" questions spanning
arbitrary dimensions. Eg. "What weeknights in January did I work past
midnight?"
- Concept 9: Keep it pure. Avoid side effects as much as possible.
- Pure functions are the bedrock of functional programming.
- REPL and unit test friendly.
- "You can use data without hidden attachments. You remember side effects
when you're writing them, but you don't remember them three months later."
- Concept 10: Keep I/O at the "edges" with pure functions in the "middle".
- "I/O should be performed by functions that you didn't write."
- Use pure functions to format your data so you only have to hand it off to
the I/O function. Eg. Create a list of "line" strings to emit with
(run! println lines)
.
- You can describe your I/O operations in data and make a "boring" function
that just follows them. This allows you to unit test the complicated logic
that determines the operations.
- Separates out I/O specific problems from business logic problem: eg.
retries, I/O exceptions, etc.
Related episodes:
Clojure in this episode:
filter
, map
, reduce
loop
, recur
group-by
run!
println