Sveriges mest populära poddar

Functional Design in Clojure

Ep 022: Evidence of Attempted Posting

34 min • 29 mars 2019

Christoph questions his attempts to post to Twitter.

  • This week, continuing to dig into the "Twitter problem". We want to post to Twitter on a schedule.
  • "Writing code to help out with laziness."
  • Start with data to keep track of: inside (our data) and outside (Twitter data)
  • "Data from a foreign land."
  • We need to determine our "working view" of Twitter's data.
  • What is in our data? For each "scheduled tweet":
    • Text to post: the "status"
    • Timestamp of when to post
  • Timestamps are nice
    • Milliseconds since the epoch
    • Universal instant
    • Allows the client to localize
  • How do we know a scheduled tweet has been posted? A "posted?" boolean?
  • Boolean says, "Yes! It has been posted somewhere on the Internet."
  • Correlating identifiers are more useful than a Boolean.
  • The tweet ID is a correlating identifier. We can use it to lookup all of Twitter's data about it.
  • "We don't need to store all of Twitter in our database."
  • What is the story you need to tell about what happened?
    • A record of all the attempts allows us to tell a story about what happened.
    • Useful to have the timestamp of when our application posted it.
  • Make a separate log for attempts.
    • Attempting to post is a separate concern than what to post.
    • Don't complicate the scheduled tweet information by embedding the log.
  • "Once you have all the data, it allows you to ask new questions you didn't originally think of."
  • Clojure makes it easy to work with a large tree of data that came from an external source. We don't have to care about the structure of that data. We can just write it down.
  • Simply attempt to post the next scheduled tweet that does not have a Twitter ID recorded.
  • If it fails, just record the attempt, and go back to sleep.
  • "Handle the brick in front of you, and if you keep doing that, you'll eventually build the wall."
  • What if we don't hear the success response from Twitter, but it did get posted?
  • Idea: Try to detect if a tweet has already been posted.
  • If we can uniquely identify something by its content, we can know two things are the same without having a common ID.
  • Problem: Twitter can alter the contents.
  • Idea: fuzzy "measure of similarity" between our recent tweets and the next scheduled tweet.
  • We can record the fuzzy match in our attempt log too!
  • If we can correlate by contents, we could even identify when we manually post in advance.
  • As soon as you can determine equality by the substance of the thing itself, you can have more than one writer.
  • How "recent" is "recent"? Is it 100? Is it 200? Is it 500?
  • Even better, fetch all the tweets since the last ID we recorded.
    • we know we're seeing all of the tweets
    • can scan each of those for a match (in the case of a manual post)
    • know when the tweet stream ends, so we can know a posting is still needed
  • The worker will get there eventually. Can just give up on an error. No complex retry and recovery logic.
  • With more than one writer, we still can have a race condition. Ultimately Twitter has to deal with deduplication to avoid a double post in a short interval.

Message Queue discussion:

  • Namespacing in a map is really useful
  • A flat, namespaced map is easier to traverse than a nested map.
  • One use for namespaces: indicate the origin of the data
  • Eg. :twitter/id, :twitter/status vs :local/id, :local/text
  • You see the namespace in your code, so it makes the data origin very visible.

Related episodes:

Related projects:

Clojure in this episode:

  • pr-str
Förekommer på
00:00 -00:00