What is an abstraction barrier?

What is an abstraction barrier?


What is an abstraction barrier? This is a concept from “Structure and Interpretation
of Computer Programs.” In this episode, we’re going to talk about
what it is, why to use it, and the limits of its usefulness. My name is Eric Normand, and I help people
thrive with functional programming. Like I said, this is a concept that I was
first introduced to in Structure and Interpretation of Computer Programs. I imagine it goes back further than that just
because it’s a natural concept to develop. What it is basically is instead of doing some
complex operations inline, you move them, you extract them out into a function, you
name that function. Now you have a barrier, where you don’t really
have to think about the internals of how this thing gets calculated, you have an operation
that’s got a nice, clear, meaningful name that you’re working on. If you do this with a data structure, so in
structure and interpretation of computer programs they’re using scheme. The data structure they use is just basically
these cons cells, just pairs. They’re like two poles, but always pairs. They’re building these intricate little data
structures out of them. For instance, in one example I can remember,
they build an associative data structure where you can put keys and values and replace keys
and values and look up keys by values. It’s all just cons cells. It’s all just pairs, and deeply nested stuff. They build the operations like add a new key
value pair. Find the value for this key. These are operations. They’re giving them nice names. When you look at the code it’s just like…I
have to explain this. In scheme, to get the first element from a
console, you use a function called car C-A-R, and to get the second element from a console
use a function called cdr, which is C-D-R. You would put these things like get the first
element and car of the cdr of the cdr of the car, the car, and that will give you the element
you’re looking for. As you’re looking at these operations, they’re
not very meaningful except this like first, second, second, first, first, first. It’s just hard to wrap your head around what
is that actually doing. It would be really nice if someone gave it
a very meaningful name, like, “Find value given a key.” [laughs] Or just like “get” something like
that. You do this to help your mind to encapsulate
to put a barrier on the meaning. You can say, I don’t have to think about this
anymore. There’s three operations I’m going to do on
these things. I have them well named. I don’t have to go digging around myself and
remember how to get a cdr, you know, what cdr, how many cdrs, I need to get the value
of something. This is the what and the why. It’s because sometimes you’ve get these deeply
nested things. Just for your mental capacity it’s hard when
you inline those to be able to reason about the code. What’s happening is you’re just doing that
basic naming operation. You’re naming this thing that you’re going
to use a lot. You’re taking what are meaningless operations
like car and cdr. They have a meaning but their level of meaning
is very low. You’re elevating that function into a new
level based on the name. The name is something much more meaningful
at a higher level of meaning. It’s not data hiding. It is different from data hiding in one very
important respect. It’s still all there. It’s still transparent. If you want to pierce your abstraction barrier,
go right ahead. You can still map over it. It’s still a list. It’s still cons cells. You can still call car on it. You are not forced to use those abstraction
barriers, those defined operations. Some people say it’s a fine line, because
you’re saying you should. There’s some arguments like if this data structure
escapes, no one’s going to know to use those things or they’re going to have to use them
anyway. You have to give them to them. All that is true. I don’t want to argue about that. What I just want to argue is that there is
this very important difference, which is that you don’t have to use those operations. It’s not like data hiding in an object-oriented
system, where you’ve got all this private stuff. Then the three operations that you want to
do on it are the public methods, and you don’t know how it’s implemented and you’re supposed
to not care. You’re supposed to not even be able to mess
with the data. You can mess with the data if you want to. This is important, because you want to be
able to move through those levels. The trouble is, cons cells are not a very
rich data structure. They basically suck. I’m just going to say like that. They’re really neat. They’re complete. You can build trees and lists and a whole
variety of data structures with them. They’re not self-describing. You don’t know what you have, you just have
this, if you printed them out they’re just be parentheses with stuff inside. They don’t have…They’re not human-readable. They are, but not very. You need to know what the different levels
of the of the parentheses mean. You need some out-of-band communication, it’s
not self-describing. A lot of problems that we have with these
data structures are solved by having self-describing names, like a hash map, where the keys are
strings. You can have a nice name. Before you had to build this associated data
structure, now you can have a data structure that says, “Hey, I am an associated data structure. I have the curly braces in JSON. My keys are strings. My values are also some value that you can
understand.” This is amazing. Now we don’t need to have these abstraction
barriers to do the same thing. A lot of problems are solved just by having
better data structures. You don’t need the abstraction. It removes a whole slew of reasons for needing
abstraction barriers especially around these intricate data structures. There’s a thing where if you have public-facing
data, you shouldn’t really use abstraction barriers. You should design that data to be easy to
consume, easy to produce, not necessarily using specific operations. You want someone to build a type in this literal
JSON and it be correct. You don’t want something where you need some
complex operations that they have to define and basically copy from your code base in
order to build the thing up. You don’t want that. A public-facing thing, you want very clear
names. When you design them, and you put it out into
the world, those names are a commitment. They’re a commitment on your part as an implementer
that you’re going to honor those names. If you send me this JSON to my NPI endpoint,
I’m going to read it. I’m telling you what is this key means and
what the value, how I’m going to interpret the value. That is a commitment that you’re making. The self-describing nature of it is really
important. You shouldn’t rely on an abstraction barrier. However, we also use besides using it for
public-facing data, like a public-facing schema or spec. We also use data structures internally in
our software. If you need some intermediate index of something,
you’ll use a hash map to index it. Sometimes when you add it, when you make the
index where you want to keep track of when you added the thing and when was the last
time you accessed it. You’ve got all these other bits of information
that you have to maintain. Sometimes, you want to maintain the order,
and it’s in a hash map that doesn’t have order. You want to keep them in sync. Now, you’re starting to talk about this intricate
data structure nested and other data structures. At some point, you’re back to the same problem
that you had with cons cells which is deeply nested. You’re forgetting how many levels you have
deep. You’re in-lining all these cdrs and cars,
except they’re not cdrs and cars. They’re like, “I’ll get this internal map. Inside of that, get this thing.” Then that’s going to give you a map. Then you need the value out of that map. It’s all deeply nested again. It’s easy to get wrong. When you add a thing to the index, there’s
like five things that you need to do, and yet they have to be right. You want to make it easy to get those things
right. It’s like doing five things. It’s probably five lines of code. You’re repeating that everywhere. You want to try this up. You want to take that duplication, put it
in a function, give it a good name. All the sudden, you’re doing abstraction barriers
again. It’s just the way it is. Is just happens when you have these complex
things. Like I said, this isn’t for something that’s
going to go external. External, you want to be nice, and clean,
and neat, and human-writable, human-readable. When you’re working internally, sometimes
you need an index that’s really tricky and complicated, or you need some data structure
that’s super weirdly nested. You want to start extracting out all those
operations again. I want to say another thing. A lot of times…I think in SICP, it says
there’s two, and I disagree with it. I disagree with SICP, that Structure and Interpretation
of Computer Programs. One of the reasons they give for using abstraction
barriers is so that you can change the data structure if you need to. This is just so overused. We write code today, more complicated than
it needs to be because maybe one day in the future, we might want to change it. I just think that that’s wrong. Why complicate your life today for something
that might or might not happen in a way that you can’t even predict? If you know how it’s going to change, if you’re
saying, “Look, I know I’m going to swap out my database in one year. I’m using this one database now because I
can’t afford the one I really want. When my company does better, I’ll have an
income. I’ll be able to pay for that database I do
want. I want to be able to swap it out easily.” Sure. If you’ve got some plan for changing it and
you don’t want to have to change all the code again, sure, fine. If that’s part of your plan, you need to be
able to change it, yes. Put some kind of indirection in there. If you’re just doing it like a just in case,
like maybe we’ll need it, no, do not do that. People say it, but I think you should not
put abstraction barriers just because you might want to change it. You should put abstraction barriers to make
it clear what’s going on, especially when you got these intricate, tricky things. It’s hard to get right. You shouldn’t use it for public-facing data. That should be well-designed, clean, simple,
something that another person could write code to generate and not rely on your perfect
implementation of all these operations. You should use it for these intricate data
structures that never leave. These indexes aren’t meant to be printed out
and send over a wire. They’re meant to be stored in memory for some
algorithm or something you’re doing on it where you need constant time access. All right, so abstraction barriers, I’m going
to recap real fast. Abstraction barriers are simply taking operations
that you’re doing on some data structure or some piece of data. You’re repeating it. It doesn’t have enough meaning, so you extract
it out and give it a name. If you can count all the operations you’re
doing on this data structure…Let’s say there’s three, there’s four of them. You extract all of them out, give them good
names. Now, you no longer have to go down into the
data structure and manipulate things at the low level. You can operate it at a higher level. That sounds like a good thing to me. It differs from data hiding in that you can
always pierce the barrier. You can look at it, and it’s just raw data. It’s not some encapsulated class or object
that has some bespoke methods on it that you can’t see how it’s implemented inside. You can pierce it. Hash maps and other very much more descriptive
data structures that we have in the modern languages, these are because they’re literals. They have descriptive names. They have more well-understood properties,
like an array has certain order to it. You don’t need to use a cons cell, which has
almost no meaning behind it. You’ve got higher-level stuff, self-describing. You have literal versions of it. You don’t have to even think about constructors
so much anymore. It’s much nicer, but we still build up these
intricate, highly nested things for internal use. I believe that abstraction barriers, as I’ve
defined them here, are useful for that, that you want to be able to be operating at a higher
level even though it’s this really intricate turning of machines and stuff. It’s just normal. Let me say it a different way. Hash maps, descriptive names and stuff remove
a huge need for the abstraction barriers. We reinvent the problem, because we have all
these highly-nested intricate data structures. Again, they are now made of hash maps, and
vectors, and sets, and whatever else we have. They’re still there. They’re just not cons cells anymore. They’re just some other thing. I’ve seen so many messes in languages like
Clojure, that use these data structures a lot when they start getting nested, people
forget what they have. They start coupling code together because
they’re coupling the “where a value lives” deeply nested in this map. Because they’re using some path into the nested
data structure with the operation — what they want to do. The “where” and the “what they want to do”
get coupled together. Having a little bit of barriers when you have
a mess, it’s like having little bins to put all your stuff in instead of having in one
big bin. It’s just a way to organize it in a way to
keep a little bit of sanity when things start to get into a mess. All right. This has been all about abstraction barriers. This might be a tad controversial. Abstraction barrier shouldn’t be used all
the time. I’m not saying that. I still think that they’re really useful,
especially when you’ve got nested data structures and you’re starting to get into a mess. If you liked this episode, please subscribe. Go to lispcast.com/podcast. There you’re going to find all the past episodes. Listen to the one where I talk about building
your interface first. Listen to the one where I say just use data. I really think these are subtle issues and
it’s not as simple as like use this, don’t use that. You got to allow for some subtlety in there. You’ll find all the past episodes with audio,
video and text transcripts. You listen to it however you want, watch it,
or you can even read it if that’s how you like to do it. You can also subscribe. There you’ll find links to subscribe in the
various platforms, and also links to find me on social media. That’s email, Twitter, LinkedIn. Get in touch with me. If you disagree with me, I would love to have
a discussion about this because I think it is a bit controversial. I’d love to hear more arguments for and against. If you’ve got one of those, or you’ve got
a question because I didn’t go over something clearly enough, come on, just hit me up, and
we’ll talk. Awesome. My name is Eric Normand. This has been my thought on functional programming. Thank you for listening and rock on.

Leave a Reply

Your email address will not be published. Required fields are marked *