Git Demystified

Time

Thursday, 9:15 am CDT - Thursday, 10:15 am CDT

Location

Room 314B

Description

Nearly all Drupal developers use git to track changes to their code, but how does git actually work? In this session, we will take a peek under the hood at what git does when you run various commands. Starting with git init and working through git add, we will explore how versions of code are tracked throughout your project. We will also review merging and rebasing branches, as well as the conflicts that arise when we do so.

To get the most out of this session, attendees should have prior git use.

Slides and code can be found at https://ramir.dev/mc23

Presenter Slides

Git demystified.pptx

Speakers

Matthew Ramir

Senior Web Developer @ Bounteous

Matthew is a product of Chicago’s Andersonville neighborhood transplanted to the Mountains of Denver. He cultivated his passion for website development while apprenticing with a former computer science teacher in high school. He is a jack-of-all-trades, employing his skills in back-end development, server management, database management, and front-end development to expertly tackle a wide-range of client projects. Matthew stayed local to earn a degree in computer science from the University of Illinois in Chicago and has been working with Drupal since his graduation in 2013. He is also an Acquia Certified Grand Master.

Track

Back-End

DevOps

Feedback Form

Transcript

LECTURER:
Alright, y'all, let's get at it. Welcome to Git Demystified. I am Matthew. Happy to be here. Super excited about this talk. I'm gonna get real nerdy, so I hope you all can keep up. A little bit about me. I am currently living in Denver, originally from Chicago. We were there back in March 2020 when COVID hit. Good timing. I stayed in Chicago for a degree in computer science at UIC. (INAUDIBLE) my whole life up until a couple of years ago. Started playing with Drupal back in 2012 when I was in college. My first Drupal website was for the LGBTQ group in college back on Drupal 7. That was painful. I'm currently a lead developer. I love DevOps. I have a Linux tattoo. I play with Linux all day, love being in the command line. And I have a disability called Trundle Cerebral Palsy. Obviously it makes it hard to talk sometimes when I get excited, so if you have any trouble understanding me, let me know and I will calm down, hopefully (LAUGHTER). I work at Bounteous is based on Chicago. We're a digital consultancy.

We work directly with our clients coming to meeting. We do everything from technology, Drupal, (UNKNOWN) to design, marketing, UI, UX, all that fun stuff. Yeah. I did put some links to the slides. There's gonna be a lot of code in this talk. So if you wanna follow along with the code, you can hit that link. It's also in the session description. There's a shell file that you can look at on GitHub if you wanna come back and look at this stuff. So what we're gonna talk about today, I'll give us a little bit of an intro to Git, what it's for, how to get it initialized. And then I'm gonna talk about Git objects, which are the brain of Git, how it tracks data, how it remembers your information. And then I'll get into some Git commands and talk about how those commands are altering your working tree and keeping track of your history. So that being said, this is a really technical talk. I'm gonna get into the (UNKNOWN). Talk about what's happening under the hood in the world of Git. And previous use of Git is helpful, it's not required though.

Hopefully if you know, if you're not super familiar with Git, there will be a couple of spots for questions. Happy to fill people in if you get lost. And then as far as these slides, there's going to be a lot of code on the slide. So I tried to make it a little clearer about what's happening. So the commands that I'm running are going to be uncommented, the output from those comments, the output from those commands will be commented for clarity. So the green text is just the output from whatever we're running. And again, all of this is in the shell file. If you wanna take a look at it now or later. So that being said, let's get in the groove. I promise you, this is the only pun in this talk (LAUGHTER). Wink, wink. So Git is an open source version control system used to track code over the history of your project. Track changes in files over time. It was created in 2005 by Linus Torvalds for the development of the Linux kernel. It's really not that old. If that name sounds familiar, it's the guy who created Linux and he created it for the development of the Linux kernel, to keep track of those changes over time.

There are other version control systems out there like SVN that people can use that are a little bit older. It's used heavily by the Drupal project to track changes to Drupal. So, you know, if you've ever looked at the current code, open the pull request for anything in Drupal you're working and GitLab currently that's obviously driven by Git. So used heavily in Drupal and it's the most widely used version control system currently. I think I saw something like 70% of projects are currently using Git to track their development. So little a little basics about how Git works. So I grabbed this image from, Git has really good documentation online. This is one of their diagrams. So when you're working in Git. Wrong button. So you have your working directory, you're making changes to code. You know all that is great. When you're ready to finalize your changes, you're gonna add them into a staging area. So kind of a middle area between your working directory and then the history of your project.

So once you're ready to finalize your changes, you will commit them to your repository, at which point all the changes you made are put into your .git directory and stored there for everyone to use and makes them permanent in your history. So if you have little old Drupal 7 days or if you've ever done a pull request review, this might look a little familiar. This is a diff, it shows you what line was added in green, what line was taken away and communicates to us how a file has changed over the course of our repository. And really, what I want you to take away from this talk is that Git stores snapshots. It does not store diffs. So that's not how Git thinks it's not Git is doing. It's just how it communicates to us what has changed over time. So when we created a repository we used the git init command and we can take a quick peek. So you know on the on the right here on line nine, I'm creating a new repository. I list out the repository, we see a bunch of files here. The first one, that config file is our information about our remote and our branches.

So whenever you do like a git push or git pull, you're sending and receiving code based on what's in that config file. We're not gonna talk about that too much. Likewise, the HEAD file here is represents your currently checked out commit. So whatever commit is currently active in your working directory is referenced in that HEAD file. It's how Git knows what the changes are in comparison to your current environment or your current working directory. And then .git/index. The index file, it's a single file, it's our staging area. Whenever we do git add or we're adding stuff to our staging area, it's actually putting it in that index file and updating it. We'll show how that works. And then objects, we're gonna spend a lot of time talking through kind of like the brain of git. It's how it's actually remembering what files have changed and what the state of your data is at any given moment. So anytime you make a change to your file and commit it, it's gonna end up in that object directory.

And then refs, our pointers to git objects. Well, we'll see what those are for. So all objects we're gonna focus in on objects. Talk about objects for a little bit. All objects are identified by an Object ID, also known as a Hash. Git gives us this really cool command where we can actually calculate the Hash of a given string. So line 26 here, we're giving it the string of 'Hello world' pass it to the hash object, which is a Git command that will take that data, give you the hash for it. The -w flag means create this as an object in your database. So if you do that with our Hello World string, we have this commit or this object 557db0. That's something I'll be referencing for these first slides. We can actually see that object as a file in our file system. So we saw that .git/objects/55/the rest of that hash. And we can also get the contents of an object using the cat -file command. So the -p says pretty print out this object, give it that object ID and it gives us that hello world string based off of that hash object function that we just ran.

These hashes are sha1 newer versions of git depending on your configuration, might use sha56. A sha1 hash is 160 bits long index that equals 40 characters. So you will see the length of that hash that we're given is actually 40 characters. And we can reference those objects by a partial hash. So we give you the git cat - file of our whole object. We give you reference to our object by 557dv0. We could also reference it by just 557d, as long as you have a unique string in the front of your reference, it'll find it. So as you have more objects in your database, you'll have to add more characters to that hash to identify it as your reference. As you're referencing it a lot of the time on my GitLab and GitHub, you'll see that six character hash. So that's what I'll be talking through or referencing when I'm talking about this Hello world hash or object. So objects really are the brain of Git. I did, just listing out our objects directory here after if you commit. Oh see, we have like seven different objects.

They're all different types of objects. So, all hashes are stored as a file. You'll notice that there's not any file extension on these. So if you're adding the README.txt file to your project, it's not gonna show that as .txt in the object directory. It's just gonna store it with whatever the hash of that object is. And there are different types of objects. We're gonna talk through each of them and how they reference each other. We'll see there's blob objects which are a file. There's tree objects which are directory and commit objects which are a commit. Let's see how these come together. So starting with blob objects, a blob object is the contents of a file. It is only the contents of a file and it is a snapshot in time. It is not a diff between one commit or another. It's gonna be the entire contents of a file at a specific point in time. You're also not gonna see any metadata in a blob object. You're not gonna see file permissions. You're not gonna see file name. You're just gonna see the content of whatever is in that file at a given point in time.

So we have a single file, whole file. Nothing about the file. So help me run plus some additional stuff that helps us identify what that object is so that 5570dv0 hash is not the hash of Hello World. It's the hash of Hello World plus some additional data that tells you that it is an object. We're not gonna worry about that too much. And then blob objects get created when you run the git add command. So let's see what that looks like. git add, we have our staging area. We have our working directory. We're making changes in our working directory, ready to turn them into a commit and then add them into our project. So I'm gonna take that Hello world string. Put it into README.txt, run a git add on that file. And we're gonna get status to see the current status of whether or not things are in our staging area ready to be committed. So we'll see on line 52 we have said we have a new file that we're adding. And tell us what our next commit is gonna look like. So if we take that Hello world string that we know the hash for, put it into README.txt file, git gives us this really cool command called ls - files --stage.

This prints out the contents of that staging area which is stored in that index file. So we'll see now we have file permissions and metadata. So the first part we have our. Yeah. So we have our permission bits. We have the object ID that that file is pointing to. So it's our Hello World string. We have a flag after that hash that tells us the conflict status of a file. So we'll see how that how that shows up in the future. And then we have the file name. And again this blob is a file on this is still made of Git objects 557dv0, which has that Hello world string. So when we're ready. Oh, staging you can have multiple files in it. So if we make a directory called docroot, take that same Hello world string, put it into index.html, run that same, git ls - files. We will see that we now have two files in our staging area reading that txt index.html. They're both referencing the same blob because they have the same data in them. So you can kind of start to see here how git does some magic to save space and save calculations as it's determining the state of your code.

So if we take that staging area, we can turn it into a commit, add it to our git history via the git command. And we'll see some output that tells us what git just created. So we have some, we have a list of changes. We have two different files here. One is README.txt, one is index.html. Well, you also get this fancy objects. So when you run a commit, it's also creating an object in your objects directory. And that commit object is what identifies the history of your project. So again, we can take the hash for that commit object. So ecdb I made that really hard to say for me. This is awesome. We can run that same cat - file command, see what the contents of that object are. We get something like this, we have our committed information. We have that commit message that we just added. It tells us what this set of changes represent. And then we also get this really cool tree object, which is another object with a different hash. And if you take a peek at that tree or that object, again with a cat - file command, we'll get something like this.

And this is where we start to see metadata about the history of our project. So we can see that this tree object is pointing to our 557dv0 blob via README.txt. But then we also have another tree object pointing to docroute. And again, we can go a level deeper. Take a look at that object. And when you see our index.html file pointing to 557dv0. And that ends up looking something like this. So at the top there, we have our commit, that commit object is pointing to a tree. That tree object is pointing to a blob as well as another tree. And that other tree references a blob object. So this is how Git represents your data in your project. Whenever you're looking at a commit and it uses that to determine what the exact contents of your code at that point in time. So I'm gonna pause in a second, but I wanted to add a little bit of complexity before we do that with a second commit. So if we take a different string Git rules, put it in README.txt, you could run that same git add ls - files --stage.

We see something a little bit different now. So the hash for reading .txt has now changed. It's pointing to a different blob object. We still have our docroot/index that each html file in our staging area even though it hasn't changed. So whenever you do ls - files --stage, it gives you the state of every file in your code and the blob that it points to. So if we make a second commit with that Git rule string, we'll get another commit object. If we take a peek at that commit object, we have our same commit information as well as a different tree that we're pointing to. It's a different tree because it's pointing to different files with different content. So you can see on the left here we have an additional tree. That tree is still pointing to your old Hello world blob via the docroot trees. So it's still pointing to that old tree, but it's also pointing to a new blob via the README.txt file. So this is how Git is representing data under the hood. It's how it determines the state of your code at any given time.

It's not remembering what it's changed, it's just remembering the snapshot of a file at a specific point in time. So I'm gonna give you all a second to catch up. I know this is a lot I just threw at you. Anyone like, have questions about this first part? I'm gonna hop into merging workstreams together, git rebase, all that fun stuff. But this is kind of the setup for how we're thinking about what Git is doing. Yes.

STUDENT:
Drink first, I might have misunderstood when you say that when you ls files in stage. It shows you every, the current state of every file in the project.

LECTURER:
Yeah. So Git ls - files --stage shows you every file in your project regardless of whether or not it has changed in your current commit or added it to your staging area.

STUDENT:
So there's a flag on one of those if it is actually staged.

LECTURER:
Yeah. Well it just looks at that blob so it doesn't know whether or not a file is different. It's just pointing to that blob object. And it'll look when you do a commit and compare it to the previous tree and tell you what has changed.

STUDENT:
Gotcha. Cool.

LECTURER:
Any other questions? This is fun (LAUGHTER). So let's look at branches. Who wants to guess what a branch is? Yeah.

STUDENT:
A tree.

LECTURER:
No. So great. Sorry.

STUDENT:
A deviation.

LECTURER:
Not a deviation, a branch. Also a little bit of a scenario here. So leaving our old commits behind I'm gonna simplify things a little bit give us a new tree. So we have our root commit at our main branch on the left. The squares are our branches, circles are commits and then we have two separate branches here. branchA that says Git sucks and branchB that says Git rules. We can take that branch object and see what it is. And branch objects are stored in .git/refs. They are not objects, they are just pointers to objects. So you don't need to do like git cat file. You can just cat out, print out the contents of that file. You'll get whatever hash that that branch is pointing to. So branchA is pointing to commit c, branchB is pointing to commit e. So Git doesn't care what branches are. And likewise, we can create new branches. We can also create tags which are both references to commit objects. The difference between a branch and a tag is that a branch can be changed, a tag cannot be changed.

So branches are mutable. Tags are immutable, but they're both stored in the same way. So a tag pointing to commit d, we can tell it to create a tag or branch at a specific point by doing checkout -b or tag and giving it that commit hash Git will go look up that commit hash, make a new reference object and pointer to that hash in that file. We got it. So I'm gonna talk about merging and combining these two workstreams. So we have two different workstreams and a few different ways to combine them. So we're gonna work on combining branchA and branchB. We'll talk about merging, which is on the left. We'll talk about rebasing, which is in the middle. Everyone has fun with that and then we'll talk about how cherry picking works as well. So let's start with a merge. If we check out branchB, we're on branchB, we wanna merge branchA. You can run git merge branchA. And since those two branches both have changes in the same file, we get a really fun merge conflict, that everyone loves (LAUGHTER).

Right? Maybe. So what's actually happening when we run git merge? So Git does what's called a three way merge between these two branches. It starts by finding their common ancestor. So branchB we have two commits after commit a branchA, we have two commits after commit a. It's gonna go back through our history, go through each one of those commit objects, find the parent until it finds two parents that match. So it grabs that and then it runs a diff. So diff is what we saw in that patch file. It'll compare the content of commit e to commit a and give us that as a diff. And it'll do the same for commit a and commit c and then we get two separate diffs like this. So you'll get diff allows us to run that same process and see kind of looks like a patch file. If we do git diff a to branchB we'll see that we're adding the string Git rules. If we do git diff a to branchA, we'll see that we're adding Git stinks. And there's a little bit of information about here about the file that changed. It's gonna look and see that the bytes in these two files that changed the lines in these two files changed which is why there's a merge conflict.

So if there's not a conflict, it'll pull in the changes from both sides. If there is a conflict, it'll yell at us and tell us that we need to fix it. So when we're in this state, if we actually take a look at the README.txt file, we'll see something that looks like the following might look a little familiar to you. And this is telling us that in our current commit, we have Git rules and the branch that we're trying to merge, we have Git sucks and it's up to us to come in here and actually update this file and put the correct contents in it. So you can either do that, an IDE, some IDEs have that built in where you can resolve conflicts. You can also just go and update the file directly and change it to whatever you want it to be that is correct. It's also possible to commit this without updating anything, in which case you'll break all your stuff. We won't talk about that. But I think it's really interesting though. If we take that same git ls - files --stage, we'll see we now have the reading of that txt file listed three times and we have that merge conflict indicator set.

So the first line is our object for our a Hello world string 557db0. We'll have branchB so the representation there which is Git rules and then branchA which is Git sucks. And these are all running that same hash object to get the contents of that file. So this is how git knows that you're currently in a merge conflict. It looks at this index and it says I have these files flagged as conflict and it'll also update that file to show what the conflict is. So if we were to just put whatever content we want, resolve those conflicts, I'm gonna put Git rules, run that same git add command and ls - files --stage. We'll see that we no longer have those three entries because that conflict has been marked as resolved. So we're back to our normal state. We just have a blob representing whatever's in the contents of that file ready to be committed to our repository. So when we resolve that conflict and run git commit, we're given another commit hash. So in this case, we're at commit f. Which represents our merge between branchA and branchB.

Again, we can cut out that object. We'll see that we have a tree object that it's referencing that tells us the state of the files, but then we'll see that we actually have two parents instead of one parent. So this tells us that this commit came from these two places. And there are actually ways to merge more than one branch at a time. There's different algorithms merging. If you really wanted to get crazy, you're gonna have however many parents you want in there and how many branches merged together at once. So that's merge. Let's talk about rebase really quick. So rebase is a really fun one. The difference between merge and rebase is, in rebase you're not making a merge commit, you're copying history from one branch to another. And the way it does this is again will find that common ancestor and then it checks out the branch that we are rebasing on to. So branchB moved from commit e up to commit c. So at this point we're just starting our rebase. Both our branches are pointing to the same place.

And then it runs through that diff process again. So it's gonna compare instead of comparing commit e to commit a, it's gonna compare a to commit d. So it gets that diff and sort of across the whole chain, it's just gonna do one commit at a time. So it gets that diff and then it adds it on to the end of branchA and updates our branchB pointer to have that and it'll use the same commit information. So you'll see the same author, you'll see the same commit time. And it'll be pointing to a different tree and then it'll repeat that process for the next two commits. So it's gonna take commit d, compare it to commit e, we have that git rule string when we apply it to branchA again current branchB, we know it'll move our pointer forward. We're also gonna get a conflict here and it works in the same way. So we rebase and we have a conflict between our two changes, it'll yell at us. We have to go and update that and make sure,t resolve the conflict. Run, git add, rebase, continue and it'll continue on with the process.

So differentiating merge and rebase, is merge combines everything to one commit rebase takes those changes and puts them on top of another branch. But what's really cool here in this diagram flashing git is that those old commits are still there. They're still in the object directory. They don't go away even though our branch isn't referencing them anymore. So if we want to, we can move branchB back to where it was via the git reset command. So we took it from e' and moved it back to e and now branch is pointing to commit e again. Likewise our rebase commits are also still in the history. So if we wanted to we can move it back and forth. Git doesn't care where that branch is it just knows that that branch is pointing to a specific commit so we can move it around, put it wherever we want. And if we do something that we don't like we can always go back in history and find the old objects that it used to reference, update that pointer to the old objects. And you can do git reset hard, git reset soft.

The difference between hard and soft is hard will get rid of your changes. Soft will just update the pointer, but it won't reset your changes so you'll still have that same working directory. You'll still have that same tree, you'll have the same blob objects. It's just pointing to a different place. I'm not gonna jump into that too much. But from here, let's take a look at cherry picking. And of course, I'm like flying through this. But so if we wanted to cherry pick that commit, the difference between a merge is combining two trees a rebase takes a tree and puts it on top of another one. A cherry pick takes a single commit and puts it somewhere else. So if we were to cherry pick the e commit from branchB to branchA, again, it's gonna take that diff between e and its current. It's not gonna go back to P, it's not going to compare anything beyond that. It'll just compare commit e to commit d, get a diff between those two things and then put it at the end of your branch. And in this case I'm cherry picking a commit on to branchA, so I'm not doing it on to commit b but it will.

You'll run into those same conflicts because the strings are different. The same file changed and then. Yeah. You will have another commit with a new tree, with the new contents of that file point to, but on top of branches. So that's how cherry pick works. And of course, what's really interesting is that, since Git sees these two commits as there are two different objects, they have different parents, they'll have a different hash. You could actually cherry pick a commit twice. So we cherry pick e or yeah, cherry e we'll have it on branchA. We could actually cherry pick that again and then we'll have it in our branchA twice. And again we'll have to resolve conflicts. It might yell at you and say nothing has actually changed we do this because it's actually in the same state of code. But know point being that it doesn't really care what your objects are. Doesn't care where your branches are pointing. All it cares about is the state of your code in that time and how to get from one commit to another.

So in summary, that's fine. So Git is a file-based version control system. There's other version control systems. It uses objects and then references those objects to determine the the history of code. Branches are refs, they're just pointing to objects. It don't care what's actually on that branch. They're just references. Git stores an entire snapshot of a file. So when you go from one commit to another, it's not saying, oh, this line was added here. It's saying this is the exact state of my code at this point in time in the commit. Commit objects, it was two types of objects. Commit objects pointing to tree objects, tree objects pointing to blob objects. And that's how it's determining the state of things. Branches or pointers. Git doesn't care what they're pointers to. So it's a really interesting technology and I always thought it was like remembering the diff between various commits, but it's not actually doing that. It's actually remembering the exact state of your code at any given point in time.

And that's what I got. There's my other pun. (UNKNOWN). Yeah that was, that was a lot. I can go back if people have questions. I think there are certain things that you all want me to touch on, talk through. What's up, Bob?

STUDENT:
So you've got multiple people working on a team. You're committing everything and you end up with conflicts between different branches.

LECTURER:
Yep.

STUDENT:
What would cause it if you have duplicate events going up in your changes?

LECTURER:
Yeah. So if we go back to that. Yeah. So if we go back a couple of slides here, how do I do things? So in this example, not this example, this example not that one, that one. So this is an example of having a duplicate commit. So we cherry picked e once we'll have that on branchA if we cherry pick it again, it'll be there duplicated. Git's not gonna say that like they're the same thing. It's just looking at them. It's two different objects in your history based off of like what you told it to commit whatever the parents are. And it's just getting to know the state of files at that point in time.

STUDENT:
So if you get those duplicates and you haven't been using cherry pick. Yeah, it can come from... Maybe 'cause you've got two parents out there.

LECTURER:
It can come from like if you merge a branch or I guess it depends. So if you merge a branch twice it won't make that duplicate commit, it'll just have like pointing it will have a merged commit. But if you're rebasing or something you could end up with those duplicates. I think you can you can't rebase the branch by yourself. But if you were to revisit a branch twice and then cherry pick or do something weird of merging, you can definitely end up with duplicate commits. Git doesn't know they're duplicates, so it just thinks there are different commits. They might be pointing to the same tree object, but they're gonna have different parent objects. So Git's gonna see them as two different things. And you could actually, you can rewrite that history. So like we did with the git reset command.. Where did that go. So no, Yeah, so if your branch is pointing to commit e' e'' and it's if we have a duplicate commit there, we can say hey Git take this branch pointer, point it to a earlier point in history and you'll have just that single commit at whatever time.

Your duplicate commit is still gonna be in your object database. It won't push it up to your repo if there's not a branch pointing to it, but you're still gonna have that old object, Git doesn't care where your branch is pointing, it's just pointing to a specific commit in time. Cool. Other questions.

STUDENT:
Squash. When you squash commits, does it remove the references?

LECTURER:
Yeah. So? So a squash. What it does is it well, I guess it would just grab the tree of the last commit because it doesn't care like what the commits are. It just gonna look at that tree object so when you squash it's gonna make a new commit pointing to that same tree object with whatever kind of squash method you want. And you'll still have your old commits that you can go back and reference. But it cleans up the history so that you only have a commit object, a single commit object versus multiple.

STUDENT:
So the object still exists, but it just cleans up the commit.

LECTURER:
Right, yeah.

STUDENT:
Two different things.

LECTURER:
Yeah. So,

STUDENT:
Objects are not history.

LECTURER:
Yeah, exactly.

STUDENT:
(UNKNOWN).

LECTURER:
That's it a good question (LAUGHTER). I don't know, my guts would be when you're stashing, it's probably gonna make another object in your database and add a reference somewhere to it. So maybe I could imagine, when you do a git stash, it makes a new tree object, puts that tree object in your object database and then probably adds a reference somewhere to, say when you wanna pop it, apply this tree to the current working directory. So kind of like checking out a commit.

STUDENT:
OK. Nice. Yeah. How do you choose between, like, a merge or a rebase?

LECTURER:
Yeah. Yeah. So the question is, how do you choose between merge, rebase, cherry pick? So typically a merge is when your work is coming together. So when you open a pull request, you're gonna merge two branches together and it'll have that new tree. When you're doing a rebase typically that's when, if you're working on something locally, you've made a lot of changes, but your, the upstream like branch has changed or something. You wanna take all your work and put it on that other branch that has all your other changes in it. And then cherry pick is if you only want one change that you made in your history, it'll take that one change and put it onto the branch. Well, did I answer your question?

STUDENT:
Yeah.

LECTURER:
OK, cool. Anyone else?

STUDENT:
So when you're doing a git pull it's gonna grab your whatever your origin branch is, right?

LECTURER:
Yeah.

STUDENT:
And it will fetch it and it will merge it.

LECTURER:
So when you do. Yeah.

STUDENT:
There's also the option to set it up to rebase all the time.

LECTURER:
I don't know if you can pull actually, I think you can pull rebase but when you do just like a regular git pull, it'll treat it like a merge. So if you have changes locally that are different than what's upstream, it'll merge those two together, make a new merge event. It actually allows you to like update like commit message. Yeah. Thought. Yeah.

STUDENT:
Occasionally we've been getting a message saying that you to say actually. It's saying that you can set it up to automatically rebase every time you look at (INAUDIBLE).

LECTURER:
Yeah.

STUDENT:
Merge or merge and rebase I guess is. Right.

LECTURER:
Yeah.

STUDENT:
It's doing both of those things or not.

LECTURER:
Yeah. So do you follow this merge. If you're, if when you do a merge, if your two branches are coming from the same place. So like if you are, if you have changes locally that are on top of whatever is remote and you do a get pull, it'll do what's called a fast forward. So instead of making a merge commit locally, it's gonna do more of like a rebase where it will take those commits and just apply them on top of whatever the remote branches and update the pointer to whatever that latest commit is.

STUDENT:
I think this comes up usually when we're running into conflicts on that whole.

LECTURER:
Yeah. So you got merge conflicts. It's going to look at those two commits in the history. Find the common parent and then see what is that and do that same diff process between the two, oh my God, commits. That's what you're gonna conflict if you know something upstream has changed something in your branch has changed and it's gonna do that same merge process. See that there's a conflict. It'll pause so it won't completely merge and make you commit. It'll give you a chance to resolve those conflicts and then you can commit directly. Cool. Anything else? Was this helpful?

STUDENT:
Very helpful.

LECTURER:
Awesome. Love it.

STUDENT:
So most of these presentations that I've seen don't go into what Git does in the background.

LECTURER:
Yeah, this is a lot of fun, (CROSSTALK)

STUDENT:
Figuring out what is it doing and why am I.

LECTURER:
Yeah.

STUDENT:
This problem.

LECTURER:
Yeah, exactly. Yeah. And I mean, like I said, I was really surprised that it's not like storing diffs. It's just storing whatever the exact state of your code is at any given time.

STUDENT:
That seems so surprising. Like counterintuitive because it seems like that would be more expensive than SVN which Git store diffs, right. (UNKNOWN)

LECTURER:
I didn't look. I think SVN is also snapshot based. I'm not sure.

STUDENT:
It makes sense though because otherwise you'd have to compare tons of diffs like commit a commit d commit c and you'd have to store the diffs for each variation. That's a snapshot, makes sense and then need to run a function or something to get the diff. And that makes sense. The contents of the files are stored as hashes, not.

LECTURER:
Yeah. As blob objects with a specific hash.

STUDENT:
Yeah. (UNKNOWN) question should go back from earlier from one of the first two bits where you're pointing to the blobs.

LECTURER:
Oh, yeah. Love that. I'm just gonna leave that here.

STUDENT:
And the main question I had, you had both the README.txt and the index.html, pointing to the same blob.

LECTURER:
Right.

STUDENT:
And I just don't understand why. I don't quite understand that.

LECTURER:
Yeah, I can get to that really quick here. Oh, technology. Oh, there it is. So, see, in our last event here, they are pointing to the same blob, right. We had two different files. And that's because those two files both have the Hello World Stream. So the hash blob object is just the contents of the file. It's not gonna store the file name or the permissions or anything. It's just the literal contents of that file. So since those two objects or those two files have the same content, they have the same hash. So they point to the same place.

STUDENT:
Not often gonna happen (UNKNOWN) sort of thing.

LECTURER:
Yeah. Well as you're updating history, it'll still point to that old object, right? So it's gonna keep that. If you were to, revert a file in the future, it'll point back to that old blob because of the same file contents. Cool. Any other questions? Awesome. Thank you all. This is a lot of fun to me. Hope it was helpful.

MidCamp 2023

Git Demystified

Description

Matthew Ramir

DePaul University - Lincoln Park Student Center

Thank You to our Core Sponsors