- [Dan] Hi, so this is
Understanding migration,
- [Producer] Good, you're good.
- We're good, okay.
This is Understanding migration
development in Drupal 8.
Hopefully you're in the right room.
The talk is gonna cover
some strategies and tools
that I use and we use
in our group to develop migrations.
So there's probably gonna be a lot a stuff
and none of it's gonna go too deep.
The intended audience are people that know
a bit about Drupal 8 development
but not necessarily that
much about migration.
So we will cover some high-level topics
and then kinda talk about a
lot of things that are involved
in this sort of development process.
I get a lot of value from
sort of getting awareness from talks,
so these slides are all available online,
on the session page, so if you
wanna look at any specifics,
don't feel like you have to write it down,
or take pictures or anything.
Some of the pages we'll go
through a little bit quickly
to just say, Here's a thing.
But the details of it, if you
wanna reference back later
you can kinda take a look.
Yeah, so thanks, everyone,
for coming and we'll jump in.
So, again, my name's Dan Montgomery
and I'm a senior engineer
and technical architect
at Palantir.net.
This is a picture of Palantir.
We are a, I'm gonna try to get this right,
so we are a digital
consultancy with a focus
on higher education,
government, and healthcare
with, you know, pretty
much anything as well,
but that's the focus.
And there is currently job
postings on the MidCamp site
so if you're interested
there's other sessions
that we're having as well
and there's a booth out at the front.
All right, so these are the
topics that we're gonna go over.
At the beginning, we're
gonna cover a brief overview
of sort of what is
involved with migration.
We'll talk about automated testing
and using automated
testing as a way of doing
test-driven development in a sense.
Talk about the process of
actually building a migration
and then how to plan for manual
content migration cleanup
after you kind of do that automated work
and you pass it onto content
editors that are gonna have to
fill in the gaps.
All right, so for the overview,
migrations are basically,
I might get some of this
a little bit off from
the official definitions
and Mike's here so I'm a
little bit nervous about it,
but this is how I understand things.
You basically have your source data
and you wanna take that source data
and create destination content or entities
in the Drupal 8 site.
And migration is the
process of doing that.
So what does that look like?
So your source, what
we mean by source data,
it could be source database,
Drupal 6 and 7 databases
are very well-supported.
It could be a set of flat files
or some combination of all these things.
So, YAML files,
we're pretty used to
dealing with in Drupal 8.
You can use CSV files, JSON files,
pretty much anything that you want.
The destination sources
that we're talking about,
it's a pretty flexible
system that we'll talk about
but primarily nodes, like if
you're making articles or pages
translations of nodes, individual files,
how they bundle up into media
files or media entities,
it could be custom modules
that provide their own entities
like paragraphs, where if
you create your own entities
as well, that extend certain classes,
you can migrate into those.
The migration that we're
gonna be talking about
is primarily a custom migration module
that you'll be developing
and then it'll be supported
by core and contributed
plugins, essentially,
pieces from core and contrib
including Migrate Plus.
We'll kinda assume that most
of this is using Migrate Plus
as well as the core Drupal migrate module.
And then, for things like
paragraphs, you know,
paragraphs comes with
some of it's own plugins.
All right, so automated testing.
The use case for this that we've done
is creating tests first
with example content.
And then running those tests,
write the migration,
keep running those tests
until the tests pass.
And at that point you've
kind of shown that
with your example content, you know,
that exists in the new system
exactly as you'd expected it.
This can also be used
to reduce regressions.
So, if you're writing custom
plugins within a system
and you wanna make sure
they're still working,
those automated tests are really helpful
to just kind of repeatedly run
throughout the development process
or maybe you've made some
changes to content types
and you wanna kinda run it again.
It can be pretty difficult
to test content migration
because of all the content pieces,
so having something to do
that in an automated fashion
can be really helpful.
We use Behat as a testing framework
that's part of Drupal 8.
Well, I don't know if
it's part of Drupal 8.
It's a pretty common framework.
We use static source data,
so maybe like a snapshot
of the database that's
from a given point in time,
that we know is gonna be consistent
and then we test the data
directly on the new site.
So this isn't how it's
being presented on the page
but just the actual data source.
So, one thing that's helpful with that is
the open source Palantir Behat extension
which let's you look at the data directly.
So we'll show some examples.
This is an example test.
So this is a feature with a scenario
and the highlighted bits are the,
kinda the important parts here.
So you can load example
nodes of a specific type,
so in this case, an article
node with a certain title,
and then you can test
field values against it.
So this is really easy to write
before you've even written the migration.
It's not looking at specific,
I mean you're just looking at field names,
but you can kinda fill those in as you go.
And you can test that just
sort of like the core value
of that field is a specific thing.
This can also work with paragraph.
So you can test for that a certain field
has a type of paragraph
and you can also test
that a certain field on a
paragraph has certain values.
So this can be really, you know,
you can kind of extend it as you need
to handle different types of fields
or different types of entities.
All right, so that's how we kind of use
the automated testing framework.
When we're actually building the migration
to past those tests,
we, like one of the
processes that I go through
is to kinda of scaffold out the module
just like I would kind of a lot of work.
We'll look at what a
completed migration looks like
so that we know kinda what
we're building towards.
Explore a lot, a lot of the
work is exploring the data,
either the source or the destination data.
And then making a map
between that source and destination data.
And then lastly, the migrations
that have dependencies
on other migrations, you know,
how do we kinda manage
those relationships?
So if there's articles that
require media entities,
like how do you make sure
that those are all kinda being
connected together correctly?
Okay, so the migration module
is gonna be made up of mostly
configuration files, just
like a lot of Drupal code
and custom plugins.
So the configuration files, most of them.
Sorry.
For the configuration
files, they're gonna include
a lot of content in them,
but the core content
that they're gonna describe
is the source, the destination
and then kinda how do
we process each field
so that we get the correct
values in the destination
from the correct values in the source.
The plugins that we might
develop are gonna match up
to those three parts of the process.
So we might develop,
we will either find or create our own
source, destination, and process plugins.
So if we have a special kind of file
that's not supported or
special kinda of database,
you might create a source plugin.
If you have a special kind of entity
that you need handling for,
you might create a destination one.
And we'll cover some of
the process stuff as well.
So, just like most modules,
you know, it's an info file,
we can set up a folder
where we're gonna put
all of our configuration files
and a folder where we're
gonna put all our plugins.
We, one thing that I will usually do is
set up an uninstall hook so that,
because we're dealing with
a lot of configuration,
it's helpful to be able to uninstall that
when you want to install the module.
And so just having a system
for doing that automatically
without, you know, that's
a common bottle neck
that you might hit, you
disable module, enable it again
to try to get the configuration back
and it gives you an error
because there's a conflict,
because the configuration already exists.
So that can be helpful.
You can also set up migration groups.
And that's something I'll usually do
before I even write the migrations.
And this can kinda tie
together shared configuration
including database configuration.
So let's say you have a Drupal 7 database
in your Drupal 8 environment.
You can set up the connection
details and settings
on PHP, then create a
configuration file that
kind of links the two and
defines a migration group
and then in each of your migration files,
you can specify that
it's part of that group
and it'll inherit that connection.
So a lot of that work is just
kind of connected for you.
All right, so that's kind of a lot.
What does the actual migration look like?
So, these configurations files,
more or less look like this.
So, there's one file per migration.
They've got an ID, they're
part of that group,
if it makes sense.
They might have
dependencies, so for example,
like if an article's
dependent on a media file
you might, we'll come back to that,
but you might wanna add
that as a dependency.
And then they have the same sort of
destination, source, and process
that we've been talking
about a little bit.
So the destination is the first
thing that I'll usually add
and figure out like, What am I
creating from this migration?
So in this case, we're
gonna create a node.
Where am I getting this data from?
So the source in this
case is a Drupal 7 node.
So this ID d seven node
is an ID for a plugin
that's gonna be able to
read in intelligently
a Drupal 7 node.
And then the process
that includes, you know,
this is the destination field
and then this is the source field,
so, How am I gonna map the two together?
So this is a very simple example
where we have a title
field in the destination
and a title field in the source,
and that's kinda all that's needed.
If there's additional
plugins that are being used,
you specify that there's a plugin
and then what the plugin's name is
and then the plugin usually has
parameters that you pass in.
This is a simple one where,
if you just wanna set
a value for everything,
you can just use the default value plugin
and just say that the type
is always gonna be article.
All right, so this is, more or less,
what we'll be creating.
But like, how do you
fill in all these values,
is what we'll talk through next.
So, the first thing you know,
we're gonna have to
identify the destination,
what do I type in there?
If possible, we want to copy some examples
and then we wanna identify the source
and figure out, like, where's
this data coming from?
I need to know what the ID is.
So for all this, it's
kinda the same thing.
Like, Drupal 8 has this
nice system of annotation
and it has this sort of
structured method for its plugins.
So a lot of these plugins
that we're, sorry.
For destinations, they're all
gonna be defined by classes
in Drupal 8.
And you can find a lot of these classes
in the Drupal migrate module.
You can also find a lot in other modules
by looking in this common path
source plugin migrate destination.
So, this stage is kinda
like an exploratory phase.
It's like you can either look online
or you can just kinda look
around in your code editor
and try to find like,
What are my options for creating things?
Do any of these, you know, make sense?
Like, if I'm tryin' to
migrate into a node,
like, is there one that
exists that it's called node?
Or if it's a paragraph, does
it, is it called paragraph?
And then this at migrate
destination annotation
is another way you can find.
That'll be present in all the
classes that kinda define it.
So once you've found a
class that makes sense
or that might make sense,
you'll look for this
part of the annotation
that says, This is the ID for that.
And this is the name that's
used in those migration files.
So, the ID, yeah.
So it's say ID equals,
in this case, URL alias,
so if you're creating URL aliases,
this is the one that you want to use.
And you can find example
migrations that might be in the
in the code base that
people have written like
test migrations, to say like,
Okay, this is the source, so
if we go back a couple pages
and say like, the plugin,
in this case it would say
URL alias, you might be able
to find one that somebody
has already written and use
that as a starting point.
So, you might not always
find ones that match 100%.
So, one thing that
important to be aware of is
derivative classes.
So, they're very common,
particularly because entities
and any entity revisions, like nodes,
use this deriver class.
So before we saw that it was,
I think it was called entity node,
and so, in this case,
we have this colon here
and that's indicating that
the ID is actually the entity
and then the parameter
it's getting is node.
So, in this case, you're
gonna find an entity class
but you're not gonna find
one that's just called node
and you're not gonna find
one that's called article.
All right, so that's
something to be aware of.
Once you do find that, yeah,
you'll see, for example,
entity or entity revision and
then the colon and the name.
So once you do that, you
can do a search for plugin
just like we saw before with the ID
and you'll probably find
something that already exists.
In this case, with the URL alias,
there's something called d seven URL alias
and that's a good example
of like how you migrate
URL aliases from Drupal 7 to Drupal 8.
And so that's gonna be
a really useful example
to copy from.
If you can't find one in there,
you probably can find something online
that has a good starting point.
So, the same process is what you'll do
for identifying the source.
So pretty much, you
need to find, you know,
this is what it looks like,
and you're gonna need to
understand what the ID is.
And the ID can be found
in the same sort of places
with the same annotations
as the destination classes.
All right, so, once we
kinda make that scaffolding
like, what's the process
of running migration?
So when we're developing
migrations, we run them
and then we kinda rerun them repeatedly
like, as we add additional fields,
as things are working, and
things are maybe not working.
So, a lot of this can
be done pretty easily
with Drush.
There's the admin interface
that you can use as well,
but Drush has a lot of useful tools.
So, probably the most
important one that here is
Drush Migrate Import, or
the short version is MIM.
And then you can list the
name of your migration
which was the ID that you defined.
And it'll just run the whole thing.
So that's pretty useful.
If you need to rerun it, you could,
cuz maybe you've updated the
files since its configuration,
an easy way to do that is
to uninstall the module
and then reinstall the module
and you'll get an update.
If you don't do that, you can make changes
and it's not gonna reimport your changes,
so you won't see your
migration get updated.
Another way to do this is
to use a contributor module
called config devel which
will provide an easier way
of reloading that configuration.
It has a screen where you
can put in all the files
that you want to reload.
And then whenever you
want to open the browser
it will just reload them for you.
So that goes a lot quicker
than the other process.
So that's one way to speed things up.
You can also speed things up
by limiting the data
that you're migrating.
So, in this case, we
talked about sample data.
So some common parameters
are doing things like
specifying a list of source IDs.
So you could say, I only
wanna import source 123
because that one has a lot of the fields
and I don't need to process all of them
cuz that takes too long.
You can also limit it, in this case.
This could be slow because the way,
it is faster than the
alternative, but it can be slow
because it reads through all
of the source data fields
until, all the source entities,
until it gets to that specific one.
So there's actually a way
that I will usually use
which is modifying the
source plugin directly
either in a contrib module,
which is probably not the best idea,
or by making my own extended
version of the plugin
and then adding a condition to say,
as part of the query stage,
I'm only gonna ever query that one node,
and then I just delete
this line at the end.
So that kinda speeds the whole thing up
because the system's only ever aware
of that one piece of content
that I'm interested in
and the whole thing just
runs very, very quickly.
All right, so we kinda have
in this sense, like we
know how to run migrations,
we know like the overall structure,
but before we can make that mapping
we need to really understand
the destination and the source.
So the ways that you
can do that, you know,
you can open up the admin structure page
and just kinda look at the
content in your Drupal 8 site
and start listing the fields, you know,
what are the fields?
What are the field types
that I need to create?
Devel has a really nice
way of having a tab.
You can create an example piece of content
and then you can go look at that
and see what's there.
So this is an example of the devel output
and you can see that there's a body field
and it has a value and a format.
And the summary is sort
of an optional field.
So we know that, if we're
gonna be migrating into this,
this is basically the structure
that we're gonna need to create.
So, devel has a limitation
that those tabs only show up
on page-based content.
So if you have pageless
content like paragraphs,
there is a tab, if you
wanna use the browser,
you can enable devel and PHP
and you can just paste PHP in there.
And you can start printing messages.
So this is something that's
useful for like paragraphs
where you're loading a specific entity
in a specific paragraph
and it just prints it to the screen.
So that's helpful.
You don't need to install any
extra tools except for devel.
Another option is
you could use,
if it's a Drupal site, you
could use those methods.
If it's a non-Drupal site,
you can't use those methods
for the source, like,
oh sorry.
If you're exploring the source
you could use the same methods.
They're available on Drupal 7 or Drupal 6,
but they're not available
for maybe, you know,
a WordPress site or a CSV file.
So a common pattern that we use is to use
debug process plugin.
So creating process
plugins in pretty simple.
The key is that you provide
an ID, you create a class,
and then you have a transform method.
And this takes in a
value that it gets passed
and then you kinda do something
to it and return a value.
We're not really using
it for that in this case.
We're just printing to the screen
or to the terminal, whatever
is the content of that value.
So this can be used to say,
Hey, we're gonna have a field name,
and we're gonna get this
field that's called name
and then we're just gonna print it.
And so when you run that Drush command,
it's just gonna print
every value that it hits
to the screen in sort of
like a nice-ish format.
And you can look at, you know,
what is the source data that I'm getting?
What was the destination data?
How can I match them?
This is the same, this is an option for it
in that specific one that we created
where you can also just print
what are all the values of the row?
Instead of just the specific one,
I wanna know what the
whole content looks like.
So the third option is to use Xdebug.
So if you use Xdebug as a debugger,
this is a really good
option that you can use
with migration.
Things to keep in mind are that
you're gonna need somewhere
to put a break point.
So process plugins are
really a good place for that.
You can add that debug
plugin and then you can
put a break point in it.
Because it's Xdebug, one of
the easier ways to trigger that
is to run the migrations
through the user interface.
So mostly I run them through the console,
but if you run it through
the user interface,
you can add that URL
parameter, startup Xdebug,
and then trigger the migration
and it'll stop where you are
and you can look at the destination,
you can look at the source,
you can look at the values,
kind of explore anything as you want
instead of kinda of
continually rerunning things.
All right, so once we have an idea
of the source and the destination,
we need to kind of map the two.
So the first thing I do
in that process is say,
Here's all my destination fields,
I'm gonna just list them all
in that configuration file.
Once I've done that, I kind of
look through the source data and then
piece by piece map them up.
Refresh the page, run the migration again.
And then just see if that's working.
So like, field by field,
make sure each one's going.
You can use that automated
testing framework
to do that a little bit quicker for you,
and then just keep running
that automated migration
and eventually it's gonna
say, Everything passed.
And you have a pretty good
idea that things are working.
Yeah, so at this point, the key thing
that you're gonna be dealing with,
cuz you've already identified
the source and the destination
is process plugins.
So just like source and
destination plugins,
process plugins can be
found in the same way.
And there's a lot of
really good documentation
online for that.
So here are some links to drupal.org.
They'll define every single version
of the process plugins that you can use,
examples of how those can be used,
what the parameters are, things like that.
If you write a custom one, that could be,
if you're not finding something
that's working for you
or it's becoming way too complicated,
custom process plugins are
not that hard to write.
And you have access to things
like the value that
you're being passed in,
and also the whole source rows.
So, if you need to combine
multiple source rows
into a specific value,
that might be a way that you can do it.
There are ways that
you can kind of combine
existing process plugins,
but if you're struggling with that,
it might be worth trying
a custom process plugin.
All right, so, yeah.
The last topic, sorry,
second to last topic
is managing relationships.
So, let's say you've got
your migration completed
and you've written the node migration
and it needs those
media entities to exist.
So there's kind of like two main ways
that you can go about doing this.
You can either specify
them all as dependencies
and say that I need to run
them in a specific order.
So I need to run that
media migration first
so that when the node migration is running
and those references exist,
it can find the other entities
and it can recreate those references.
That's definitely a valid option.
There are some limitations.
Another option is to say, I'm
gonna run them in any order
and I'm gonna use the stub process.
So when you write those
references you can say,
If it doesn't exist just create a stub.
And it's an empty version
of that with just an ID.
And then when it, let's say you
wrote the article migration,
it'll create those stubs
and when it runs the media migration
it'll see that there's
already a stub created
with the ID that matches
and it'll just fill in all
the fields for that content.
So, some reasons that you might,
some things that you might
hit in that case is that
you know, maybe neither of those works.
And in that case you
might just run migrations
multiple times with the update flag.
One particular case where I
found this to be very helpful
is if nodes are referencing other nodes.
It might not know what
kind of stub to create
and so you might wanna just
have things run multiple times.
And that can kind of, you know,
not create stubs when they're not needed
and it just kinda fills stuff in.
Yeah, so, the last piece of this is that,
eventually like you're
gonna write your migration
and you're gonna pass it on,
and it's gonna be somewhere between like
nothing was migrated
in an automated fashion
and everything was, but it's
never gonna hit that everything
so a thing that we found valuable
is to tag content as we're migrating it
and to create a vocabulary
for that purpose.
So we create a vocabulary, create a field
on all the content types
that we're migrating
and then as we're
processing those entities,
identify things like,
this has embedded HTML,
or this has an image,
or this has a link
that's an absolute link.
And these are flags that
people can come back to
as they're doing content editing,
and kind of filter all their content
and then say like, Oh,
I need to review this
in a hand by hand manner.
Yeah, so,
that is it for now.
Hopefully I've left
enough time for comments.
Again, my name's Dan Montgomery
and I wanted to kinda
open it up to questions.
Thank you.
(applause)
- [Participant] One
quick question I have is,
is there a good resource
online where you can
find examples of, cuz, I know
you did a lot of high-level,
but if there's, you know, some examples of
how do you write the code for
migrating from seven to eight?
- [Dan] Yeah, I would recommend,
I think there are a
lot of good blog posts.
So there's some information
I think on drupal.org,
a lot of the references
I mentioned are kind of
kinda like API documentation.
So it's like you have
a list of the plugins
and then you would kinda
look at specific things.
A lot of the Drupal 7 migrations
also have documentation
within, like, within those migration files
so they provide an example of like,
Here's a very basic version
or representative version of doing like,
this type of migration.
And you can usually just copy those over
and then change some
things that apply to you.
I don't have, nothing's jumping out like,
for that particular purpose,
but I think there is a lot
of stuff around that specific topic.
Cool. Mike.
- [Mike] How, so you know, when
you have the node migration,
you know, then you need to
create a media migration,
you have media, what's your
process for figuring out
how many different migrations you need?
And then, part of that is
if you have to do estimate,
estimations, for like
how long the migration's
gonna take you to build.
How do you work that out?
- [ Dan] Yeah, migrations
are really tricky
for the estimation process.
I think for the number of the entities,
it kinda comes down to like,
yeah, it really, it really depends.
I think there is a lot of
similarity between projects
but some of the things,
is like you're splitting
apart content types or
you're combining things,
there might be tricky
ways that you can like,
write one migration
that dynamically handles
a bunch of things at once,
or just like split 'em up to keep things
a little bit easier to read.
I think there is some like,
figuring what works for
you and iterating on it.
In terms of the estimation,
I think when projects,
like there's typically a budget
that's associated with migration.
And you can give like a high-level,
These are the primary
content types based on
you know, this is the quantity of them
and the complexity of them
that you wanna allocate to migration.
And I think there's
just a lot of discussion
that has to happen of like,
Okay, if we were to
spend more effort, like,
what are we gonna get the benefit from?
And making sure that that works
with the clients that you have.
So, it's not like a great answer,
but it is definitely one where,
it's a very different thing
from like, Drupal development,
and so I think there is
a lot of communication
that has to happen with
the content editors
who might not be the people
you're normally talking to
on a project.
Yeah.
- [Participant] I have
a blog post that kinda
talks about that, so
how to manage this, so,
come find me if you wanna
read it, or pointed to it.
- [Dan] Yeah, thanks.
- [Participant] Have you ever
run across a scenario where
your Behat tests are looking
at the actual database content
but the Drupal formats are filtering out
so you're not getting the
same result that you set,
the taxonomy that you talked about?
Flag those or how do you do that?
- [Dan] Yeah, that's a really good point.
So, like, we're,
as a developer at this
stage of the process,
we're interested in like,
the value came across.
But yes, it is really
good to kinda of work
with the content team
and let them know that
what's presented on
the page passes through
like several other stages.
And so you might think that
all your content got deleted
but you need to look at
like, the source code
or something like that.
So I think, I might have
missed your question,
but I think the thing that
I would recommend doing
is just kind of working with that team
and as they identify
examples for you to work with
like, they might provide
examples that you're then using
and they're expecting to see things,
so as long as you have that
like shared set of examples
and show them like, this is how to look,
this is how to verify it on your end,
cuz they're probably not gonna run
those automated migrations,
although you could show that
as part of a demo, and
clients usually like to see
things that they don't
necessarily understand,
but they kind of see the things working
and they're like, Oh, that's cool.
I don't know if that
answered your question.
- [Participant] I think he
was, sorry, you were asking if
the Behat test itself is
testing for live content
- [Dan] Yeah
- [Participant] that
might have changed so,
I think that answer to that...
- [Dan] The answer for the
one's that I use is, No.
But you could definitely do it.
So those are just like, the
standard Drupal Behat libraries
will test for the what's
presented on the page.
So if you wanna test for those as well,
I think those would be good tests.
But, yeah, in this case, because, maybe
you're doing it before
all that work's been done
or because, you know,
there's all that other stuff.
Yeah, I think it's just
something to keep in mind.
All right, I think we're at
the time, but I'm also around.
And, you know, happy to
answer questions and stuff.
So, yeah.
Thanks, everyone.
Captions made possible by ClarityPartners.com
Chicago area Drupal consultants
(applause)
Captions made possible by ClarityPartners.com. Chicago area Drupal consultants