Custom Drupal data migration: a Georgia GovHub story

Time

Thursday, 2:00 pm CDT - Thursday, 2:30 pm CDT

Location

Room 314A

Description

As Digital Services Georgia upgraded their Drupal 7 multisite platform to Drupal 8, they capitalized on the opportunity to make improvements to their content model. Data migrations were customized to move and shape data to fit into new content types and fields.

Let's take a look at some of the strategies, tools, and techniques used to migrate site data from the GeorgiaGov Platform to Georgia GovHub.

Topics that will be covered:

Discovery and planning
Strategies and workflow
Sample solutions
- Site specific overrides
- Nested Paragraphs
- Circular dependencies

Presenter Slides

MidCamp_2020_Custom_Drupal_Data_Migration_A_Georgia_GovHUB_Story.pdf

Speakers

April Sides

Principal Software Engineer @ Red Hat

April has over a decade of experience with Drupal web development. Her journey has taken her from higher education and federal government roots to agency and big tech opportunities.

Along the way, she's been involved in the Drupal community as lead organizer of Drupal Camp Asheville and a Drupal Community Working Group Community Health Team member. She often attends and speaks at Drupal events in North America and helps organize the Accessibility Talks virtual meetup.

April is passionate about connecting people in the community as well as providing knowledge and resources for new developers to grow professionally and personally.

Outside of work and volunteering, April enjoys collecting LEGO Harry Potter, watching Star Wars and Marvel media, and going on adventures with her step-grandchild.

Track

Back-End

Feedback Form

Transcript

>> APRIL SIDES: Hey, everybody. And welcome to custom Drupal data migrations: A Georgia GovHUB story. My name is April Sides. I'm a senior developer at Lullabot, and I've been working at Lullabot for about a year and a half now. The GeorgiaGov project was my first project and my first migration, so I'm really excited to show you what we did.

I'm also the lead organizer of Drupal camp Asheville, which is currently scheduled for July 10th through 12th this year. So visit DrupalAsheville.com, and follow us on social media to stay up to date on what's to come. We're discussing options at this point, but there's still a lot of unknowns. So, let us know what you think of the virtual MidCamp, and we might look into doing something similar.

So, at Lullabot, we provide strategy design and Drupal development for large scale publishers. So if you're interested in working with us, reach out to me in the MidCamp Slack.

All right. So, first, I want to introduce the migration team. Me, of course. Karen Stevenson, who is our chief technology officer. She had a lot of migration experience and provided a lot of guidance and direction. Marcos Canno is a senior developer. He did all the file migrations and was instrumental in keeping our code clean and organized. Darren Peterson, senior technical product manager was our fearless leader and was key for keeping us moving. And James Sansbury, who is now Tugboat technical account executive, was responsible for the dev ops magic and made all of our work possible.

Also want to give a shoutout to the digital services Georgia team. They're definitely one of my favorite clients, for sure.

So what we'll cover. We'll talk about the discovery and planning, the strategies and workflow, and a little magical Nerdery.

So, discovery and planning. So, into migration, we move content from a source to a destination. So, in this case, the source was a Drupal 7 multisite with 85 plus sites. Each site had its own database so it actually was very beneficial in this move. It was hosted on Acquia. They had 27 content types, 15 of which we migrated content from. 14 taxonomy vocabularies, nine of which we migrated. And the architecture was paragraphs, field collections, and entity embeds.

So our destination site is a Drupal 8 multisite, and we were migrating them at about groups of six at a time. It is also hosted on Acquia.

There were 20 content types in the Drupal 8 site, 14 of which we populated through the migration. 17 taxonomy vocabularies, ten of which were populated. And we move the architecture to more of a microcontent types and entity embeds sort of architecture.

So, the microcontent type is, there are actual content types, but they're not viewable as a standalone page, they were only viewable as entity embeds or references. So instead of using paragraphs, we use content types in this format.

So, this is a very complex migration. Mostly because it was multisite, and it had a completely new architecture.

So if you're looking to migrate, don't be afraid of this particular migration. This was a very complex one. So, kicking off with the discovery, Karen Stevenson collected all the field instances for all entity types using a script, and then converted that to a Google sheet, and this was very useful in determining our source fields for field mapping.

At the same time, our content strategy team delivered content model documentation for the Drupal 8 architecture. So this documentation was used to manually build the new content types in Drupal 8. And we also used it to help determine what our destination fields were going to be for field mappings.

So this document was created to track the field mappings. So, we have a column here this is towards the end of the project where you're seeing what field is going to what, has it migrated, and we had various different phases that we were doing so we went from basic to complex and we could keep a high level track of the status of the field migrations for each content type.

So this is actually a new iteration of the field mapping documentation that we're using on a current project. So this one is based more on Karen's field instance document, and we added some conditional formatting, so you can see that one of these fields is going to be a new field, so it highlights in yellow. Some of these things aren't migrating, so they're grayed out. And there is more duplication on the content on the left, but it was more focused on field mapping or field migration instead of entity migration.

But, you can also in this case create these filter views, because we pulled it into Google sheets, so you can create a filter view for each content type, and so you're only seeing those field rows when you're looking at it.

So, I actually used this document during the discovery phase of this project, and I was able to hand it off to our developer with clear directions on what needed to be migrated, and it's also been useful to share with the client to make sure that our assumptions are correct, that we're migrating everything that they want us to migrate. Yeah.

So I actually created a module on Drupal.org called migration planner, which creates a command that will generate this kind of document for your site, so you can run it in Drupal 8 with your Drupal 7 database connected through the settings of that PHP file, and it will create the tabs for various entities and give you at least a good starting point. I also have a Google sheets link on the Drupal.org page that you can use to copy the formatting if you want to use the conditional formatting.

So there's a dev branch available, so I encourage you to check it out. Let me know how it is.

So another tool that we developed for the discovery is a custom command called SQueaLer. So it runs in the Drupal 8 site using the Drupal 7 database similar to the one I just talked about configured in the settings file. It runs scans to determine if there are going to be any issues during the migration. Identifying things like image tags or absolute links, things like that. We had a scan for each different type of case that we were looking for.

And so we used this to determine if things needed to be changed in Drupal 7 before the migration, or just to inform the migration logic to identify edge cases that we needed to consider.

So as we went along, the list grew, and we had other questions throughout the development process, so we could just add new reports and scan each site and integrate it into part of the migration workflow, so we could scan the site, make that available to the client, and then they could take a look at any issues that they want to fix, and we could take a look at identifying, okay, this one didn't migrate properly. I can actually go and force the migration of this particular ID and see what's going on.

So you would just run it in drush, and it generates an Excel file with the tabs for each report in the site directory, site files directory, in a folder called GA SQueaLer. We had a costume module.

So the report would look like this. The goal was to trace back any issues back to the nodes. That's what the alias links are. It traces it back to the node where the problem or the edge case lives and then you can investigate from there, and you can see we have different tabs at the bottom for the different reports.

If there wasn't anything found for a particular scan, there wouldn't be a tab, so it just only the reports that were there. So we have this in the Google sheets so we can share it with the client.

So I've created a space for SQueaLer. It's not yet ready for primetime, for any Drupal 7 site, but I am working on that, so stay tuned for that.

All right. Strategies and work flows. There are things that we did not migrate and things that we did migrate. In this case, we did not migrate structural elements, the content types and field definitions. The vocabularies themselves, the structure of them, or paragraphs in field collection bundles. We didn't migrate views and we didn't migrate the web form submissions. We did migrate the data elements. So, for select nodes and field data. Select taxonomy terms. Menus, web forms, and files.

So this is a list of the migration modules that we used. We used the core migrate and migrate Drupal modules. And then these are the contributed modules that we used. And I added migrate source UI on here. This was used because the client wanted a way to import data from CSV files during and after the migration so that they could manage data that way. So it wasn't integrated necessarily to part of the automated migration of the site, but was something that, you know, used the migration system.

So you can see we have a GA migrate source ui custom module as well, it's just the way you can create a migration and it's selectable, and they can download it and import it that way. Impressive our main migration module was the ga_migrate module, and the second was used for site overrides, or site specific overrides and I'll talk about that a little bit later.

Our development tools for local development, we use Lando. Our QA environment is tugboat.qa. In the dev ops magic was CircleCI and Quay.io, used to manage our docker containers for dugboat and local development.

Tugboat is a Lullabot product. Each request that you submit generates a preview site so you can test your code prior to merge or show it to the client for review, and it was very much integrated into our workflow for this project.

So as far as our development workflow, we focused on field mappings from by complexity and content type. So we would migrate, you know, like blog posts with just the basic fields, like title, maybe there's some basic text fields, and we would submit a PR for that and have that reviewed and approved and merged, and then we would do another round where we would come back and take a look at rich fields or files and images. We just sort of moved down the pike of complexity. That way the client had something to look at. They could see our progress happening incrementally and we could keep our PRs small and reviewable.

So our migration development strategy was we created and edited the migration configuration files directly in the config sync directory. We preserved the note IDs for stand alone content. We migrated unpublished content because we didn't want to have broken links and menu issues. And we wanted to prioritize the ability to roll back and reimport. We're doing this in groups of six, and, you know, the migration is running for all these different sites, 85 different sites. We wanted the ability to roll back the migration. So we didn't use things like entity generate as a process plugin. So we wanted to be able to roll things back out.

We used ga_migrate_site for specific site overrides and log skips and exceptions using a custom logging solution, which I'll show you in a second.

Our solution order of how we're trying to solve a problem is first we tried to do it with configuration and using core and contrib modules. If we couldn't do it there, we would look Atticus Tom source process plugins and services in the ga_migrate module. If we couldn't do it that way, that's when we defaulted to using the hook system.

Custom logging. This is something that was very helpful for migration. We were able to share logs with the client through migrating all of these sites. Marcus Canno developed it so that we could track these skips and exceptions throughout the migration process.

So we use this function if our logic skipped a row or if we felt we needed a record of something, or if something unexpected occurred. All sorts of varying cases. With varying severity. And so we would be able to track down the issue. We just had this function we could use anywhere in our code with GA migrate log. It could provide a detailed message with any IDs that could help us track down the problem. The migration ID of the current migration that's running. The audience was whether if the client was digital services Georgia or the dev team, like who needs to look at this issue. Sometimes we just wanted to record that something happened and we wanted to make sure that something happened.

And then severity. Was it a warning notice or error. The category was just a short descriptor where we could use to sort things out. And the real ID. The current item being migrated.

There's a drush command along with this module that created a TSV file from a table of messages. And we would import this into Google sheets and it was useful for tracking down those edge cases. You could filter by various things. You could look up an ID. And it helped us fine tune the migrations. So how do we migrate 85 plus sites? The answer is in phases.

So the migration phases were staggered in groups of about six sites over two week periods. In each phase, this is an overview of the procedures. So we would add the new sites to Tugboat and remove the old group of sites. We'd run the migrations on Tugboat. And then let the client QA the migration on Tugboat. Any issues that they found they would give to the development team. The development team would fix those issues. Then the migration would run on production and the client would prepare the site for launch, meaning any manual migrations, any new content that needed to be created, and then the site would launch.

All right. Magical Nerdery time. Karen told me this was the most complex migration she'd ever done. Most migrations are not this complicated. But I wanted to pick out some of the more complex problems that we solved and show you what we did, so it's time to put on your technical hats.

Here we go. Site specific overrides. So you've got 85 sites, there's going to be some sites that want to do things a little bit differently. These are all government agencies. So I'm going to give credit to Karen for the solution, she came up with this.

So we have a ga_migrate_site module in our top level directory. That module is mostly empty. We had some constants, like an interface with some constants that we were using in some of our logic, but mostly it was empty.

So, if any site needed to override anything in the migration process, they would copy that module into their site specific module's directory, and from there, they can make changes. They could use the hook system to make some modifications to the way that the migration was happening.

And so yeah. So, since the so the overridden module is in the site specific directory, that's the one Drupal is going to find for that particular site. Everybody else will be all the other sites will be using a top level basic module. And we set it up so that the ga_migrate_site module is enabled to run for all sites and it's set to run after ga so it can override.

This is really the only code that was in the module, an interface that just had some constants. We had some logic that would change the state of press releases after the press release migration, if they were older than three years. Yeah, 3 years old. We also had a skip list that you could say what node needed to be skipped and that was mostly used if a node page was being released by a view page. So just a few things there.

And then the module that module file was actually completely empty, but here's where, if you needed to tap into the migration process, you could use the hooks. All right. How we doing? Thumbs up? Yeah? I'm not seeing your reactions. But this is Zoom.

All right. Nested paragraphs. So, in Drupal 7, the site had two types of paragraphs, container paragraphs that were used for layout purposes, your one column, two column, three column, four column.

And then they had other content or other paragraphs were called content paragraphs. That's what I identified them as. Things that would hold the actual content, like an image, the content reference, the text area, things like that.

So if we looked at an example in box form, we had a node and the node has a field like field content. Field content holds container paragraphs. And then those container paragraphs hold the actual content paragraphs. So, fun stuff.

So our paragraph strategy was to convert the paragraphs into Drupal 8 mark up, including entity embeds and stack that new content in the body field during the migration.

So in the migration configuration, we processed field content using a custom process plugin called GA microcontent to text, and we scored it in a pseudo field or temporary field which is prepared field content, in this case. It's kind of like a custom variable specific to this migration. So we had that rendered content. And what do I mean by rendered?

So what I wanted was, what should the Drupal 8 mark up be for this paragraph when placed in the WYSIWYG body field? Is it text mark up? Entity embed code? File download link? For the container paragraphs, are there aligner adjustments we need to make sure make it to Drupal 8?

So using a function that can call itself, the process plugin iterates deep until it hits a content paragraph, renders at that level, and then renders its way back out until it reaches the field level.

So in this case, the first thing that gets rendered is the related links, the paragraph related links paragraph. Then the image, and then we come back out to the container paragraph because maybe the image needs to be aligned right or left. Next, inside it, to the text area, we render that, come back out to the one column container paragraph.

And then take all of those strings and smash them into one value for field content. And then that string can be concatenated in the body field. This is how you would reference that pseudo field with the at symbol, and above anything that might be in the body field in Drupal 7.

Last one. Circular dependencies.

This is an example of a site page that we were migrating it, but they had a field called field related links and we wanted to migrate that field data into its own microcontent node called a link collection. And then we wanted to reference that link collection in the body field when we migrate the site page.

So we broke it out into a migration of site page with just that field related links, migrating to a link collection node.

And then when the site page node migration happens, we can use the node ID to look up the link collection node and then generate the entity embed code and put it in the body field, then migrate it to a topic page node.

So we did the pseudo field again. So, this is the prepared field related links. This is in the site page migration. We're doing a migration look up to that field related links migration, using the node ID, and then we're raising another process plugin, custom process plugin to figure out what the entity embed code needed to look like. So that is now going to be a string value that we can use that prepared field related links pseudo field. We use that value and concatenate it to the bottom of the body field in this case. So our dependencies look something like this. The site page is dependent on the field related links because we want that link collection to exist, so that we can reference it and inject it into the body field.

But as we got into the migration, we found an edge case. We found the catch. So what if the site page is linking to a site page that has not migrated yet? What happens is that we end up with a null or empty link. We lose those links when we're migrating field related links.

We added this to the field related links migration. So the only purpose for this pseudo field is to stub content for the site pages. So we're not even going to use this value. We just want the stub process that the migration system has where it will generate the node and it just kind of has a gobbled title or whatever. But when the site page migration runs afterwards, it's going to fill in all that content and the references will all make connections.

So we're doing a migration look up to see if it exists. Let's see. We're processing the references. I'm trying to remember what I did here. Let's see. We're just subbing the site page in the field related links references, if they don't already exist.

So if this happens, this is a stubbed entity that we can link to. So this link will migrate during the field related links migration, and then it will be filled in by the site page. So the tricky part here is that when we are stubbing the site pages, we're looking at the migration to see if that site page has migrated already. So we're doing a look up, and then stubbing those site page nodes using that migration look up.

So we're referencing the site page migration. Which means that now the site page migration is a dependency of the field related links. So we've got circular dependencies. Both are dependent on each other. So that was fun.

So we found out that it works. It's a non specified dependency, the site page dependency in this case, is the migration ID is alphabetically before the current migration. So because ga_d7_node_site_page was before the field ID, it worked. It didn't work when we tried it with an index list, doing the same thing, it wouldn't work. And the reason was just that alphabetical listing.

So the fix or the hack was to put a 1 to make it alphabetically before. I created an issue on Drupal.org for this. It's probably very much an edge case, but it's very interesting, and I didn't have time to dive into it, but maybe there's a way that we can identify all of the migration IDs. The issue would be that the migration didn't exist because it hadn't already been registered alphabetically before. So that was an interesting case.

Oh, we made it. All right. We made it through the magical Nerdery. If you want to know more about the GeorgiaGov project, there is a case study that was presented by Darren Peterson last summer. It's also coming to Drupal Con Minneapolis 2020. Mark Drummond also did a presentation about the front end part of this site at.gov con as well. We had a podcast where we talked about migrations, talked a little bit about this project. We had some other articles from Juampy, and we interviewed Mauricio Dinarte on his blog. More to come. I need to write some more articles. Hopefully, I'll be able to write some more articles about the things that I talked about today that is a little more in depth and give you a little idea of how to do some of the similar things.

Thank you.

So, please provide your feedback. I'd love to hear your feedback. That's the link, 6294. Contribution Day is on Saturday from 10:00 a.m. to 4:00 p.m. So, yeah. Any questions? I see one.

Is ga microcontent to text any place to take a look at? That was a custom process plugin. So I want to try maybe if I can get these articles written, and all of my spare time now, I will write some articles and try to share more details into how we did it. We did use some services that we shared between various parts of the migration and hopefully I'll be able to share some of that information.

So, another question. What was the purpose of converting the paragraphs into micro content type? So, I'm not sure what the decision was that was actually part of the content strategy team. We just operated it on that, that we were going to be creating these microcontent types and populating them with the content. And those nodes would then be embedded using entity embeds. So, yeah, I don't have the background on that decision, but that was the way that we went.

For a D7 site with heavy use of container paragraphs, would migration to layout builder be conceivable?

So we didn't actually migrate we're using layout builder, but we didn't migrate anything to layout builder. So, I'm not really sure what that would look like. It would be pretty interesting, for sure. But we didn't do that.

All right. Any other questions? I think I'm out of time. I'm not sure what my time is. Have a little bit of time? Which one is better, paragraph or embedded content types?

So there's actually an article by Jeff Eaton on Lullabot.com where he talks about various ways to think about the architecture and how you want to lay out pages and things like that. I highly suggest checking that article out. Because he was part of the content strategy team that decided we needed to do microcontent types. I mean, I know there are some drawbacks with paragraphs in cases of, like, translation and things like that, and maybe I can't remember, or visioning. There's some issues. Sometimes it works for you fine, it really depends on your use case.

Can SQueaLer and Migration Planner not be used from a D site with Drush 8? I've thought about that. Right now, for our purposes, whenever I ran this, I was already in the Drupal 8 site and had the Drupal 7 database available. So that's just the way it worked.

It might be nice to port that back to Drupal 7 so that you can run it before you even have the Drupal 8 site available. And maybe with all the spare time I have, that will happen. We'll see.

All right. Any more questions? All right. Thanks, everybody, for joining me in my office.

>> Thank you, April!