Hosting Drupal on AWS

Time

Friday, 10:20 am CDT - Friday, 11:20 am CDT

Location

Room 324

Description

Sometimes canned hosting solutions like Acquia and Pantheon aren't enough. Maybe you need a specific version of an external service (like a version of Solr that isn't so ancient!). Maybe your project has major seasonable scalability issues. Maybe your organization hosts all of its other applications on AWS and DevOps isn't comfortable with adding another external cloud host. There are lots of reasons why hosting on AWS is a good idea.

It can be hard to find good documentation on most of the services AWS provides. The landscape changes frequently and some of the concepts aren't very accessible. There aren't a lot of examples or walk-throughs out there, so we'll show you how Breakthrough manages our AWS Drupal instances.

We'll demonstrate our complete project pipeline from creating a fresh Drupal project; managing settings, content, and CMI for multiple environments; AWS EC2 and RDS setup; AWS features such as high available servers and managed services; through production deployment and scaling on demand.

Having a familiarity with composer-based workflow and basic DevOps knowledge will be helpful but we'll try to make it understandable and accessible for newcomers and helpful and enlightening for experienced attendees. Hopefully, you'll learn some new tips and approaches to get your AWS Drupal instance running smoothly.

Presenter Slides

Hosting Drupal 8 on AWS Presentation.pptx

Speakers

Jonathan (Jack) Franks

Senior software developer @ Breakthrough Technologies

Mike Wagner

Sr DevOps Engineer @ Breakthrough Technologies

Track

Back-End

DevOps

Transcript

today talking about hosting Drupal on

AWS so that's me I think you know

everybody knows me right okay okay so

I've been a software developer for a

very long time I've only been a Drupal

dev since about 2013 Mike and I work for

breakthrough technologies we were an

agency we specialized in Drupal and Tao

development for k-12 and professional

certification assessments I was a

Microsoft stack developer for a very

long time probably the first 20 years of

my career and what is you know kept me

in this is all of you is this community

being involved in things being able to

be important you can't really contribute

to that net unless you work for

Microsoft and that is not the case here

I I love you know building out this

community building out parts of it with

you guys and that this is why I do this

I'm primarily a back-end developer and I

mean I'll think something if I need to

but I don't take joy from writing CSS so

I like there there are people who do and

they can do it

I do everything on the command line and

I'm happy so mostly back in developer

yeah I got some certifications I do some

contributions and I used to be a

motorcycle rider

yeah this is Mike he also works for

breakthrough is currently a motorcycle

rider I was let you say just a word or

two about yourself here now

yeah these things yeah these are the

person that talks to computers when

people self I stammer through some of

this have been working with a variety of

UNIX and various flavors of the star Nix

for quite a while I start working in AWS

about six years ago maybe seven almost

and that's mostly what I do now and in

my career breakthrough

I do ride motorcycles now and very into

home automation I did it

am i I'm in the home automation my home

is only automated except for my blinds

which we were just talking about

all right and today we're going to be

talking about hosting Drupal on AWS

kind of why we do it that this is I'll

get to that a minute we were talking

about how we set up the environments a

brief overview of our dev workflow just

so that our deployment processes make a

little more sense how we do those

deployments and then what else we need

and how we set those things up what we

will not be covering is selling you on

AWS I don't care how you host it

whatever like this this is what we had

to do for some particular requirements

and a couple of projects and how we more

or less standardized on this for some of

our larger projects that need any sort

of scalability so this is not a AWS vs.

aqua AWS vs. Pantheon vs versus self

hosting none of that this is just how

and why we do this so yeah all the the

platform hosts are amazing for regular

websites most of what we're talking

about here not just websites a lot of

what we do is custom web apps so like

we're gonna talk a little bit about

varnish and things like that I don't

ever use martes because the only

unauthenticated page you'll ever see is

the login page everything is all you

know authenticated

so why did we have to go to AWS well we

knew that we would lose some of the the

great features that some of these

platforms have like deploying on Acquia

from environment to environment is so

easy it's a drag and drop maybe a couple

dress commands snip snap it's done

Pantheon is the same way ours are not

quite that easy but we've got this down

to a science and it's it approaches it

but again you're going to need a

terminal to do it we have a bunch of our

platforms use apps that aren't just

Drupal so again for a lot of our test

taking stuff we integrate with Tao which

is probably the premier of open source

test taking and test authoring engine

it's a PHP app that is very particular

in what version of PHP it wants what

version of Redis it wants what version

of MySQL it wants we can't just throw a

server at it it has to be set up in a

very specific way so with you know with

that in mind that we're not just running

a Drupal app here we needed a good

consistent manageable way to to deal

with all of our platforms so that this

this gave us that we wanted to make sure

that we could manage our own config and

infrastructure and and do that through

code have commit history and you know

manage that just like anything else any

of our other code so we've got

you know config an infrastructure in

code chef and terraform and we we have

needs every so often for custom stuff

that isn't available on some of the big

hosts for a while we were talking about

a no SQL document store solution and we

were debating between mago and caps base

and like so now Mongo has managed RDS

service with AWS you can just spin it up

at this amazing couch base on the other

hand needs like it doesn't have just a

nice composer installable PHP driver

there's a custom PHP extension you have

to turn on and there's some builds that

you have to do it's it's really a pain

and you know we can't just do that on be

empty on we also have a need for my

modern versions but you know the nice

way to say that is our preferred

versions of things like solar I'm sure

that a lot of us in here have stumbled

against these old bespoke versions of

solar that the platform's give us and

it's not ideal we have our own dev

workflow it's get flow related again

we'll get to that later but you know

there's no there's no pull requests and

any of the the remote get repos for any

of the

the platform hosts this is a big part of

our workflow we've got a team lead a def

lead that you know manages pull requests

and does merges based on the other team

members you know work and what we would

have to do is we would have to have our

own remote our own hosted private

repository and then maintain another

remote and push all those things up to

Aqua air Pantheon or whatever and you

know that's that's a pain but some of

one of the most important things is the

tuning our setup for heavy use like one

of my projects right now is a

professional certification board exams

practice exams so you know I mean that

this is one of our smaller test taking

platforms we might have to deal with

eight or nine hundred concurrent users

and you know launching tests and

thousands of tests a day and you know we

just we can't get that sort of you

spaced scalability you know on these

other platforms we also have a like a

State Board of Education you know a full

statewide standardized testing system

and so I haven't checked since I was in

the office so that was a couple of days

ago so even through mid-march our month

to date is so far 70,000 some tests so

like we can't get that sort of custom

scalability that we need in order to run

one of those questions oh you were so

nobody on two issues is that a problem

with the test software that's very heavy

on resources or it is it is Tao is not

that is good as Drupal is

resource-intensive it's it's not optimal

in any way to use of its databases

really for in some ways it's as it

accumulates data over time it gets

progressively worse

you have to adjuster that

and another part of that is for this we

only need that level of of hardware for

very particular times so you know

customizing your your tiers of usage

isn't just it's not nice and rule-based

like like we have there's no I get what

we'll get to that so we do a lot of cool

stuff with you know Amazon storage and

backup solutions we do some amazing

things with logs and alarms and log

metrics we'll get some of that too

that's that's really cool and we can

really control to a real fine level our

firewall and network security which is

something important to us because you

know we like to lock these things down

so we're gonna briefly talk about cost

control again this is not really the

focus of what we're talking about here

but it was a benefit for us anything

that we need for these couple of apps

these are all in that most expensive

tier on our big few platforms and you

know yeah that didn't it didn't give us

a lot of fine control and it's been for

Tao especially like you know we can't

just run that on you know one of these

platforms we needed to host

now in the centralized building this is

not something I deal with but apparently

it's very important I know that we do

occasionally build these things and then

when we're done with a project we're

handing it off to the client we also

hand off the Amazon account that goes

with it so they get to control their own

billing like I know that just a couple

weeks ago we were working on you know if

an Theon project and the the we've

finished and now the client wants to

take it over and getting that

transferred over to them was not just

that he is not as easy as I hoped it

would be and it's not it's super simple

Oh neat have you guys seen ever we have

process for that now smoothly

first towards backups we have our full

daily backups and we can do

point-in-time recoveries which was super

cool I don't know if you've ever started

to deploy and realized as your your TV

is failing that you forgot to back up

the database mm-hmm

so we can manage all of that and restore

it to just before the the deployment

went and and it's great so we've you

know we've had our practice data

recoveries and then an actual data

recovery which again totally my own

fault but we still needed it and and it

just it works it worked exactly the way

we wanted it to and we've got a lot of

security there so we do we do these

snapshots of the DV servers of the disks

and and our cloud storage so for the

most part we're using EFS but it's true

for s3 as well yeah and Tao the custom

backup solutions we need to do outside

of the Drupal realm like the tile backup

is it's really cranky about what it

wants to once its database its file

system and it's Redis backed up it's a

moment in time

so we had to create a custom solution

for that right so we have to we actually

have to you know disable Tao we have to

put Tao into maintenance mode before we

can do all these backups because you

can't restore anything in Tao that is

that was in use for Drupal that's not

such a big deal our our backups here are

real simple we didn't really gain

anything for our Drupal backups here

yeah

I'm just gonna ask are you gonna cover

how you setup your databases briefly to

use the RDS so yeah we will talk about

that yeah yeah one of the more exciting

things I think the log aggregation stuff

is really cool we can set up all kinds

of custom alerts and reports and alarms

and things for you know for anything

that gets recorded in a log which which

is real neat well we'll talk about that

we have examples of that later we grow

that all the time if someone comes up

and says hey I'd like to know when this

happens

we use a series of terraform scripts and

create those symmetrical arms and then

start issuing them to whoever asked for

that information

and the cool thing is since it's all

scripted it's antara form if we need

that for another project or another

platform yes these with stays in our

toolbox and it just propagates itself as

we spin up new projects so we've got you

know we've got distribution lists for

different types of alarms you know like

you know usage and 500 errors whatever

you know there are different groups of

people that get all those and these are

just some of the things we check you

know we check our CPU usage we check for

errors and warnings although some

projects we turn the warnings off

because there's a lot of them but the

errors we always want to know about all

of our disk and memory utilization you

know different service statuses and so

for the network security you're able to

really tweak this you can fine-tune this

kind of however you want you want to

talk a little bit about a security group

in a way the security roots of AWS were

kind of like their initial idea of

firewall providing functionality of what

is allowed between the internet and a

load balancer or load balancer and

server one server to another server is

able live control there you can define

different rules for in and out protocol

supports and then more recently they

added a service called Web Application

Firewall

which tax before the traffic hits your

your system so we had a situation where

we were being we had a denial service

attack on this testing platform and so

we wanted to come up with a way to

prevent that before it actually hit our

load balancer to start impacting our

servers in our therefore cost because

the servers have to accommodate it so

many more servers and longs to withstand

that attack the Web Application Firewall

to be able to stop it before it got into

the servers so on AWS takes the brunt of

it and

are actually built

that we've had you know a couple

challenges here and there with with

tightening things down like this the AWS

SDK is amazing but it's not super well

documented especially for PHP so we have

things like you know two internal

applications and one is hitting web

services on another and we can get that

all locked down so we it only accepts

traffic from you know or other whatever

we set up our other AWS account so you

know like we have to generate a

certificate sign their requests generate

the requests and then make it and we can

do all that through the SDK it's again

it's it's wonderful but you gotta figure

out a little bit how to do it we're

gonna have a series of blog posts that

are gonna go into that because they're

just there aren't that really yeah

another reason we had we ended up going

with AWS in terms of meeting those

Google and tau

they talked one another on the back end

having them running in the same place

allows us to restrict their

communication

to a private network that only they use

so it's much less late but it also

doesn't traverse the internet it's all

encased in that secure networks

our configuration management we talked a

little bit about this it's really

wonderful to be able to store all of

your config as code in a repository and

you know you get commit history anytime

you you update or change an environment

setting you always have something to

roll back to you always have a map of

what it is that you've done so we use

both terraform and chef for this we have

a chef server that that manages

environmental specific stuff so installs

config settings all that kind of stuff

we start out with you know a basic

template and then add stuff in as we

need it this gives us you know our base

image Apache you know people have asked

us what about nginx we can do that too

we use Apache and again this is mostly

because of some of the other apps that

we're supporting really or finicky about

where they run so to keep them all nice

and level and all the same we're still

on Apache and then we load up cookbooks

and modules from our own repository turn

on the features that we want exclude the

ones that we don't and and then spin it

up and the cool thing about this is that

if devs do have something that they need

again this is going to you know

something like we need to use Couchbase

for this for this project we you know

we've got some some config code that we

can do or if we don't have it a dev or a

dev ops person can write it that gets

submitted we can you know make a pull

request then that ops mostly Mike

and review and tweak and then it gets

committed and then for the next build or

you know whatever we want to do it it'll

spin up yeah

and it can do some things that's complex

or simple it's you need or like that's

one example of it could be either you

just need to adjust an apache directive

so you know jack and tweak that in the

chef cookbook and i can then pull that

change or merge it and put it on the

ship's server and then like the six

drupal servers well then pull it to

themselves and recover themselves and

restart apache to imply the change we

also use this for some custom drupal

settings now we use config split to you

know to better manage these but you know

there are things like you know any

custom config 'that has secrets in it

you know our smtp convicting stuff like

that that we don't want just out in a

git repo and our git repos are private

for the most part but you know we do do

open source stuff like you know any we

had done the teacher system the portal

for the park if you remember that

standard test thing week that failed a

few years ago so you know that's an open

source repository for the platform

itself but we have specific things you

know we need to set up and and we manage

those extra configs thru we have get lab

for that we don't have a

private get repo or github repo for it

and then chef pulls in the files that we

have configured for that project sets

them up where they're supposed to be

does it import if it needs to and and

then you know again it just it sets

itself up now yeah we chose to run a git

lap server in the house they have

Community Edition that's free and we

keep things that are private that we

don't want exposed anywhere even a

possibility of such in that git

repository as opposed to get up you know

even though those repositories we using

get our private you know they're still

out there so this keeps it on our

network it also makes it easier to

handoff let a client fork you know their

their codebase if they want to and then

we maintain our own set of secrets so

they can maintain their own set of

secrets however they everything

and this gives us a real good handle on

drift control nothing really gets out of

hand because anytime something looks

like it gets out of hand you can pull

down an unhealthy server spin and do one

up and it's it's transparent yeah and

chefwell new periodically of all the

things in his purview of Jack happens to

change something temporarily to tests of

had to change our PHP change shuffle

we've heard

and I'm sure he is this happened not

long ago yeah it's I mean I like the

effect some people don't like that

them without their fixed plate

inspecting at the head

so to set up Tripoli on AWS I know we've

been talking a lot about town but again

it's just sort of the setting of

expectations is that this was a big

driver for us we just wanted everything

to be the same but we're really going to

focus on Drupal now so what do we need

for Drupal well we need an ec2 server

ec2 is the webserver part it's the app

server so you know it's the the Linux

box it's got the web server on it is

what runs PHP that sort of thing we run

our ec2 on an Amazon Linux one am i so

we've got a couple of challenges there

you see in our dev workflow we do a lot

of Orlando and docker stuff and the only

place to get Amazon Linux one is on

Amazon so there's no docker images for

it it is locked down it is proprietary

we're looking at moving to the newer

Linux two which does have docker image

support but we actually we don't have

any production apps on this yet so we

haven't had a lot of issues with not

having our dev OS and our ec2 OS be

exactly the same yeah we've got that

stuff pretty well sorted probably again

this is something we will we will be

changing to what you're using the sense

basically so we've got our web server

now we need a shared file system now

again we'll talk about this a bit later

for Amazon you have two main choices

which is s3 and EFS s3 is powerful and

cool but boy is it a pain if you're you

need a module to hit it on Drupal and if

something gets slightly out of

configuration with the module versus the

actual s3 service it'll take down your

whole site ik Drupal can't serve up any

files without it and then getting in

there to disable it like that's a pain

it got to be painful enough for the

little game that we got from using it

that we switched over to Amazon's EFS

which is really just a normal file

system you mounted up like a file system

the whole the whole server doesn't even

know the difference yes that's wonderful

you need RDS which is where we have all

of our database set up again we'll get

into that in a little bit we use the AWS

certificate manager which again we'll go

into more detail on and it's wonderful

to have everything nicely consolidated

and

we can do things like manage free HTTPS

certificates on you know not in

permanent boxes which is wonderful and

then we need to set up whatever else we

need you know solar Harnish whatever

SMTP

that kind of stuff so we use a soloist

Bluegreen style deployment for for our

environments and so a soloist is

basically one master image that you

build up you get it all set up and then

it's what's used to clone into the other

images that we use to deploy out to the

you know to the various environments so

it's a it's a Bluegreen deployment

meaning we build up the master image the

new image for whatever we're trying to

deploy we convert that into an ami from

the Amazons actual image and then when

we start the deployment it does what's

called the Bluegreen deployment so it

leaves the the old servers up for as

long as it can sets up the new ones in

parallel and then once new servers are

green you know they're they're healthy

they're good they're they're taking

traffic it will start pulling down to

old servers until all of the servers are

green

goes that

yeah I do actually so and part of this

this set up and you know our custom you

know chef secrets management is that

that we can pull secrets management out

of the repository for its master view we

have standardized on one password in our

organization so every cats where it gets

its own or every project gets its own

vault and then you know API keys the

database passwords you know a user one

password that kind of stuff those things

go in there and are secured so that

certain people can or can't get to them

and it's all again it's all secure yeah

it used to be that it's been up with new

project we had

instead of Drupal in that with usually

want to counter for example and I would

set the password initially and then we

had various methods over the years to

securely trance translate or get it to

needed it now we just have in one

password a project specific fall we had

the devs on that team or two involved

when they know that they can go to that

vault in one password to get all the

Secret Service credentials that they

might need it's great we don't need to

worry about sending or emailing or

texting or post-it notes with passwords

on them everything is done in a nice

controlled manner so you know nothing

gets lost or propagated the way

so ec2 again this is our webserver for

Drupal 8 we usually have two medium

sized servers which will dial up or down

depending on the load the project is

going to see they're all high

availability which means we always have

at least two servers running in

different fail ability zones which in

Amazon or AWS speak is kind of data

centers so they have multiple regions to

the country and around the world but in

each reason region they have

availability zones which roughly

correlated to data centers so they're

physically separated from one another so

one datacenter as an issue network issue

or fire or whatever the other data

center continues to serve our

application

right so if that Northern Virginia

volcano erupts we can still reserved up

tests out of the southern Virginia at

the Unity Center and then we can do a

lot of host based routing trickery to

make sure the traffic gets to the right

place we haven't had to really take

advantage of some of the power we have

here but we could and it could do some

really cool stuff but in the way we have

our RFS our file stores shared we can

hop back and forth between servers you

know and the load balancer all the time

and everything's fine we just manage

everything yeah one one thing that alb

the application load balancer in an AWS

has let us do that before this was

around they just had the classic TLB and

we would run one of those for say an EOP

bouncer for Groupon one for Tao now we

can collapse those into a single of

bouncer and requests with a header that

the person making the request wants to

talk to Drew boy herships of traffic to

that set of servers or vice versa so

when I have to run one little bouncer

now instead of two or three year weapons

so all of our except for the M all of

our lack stuff goes on to the EC - this

is where we have our operating system

Apache PHP PHP FBM any of our extensions

or middleware that we need this all goes

on the ec2 bucks and we use again we use

terraform to spin that up and then chef

to actually manage the config of it so

if we need custom time outs custom

memory limits all of that is managed on

the ec2 box through these these chef

scripts the shared file system again has

three or EFS if you need s3 go for s3

but it's it has challenges if you just

need a file system to go across your

your web farm go EFS it is absolutely

transparent and very easy to setup and

if you have more than one Drupal server

you do have to have something like this

so that they can access reading reading

and writing to common places so if

someone uploads a file

one sir it has and the request is made

and it ends up on another server it has

to be able to get that information

somehow

PFS seems to be the most smooth way to

do that at AWS yeah and we serve on a

public private and temp all through the

so again if your session hops through

one node to another they all pull up the

same thing for RDS this is where we set

up the database and this is also high

availability so it's a minimum of two

instances in different geographical

areas we've got a nightly backup

schedule an additional point in time

recovery and we get to pick the database

engine that we're using for any

particular RDS instance so a lot of

viruses

you know MySQL in the i-5 so it's a five

six or whatever it is we can do Maria we

can do Aurora

again there's RDS stuff now MongoDB

hasn't managed RDS instance you can set

up you know anything database related in

one of these and the cool thing is that

it's just managed and and a lot of it

really just works it used to be you know

especially going back through my

background you know for

we had a Microsoft sequel server and we

had Oracle and the people who maintained

each of those didn't do them the same

way and they often wouldn't talk which

put us the devs right in the middle of

that I need a backup of this and I need

a backup of that and they weren't in

sync you know setting this stuff up

setting failover up on Oracle I mean I

suppose it's easier now but will teams

with people dedicated to yeah

the certificate manager is pretty cool

this gives us free SSL certs and again

now we can we can spin up a new queue a

box let's say we want to test some new

you know load balancing rules on QA you

can spin up a new instance the automatic

mediately automatically get a new

circuit for it and not have to worry

about QA going through and not having

the same you know HTTP behavior on QA

versus prod whatever the certificates

are all basically managed the same way

so these are Auto approved and the

renewals all happen based on the DNS

talked about that yeah so mmm you ask

you have the option of

but you request SSL Certificates they're

free as long as you use them at AWS you

can't really get out to use them

somewhere else right but in order to

prove them that a domain owner has just

read those so you can either do it the

traditional way to be an email where you

quit yes I approve this certificate its

creation at use I have an important you

that or you can do with DNF DNS where

the certificate manager will give you a

list of injuries to create in your DNS

system and their certificate system will

check your DNS server look up these

requests and if those records exist then

it will assume that you the owner of the

DNS domain has put those in there and it

creates a certificate

end of that certificates life if those

renewable credentials are in DNS it will

automatically generate new certificates

and install them in the load balancer

for you so there's no more

surprises when certificates expire and

requests are coming in for broken

websites yeah it's nice when you

suddenly realize that you haven't had to

deal with those in a long time yet is it

using let's encrypt we had it seems I

don't know is it using let's encrypt

behind this the scenes try to get it for

the mic I think I don't think so I

remember when they started happening I

think they started providing the service

before let's encrypt actually came out

of whatever you know

people started using it and I remember

you know AWS became its own certificate

authority so that's yeah no they don't

no no I remember that

and this last point is is again

important for testing is that these are

externally signed and authenticated

certificates they're they're not self

signed anymore that's something that

we've had to deal with a lot one of the

things we do for some of our exam

projects we have a an in-house built

secure exam browser so we like to be

able to replicate this full user

experience of setting up the secure

example hauser getting into the test

rostering using it and not having to

acknowledge self-signed cert every time

again for QA and for showing this to the

business stakeholders means we can we

can duplicate this workflow in every one

of the environments we don't have to

have well this is QA you won't have to

do this in Pride and I have to do any of

it it will do they'll be exactly the

same

so DNS management you know this is more

this is more Mike's area than mine but

before we used external registry

services we used network solutions

GoDaddy stuff like that but we do many

of them I don't know why you know that

so we wanted to manage DNS for

particular project you may have to go to

several

but now everything is all nicely

consolidated on AWS using rap 53 that

does the registry providing and the

actual DNS providing and we get to

manage all of this through terraform so

again we get to manage all of our DNS

stuff and our registry management

through trackable code so there's no

more of this figuring out well on you

know well we have this on GoDaddy we

need to change something so here's the

flow through that and of course that

whatever we document isn't necessarily

right the next time

but the other provider you have to do it

this way and same thing so you know our

documentation got out of sync real easy

and in every different place we did this

we had to maintain differently yeah I

mean having the history is great too if

I were someone else

and this terraform template that

provides the establishes needs the

message

brothers domain we can see when that

happened

these are tough to get we can see if we

did it

and we've heard it maybe I also have

found this to be really useful when

I see all the injuries might see names

for a particular domain or a project I

just pull up these files we're gonna

quick rap on them where I had to do

something like that and go

I'll be strolling including

it makes a lot more convenient

is that checked in her project then we

actually way we do it we have a master

County WS and we run our Oliver route 53

service through that so we might have a

different account for her project some

of them

some of them are in the project

themselves if we expect that everything

with it to the client

that but a lot of it we do in one place

yeah I like all of our internal DNS or

non project things we do have like

sure so you're using 53 if for political

reasons and an organization or for

whatever reason you want to use the DNS

management features of say network

solutions or whatever whoever your

registrar is is there a problem in doing

that are there any significant

disadvantages of P Tony's provision no

in fact there's probably some there's

definitely some advantages like

propagation is a lot faster here if your

DNS change is made server and AWS your

load balancer native us I'm going to get

it in

maybe haunted somewhere else I mean it's

stuff like that is a consideration

but no we used to run them everywhere

and we didn't really have any issues and

turned to the posting the application

was more like a

I'm always trying to remember where this

domain was mansion

now terraform does it's not strictly

it'll work on different platforms either

different cloud providers under the name

of us so the same terraform code

these plug install providers terrible

top and different services and I think

it even has providers for safer GoDaddy

or network solutions so the same coach

is a point AWS

creating this record or wanted to go

Danny

and I don't have two slides up on the

presentation page yet so we will so you

can see everything that you know we have

up here and look at the code samples

that kind of stuff

so we have additional services we use

like ElastiCache Revis you know varnish

solar we use this SES simple email

service to run our SMTP services and

then there's cloud watch which does our

our logs and then the SNS for delivering

alarms on those logs and then and then

guard duty which is really cool and a

little bit annoying sometimes it it it

monitors all of the traffic and that's

both for the actual you know app traffic

as well as configuration and management

traffic so anytime you know if I'm here

at the at the camp and I need to sign in

to the AWS dashboard to you know fix

something check a log make a change

whatever you know Mike and my whole team

knows that I access this somewhere else

but then after I've been here a couple

times and I keep doing it decides that

that's okay and and then doesn't let

anybody know anymore

or you think oh it's going to say that

we'll get an email that says the

principal jack login is accessing or

making suspicious requests from

or something like that

it also it's a it's also firing several

times but if we haven't added it to new

for an older Drupal project the first

time you request detector updates no say

hey the server's talking to a new target

should check it out

yeah the funny thing is I always think

something is a mess when I'm you know

out at the track and there's a 500 error

on that same time so to set up our

environments so again we've talked a lot

about terraform all of our

infrastructure is code we use this to

set up our you know our clouds our

subnets the security groups all the load

balancers the web servers the database

servers we use this to spin up

everything and we have a commit for

every change so everything is nice and

reproducible it's all consistent and if

we need to you know troubleshoot

something we have other projects as

baselines you know there's no drift so

everything that's done on one of the

servers is set up through these files so

nothing should come as a surprise if

something isn't working on some

particular server it's because we built

it wrong it's not because something got

a little bit out of our control

this this graphic is just an example of

this our

repository is one projects

section or we control these PWS services

and

Parise attention

General Services in there like s3

we have

database service configuration

these are the application servers the

general section

change the server pieces class

more power or less powerful system or

whatever we can see the history of what

we've changed that server to that

used to take us three or four or five or

days or a couple of weeks to set up some

and then we also counseled with this

system we just run a series of terraform

commands and it takes like a totally raw

pls account reads everything

a tried-and-true way reproducible way so

that's helped us a lot reduce time and

errors and on reducing errors like we

were talking about making a config

change to some environment we get to

reduce that oh right yeah I made that

change and we're doing a new deploy and

now the change is gone because I just I

forgot to put it in to terraform

whatever there's none of that now we

make the change right in terraform spin

up the new server we can test it right

there we don't have to worry about

making a change remembering to

persistent and then remembering to bring

that over to our new build you know we

don't have to worry about deploying a

build on top of an out-of-date server it

just spins up a new image every time

does that a hand up then I see

so here's what some terraform looks like

this spins up our drupal soloist so

again that's the the master image and

creates an alarm for cpu utilization so

we can see in here there's you know

there's there's tokens in here so it

sits nicely parameter driven and we have

custom modules that we can reuse from

project to project so we don't have to

reinvent a you know a CPU utilization

metric alarm every time we just tokenize

whatever we're recording on and and we

add this in to whatever our terraform is

do you want to walk through this real

quick guys well yeah do you want a line

but you do you want to know what this

does they have nobody cares we'll just

have it it's basically just spending up

one server that acts as our master

servers

puts it at a particular something that

as this server class or like a powerful

server is gives it a security group so

it's like a network very here firewall

and then this is one example of many

this system would have this is CPU

utilization

subjects for this period of seconds and

during that period the average CPU

utilization is over Haiti then it'll

send an alarm to this SMS topic which

generates a mental to the project team

all right so setting up the actual ec2

server we set up a patchy and PHP and

PHP fbm with with chef so again our sort

of our determination about you know like

terraform versus chef terraform is our

infrastructure management chef is our

config management so to set PHP

variables or you know the PHP

configuration that goes into our chef

cookbooks and each project has its own

set some of them are exactly the same

because they're based on our sort of

core template for it but we customized

it per project based on where we and

we'd already talked about the secrets

management all of the secrets we need to

track they all go into one password so

that devs and ops and whoever needs it

can can pull API keys passwords whatever

out of a vault and we never have to send

them unsecured

so here's what some of the chef's looks

like the one on the left here is setting

up PHP the one on the right is setting

some config for it yeah so when a new

server spins up based on that so Louis

am I the first thing it does is reach

out to the chef server and register

itself in roll itself and as a chef

client and it'll pull it down in a

series of cookbooks that start applying

them some of these things are baked into

the ami so it checks

do I need to install HP or @pm if it's

already there it doesn't do it does

happen to do it then it will start

restart apache and

over here these are just some of the

things

here's the settings that look will be a

keeper Drupal

these files are in

project

the pilots into the book and where it

needs to go where it needs to go on the

server itself and it creates it

propositions were set for pigs

we serve actually the

so it's kind of like these servers come

instantiate I had nothing and then they

pulled all the software they need and a

configuration for Drupal that they need

automatically and fun booting they're

running the application

so to do our actual server updates you

know any of our you know gum updates you

know whatever we we do those manually on

some projects for others we use the AWS

systems manager and you know when we

have a security update and you know some

sort of server component update the next

time we deploy those will just get you

know pushed out whether there's a you

know an actual happen or not it's part

of the image and it will you know

come on line that way for Drupal updates

this is part of our dev workflow we

don't do this automatically it's got to

go through QA and for some of our you

know our real high availability systems

where if there's a five-minute outage we

still have to schedule that you know all

of this is done through composer you

know our composer block gets updated

we've pushed that out to our composer

install and our TVs and whatever this we

need to do and then you know that's done

again through a normal deployment

process

for RTS guess what we use terraform spin

this up we create our users and grants

and commissions and things manually with

some scripts there's just not a lot of

there's not a lot of bang for your buck

spending a lot of time automating that

because it's so repeatable it's so easy

again all of these secrets get stored in

the project's vault in one password and

here's our little terraform script on

here you see it looks much like the

other one sets up some you know some

some settings defines when our engines

are like yeah I was similar like you

said it's it's basically saying web

version well the database engine my

sequel

their class or the size of the server

that's running my Sikh world

he's on so yeah the EJ aspect going

that's true you can spin up from an

initial snapshot if you

at 1:00

new project or rebuilding a project

at rest or in transit

the security here just gets placed into

parameter groups are the

like my secret

the backup window you define how many

days back

like me you survived

and the maintenance window where desert

is able to attach itself like minor

releases and stuff like that late night

on sand

see we

all right EFS again we already mostly

talked about once again we use terraform

to set this up and then the each of the

soloists knows where to mount this and

then that gets pushed out to the to the

host during deployment for cron we use

an external quark tokenized cron request

we have an AWS lambda function that does

this so it's all internal it doesn't

need to go out into the public space so

it's all it's all secure it's all nice

and private

our development workflow I'm gonna

breeze through this because we're almost

at a time it's a slight a slightly light

get flow workflow this is from

drupal.org here really it's you know

most of our docker based you know dev

environments we have our own River our

own private code repositories we use a

feature branch structure for hot fixes

and security fixes that are going off of

other releases those are branched off

the actual release tags instead of from

masters so that we can maintain each of

those separately technically it approves

the pull requests we do a lot of very

frequent QA deployments and then we test

those once that's all done we pushed to

our staging environment which is really

like a one version ago well a void

version of had rather of prod this is

where we validate all of our deployment

instructions and all that kind of stuff

so project setup it's we just do a

create project all of its managed

through composer we initialize a lando

container customize some stuff put in

what we need you know like in this case

here's a bunch of tooling to setup

webdriver on here so we can do

functional testing and this way every

dev is using the same image I haven't

really heard a an environment based

works on my machine in a little while

now and that's really refreshing now for

the actual deployment it's super easy

not that most Drupal deployments should

be very difficult but it really is just

this easy it's so Louis base that we run

this one time and then push everything

out it's just pulling the remote code

down checking out whatever release we're

doing updating all of our composer

dependencies doing a you know a database

update which you don't always need to do

doing an edit a schema update which you

don't always need to do a config import

which we almost always need to do and

then a catch me sign

okay now we're into the meat of the

deployment so I think this speaks for

itself we don't have to spend any time

on this right

we've got a few cloud environments here

so we can see we have a pride in a stage

we have that the top guy with the arrow

that's our soloist so you know come

deployment day but I can spin this up

for me I'll go in do the deploy you know

the pole deploy and get it all set up

and then as soon as I'm done the image

Isis that and that's on the next slide

so okay well this shows then we're the

soloist and the actual ec2 boxes set and

how the load balancer sits between them

we've got an availability zone on the

top and on the bottom the availability

zone different data centers where these

Google services okay we have a question

are your slides available yeah I haven't

I haven't attached these in you know my

talk yesterday I didn't follow my own

advice of half these thing you know

don't make changes the day of and we did

and so will happen these will have these

up today so here's how the actual

deployment works

there we go

so we spin out soloist deploy the code

if there's any config changes that need

to be made we update those in chef

we snapshot it you know make an image of

it and then the terraform scripts need

to be updated with the ID of that new

image we don't I mean this can be a

little more automated but it's not like

it adds a whole lot of time again

there's not a lot of bang for your buck

in there because most of our things are

not a continuous deployment kind of a

thing there are real high stakes any

sort of service change or especially an

outage these have to be scheduled in

advance the running at terraform apply

kicks off all this stuff and sets itself

up so it does this Bluegreen deployment

it starts bringing up the new servers

with all of our new code all of our new

config and then as each one comes up it

will start tearing down the blue ones so

that eventually it's all the new servers

that that are

[Music]

so most of the time this actually makes

our downtime due to deployments zero but

again some of these are really high

stakes so even if it is 60 seconds that

is a big deal but most of the time we

can set this up and it's totally

transparent nobody even knows and it's

all automated so it reduces a lot of our

error like I remember the last one of

these that we had a problem with was

even probably over a year ago and it was

we had accidentally left a blue server

up in wait we the team accidentally left

a blue server up in the load balancer so

that's for some reason every fourth

request got weird failures and it's

because as it was bouncing through one

of the servers was still running old

code and you know something that was in

the APC I couldn't find the class that

was in the APC because it still an old

code I so that doesn't happen anymore it

tears itself down automatically and you

know we could use Jenkins to automate

this again we just we don't have a lot

of call right now for a CD kind of

solution this is something that you know

Mike and I have both been trying to push

a little bit because even if we need you

know permission to do this we still want

to not have to do anything more than

just tell it to go but we haven't we

haven't finalized this yet

oh boy we still have a long way to go at

only two minutes all right we're gonna

have to do father here so we don't use

auto-scaling for everything but it's

really cool when we do and for the

smaller projects the auto-scaling

that gives it some self-healing so if a

server starts to misbehave it will

automatically turn itself down and and

build up a new one and again it uses the

same soloist images that we use for

regular based on the AWS code or your

own code

well AWS on is stealing their feature

yeah so it's their software that yes

watches your server if something's you

have stored up then it says you're dead

you're gone and we have you new really

sophisticated complaint stuff with their

auto scaling buttons

spaces we use are in real simple way we

say we want to servers running at all

times of one doesn't respond in a

certain amount of time

okay so again different auto-scaling

needs 7000 concurrent users is different

than the 600 concurrent users but our

auto-scaling rules are pretty cool we

have multiple tiers of these so if CPU

utilization starts to get too high it'll

add a couple instances when it passes

that threshold and goes a little bit

more it'll add four instances at the

time until it gets to a threshold of 16

servers and then it'll go to this busy

step which adds four at a time and then

eight at a time until it maxes out at 40

so we can customize all of this stuff I

we've got I think lunch is next do we

mind if we go an extra couple of minutes

we can stop on time if we want okay yeah

we're almost done so again I am so

excited about all the log and alert

stuff this is this is so cool we can we

can set we can store all of these logs

up in cloud watch you know in the cloud

because we're bringing up and tearing

down servers all the time if we

persisted these logs locally every time

a server goes away those logs disappear

we don't have to worry about that now

everything goes into a nice central

repository and we can keep it for as

long as we want which right now is

forever because we haven't run out of

space on logs so we can log all kinds of

stuff load balancer requests you know

any of these alarms that we set up if we

have we have actual you know requests

going into the log so we can do alerts

based on

the actual details in a request so

anytime some URL gets hit for instance

or some URL gets hit with some parameter

from some domain we can send out an

alert based on man so PHP fatal shows up

we send on L arm that doesn't you know

that never happens and then you know

some of these go to well I think they

all go to DevOps most of them come to

dev and but again we can set those

custom and we can we can set who gets

notified for what particular type of

alarm I think we only have two of these

left so here's a sample of some of the

Diagnostics we've got blue for tau and

green for Drupal although we should have

done that backwards blue for triple so

we can see and again this demonstrates

part of our problem you know this is the

same user load the you know they're

doing operations in both systems we can

see how much more resource intensive tau

is being able to you know standardize

this and handle both systems more or

less the same and it's just saved us a

ton of time and money so we set all of

these Diagnostics up out of our cloud

watch locks and we can get real-time

information on any of these

now this is really cool this is the the

new cloud watch insights so you can have

these report queries based on your log

entries so this is for a professional

certification practice exam system so if

students buy you know the textbook for

whatever the exam is it will

automatically go and send them an access

code to activate the you know the the

tests for that book in the portal system

so here we can see you know how many

products get purchased we can break

these down by hour by day we can group

lease however we want we track here how

many tests get launched a new and how

many resume test launches there are so

this tracks the difference between you

know who's starting up tests for the

first time versus who has paused them

and starts going into them again and

again all of this is done right through

god watch we have an example of that

query so we tell it what you know what

log server we want to to come from we

make sure to ignore our own internal

testing because you know come release

day all of a sudden we have two or three

IP addresses which will have forty tests

in one hour and that's not real so we

want to make sure that stuff gets

filtered out and we can do these filters

and groupings if the bride right on here

any it's not you know answer your sequel

but it's you know not hard to pick up

and this this is the result of of that

query and I think that finishes us up so

thank you any any question does

terraform invoke Shepard as part of the

image builder EWS

yeah they don't talk to one another we

use tariffs on most of these or

infrastructure and pretend when the

structures becomes like that it

registers itself which they have

disclosed all this figure we use an S

with everything like I used to shop and

like I need a cemetery I was like all

right what the hell likes me too

that's always a problem that's why we

get stuck on some things like you like

this that you based here this is an AWS

on load that towel was putting on their

normal triple install where would you

put that decision to the point on a fuse

per day or per month or titanius users I

mean that where would you go with that

so the question is what's our metric for

moving off of like a host I got a

platform host onto an AWS host it's

actually it's it's less CPU utilization

or concurrent users it's really page

throughput so once it starts getting to

the point where it's slowing down and we

can't scale up to catch it up we know

that we're you know it's it's we need to

move it

so I mean sometimes it's concurrent user

base sometimes it's utilization based

you know every every one of these apps

that we do it was a little bit different

they take traffic a little bit

differently again for just a regular

website that's harder to tell because

you've got my partner sitting in front

of it or whatever

right you know again we'd like I just

like don't have anonymous traffic I

don't have to worry about any of that

stuff so you know and every one of those

metrics is gonna be a little bit

different again we look for latency and

and site wide performance on what we can

skim we were considering AWS for our

site because it's give a lot of users a

particular time of the day or the week

or whatever right and so what I'm what

I've seen looking at me in terms of your

deployment we have you know contributors

working on the site all day long is this

something where they have to basically

publish at one time during the day and

then we deploy the data or can you

actually work live money to us sure you

can work live like I said a lot of time

our our deploys are transparent so you

know it you switch over from servers say

the thing I guess it hasn't happened in

a long time but the couple of systems

I'm working on right now are not super

content heavy but you know even having

like your form ID is expired when you

switch over servers during a deployment

even that doesn't doesn't happen a whole

lot so yeah for the most part we can do

our deploys while users are in the

system but so again I know I've said

this a few times but some of these

systems like those testing systems can

be very high stakes right so we can't

risk having someone's session be

interrupted right so those have to be

scheduled in advance and downtime has to

be scheduled and whatever so you know

for a content contributor like that like

you know for just content you know

that's just normal system usage for most

deployments it shouldn't even affect

them so you would be logged in one

target server and then that would then

ripple through the system and update all

the other servers when they you know

save the file or save the article or

something yeah well it saves it out to

RDS and there's no it's a shared

database everything hits that you know

shared it same thing with the file

system if they are working on a note and

they have to upload a file that goes

into shared file storage and then

everywhere that hits it it's the same

flat story yeah I think you're talking

about that content escapes like

someone's working within people right

update something yeah that gets you know

all the servers are talking to the same

database in the same file system so of

anything like that here and they all get

it in our varnish there's the lag time

between content being live unless we

refresh the cache so it's like three to

four hours like what would that be like

in native you wants to be the same thing

or there's a way to do it but I think

said I don't do a lot of varnish so I

can't just tell you there's a there

there's a way to trigger it and I don't

know what it is okay

first module

was

go get other questions I understanding

right are you deploying your Drupal

updates by going on to the servers and

AWS doing I'm get to fetch not exactly

so the question is are we doing our

deployment by logging into AWS servers

and doing a git fetch we spin up this

soloist image which is again that master

image and we do that one time on there

get the soloist up-to-date and then that

gets automatically propagated that image

gets pushed out to you know through the

load balancer and sets up so we don't

have to do this on multiple servers we

do it one time and it just it just puts

it so much so we take an image of that

so listed as the new quote on it update

that parameter and in terrified and then

issue that terrible reply command it'll

go out the AWS and enumerate everything

there and say oh this server is not

running office correct am I

and then it will say would you like you

have to

you know say it wants to spin up two new

servers based on this am i that we just

updated the code up and has it does that

when those two new servers come into

service and I ran through requests

correctly

then it'll take the old servers out of

service will do one more question that's

on you

when you are comparing the EFS versus s3

you're talking just strictly for user

content not for PHP script which

wouldn't run that's me yeah right right

this is again it's a really private

public and temp really is what it gets

us for okay Wow you said you need a 3-1

with the meat for ice tweeting what do

you package it depends on it depends on

the content so the question is why as

three versus yeah that's like what so if

we're doing things like video streaming

you know large files that you know that

are just requested and how three is a

lot better at serving some of that stuff

up it really it depends on the usage of

the particular content but even then the

penalty like that there's it's not that

much faster than the FS in any of these

cases I just don't think it's worth a

headache you know cheaper oh yeah s3 is

cheaper there was a time when the

invested exists so we did have to use s3

that was like

Multi Drupal server shared storage can

use secure s3 with the AWS know or you

can if you put a like a CDN and cloud

watch Club front distribution in front

you can use the application so I just

used avada infection policies yeah I

mean at a flow at that level we just let

the Drupal servers talk to a particular

place in s3 for their shared storage and

the other apps can get in there

alright alright thanks for letting us go

over thank you for coming

MidCamp 2019

Hosting Drupal on AWS

Description

Jonathan (Jack) Franks

Mike Wagner

Thank You to our Core Sponsors