today talking about hosting Drupal on
AWS so that's me I think you know
everybody knows me right okay okay so
I've been a software developer for a
very long time I've only been a Drupal
dev since about 2013 Mike and I work for
breakthrough technologies we were an
agency we specialized in Drupal and Tao
development for k-12 and professional
certification assessments I was a
Microsoft stack developer for a very
long time probably the first 20 years of
my career and what is you know kept me
in this is all of you is this community
being involved in things being able to
be important you can't really contribute
to that net unless you work for
Microsoft and that is not the case here
I I love you know building out this
community building out parts of it with
you guys and that this is why I do this
I'm primarily a back-end developer and I
mean I'll think something if I need to
but I don't take joy from writing CSS so
I like there there are people who do and
they can do it
I do everything on the command line and
I'm happy so mostly back in developer
yeah I got some certifications I do some
contributions and I used to be a
motorcycle rider
yeah this is Mike he also works for
breakthrough is currently a motorcycle
rider I was let you say just a word or
two about yourself here now
yeah these things yeah these are the
person that talks to computers when
people self I stammer through some of
this have been working with a variety of
UNIX and various flavors of the star Nix
for quite a while I start working in AWS
about six years ago maybe seven almost
and that's mostly what I do now and in
my career breakthrough
I do ride motorcycles now and very into
home automation I did it
am i I'm in the home automation my home
is only automated except for my blinds
which we were just talking about
all right and today we're going to be
talking about hosting Drupal on AWS
kind of why we do it that this is I'll
get to that a minute we were talking
about how we set up the environments a
brief overview of our dev workflow just
so that our deployment processes make a
little more sense how we do those
deployments and then what else we need
and how we set those things up what we
will not be covering is selling you on
AWS I don't care how you host it
whatever like this this is what we had
to do for some particular requirements
and a couple of projects and how we more
or less standardized on this for some of
our larger projects that need any sort
of scalability so this is not a AWS vs.
aqua AWS vs. Pantheon vs versus self
hosting none of that this is just how
and why we do this so yeah all the the
platform hosts are amazing for regular
websites most of what we're talking
about here not just websites a lot of
what we do is custom web apps so like
we're gonna talk a little bit about
varnish and things like that I don't
ever use martes because the only
unauthenticated page you'll ever see is
the login page everything is all you
know authenticated
so why did we have to go to AWS well we
knew that we would lose some of the the
great features that some of these
platforms have like deploying on Acquia
from environment to environment is so
easy it's a drag and drop maybe a couple
dress commands snip snap it's done
Pantheon is the same way ours are not
quite that easy but we've got this down
to a science and it's it approaches it
but again you're going to need a
terminal to do it we have a bunch of our
platforms use apps that aren't just
Drupal so again for a lot of our test
taking stuff we integrate with Tao which
is probably the premier of open source
test taking and test authoring engine
it's a PHP app that is very particular
in what version of PHP it wants what
version of Redis it wants what version
of MySQL it wants we can't just throw a
server at it it has to be set up in a
very specific way so with you know with
that in mind that we're not just running
a Drupal app here we needed a good
consistent manageable way to to deal
with all of our platforms so that this
this gave us that we wanted to make sure
that we could manage our own config and
infrastructure and and do that through
code have commit history and you know
manage that just like anything else any
of our other code so we've got
you know config an infrastructure in
code chef and terraform and we we have
needs every so often for custom stuff
that isn't available on some of the big
hosts for a while we were talking about
a no SQL document store solution and we
were debating between mago and caps base
and like so now Mongo has managed RDS
service with AWS you can just spin it up
at this amazing couch base on the other
hand needs like it doesn't have just a
nice composer installable PHP driver
there's a custom PHP extension you have
to turn on and there's some builds that
you have to do it's it's really a pain
and you know we can't just do that on be
empty on we also have a need for my
modern versions but you know the nice
way to say that is our preferred
versions of things like solar I'm sure
that a lot of us in here have stumbled
against these old bespoke versions of
solar that the platform's give us and
it's not ideal we have our own dev
workflow it's get flow related again
we'll get to that later but you know
there's no there's no pull requests and
any of the the remote get repos for any
of the
the platform hosts this is a big part of
our workflow we've got a team lead a def
lead that you know manages pull requests
and does merges based on the other team
members you know work and what we would
have to do is we would have to have our
own remote our own hosted private
repository and then maintain another
remote and push all those things up to
Aqua air Pantheon or whatever and you
know that's that's a pain but some of
one of the most important things is the
tuning our setup for heavy use like one
of my projects right now is a
professional certification board exams
practice exams so you know I mean that
this is one of our smaller test taking
platforms we might have to deal with
eight or nine hundred concurrent users
and you know launching tests and
thousands of tests a day and you know we
just we can't get that sort of you
spaced scalability you know on these
other platforms we also have a like a
State Board of Education you know a full
statewide standardized testing system
and so I haven't checked since I was in
the office so that was a couple of days
ago so even through mid-march our month
to date is so far 70,000 some tests so
like we can't get that sort of custom
scalability that we need in order to run
one of those questions oh you were so
nobody on two issues is that a problem
with the test software that's very heavy
on resources or it is it is Tao is not
that is good as Drupal is
resource-intensive it's it's not optimal
in any way to use of its databases
really for in some ways it's as it
accumulates data over time it gets
progressively worse
you have to adjuster that
and another part of that is for this we
only need that level of of hardware for
very particular times so you know
customizing your your tiers of usage
isn't just it's not nice and rule-based
like like we have there's no I get what
we'll get to that so we do a lot of cool
stuff with you know Amazon storage and
backup solutions we do some amazing
things with logs and alarms and log
metrics we'll get some of that too
that's that's really cool and we can
really control to a real fine level our
firewall and network security which is
something important to us because you
know we like to lock these things down
so we're gonna briefly talk about cost
control again this is not really the
focus of what we're talking about here
but it was a benefit for us anything
that we need for these couple of apps
these are all in that most expensive
tier on our big few platforms and you
know yeah that didn't it didn't give us
a lot of fine control and it's been for
Tao especially like you know we can't
just run that on you know one of these
platforms we needed to host
now in the centralized building this is
not something I deal with but apparently
it's very important I know that we do
occasionally build these things and then
when we're done with a project we're
handing it off to the client we also
hand off the Amazon account that goes
with it so they get to control their own
billing like I know that just a couple
weeks ago we were working on you know if
an Theon project and the the we've
finished and now the client wants to
take it over and getting that
transferred over to them was not just
that he is not as easy as I hoped it
would be and it's not it's super simple
Oh neat have you guys seen ever we have
process for that now smoothly
first towards backups we have our full
daily backups and we can do
point-in-time recoveries which was super
cool I don't know if you've ever started
to deploy and realized as your your TV
is failing that you forgot to back up
the database mm-hmm
so we can manage all of that and restore
it to just before the the deployment
went and and it's great so we've you
know we've had our practice data
recoveries and then an actual data
recovery which again totally my own
fault but we still needed it and and it
just it works it worked exactly the way
we wanted it to and we've got a lot of
security there so we do we do these
snapshots of the DV servers of the disks
and and our cloud storage so for the
most part we're using EFS but it's true
for s3 as well yeah and Tao the custom
backup solutions we need to do outside
of the Drupal realm like the tile backup
is it's really cranky about what it
wants to once its database its file
system and it's Redis backed up it's a
moment in time
so we had to create a custom solution
for that right so we have to we actually
have to you know disable Tao we have to
put Tao into maintenance mode before we
can do all these backups because you
can't restore anything in Tao that is
that was in use for Drupal that's not
such a big deal our our backups here are
real simple we didn't really gain
anything for our Drupal backups here
yeah
I'm just gonna ask are you gonna cover
how you setup your databases briefly to
use the RDS so yeah we will talk about
that yeah yeah one of the more exciting
things I think the log aggregation stuff
is really cool we can set up all kinds
of custom alerts and reports and alarms
and things for you know for anything
that gets recorded in a log which which
is real neat well we'll talk about that
we have examples of that later we grow
that all the time if someone comes up
and says hey I'd like to know when this
happens
we use a series of terraform scripts and
create those symmetrical arms and then
start issuing them to whoever asked for
that information
and the cool thing is since it's all
scripted it's antara form if we need
that for another project or another
platform yes these with stays in our
toolbox and it just propagates itself as
we spin up new projects so we've got you
know we've got distribution lists for
different types of alarms you know like
you know usage and 500 errors whatever
you know there are different groups of
people that get all those and these are
just some of the things we check you
know we check our CPU usage we check for
errors and warnings although some
projects we turn the warnings off
because there's a lot of them but the
errors we always want to know about all
of our disk and memory utilization you
know different service statuses and so
on
for the network security you're able to
really tweak this you can fine-tune this
kind of however you want you want to
talk a little bit about a security group
in a way the security roots of AWS were
kind of like their initial idea of
firewall providing functionality of what
is allowed between the internet and a
load balancer or load balancer and
server one server to another server is
able live control there you can define
different rules for in and out protocol
supports and then more recently they
added a service called Web Application
Firewall
which tax before the traffic hits your
your system so we had a situation where
we were being we had a denial service
attack on this testing platform and so
we wanted to come up with a way to
prevent that before it actually hit our
load balancer to start impacting our
servers in our therefore cost because
the servers have to accommodate it so
many more servers and longs to withstand
that attack the Web Application Firewall
to be able to stop it before it got into
the servers so on AWS takes the brunt of
it and
are actually built
that we've had you know a couple
challenges here and there with with
tightening things down like this the AWS
SDK is amazing but it's not super well
documented especially for PHP so we have
things like you know two internal
applications and one is hitting web
services on another and we can get that
all locked down so we it only accepts
traffic from you know or other whatever
we set up our other AWS account so you
know like we have to generate a
certificate sign their requests generate
the requests and then make it and we can
do all that through the SDK it's again
it's it's wonderful but you gotta figure
out a little bit how to do it we're
gonna have a series of blog posts that
are gonna go into that because they're
just there aren't that really yeah
another reason we had we ended up going
with AWS in terms of meeting those
Google and tau
they talked one another on the back end
having them running in the same place
allows us to restrict their
communication
to a private network that only they use
so it's much less late but it also
doesn't traverse the internet it's all
encased in that secure networks
our configuration management we talked a
little bit about this it's really
wonderful to be able to store all of
your config as code in a repository and
you know you get commit history anytime
you you update or change an environment
setting you always have something to
roll back to you always have a map of
what it is that you've done so we use
both terraform and chef for this we have
a chef server that that manages
environmental specific stuff so installs
config settings all that kind of stuff
we start out with you know a basic
template and then add stuff in as we
need it this gives us you know our base
image Apache you know people have asked
us what about nginx we can do that too
we use Apache and again this is mostly
because of some of the other apps that
we're supporting really or finicky about
where they run so to keep them all nice
and level and all the same we're still
on Apache and then we load up cookbooks
and modules from our own repository turn
on the features that we want exclude the
ones that we don't and and then spin it
up and the cool thing about this is that
if devs do have something that they need
again this is going to you know
something like we need to use Couchbase
for this for this project we you know
we've got some some config code that we
can do or if we don't have it a dev or a
dev ops person can write it that gets
submitted we can you know make a pull
request then that ops mostly Mike
and review and tweak and then it gets
committed and then for the next build or
you know whatever we want to do it it'll
spin up yeah
and it can do some things that's complex
or simple it's you need or like that's
one example of it could be either you
just need to adjust an apache directive
so you know jack and tweak that in the
chef cookbook and i can then pull that
change or merge it and put it on the
ship's server and then like the six
drupal servers well then pull it to
themselves and recover themselves and
restart apache to imply the change we
also use this for some custom drupal
settings now we use config split to you
know to better manage these but you know
there are things like you know any
custom config 'that has secrets in it
you know our smtp convicting stuff like
that that we don't want just out in a
git repo and our git repos are private
for the most part but you know we do do
open source stuff like you know any we
had done the teacher system the portal
for the park if you remember that
standard test thing week that failed a
few years ago so you know that's an open
source repository for the platform
itself but we have specific things you
know we need to set up and and we manage
those extra configs thru we have get lab
for that we don't have a
private get repo or github repo for it
and then chef pulls in the files that we
have configured for that project sets
them up where they're supposed to be
does it import if it needs to and and
then you know again it just it sets
itself up now yeah we chose to run a git
lap server in the house they have
Community Edition that's free and we
keep things that are private that we
don't want exposed anywhere even a
possibility of such in that git
repository as opposed to get up you know
even though those repositories we using
get our private you know they're still
out there so this keeps it on our
network it also makes it easier to
handoff let a client fork you know their
their codebase if they want to and then
we maintain our own set of secrets so
they can maintain their own set of
secrets however they everything
and this gives us a real good handle on
drift control nothing really gets out of
hand because anytime something looks
like it gets out of hand you can pull
down an unhealthy server spin and do one
up and it's it's transparent yeah and
chefwell new periodically of all the
things in his purview of Jack happens to
change something temporarily to tests of
had to change our PHP change shuffle
we've heard
and I'm sure he is this happened not
long ago yeah it's I mean I like the
effect some people don't like that
them without their fixed plate
inspecting at the head
so to set up Tripoli on AWS I know we've
been talking a lot about town but again
it's just sort of the setting of
expectations is that this was a big
driver for us we just wanted everything
to be the same but we're really going to
focus on Drupal now so what do we need
for Drupal well we need an ec2 server
ec2 is the webserver part it's the app
server so you know it's the the Linux
box it's got the web server on it is
what runs PHP that sort of thing we run
our ec2 on an Amazon Linux one am i so
we've got a couple of challenges there
you see in our dev workflow we do a lot
of Orlando and docker stuff and the only
place to get Amazon Linux one is on
Amazon so there's no docker images for
it it is locked down it is proprietary
we're looking at moving to the newer
Linux two which does have docker image
support but we actually we don't have
any production apps on this yet so we
haven't had a lot of issues with not
having our dev OS and our ec2 OS be
exactly the same yeah we've got that
stuff pretty well sorted probably again
this is something we will we will be
changing to what you're using the sense
basically so we've got our web server
now we need a shared file system now
again we'll talk about this a bit later
for Amazon you have two main choices
which is s3 and EFS s3 is powerful and
cool but boy is it a pain if you're you
need a module to hit it on Drupal and if
something gets slightly out of
configuration with the module versus the
actual s3 service it'll take down your
whole site ik Drupal can't serve up any
files without it and then getting in
there to disable it like that's a pain
it got to be painful enough for the
little game that we got from using it
that we switched over to Amazon's EFS
which is really just a normal file
system you mounted up like a file system
the whole the whole server doesn't even
know the difference yes that's wonderful
you need RDS which is where we have all
of our database set up again we'll get
into that in a little bit we use the AWS
certificate manager which again we'll go
into more detail on and it's wonderful
to have everything nicely consolidated
and
we can do things like manage free HTTPS
certificates on you know not in
permanent boxes which is wonderful and
then we need to set up whatever else we
need you know solar Harnish whatever
SMTP
that kind of stuff so we use a soloist
Bluegreen style deployment for for our
environments and so a soloist is
basically one master image that you
build up you get it all set up and then
it's what's used to clone into the other
images that we use to deploy out to the
you know to the various environments so
it's a it's a Bluegreen deployment
meaning we build up the master image the
new image for whatever we're trying to
deploy we convert that into an ami from
the Amazons actual image and then when
we start the deployment it does what's
called the Bluegreen deployment so it
leaves the the old servers up for as
long as it can sets up the new ones in
parallel and then once new servers are
green you know they're they're healthy
they're good they're they're taking
traffic it will start pulling down to
old servers until all of the servers are
green
goes that
yeah I do actually so and part of this
this set up and you know our custom you
know chef secrets management is that
that we can pull secrets management out
of the repository for its master view we
have standardized on one password in our
organization so every cats where it gets
its own or every project gets its own
vault and then you know API keys the
database passwords you know a user one
password that kind of stuff those things
go in there and are secured so that
certain people can or can't get to them
and it's all again it's all secure yeah
it used to be that it's been up with new
project we had
instead of Drupal in that with usually
want to counter for example and I would
set the password initially and then we
had various methods over the years to
securely trance translate or get it to
needed it now we just have in one
password a project specific fall we had
the devs on that team or two involved
when they know that they can go to that
vault in one password to get all the
Secret Service credentials that they
might need it's great we don't need to
worry about sending or emailing or
texting or post-it notes with passwords
on them everything is done in a nice
controlled manner so you know nothing
gets lost or propagated the way
so ec2 again this is our webserver for
Drupal 8 we usually have two medium
sized servers which will dial up or down
depending on the load the project is
going to see they're all high
availability which means we always have
at least two servers running in
different fail ability zones which in
Amazon or AWS speak is kind of data
centers so they have multiple regions to
the country and around the world but in
each reason region they have
availability zones which roughly
correlated to data centers so they're
physically separated from one another so
one datacenter as an issue network issue
or fire or whatever the other data
center continues to serve our
application
right so if that Northern Virginia
volcano erupts we can still reserved up
tests out of the southern Virginia at
the Unity Center and then we can do a
lot of host based routing trickery to
make sure the traffic gets to the right
place we haven't had to really take
advantage of some of the power we have
here but we could and it could do some
really cool stuff but in the way we have
our RFS our file stores shared we can
hop back and forth between servers you
know and the load balancer all the time
and everything's fine we just manage
everything yeah one one thing that alb
the application load balancer in an AWS
has let us do that before this was
around they just had the classic TLB and
we would run one of those for say an EOP
bouncer for Groupon one for Tao now we
can collapse those into a single of
bouncer and requests with a header that
the person making the request wants to
talk to Drew boy herships of traffic to
that set of servers or vice versa so
when I have to run one little bouncer
now instead of two or three year weapons
so all of our except for the M all of
our lack stuff goes on to the EC - this
is where we have our operating system
Apache PHP PHP FBM any of our extensions
or middleware that we need this all goes
on the ec2 bucks and we use again we use
terraform to spin that up and then chef
to actually manage the config of it so
if we need custom time outs custom
memory limits all of that is managed on
the ec2 box through these these chef
scripts the shared file system again has
three or EFS if you need s3 go for s3
but it's it has challenges if you just
need a file system to go across your
your web farm go EFS it is absolutely
transparent and very easy to setup and
if you have more than one Drupal server
you do have to have something like this
so that they can access reading reading
and writing to common places so if
someone uploads a file
one sir it has and the request is made
and it ends up on another server it has
to be able to get that information
somehow
PFS seems to be the most smooth way to
do that at AWS yeah and we serve on a
public private and temp all through the
FS
so again if your session hops through
one node to another they all pull up the
same thing for RDS this is where we set
up the database and this is also high
availability so it's a minimum of two
instances in different geographical
areas we've got a nightly backup
schedule an additional point in time
recovery and we get to pick the database
engine that we're using for any
particular RDS instance so a lot of
viruses
you know MySQL in the i-5 so it's a five
six or whatever it is we can do Maria we
can do Aurora
again there's RDS stuff now MongoDB
hasn't managed RDS instance you can set
up you know anything database related in
one of these and the cool thing is that
it's just managed and and a lot of it
really just works it used to be you know
especially going back through my
background you know for
we had a Microsoft sequel server and we
had Oracle and the people who maintained
each of those didn't do them the same
way and they often wouldn't talk which
put us the devs right in the middle of
that I need a backup of this and I need
a backup of that and they weren't in
sync you know setting this stuff up
setting failover up on Oracle I mean I
suppose it's easier now but will teams
with people dedicated to yeah
the certificate manager is pretty cool
this gives us free SSL certs and again
now we can we can spin up a new queue a
box let's say we want to test some new
you know load balancing rules on QA you
can spin up a new instance the automatic
mediately automatically get a new
circuit for it and not have to worry
about QA going through and not having
the same you know HTTP behavior on QA
versus prod whatever the certificates
are all basically managed the same way
so these are Auto approved and the
renewals all happen based on the DNS
talked about that yeah so mmm you ask
you have the option of
but you request SSL Certificates they're
free as long as you use them at AWS you
can't really get out to use them
somewhere else right but in order to
prove them that a domain owner has just
read those so you can either do it the
traditional way to be an email where you
quit yes I approve this certificate its
creation at use I have an important you
that or you can do with DNF DNS where
the certificate manager will give you a
list of injuries to create in your DNS
system and their certificate system will
check your DNS server look up these
requests and if those records exist then
it will assume that you the owner of the
DNS domain has put those in there and it
creates a certificate
end of that certificates life if those
renewable credentials are in DNS it will
automatically generate new certificates
and install them in the load balancer
for you so there's no more
surprises when certificates expire and
requests are coming in for broken
websites yeah it's nice when you
suddenly realize that you haven't had to
deal with those in a long time yet is it
using let's encrypt we had it seems I
don't know is it using let's encrypt
behind this the scenes try to get it for
the mic I think I don't think so I
remember when they started happening I
think they started providing the service
before let's encrypt actually came out
of whatever you know
people started using it and I remember
you know AWS became its own certificate
authority so that's yeah no they don't
no no I remember that
and this last point is is again
important for testing is that these are
externally signed and authenticated
certificates they're they're not self
signed anymore that's something that
we've had to deal with a lot one of the
things we do for some of our exam
projects we have a an in-house built
secure exam browser so we like to be
able to replicate this full user
experience of setting up the secure
example hauser getting into the test
rostering using it and not having to
acknowledge self-signed cert every time
again for QA and for showing this to the
business stakeholders means we can we
can duplicate this workflow in every one
of the environments we don't have to
have well this is QA you won't have to
do this in Pride and I have to do any of
it it will do they'll be exactly the
same
so DNS management you know this is more
this is more Mike's area than mine but
before we used external registry
services we used network solutions
GoDaddy stuff like that but we do many
of them I don't know why you know that
so we wanted to manage DNS for
particular project you may have to go to
several
but now everything is all nicely
consolidated on AWS using rap 53 that
does the registry providing and the
actual DNS providing and we get to
manage all of this through terraform so
again we get to manage all of our DNS
stuff and our registry management
through trackable code so there's no
more of this figuring out well on you
know well we have this on GoDaddy we
need to change something so here's the
flow through that and of course that
whatever we document isn't necessarily
right the next time
but the other provider you have to do it
this way and same thing so you know our
documentation got out of sync real easy
and in every different place we did this
we had to maintain differently yeah I
mean having the history is great too if
I were someone else
and this terraform template that
provides the establishes needs the
message
brothers domain we can see when that
happened
these are tough to get we can see if we
did it
and we've heard it maybe I also have
found this to be really useful when
I see all the injuries might see names
for a particular domain or a project I
just pull up these files we're gonna
quick rap on them where I had to do
something like that and go
I'll be strolling including
it makes a lot more convenient
is that checked in her project then we
actually way we do it we have a master
County WS and we run our Oliver route 53
service through that so we might have a
different account for her project some
of them
some of them are in the project
themselves if we expect that everything
with it to the client
that but a lot of it we do in one place
yeah I like all of our internal DNS or
non project things we do have like
sure so you're using 53 if for political
reasons and an organization or for
whatever reason you want to use the DNS
management features of say network
solutions or whatever whoever your
registrar is is there a problem in doing
that are there any significant
disadvantages of P Tony's provision no
in fact there's probably some there's
definitely some advantages like
propagation is a lot faster here if your
DNS change is made server and AWS your
load balancer native us I'm going to get
it in
maybe haunted somewhere else I mean it's
stuff like that is a consideration
but no we used to run them everywhere
and we didn't really have any issues and
turned to the posting the application
was more like a
I'm always trying to remember where this
domain was mansion
now terraform does it's not strictly
it'll work on different platforms either
different cloud providers under the name
of us so the same terraform code
these plug install providers terrible
top and different services and I think
it even has providers for safer GoDaddy
or network solutions so the same coach
is a point AWS
creating this record or wanted to go
Danny
and I don't have two slides up on the
presentation page yet so we will so you
can see everything that you know we have
up here and look at the code samples
that kind of stuff
so we have additional services we use
like ElastiCache Revis you know varnish
solar we use this SES simple email
service to run our SMTP services and
then there's cloud watch which does our
our logs and then the SNS for delivering
alarms on those logs and then and then
guard duty which is really cool and a
little bit annoying sometimes it it it
monitors all of the traffic and that's
both for the actual you know app traffic
as well as configuration and management
traffic so anytime you know if I'm here
at the at the camp and I need to sign in
to the AWS dashboard to you know fix
something check a log make a change
whatever you know Mike and my whole team
knows that I access this somewhere else
but then after I've been here a couple
times and I keep doing it decides that
that's okay and and then doesn't let
anybody know anymore
or you think oh it's going to say that
we'll get an email that says the
principal jack login is accessing or
making suspicious requests from
or something like that
it also it's a it's also firing several
times but if we haven't added it to new
for an older Drupal project the first
time you request detector updates no say
hey the server's talking to a new target
should check it out
yeah the funny thing is I always think
something is a mess when I'm you know
out at the track and there's a 500 error
on that same time so to set up our
environments so again we've talked a lot
about terraform all of our
infrastructure is code we use this to
set up our you know our clouds our
subnets the security groups all the load
balancers the web servers the database
servers we use this to spin up
everything and we have a commit for
every change so everything is nice and
reproducible it's all consistent and if
we need to you know troubleshoot
something we have other projects as
baselines you know there's no drift so
everything that's done on one of the
servers is set up through these files so
nothing should come as a surprise if
something isn't working on some
particular server it's because we built
it wrong it's not because something got
a little bit out of our control
this this graphic is just an example of
this our
repository is one projects
section or we control these PWS services
and
Parise attention
General Services in there like s3
we have
database service configuration
these are the application servers the
general section
change the server pieces class
more power or less powerful system or
whatever we can see the history of what
we've changed that server to that
used to take us three or four or five or
days or a couple of weeks to set up some
and then we also counseled with this
system we just run a series of terraform
commands and it takes like a totally raw
pls account reads everything
a tried-and-true way reproducible way so
that's helped us a lot reduce time and
errors and on reducing errors like we
were talking about making a config
change to some environment we get to
reduce that oh right yeah I made that
change and we're doing a new deploy and
now the change is gone because I just I
forgot to put it in to terraform
whatever there's none of that now we
make the change right in terraform spin
up the new server we can test it right
there we don't have to worry about
making a change remembering to
persistent and then remembering to bring
that over to our new build you know we
don't have to worry about deploying a
build on top of an out-of-date server it
just spins up a new image every time
does that a hand up then I see
so here's what some terraform looks like
this spins up our drupal soloist so
again that's the the master image and
creates an alarm for cpu utilization so
we can see in here there's you know
there's there's tokens in here so it
sits nicely parameter driven and we have
custom modules that we can reuse from
project to project so we don't have to
reinvent a you know a CPU utilization
metric alarm every time we just tokenize
whatever we're recording on and and we
add this in to whatever our terraform is
do you want to walk through this real
quick guys well yeah do you want a line
but you do you want to know what this
does they have nobody cares we'll just
have it it's basically just spending up
one server that acts as our master
servers
puts it at a particular something that
as this server class or like a powerful
server is gives it a security group so
it's like a network very here firewall
and then this is one example of many
this system would have this is CPU
utilization
subjects for this period of seconds and
during that period the average CPU
utilization is over Haiti then it'll
send an alarm to this SMS topic which
generates a mental to the project team
all right so setting up the actual ec2
server we set up a patchy and PHP and
PHP fbm with with chef so again our sort
of our determination about you know like
terraform versus chef terraform is our
infrastructure management chef is our
config management so to set PHP
variables or you know the PHP
configuration that goes into our chef
cookbooks and each project has its own
set some of them are exactly the same
because they're based on our sort of
core template for it but we customized
it per project based on where we and
we'd already talked about the secrets
management all of the secrets we need to
track they all go into one password so
that devs and ops and whoever needs it
can can pull API keys passwords whatever
out of a vault and we never have to send
them unsecured
so here's what some of the chef's looks
like the one on the left here is setting
up PHP the one on the right is setting
some config for it yeah so when a new
server spins up based on that so Louis
am I the first thing it does is reach
out to the chef server and register
itself in roll itself and as a chef
client and it'll pull it down in a
series of cookbooks that start applying
them some of these things are baked into
the ami so it checks
do I need to install HP or @pm if it's
already there it doesn't do it does
happen to do it then it will start
restart apache and
over here these are just some of the
things
here's the settings that look will be a
keeper Drupal
these files are in
project
the pilots into the book and where it
needs to go where it needs to go on the
server itself and it creates it
propositions were set for pigs
we serve actually the
so it's kind of like these servers come
instantiate I had nothing and then they
pulled all the software they need and a
configuration for Drupal that they need
automatically and fun booting they're
running the application
so to do our actual server updates you
know any of our you know gum updates you
know whatever we we do those manually on
some projects for others we use the AWS
systems manager and you know when we
have a security update and you know some
sort of server component update the next
time we deploy those will just get you
know pushed out whether there's a you
know an actual happen or not it's part
of the image and it will you know
come on line that way for Drupal updates
this is part of our dev workflow we
don't do this automatically it's got to
go through QA and for some of our you
know our real high availability systems
where if there's a five-minute outage we
still have to schedule that you know all
of this is done through composer you
know our composer block gets updated
we've pushed that out to our composer
install and our TVs and whatever this we
need to do and then you know that's done
again through a normal deployment
process
for RTS guess what we use terraform spin
this up we create our users and grants
and commissions and things manually with
some scripts there's just not a lot of
there's not a lot of bang for your buck
spending a lot of time automating that
because it's so repeatable it's so easy
again all of these secrets get stored in
the project's vault in one password and
here's our little terraform script on
here you see it looks much like the
other one sets up some you know some
some settings defines when our engines
are like yeah I was similar like you
said it's it's basically saying web
version well the database engine my
sequel
their class or the size of the server
that's running my Sikh world
he's on so yeah the EJ aspect going
that's true you can spin up from an
initial snapshot if you
at 1:00
new project or rebuilding a project
at rest or in transit
the security here just gets placed into
parameter groups are the
like my secret
the backup window you define how many
days back
like me you survived
and the maintenance window where desert
is able to attach itself like minor
releases and stuff like that late night
on sand
see we
all right EFS again we already mostly
talked about once again we use terraform
to set this up and then the each of the
soloists knows where to mount this and
then that gets pushed out to the to the
host during deployment for cron we use
an external quark tokenized cron request
we have an AWS lambda function that does
this so it's all internal it doesn't
need to go out into the public space so
it's all it's all secure it's all nice
and private
our development workflow I'm gonna
breeze through this because we're almost
at a time it's a slight a slightly light
get flow workflow this is from
drupal.org here really it's you know
most of our docker based you know dev
environments we have our own River our
own private code repositories we use a
feature branch structure for hot fixes
and security fixes that are going off of
other releases those are branched off
the actual release tags instead of from
masters so that we can maintain each of
those separately technically it approves
the pull requests we do a lot of very
frequent QA deployments and then we test
those once that's all done we pushed to
our staging environment which is really
like a one version ago well a void
version of had rather of prod this is
where we validate all of our deployment
instructions and all that kind of stuff
so project setup it's we just do a
create project all of its managed
through composer we initialize a lando
container customize some stuff put in
what we need you know like in this case
here's a bunch of tooling to setup
webdriver on here so we can do
functional testing and this way every
dev is using the same image I haven't
really heard a an environment based
works on my machine in a little while
now and that's really refreshing now for
the actual deployment it's super easy
not that most Drupal deployments should
be very difficult but it really is just
this easy it's so Louis base that we run
this one time and then push everything
out it's just pulling the remote code
down checking out whatever release we're
doing updating all of our composer
dependencies doing a you know a database
update which you don't always need to do
doing an edit a schema update which you
don't always need to do a config import
which we almost always need to do and
then a catch me sign
okay now we're into the meat of the
deployment so I think this speaks for
itself we don't have to spend any time
on this right
we've got a few cloud environments here
so we can see we have a pride in a stage
we have that the top guy with the arrow
that's our soloist so you know come
deployment day but I can spin this up
for me I'll go in do the deploy you know
the pole deploy and get it all set up
and then as soon as I'm done the image
Isis that and that's on the next slide
so okay well this shows then we're the
soloist and the actual ec2 boxes set and
how the load balancer sits between them
we've got an availability zone on the
top and on the bottom the availability
zone different data centers where these
Google services okay we have a question
are your slides available yeah I haven't
I haven't attached these in you know my
talk yesterday I didn't follow my own
advice of half these thing you know
don't make changes the day of and we did
and so will happen these will have these
up today so here's how the actual
deployment works
there we go
so we spin out soloist deploy the code
if there's any config changes that need
to be made we update those in chef
we snapshot it you know make an image of
it and then the terraform scripts need
to be updated with the ID of that new
image we don't I mean this can be a
little more automated but it's not like
it adds a whole lot of time again
there's not a lot of bang for your buck
in there because most of our things are
not a continuous deployment kind of a
thing there are real high stakes any
sort of service change or especially an
outage these have to be scheduled in
advance the running at terraform apply
kicks off all this stuff and sets itself
up so it does this Bluegreen deployment
it starts bringing up the new servers
with all of our new code all of our new
config and then as each one comes up it
will start tearing down the blue ones so
that eventually it's all the new servers
that that are
[Music]
so most of the time this actually makes
our downtime due to deployments zero but
again some of these are really high
stakes so even if it is 60 seconds that
is a big deal but most of the time we
can set this up and it's totally
transparent nobody even knows and it's
all automated so it reduces a lot of our
error like I remember the last one of
these that we had a problem with was
even probably over a year ago and it was
we had accidentally left a blue server
up in wait we the team accidentally left
a blue server up in the load balancer so
that's for some reason every fourth
request got weird failures and it's
because as it was bouncing through one
of the servers was still running old
code and you know something that was in
the APC I couldn't find the class that
was in the APC because it still an old
code I so that doesn't happen anymore it
tears itself down automatically and you
know we could use Jenkins to automate
this again we just we don't have a lot
of call right now for a CD kind of
solution this is something that you know
Mike and I have both been trying to push
a little bit because even if we need you
know permission to do this we still want
to not have to do anything more than
just tell it to go but we haven't we
haven't finalized this yet
oh boy we still have a long way to go at
only two minutes all right we're gonna
have to do father here so we don't use
auto-scaling for everything but it's
really cool when we do and for the
smaller projects the auto-scaling
that gives it some self-healing so if a
server starts to misbehave it will
automatically turn itself down and and
build up a new one and again it uses the
same soloist images that we use for
regular based on the AWS code or your
own code
well AWS on is stealing their feature
yeah so it's their software that yes
watches your server if something's you
have stored up then it says you're dead
you're gone and we have you new really
sophisticated complaint stuff with their
auto scaling buttons
spaces we use are in real simple way we
say we want to servers running at all
times of one doesn't respond in a
certain amount of time
okay so again different auto-scaling
needs 7000 concurrent users is different
than the 600 concurrent users but our
auto-scaling rules are pretty cool we
have multiple tiers of these so if CPU
utilization starts to get too high it'll
add a couple instances when it passes
that threshold and goes a little bit
more it'll add four instances at the
time until it gets to a threshold of 16
servers and then it'll go to this busy
step which adds four at a time and then
eight at a time until it maxes out at 40
so we can customize all of this stuff I
we've got I think lunch is next do we
mind if we go an extra couple of minutes
we can stop on time if we want okay yeah
we're almost done so again I am so
excited about all the log and alert
stuff this is this is so cool we can we
can set we can store all of these logs
up in cloud watch you know in the cloud
because we're bringing up and tearing
down servers all the time if we
persisted these logs locally every time
a server goes away those logs disappear
we don't have to worry about that now
everything goes into a nice central
repository and we can keep it for as
long as we want which right now is
forever because we haven't run out of
space on logs so we can log all kinds of
stuff load balancer requests you know
any of these alarms that we set up if we
have we have actual you know requests
going into the log so we can do alerts
based on
the actual details in a request so
anytime some URL gets hit for instance
or some URL gets hit with some parameter
or some URL gets hit with some parameter
from some domain we can send out an
alert based on man so PHP fatal shows up
we send on L arm that doesn't you know
that never happens and then you know
some of these go to well I think they
all go to DevOps most of them come to
dev and but again we can set those
custom and we can we can set who gets
notified for what particular type of
alarm I think we only have two of these
left so here's a sample of some of the
Diagnostics we've got blue for tau and
green for Drupal although we should have
done that backwards blue for triple so
we can see and again this demonstrates
part of our problem you know this is the
same user load the you know they're
doing operations in both systems we can
see how much more resource intensive tau
is being able to you know standardize
this and handle both systems more or
less the same and it's just saved us a
ton of time and money so we set all of
these Diagnostics up out of our cloud
watch locks and we can get real-time
information on any of these
now this is really cool this is the the
new cloud watch insights so you can have
these report queries based on your log
entries so this is for a professional
certification practice exam system so if
students buy you know the textbook for
whatever the exam is it will
automatically go and send them an access
code to activate the you know the the
tests for that book in the portal system
so here we can see you know how many
products get purchased we can break
these down by hour by day we can group
lease however we want we track here how
many tests get launched a new and how
many resume test launches there are so
this tracks the difference between you
know who's starting up tests for the
first time versus who has paused them
and starts going into them again and
again all of this is done right through
god watch we have an example of that
query so we tell it what you know what
log server we want to to come from we
make sure to ignore our own internal
testing because you know come release
day all of a sudden we have two or three
IP addresses which will have forty tests
in one hour and that's not real so we
want to make sure that stuff gets
filtered out and we can do these filters
and groupings if the bride right on here
any it's not you know answer your sequel
but it's you know not hard to pick up
and this this is the result of of that
query and I think that finishes us up so
thank you any any question does
terraform invoke Shepard as part of the
image builder EWS
yeah they don't talk to one another we
use tariffs on most of these or
infrastructure and pretend when the
structures becomes like that it
registers itself which they have
disclosed all this figure we use an S
with everything like I used to shop and
like I need a cemetery I was like all
right what the hell likes me too
that's always a problem that's why we
get stuck on some things like you like
this that you based here this is an AWS
on load that towel was putting on their
normal triple install where would you
put that decision to the point on a fuse
per day or per month or titanius users I
mean that where would you go with that
so the question is what's our metric for
moving off of like a host I got a
platform host onto an AWS host it's
actually it's it's less CPU utilization
or concurrent users it's really page
throughput so once it starts getting to
the point where it's slowing down and we
can't scale up to catch it up we know
that we're you know it's it's we need to
move it
so I mean sometimes it's concurrent user
base sometimes it's utilization based
you know every every one of these apps
that we do it was a little bit different
they take traffic a little bit
differently again for just a regular
website that's harder to tell because
you've got my partner sitting in front
of it or whatever
right you know again we'd like I just
like don't have anonymous traffic I
don't have to worry about any of that
stuff so you know and every one of those
metrics is gonna be a little bit
different again we look for latency and
and site wide performance on what we can
skim we were considering AWS for our
site because it's give a lot of users a
particular time of the day or the week
or whatever right and so what I'm what
I've seen looking at me in terms of your
deployment we have you know contributors
working on the site all day long is this
something where they have to basically
publish at one time during the day and
then we deploy the data or can you
actually work live money to us sure you
can work live like I said a lot of time
our our deploys are transparent so you
know it you switch over from servers say
the thing I guess it hasn't happened in
a long time but the couple of systems
I'm working on right now are not super
content heavy but you know even having
like your form ID is expired when you
switch over servers during a deployment
even that doesn't doesn't happen a whole
lot so yeah for the most part we can do
our deploys while users are in the
system but so again I know I've said
this a few times but some of these
systems like those testing systems can
be very high stakes right so we can't
risk having someone's session be
interrupted right so those have to be
scheduled in advance and downtime has to
be scheduled and whatever so you know
for a content contributor like that like
you know for just content you know
that's just normal system usage for most
deployments it shouldn't even affect
them so you would be logged in one
target server and then that would then
ripple through the system and update all
the other servers when they you know
save the file or save the article or
something yeah well it saves it out to
RDS and there's no it's a shared
database everything hits that you know
shared it same thing with the file
system if they are working on a note and
they have to upload a file that goes
into shared file storage and then
everywhere that hits it it's the same
flat story yeah I think you're talking
about that content escapes like
someone's working within people right
update something yeah that gets you know
all the servers are talking to the same
database in the same file system so of
anything like that here and they all get
it in our varnish there's the lag time
between content being live unless we
refresh the cache so it's like three to
four hours like what would that be like
in native you wants to be the same thing
or there's a way to do it but I think
said I don't do a lot of varnish so I
can't just tell you there's a there
there's a way to trigger it and I don't
know what it is okay
first module
was
go get other questions I understanding
right are you deploying your Drupal
updates by going on to the servers and
AWS doing I'm get to fetch not exactly
so the question is are we doing our
deployment by logging into AWS servers
and doing a git fetch we spin up this
soloist image which is again that master
image and we do that one time on there
get the soloist up-to-date and then that
gets automatically propagated that image
gets pushed out to you know through the
load balancer and sets up so we don't
have to do this on multiple servers we
do it one time and it just it just puts
it so much so we take an image of that
so listed as the new quote on it update
that parameter and in terrified and then
issue that terrible reply command it'll
go out the AWS and enumerate everything
there and say oh this server is not
running office correct am I
and then it will say would you like you
have to
you know say it wants to spin up two new
servers based on this am i that we just
updated the code up and has it does that
when those two new servers come into
service and I ran through requests
correctly
then it'll take the old servers out of
service will do one more question that's
on you
when you are comparing the EFS versus s3
you're talking just strictly for user
content not for PHP script which
wouldn't run that's me yeah right right
this is again it's a really private
public and temp really is what it gets
us for okay Wow you said you need a 3-1
with the meat for ice tweeting what do
you package it depends on it depends on
the content so the question is why as
three versus yeah that's like what so if
we're doing things like video streaming
you know large files that you know that
are just requested and how three is a
lot better at serving some of that stuff
up it really it depends on the usage of
the particular content but even then the
penalty like that there's it's not that
much faster than the FS in any of these
cases I just don't think it's worth a
headache you know cheaper oh yeah s3 is
cheaper there was a time when the
invested exists so we did have to use s3
that was like
Multi Drupal server shared storage can
use secure s3 with the AWS know or you
can if you put a like a CDN and cloud
watch Club front distribution in front
of
you can use the application so I just
used avada infection policies yeah I
mean at a flow at that level we just let
the Drupal servers talk to a particular
place in s3 for their shared storage and
the other apps can get in there
alright alright thanks for letting us go
over thank you for coming