Luke Faccin

Retrofitting Automated Roller Shades

Luke Faccin — Mon, 12 Jun 2023 01:31:40 GMT

Earlier this year I decided to automate an existing bedroom roller blind in the hopes of encouraging a more natural wake-up with sunlight. Exploring a few different options I landed on the Aqara Roller Shade Driver E1. It supports a wide range of window chains, has good reviews, comes from a reliable vendor and is at a fair price.

Getting hands-on, the design and build quality are great. Since it's compact and white, it blends in well with white window frames and can lift our 180cm x 210cm Ikea blind without too much effort. It doesn't come with a solar panel like some other competitors but only needs to be charged once every two months (when used up / down once per day). In my experience, this holds up in the real world. I've had it set up for about 4.5 months and only charged it twice. This isn't much of a burden with one blind but I can imagine a whole house of blinds the charging would start to get tedious. Fitting out a whole house I'd lean towards an integrated solution that's ideally tied into mains power.

First up the blinds need to be rolling smoothly. If they're in a state of chaos you'll be automating chaos into your life. The E1 does a decent job of sensing the additional friction of a jammed blind and stopping but sometimes manual intervention is required to release it. Thankfully I was able to get my blind rolling straight with a bit of duct tape on the cylinder thanks to the video below.

The E1

Setting up the Aqara E1 is straightforward, simply take off the existing chain clip and replace it with the E1 bracket. I'm not sure if they've done anything special with the sizing of the bracket, maybe I just got lucky but the device lined up perfectly with the existing hole! Once fitting, set your upper and lower limit and you can start using it like a regular electric blind!

Home Assistant

Since the E1 is a Zigbee device it can pair directly to Home Assistant via a Zigbee coordinator without installing any third-party apps or accounts for initial setup. Zigbee is also great because it means less clutter on your WiFi network and a more reliable mesh for your Zigbee devices!

When I first connected the E1, I found it dropping off the network. It would work immediately after pairing but drop off once idle for an extended period. This was because the E1 uses a modern implementation of Zigbee whereas my controller was running an older firmware. An update to my Zigbee controller quickly fixed the connectivity issues.

The only downside of pairing with Home Assistant (via ZHA) is the lack of battery information. This prevents setting low battery alerts when the driver needs charging. Admittedly, I haven't bothered to investigate this as it requires charging so infrequently although it's on my to-do list.

Automation

Initially, I set up morning and evening automation through Home Assistant. This worked well but over time became frustrating because the automation had to be updated whenever I changed my morning alarm. Instead, I now expose the blinds through Apple HomeKit and have a personal automation to open the blinds when my morning alarm goes off.

Waking up

When I purchased the E1 I naively assumed I would be woken by the sunlight beaming in through the window from day one. However, for the first few weeks, it was the noise of the blinds moving that woke me up. It's surprising how sensitive we are to noise, particularly noises that we're not used to hearing while asleep. Although the motor isn't silent it's definitely not loud by any means. In fact, the blinds themselves make quite a bit of noise. Now that I'm climatized to it, I barely notice.

Overall I've been really happy with this purchase. It's an awesome bit of tech that requires minimal effort to set up.

Backing up with Restic

Luke Faccin — Mon, 15 May 2023 00:56:40 GMT

In early 2018 I purchased my first Raspberry Pi, a Pi Zero. After seeing the platform’s potential, I eagerly purchased a 3B+ for some extra grunt in the hopes of replacing my home server. Low upfront cost and power efficiency are pretty appealing attributes when it comes to a device that’s running 24/7 in your home! Thankfully my compute demands aren’t extreme and so the humble Raspberry Pi fit the bill for many years.

Before the Pi, I ran everything on a HP Proliant Microserver (N36L). Back in the day, this was the value sweet spot. You got the form factor of a NAS with the price point of a fully loaded Pi!

Unfortunately for the microserver, mobile compute improved dramatically and it was dethroned by the Pi and other alternatives as the choice for homelabbers. A quick CPU head-to-head shows the current Pi 4B offers twice the compute of the humble N36L thanks to two extra cores. Despite a decade the difference it’s impressive to see how far tech has progressed and the new form factors we’ve unlocked as a result.

After many years and variants of a humble home server, I’ve managed to evade the infamous disk failure. Every year a backup solution sits on my list of things to do but never quite reaches the top. I’ve done a decent job of rotating SD cards on fresh installs and for the most part, it’s gotten me through unscathed however the risk of losing home assistant is an increasing worry as I become more dependent. The time has finally come to sit down and do the boring but responsible thing… backups.

What will it look like?

A Pi 4B currently hosts all my home services such as Home Assistant. These were previously running on a 3B+ which was subsequently resigned to the cupboard. Now is the time for its illustrious return into production!

Backup Selection

When it comes to backup solutions there are no shortage of options available on the market. I considered throwing back to my sysadmin days and taking advantage of the free tier of Veeam Backup but over the last year or so Restic was mentioned on the Self-Hosted podcast so I thought I’d give it a try.

Restic is an open source Go based backup client with support for various destinations such as SFTP, S3-compatible storage and it's own rest server to name a few. After exploring further there were a few things that sold me:

Open source allows the code to be forked if the project goes subscription in the future.
Client/Server architecture available.
Server available as a container image.
Server offers an ‘append only’ mode which should cover crypto-locker protection.

The Spike

Deploying restic was a surprisingly smooth process! Initially I planned to run the client against an SFTP endpoint but after discovering the rest server and its myriad of benefits the decision was a no brainer. The most significant advantage of the rest server being that it’s distributed as a container image. This is a big win because almost everything on my primary pi runs in a container. The increased predictability of standardised configuration between hosts and services means less mental gymnastics and time spent with maintenance.

To spike Restic I kicked off by running the server on my backup pi and the client on my primary pi. I ran both natively from the command line. This allowed me to get familiar with Restic and know what to expect from the product when the time came to deploy it properly.

Server - Stand up the server:
./rest-server --path /store/backups/here --no-auth --append-only
Client - Initialise the repository:
restic -r rest:http://your-backup-server.local:8000/ init
Client - Run a backup to the repository:
restic -r rest:http://your-backup-server:8000/ --verbose backup /files-to-backup
Client - List the backups:
restic snapshots -r rest:http://your-backup-server:8000

Backups ran successfully and I could see them on the server however the server was throwing the warning WARNING: fsync is not supported by the data storage. This can lead to data loss, if the system crashes or the storage is unexpectedly disconnected.

At first this was alarming, we want our backup system to be reliable after all! Digging deeper I discovered that Restic offers an integrity check. In the rare chance of a power outage mid backup, the integrity check can be run to provide certainty around that state of the backup. Satisfied with this solution I didn't bother investing additional time here.

The Deployment

When deciding how to deploy Restic I broke the solution down into two pieces, the client and the server. I chose to run the client natively on the primary Pi because (1) it’s not publishing a service for consumption and (2) backup software can benefit from the flexibility of running natively on a host. The rest server however made sense to run via docker since it’s hosting a service for the client.

Excited by the potential of standardisation on the server front, I was quickly brought back down to earth upon realising the container published isn’t available for ARM-based processors. Thankfully the project is open source! Leveraging the source code and Dockerfile contained within I created a repo on Github with an Action to build an ARM native image with docker and BuildKit. This runs on a weekly schedule, pulling in and publishing the latest changes from the original restic server repo. I should take a moment to highlight the awesomeness that is Go. I don’t need to do anything here! Go detects the architecture under which it’s being built (in this case ARM via the BuildKit emulator) and builds a native ARM binary which can run inside the native ARM container. The fact that it's so trivial to solve for architecture compatibility at both the container and application layers is awesome! Obviously the build process isn’t efficient due to the emulation layer but until Github hosted runners support native ARM instances this is a win!

Next up I needed to schedule the restic client to periodically run backups from the primary pi to the backup pi hosting the rest server. I was tempted to schedule this with a cron job but this was during #100DaysOfHomeLab, it’s about trying something new right? So I decided to register the backup services as a SystemD schedule instead. This was a little more involved but it allows the backup job to be called via SystemD like any other installed service. This is achieved by creating a your-thing.service file that defines the service (in this case the backup command) and a your-thing.timer file which defines the intervals at which it should automatically run. By default, SystemD associates the service with the timer based on the matching prefix your-thing.

One of the great things about scheduling with SystemD is the ability to test the backup in place by simply starting the service and checking /var/log/syslog for the log output. In contrast, cron feels a little clumsy for troubleshooting despite being easier to initially configure with a one-liner.

Overall I've been impressed with Restic. It's been running reliably for a few weeks now and I'm glad to know I won't need to resort to a rebuild if something goes wrong with Home Assistant. A cloud backup target is on the cards at some stage as a secondary location but for now I can rest easy knowing I'm covered for hardware failure. Last but not least, shoutout to the crew over at Self-Hosted for mentioning Restic, it's a great pod and worth a listen!

Increasing availability with Zigbee Bindings

Luke Faccin — Sat, 04 Mar 2023 02:09:00 GMT

Zigbee devices have been running throughout my home for a few years now. Overall, they are pretty reliable however even a basic setup entails a significant chain of dependencies. In my case, I use Home Assistant (HA) as a hub with a Zigbee dongle attached. Everything pairs to HA via the dongle and can be interacted with via Siri and the Home App. The catch is if anything in the chain fails (network, WiFi, Raspberry Pi, etc) the automated experience breaks in weird and wonderful ways.

With so many potential points of failure, a home can quickly start to feel like a production environment and you are the unhappy customer if it fails! Expectations quickly get set that your smart home will take care of all the trivial tasks and it’s a rough time when it doesn’t. This is partly because you lose muscle memory for tasks like turning on the lights so you’re forced to think about them when reverting to the old fashion way. It's valuable to remember that as our smart home dependency increases and so does our expectation of availability.

As someone who enjoys introducing unnecessary changes into my home network, there's no surprise availability can sometimes be a problem. As you can imagine it’s no fun making a last-minute, sleep-deprived change right before bed that requires an extra hour for recovery.

Solving for this my mind jumped immediately to adding redundancy. Routers, switches and deploying k3s but besides cost, this approach results in substantial maintenance overhead. I don't want to spend my time keeping things alive. What I’m trying to solve here is the ability to control a few key devices when Home Assistant is degraded. Enter Zigbee bindings.

There are two regular options when connecting Zigbee devices (like a remote to a bulb). Firstly, you can pair them directly with each other. In this configuration, they only talk to each other and you have no visibility or control within Home Assistant. Alternatively, you pair them with Home Assistant and create automation between them. As you might imagine, this can increase latency as a button press from a remote needs to find its way to HA and back to the light. That said, you gain the significant feature set available in HA (mobile apps, voice assistants, etc).

Instead, Zigbee bindings give you the best of both worlds! You can pair both devices to HA and then use it to configure an additional direct link between the devices. This way it’s possible to maintain visibility and control of the devices within HA and also benefit from the fast response times and reliability offered by a direct connection. This means I'm free to accidentally `rm -rf /` the Raspberry Pi running HA right before bed and worry about it in the morning because I can still turn off my bedside lamp using the paired remote!

Set it up

Getting this setup in HA is pretty straightforward once you know where to look. Some of the menus are hidden away so it’s not super intuitive although it's reasonable given the niche nature of the config. For reference, I’m using the ZHA integration to manage my Zigbee network but other integrations such as zigbee2mqtt also appear to offer device binding configurations.

Binding Support

It’s important to note that not all Zigbee devices will support bindings and it can differ according to the manufacturer and even the device firmware version. An easy way to get concrete information is to look up the device docs on zigbee2mqtt. As an example, looking at my Ikea remote we can see the type of binding support varies based on the firmware version. I was able to identify the firmware version by opening Zigbee management for the device (see "Device Management") and selecting read attributes with the following options:

Tab: Clusters
Clusters: Basic
Attributes of the selected cluster: sw_build_id

Note: You may be required to activate the device (e.g. button press) to see the value.

Device Management

You can reach Zigbee device management by:
Open a device > Ellipses under “Device info” > Manage Zigbee device

This device management menu can be used to read/write device settings. Importantly if you’re using a battery-operated device like my Ikea remote you might need to activate the device (mine was via a button press) to see the result appear after requesting it in HA. This was necessary because the battery-operated remote disconnects after each request to save power. This means HA can’t initiate the request to retrieve the value and instead must wait to receive a request.

Configure a binding

From a configuration standpoint, you’ll likely want to set your bindings on a controlling device (like a remote). If you’d like to target a group of devices you can create one by going to Settings > Devices & Services > Open ZHA Integration > Groups Tab > Create Group.

Once you’re ready to link a remote, open the device's Zigbee management menu. From here we can get under the hood and configure the remote without disconnecting from ZHA. Select a bindable device or group from the dropdown menu and use the associated bind button. If you go with a group you’ll need to select the checkboxes.

Reminder: Don’t forget, once clicking the bind button to activate the device (e.g. button press) so it receives the new configuration.

Once complete you're ready to go! You can now temporarily disable the ZHA plugin and test whether the device can control another via the binding!

Security keys for consumers

Luke Faccin — Thu, 19 Jan 2023 08:32:28 GMT

I recently picked up a few Yubikeys and after rolling them out I realised the security key space has matured quite a bit since the last time I took a look. I've put together a summary of the common standards, what to expect from each plus some buying tips.

When thinking about two-factor (2FA) or multi-factor (MFA) authentication it’s important to remember that each additional factor we add to our authentication flow should be a different type. There are three main types of authentication to consider:

Something you know (passwords, security questions)
Something you have (phone, email, authenticator app)
Something you are (fingerprint, face scan)

Let's take a look at some common authentication mechanisms we might use for online services.

OTP

OTP or One Time Password refers to a uniquely generated code that you’re prompted to enter after your username and password. They can be generated by an application (like Google Authenticator) or sent to you via a text message or other means. They act as a second factor of authentication (something you have) in addition to your username and password (something you know).

WebAuthn

WebAuthn is a standard published by the FIDO alliance which allows security keys to be natively integrated into a website's authentication flow. It supports a range of security key functionality, most importantly U2F and FIDO2 which we'll explore below. The big win with U2F and FIDO2 is that it makes you resistant to phishing attacks since your key will only offer accounts valid for the URL of the site you're browsing.

FIDO U2F

Also referred to as U2F or CTAP1.

FIDO U2F or FIDO Universal 2nd Factor is a standard that offers a more convenient experience than one-time passwords. Instead of entering a generated code after your username and password, you replace the generated code with a security key (something you have). If the key is protected by a pin (something you know), simply unlock it and touch the pad to have the key cryptographically prove your second factor of authentication to the website.

U2F originally had its own API integrated into web browsers however this has since been deprecated. Support for U2F continues under the WebAuthn API. There was also a branding change from U2F to FIDO U2F, although they’re referred to interchangeably.

FIDO2

Also referred to as CTAP2.

FIDO2 is the FIDO Alliance’s extension of U2F which maintains the strong two-factor authentication offered by U2F while also supporting additional features such as passwordless sign-in. The user experience is similar to the U2F flow, however, FIDO2 takes things one step further by allowing a security key to replace your username and password at login!

Passwordless sign-in is usually implemented with an alternative sign-in button somewhere on the login page. Instead of entering your username and password, click the security key link and you'll be prompted to enter the pin and touch the top of your security key. Once this is done you'll get a list of valid accounts for the website you're currently browsing.

PIV

PIV or Personal Identity Verification is certificate-based authentication. It's the least relevant to the consumer web and has its strength in computer and corporate authentication. That said as a consumer it can be used to enhance the security of your computer logon so I kept it in here.

PIV is stereotypically what you would see in a movie from characters that are working for a government agency. They wear ID badges around their necks that can also be used to log in to their computers. It works by storing a certificate on the swipe pass and protecting it with a PIN code. The swipe pass is something you have and the PIN is something you know. This means someone would need to steal your physical security key and know the accompanying pin to unlock it before they get access to your account. Also, since the pin is tied to the security key, there's less chance of someone stealing your password by looking over your shoulder as you enter it.

Risks

Get ready for this sentence... Security keys keep their secrets secret by only allowing secrets to be saved and never copied! When a security key generates a one-time password or interacts with a website using FIDO2, the underlying secret used to generate the output is never revealed. "Great, I'm protected!" I hear you say, yes but now you can't back up your security key. Solution? Buy two! No, seriously.

If you buy two, then you can register each key individually so if you lose or break one, you can still log in to your account and remove it. Ideally, this prevents you from relying on less secure authentication methods as a backup that might be exploited.

When you're shopping around, don't forget to check the feature set of each security key! Not all brands and models have the same functionality. For example, at the time of writing the "Security Key" line of Yubikeys support U2F and FIDO2 but not OTP. This may be a problem if you're looking for coverage of some accounts that offer OTP but not U2F or FIDO2. Other considerations include supported connectivity (NFC is useful to authenticate mobile apps on the go) and the number of secrets each key can hold (this limit may differ for each function).

All in all, security keys are pretty cool tech. That said, they're not cheap so make sure you've done your research before locking in a purchase.

Stay secure when using GitHub

Luke Faccin — Mon, 02 Jan 2023 08:34:26 GMT

There's no shortage of options for interacting with GitHub. This is a summary to help keep you secure when connecting all the things. I assume you've already got 2FA enabled, if not, do that first!

SSH Keys

SSH keys are cryptographic keys that can act on behalf of your account. They don't respect 2FA so be careful with them. The only exception to their reach is SSO-enabled organisations which may require keys to be individually authorised before accessing org resources.

SSH keys can also be used to sign your commits. This allows others to cryptographically verify that you were the individual who made a given commit.

Since SSH keys can act as your cryptographically verifiable identity and access resources on your behalf, you'll want to protect them as best you can.

You can protect SSH keys by adding a passphrase when generated. Alternatively, you can use a third-party SSH agent, like 1Password's SSH Agent to securely store and access your keys.

Personal Access Token (PAT)

Similarly to SSH keys, Personal Access Tokens are authorised to act on your behalf (without 2FA). They're often used programmatically and offer an additional layer of control, letting you define the scope of access for each token. There are two types of PAT, Classic and Fine-grained.

Treat PATs like you would any other secret. Keep them out of plain text files and avoid committing them to repositories.

Classic Tokens

Classic tokens allow you to filter by action but not resource. This means a token can apply its permission set against any resource to which your account has access. Note: SSO organisations may require authorising keys before access to resources is permitted.

Fine-grained Tokens

Fine-grained tokens are an evolution of the classic token. They acknowledge and correct the inability to scope permissions by resource. In the screenshot below you can see we're able to filter by actions but also repository access.

Applications

As of writing, there are three methods by which applications can interact with GitHub on your behalf:

Installed GitHub Apps
Authorised GitHub Apps
Authorized OAuth Apps

As you try various tools and products you may find that you accumulate quite a number of apps. Taking a moment to review and remove any unused apps is worthwhile.

Installed GitHub Apps

Installed GitHub Apps are those which you've installed directly to your GitHub account, a Github Org or a repository. This provides a Github application access to the resources you define based on the permission set it requests. The application can then interact with the resources directly.

Authorised GitHub Apps

Authorised GitHub Apps are Github Apps that you've approved to act on your behalf. They can do so when all of the following conditions are true:

The GitHub App must be installed on the appropriate organisation or repository.
Your GitHub account has access to the resources.
The GitHub App requested the right permissions.

Authorised OAuth Apps

Authorised OAuth Apps are OAuth Apps that you've approved to act on your behalf. They can access any resource on behalf of your account except for organisations requiring prior admin approval.

Review 2 - 100 Days of Home Lab

Luke Faccin — Sun, 28 Aug 2022 01:43:00 GMT

It's been a little while since the last update so it's time for a check-in, some reflection and a tactical change in direction.

Looking back

Day 8 - 21

Energy was generally high during this period and I learned a lot. I explored a few IAC options for managing network devices and decided to stick with Ansible and continue building out the playbook. I moved quickly through the network config but slowed as I hit the automated deployment of my self-hosted Github Actions runner. Getting this setup allows the entire environment to be deployed from a local device on the network after which the runner takes care of future runs going forward. Achieving this required making the distinction between a first run and future runs of the Ansible Playbook. After some troubleshooting, I realised this was caused by some particularly heavy containers on my docker host chewing up system resources.

Day 22 -33

Returning to the original goal (home automation stability and recoverability) I started focusing on what Home Assistant and Zigbee could offer. I discovered that some Zigbee lights can be bound to two devices simultaneously. In my case, this would be Home Assistant and a physical remote. This means that even if Home Assistant is offline, the lights can continue to be controlled via a remote.

I also spent some of this time tidying and exploring my Home Assistant config. I tried a few variations of circadian rhythm lighting and noticed I'm approaching the device limit of my Zigbee coordinator. Luckily Home Assistant is coming out with a Zigbee + Matter/Thread compatible coordinator. Holding my breath for this one!

What's next?

Reflecting on the experience so far, #100DaysOfHomeLab has been a great opportunity to learn Ansible, deep dive into tech such as Zigbee and rethink how to consider availability in my automated home. That said, motivation is fading.

When I think about what motivates me to pursue something like #100DaysOfHomeLab it's usually (1) a learning opportunity and/or (2) an outcome. Starting out, both of these were solid ticks but as I progressed, the learning opportunity reached diminishing returns and then I solved for the availability problem in the most extreme way possible. I don't need Home Assistant to be online at all! This diluted some of the benefits I would gain from fast redeployment via gitops-ing all the things and with that my motivation to continue was zapped.

With 67 days left, there are still plenty of opportunities to learn. I think it's time to throw some attention at a coding challenge in the open-source community. More to come soon!

Review 1 - 100 Days of Home Lab

Luke Faccin — Mon, 27 Jun 2022 09:21:53 GMT

One week into 100 Days of Homelab and the experience has been great! I dropped two days this week due to personal commitments but made more progress than I expected despite the missing days. Going in, I was convinced that one hour per day wouldn’t be enough to see substantial progress over a week however the time constraint has been more impactful than I expected. The time pressure of a one-hour session requires that it’s properly scoped and crystal clear in intent. As a result, I’m focusing on smaller more manageable objectives which offer regular boosts of motivation to keep going.

Day 1

Day one was an effort of exploration. A plan was set to GitOps all my home tech. Starting with container technologies I thought this might be a good opportunity to get hands-on with Kubernetes (K8s). I spent some time researching K8s options and trying to get a better sense of whether it might be viable on a Raspberry Pi. Some initial research lead me to believe I’d have some luck on a Pi 4 but couldn’t find much on a Pi 3.

Hardware limitations were something to consider as I’ve got both a Pi 3 and Pi 4 (the latter of which is much more powerful) however K8s require three devices for high availability. In light of supply chain shortages, getting my hands on another Pi 4 was going to be a challenge.

Knowledge nuggets from the K8s exploration:

K8s is made up of worker nodes that run containers meanwhile control plane components manage and coordinate the cluster. Both can run on a host simultaneously which means I could have two Pi 4’s with both control plane and nodes plus a Pi 3 running just the control plane components. The Pi 3 would then act as the tie breaker if one of the Pi 4s went offline to signal to the remaining Pi 4 that it should take over node operations.
K8s requires a database to keep track of everything. Luckily it supports the distributed etcd as a backing store. This would avoid the need to run a database and eliminate the resulting high availability concerns.
There are quite a few flavours of K8s, some interesting lightweight alternatives include K3s and microK8s

The question that remained open at the end of this session was how to maintain a highly available proxy server for the Control Plane API and another highly available proxy for exposing services running on the nodes. In light of time constraints, I thought it best to shelve these questions and investigate viability.

Day 2

Despite supply chain issues being a pressing factor for selecting K8s I was still curious about the potential of running the control plane components on a Pi 3. I narrowed the flavours of K8s down to two options, K3S and MicroK8s. Both had surprisingly high idle CPU utilisation with no containers running, as an example, MicroK8s sat around ~30%. Although the Pi 3 is much less powerful than the Pi 4 I was surprised coming from Docker which is almost unnoticeable with no containers running. That said, it does make sense since you’re not running control plane services with Docker.

Ultimately the high CPU utilisation on the Pi3 had me doubting whether it could keep up as demand on the control plane increases over time. This was the decider to shelve K8s for another time.

Day 3 - 7

With K8s out of the equation, I returned to none other than Docker! The revised goal is two docker hosts with mirrored configurations that sync data nightly. Failover would be manual but the Pi 3 could be reintroduced to run critical containers while optional containers are left stopped during a failover event.

My current solution for deploying a new docker host is a janky shell script that requires manual intervention. Surely this experience could be faster and better. Thankfully the Raspberry Pi Imager enables the configuration of some basic host options when imaging the SD card. This eliminates some of the clumsy initial setup and configures the Pi into a state where Ansible can take over.

The Raspberry Pi Imager is used to configure:

Hostname
Default user (inc SSH key)
Network
SSH
Locale

Ansible is then used to:

Install Docker
Add scripts and docker-compose file
Schedule scripts with cron
Run docker-compose to bring the containers online

The biggest win from all of this is the ability to deploy a new Pi host running Docker without ever opening a shell to the host. This feels like it might be as low touch as it gets without having pre-existing infrastructure like an imaging server on the network.

Overall I’m excited with how the first week turned out and energised for the next!

Day 1 - 100 Days of Home Lab

Luke Faccin — Sun, 19 Jun 2022 04:07:38 GMT

It's been quite some time since my last post and as it turns out, the convenience of having a Ghost instance doesn't eliminate the effort required to start writing. As a wise co-worker recently reminded me "It's about impact, not efficiency". Reflecting on those words in the context of this blog I realise its inception was the perfect example of optimising under the belief that efficiency will eliminate the work. Unsurprisingly, a perfect production line still needs materials going through it to produce a product.

A few days ago I was listening to a podcast called Self-Hosted with a special guest TechoTim who recently started #100DaysOfHomeLab . Similar to the popular #100DaysOfCode, the intention is to set a technical goal and spend 1 hour per day for 100 days learning and building. Sharing progress via the hashtag on Twitter helps to add some nice elements of accountability and community.

Although I'm not convinced I'll get a perfectly uninterrupted 100-day run, I'm hopeful the exercise will still be valuable. For the sake of practicality, I'll bundle multiple days into regular summary posts and provide shorter daily updates via Twitter.

So what to focus on?

The podcast touched on the idea of GitOps. The concept of defining and configuring your infrastructure and applications via code. The benefit is that your environment is version controlled via Git and therefore easily reproducible. It's common to see this at tech companies, particularly those that rely heavily on public cloud providers (AWS, Azure, GCP) since most resources are just an API call away. It got me wondering how difficult it would be to reproduce ~~on-prem~~ at-home. Since there's no underlying API to call for new resources how much initial setup is required before the automation can take over? More importantly, how small can that initial setup get?

Why bother?

It's a good question and the driving motivator is the availability of Home Assistant. Although my usage is currently limited to a few lights and switches I expect dependency to increase over time. Also, if you've ever had outages of your home automation solution you'll know that an unreliable solution is much worse than no solution. With this in mind, increasing resiliency and shortening rebuild/recovery time is pretty important.

Self-Hosted Blog - Part 2

Luke Faccin — Sun, 28 Feb 2021 05:50:00 GMT

lukefaccin/ghost

Terraform template to deploy ghost on AWS with Cloudflare. - lukefaccin/ghost

GitHublukefaccin

Welcome back! In this part of our Self-Hosted Blog series, we'll build out the core network and server infrastructure using Terraform. Since I've commented the code I won't delve into every line but instead highlight some more interesting bits of the deployment.

Terraform

In its simplest form, a Terraform template can be written as a single file of resource declarations. This may be fine if you don't plan to reuse elements of your template but what if you decide to later? Having a monolithic file can be inefficient because you're forced to tear out resources and rework them into a new project. To make the code more modular we can use Terraform Modules. Terraform Modules will allow us to write mini templates that expose configuration options by using variables within the module. This results in each module becoming a self-contained block of infrastructure that can be cleanly integrated into another stack by simply defining a few parameters.

To help put this into context, let's take a look at the files and folder structure:
main.tf - Stores our configuration definitions.
variables.tf - Stores our variables.
outputs.tf - Allows us to export values from a module for use elsewhere in the project.

Folder Structure:

As you can see below, we have a hierarchy of the same types of files.

- main.tf
- variables.tf
- modules/
	- network/
		- main.tf
		- variables.tf
		- outputs.tf
	- server/
		- main.tf
		- variables.tf
		- outputs.tf

The 'main.tf' file in the root directory will orchestrate the modules below it by defining which modules to include and which configuration values are to be passed into each module. Modules can pass outputs up to the parent main.tf for use elsewhere thought the template. A main.tf file will look like the below when it's used to orchestrate other modules:

provider "aws" {
  profile = var.aws_profile
  region  = var.aws_region
}

module network{
  source                = "./modules/network"
  tag_terraform-stackid = var.tag_terraform-stackid
  vpc_cidr              = var.vpc_cidr
  subnet_cidr           = var.subnet_cidr
  your_public_ip        = var.your_public_ip
}

Depending on our project we may find that some variables are common throughout multiple modules. To simplify this and avoid typo's we can centralise these values passed into the modules by setting them as variables in the root 'variables.tf' file. This will provide one place to populate all variables for the project. We achieve this by setting the 'default' value for a given variable. A variables file looks like this:

variable "aws_profile" {
  type = string
  description = "The name of the AWS Profile to use on your machine:"
  default = "default"
}

variable "aws_region" {
  type = string
  description = "AWS Region (e.g. us-east-2, ap-southeast-2):"
  default = "us-east-2"
}

variable "tag_terraform-stackid" {
  type = string
  description = "The AWS resource tag 'Stack ID' applied to resources in this deployment:"
  default = "blog"
}
variable "vpc_cidr" {
  type = string
  description = "Network block for your VPC in CIDR format (X.X.X.X/XX):"
  default = "10.10.0.0/16"
}
variable "subnet_cidr" {
  type = string
  description = "Network block for the subnet within the VPC in CIDR format (X.X.X.X/XX):"
  default = "10.10.1.0/24"
}

Network

Location: /modules/network/main.tf
Before we define a server we need to create a network for it to communicate over. Since this is a pretty basic build we'll just need the essentials.

VPC: A network
Subnet: A smaller slice of the network.
Route Table: Instructions on how traffic should flow in and out of the network.
Internet Gateway: Connection to the internet.
Security Group: Effectively AWS level firewall rules attached to the server.

If we had requirements for increased availability we could also throw in:

Additional Subnets in other Availability Zones.
Auto-Scaling Group to automatically create and remove servers as load increases or decreases.
Load Balancer to manage the connections to the group of servers in the autoscaling group.

Server

Location: /modules/server/main.tf
We'll define our server in a module the same way we did the network so we can reuse it later. Here are a few highlights from the server module.

Dynamically AMI Retreival:

When building an EC2 instance, you need to select an Amazon Machine Image (AMI). This tells AWS what type of virtual machine you want (Windows, Linux, etc) and which version it should be running (Ubuntu 18.04, 20.04, etc). As you change AWS regions this AMI ID will change. So Ubuntu 20.04 in Sydney won't necessarily have the same ID as Ubuntu 20.04 in North Virginia even though they're the same operating system. Since I don't know where you're going to deploy this stack, the below resource will detect the region you're running the template in and find the appropriate AMI ID automagically!

Warning: You may want to consider hard coding this after initial deployment. Since the latest available AMI will change over time, if you modify the Terraform template when a newer version is available, Terraform will see the existing instance's AMI doesn't match and suggest to replace the instance with a new one based on the latest AMI. You can avoid this by taking note of the AMI used to run the initial deployment and hard code it into your template. If you didn't take note of the initial deployment, running a 'terraform plan' will show you the expected changeset including the old and new AMI ID. You can copy the old AMI into your template and re-run 'terraform plan' to verify the changeset before running 'terraform apply' to avoid complications.

data "aws_ami" "ubuntu" {
  most_recent = true
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
  owners = ["099720109477"] # Canonical
}

SSH Key-Pair Injection

Instead of using passwords to protect the admin account on our server we'll stick with best practice and use ssh keys. If you don't already have an ssh key pair you'll need to generate one. If you already have one, just drop the text from your public key in the allocated variable of the root 'variable.tf' file as a string. This will inject it into the EC2 instance when it's built.

Cloud-Init

Awesome, we have a server... but it's not doing anything. Most cloud providers and major operating systems support Cloud-Init which will run whatever code you want on the first system startup. This means we can automate the install of Ghost when our server first starts up. Thankfully, Ghost has a comprehensive CLI to deploy the entire product including free Let's Encrypt SSL certificates which we'll use for backend encryption between Cloudflare and the server.

Just a word on database security, if you're particularly cautious you can change your 'root' database password after you've deployed the stack. The database password in our 'variable.tf' file that gets populated into the Ghost CLI is only temporarily used by Ghost to generate its own set of credentials so changing the 'root' database password after the fact won't break anything.

locals {
  userDataScript = <

Also for those of us who struggle to deal with code that's not properly tabbed inline, sorry but you'll have to close your eyes for this one. If you tab the cloud-init script inline with the rest of the code the first few characters of the file will be spaces and the system won't interpret it appropriately.

`Up Next`

So that's our core network and server deployed. Next, we'll build out Cloudflare and deploy some automation!



Self-Hosted Blog - Part 1
Luke Faccin — Sun, 14 Feb 2021 03:24:06 GMT
lukefaccin/ghost
Terraform template to deploy ghost on AWS with Cloudflare. - lukefaccin/ghost
GitHublukefaccin
In this multi-part series, I'll run through my process for architecting and deploying a self-hosted blog. When starting this journey I asked myself whether I wanted to do Ops or write posts. The thought being if I get caught up with maintaining the systems to run the blog, how much time will be left over to write? I went back and forth a few times. On one hand, rolling my own meant having to secure and maintain the infrastructure while a hosted solution would allow me to start writing immediately without any prior groundwork. Pricing for the platform I chose isn't exactly cheap for a passion project with no revenue but the creators did open source it to allow self-hosting. Then there's the elephant in the room, content. What will I write about? After weighing this all up I decided on what will hopefully be a silver bullet. I'll self-host an off the shelf blogging platform in a public cloud with additional layers of security where possible and automate the entire process so someone else in my position doesn't have to bother! To top it all off, this can be my first few posts. So here we are and the best bit, if I or anyone else attempting this later decide to move to the hosted version it's a simple export and reimport.
Architecture
As a starting point for this project let's outline some high-level objectives that we want to meet. This will help us stay consistent and on track throughout the implementation process.
Requirements:
Blogging platform (no statically generated sites or cumbersome upload process)
Minimal ongoing maintenance
Cost-effective
Reasonably secure
Best effort availability
Fully automated deployment
Blogging Platform
TL;DR: Ghost
When it came to the blogging platform, I wanted something pleasant to use and opinionated. The idea being, if it's opinionated someone has already thought through a workflow. Highly flexible products are great for complex use cases but for this, I just want to jump in and write without distractions. WordPress and Ghost were the first two options that came to mind. For me, the writing experience in Ghost seemed more polished and less distracting, plus, seeing various complaints online about plugin conflicts and upgrade compatibility with WordPress gave me the type of Ops headache I knew I didn't want to support. Don't get me wrong, WordPress is a great product. It runs 39% of the top 10 million websites according to Wikipedia but for my narrow use case I just like Ghost better. No hard feelings WordPress!
Automation - Infrastructure as Code (IaC)
TL;DR: Terraform
Infrastructure as Code (IaC) lets you define servers, networks and other supporting infrastructure and services in a config file. This file tells a provider (usually a cloud provider like AWS or Azure) what you want to be built and how it should be configured. This is super powerful because when written properly it allows you to deploy your application/solution very quickly and in a repeatable fashion.
For example, say you’ve got an application and a customer who wants an isolated deployment. Easy done, just run the template to deploy your application again!
I’ve also found IaC great for helping create new projects faster since you can pre-write base configurations. If every stack you build has a network, server and firewall why should you write that every time? Just take a copy of an old template and update it to fit your new use case. This avoids you having to start from scratch each time.
IaC tools are available from several providers, some tools are vendor-specific such as AWS’s CloudFormation while others are open source and vendor agnostic such as HashiCorp’s Terraform. The benefit of vendor created tools like CloudFormation is that they’re natively supported by the company (in this case AWS) and therefore generally support new features quickly. On the other hand, vendor-agnostic tools support a wide array of vendors giving you the flexibility to choose tools and services from different companies or projects.
Terraform will be the IaC solution to champion the automation and repeatability requirements of our project. The choice of Terraform was a pretty simple one. It's open-source, free and supports a wide range of providers so we can pick and choose the products that deliver the best 'bang for buck' without impacting our ability to offer a fully automated solution.
Cloud Provider
TL;DR: AWS
You've probably noticed there's no shortage of cloud providers on the market. There's AWS, Azure, GCP, Linode, Digital Ocean just to name a few. I've decided to go with Amazon Web Services. I could list several reasons why I chose AWS but in reality, it was the platform I was most comfortable with when building the solution.
CDN, WAF, DNS
TL;DR: CloudFlare
This probably comes as no surprise to tech enthusiasts. Cloudflare is the provider of choice for our Content Delivery Network (CDN), Web Application Firewall (WAF) and Domain Name System (DNS) services. I have no idea how they manage to offer this many services for free but Kudo's to the team at Cloudflare. You guys truly are exceptional!
Content Delivery Network (CDN)
A CDN service is a bunch of servers located all over the world that keep a copy of your website as close as possible to your users so they can load your website quicker. When someone accesses your website for the first time the CDN will keep a copy of your website so anyone else in the area asking for the site doesn't need to go back to your server to get it. This not only increases performance but can also save you money by minimising the network traffic to your server because the CDN is sending it for you instead!
https://www.cloudflare.com/en-gb/learning/cdn/what-is-a-cdn/
Web Application Firewall (WAF)
A WAF is a firewall built specifically for handling web requests. This will allow us to set up specific rules around how web traffic to our server is handled. We will use this predominately for securing the administrative pages of our blog by IP address.
WAF's can also protect against DDoS attacks. This is when a large number of computers send a large number of requests in an attempt to bring a server offline. Luckily for us, CloudFlare will act as the entry point for requests to our web server. Since they have such a huge capacity they can send the traffic off somewhere else and prevent our server from being overwhelmed.
https://www.cloudflare.com/en-au/learning/ddos/glossary/web-application-firewall-waf/
Domain Name System (DNS)
DNS resolves web names like "www.lukefaccin.com" to addresses that computers understand like "1.1.1.1". This lets your computer know where it needs to go to get the website you're asking to load.
https://www.cloudflare.com/en-au/learning/dns/what-is-dns/
Putting it all together
So now that we've picked our tech stack let's put it together.
We'll start out by creating an EC2 Instance, wrapping it in a network (Security Group, Subnet, VPC, etc) and then building out supporting services (Cloudflare) and automations.
Up Next
So that's our project! In the next post, we'll get our hands dirty with the technical build.