Recent Posts

Foreman Systemd Export

Posted on 12 Feb 2017

I switched my side project to a single Linode instance this weekend in an effort to simplify its setup. In the process I decided to try foreman as a way of keeping the rails server up and running. I was very pleased with what came out of it.

Export scripts

To start you need to export systemd scripts into the /etc/systemd/system folder. Make sure to replace --user deploy with the user that your app should run as. This capistrano task will handle that.

namespace :app do
  desc "Reload systemd"
  task :systemd do
    on roles(:web) do
      within release_path do
        execute :sudo, :foreman, :export, :systemd, "/etc/systemd/system", "--user deploy"
        execute :sudo, :systemctl, "daemon-reload"
      end
    end
  end
end

The first deploy you won't need to daemon-reload, but whenever a systemd file changes the daemon needs to reload to know about the changes.

The export will create files that are named app-web@.service, where web is the name of the Procfile line. In systemd an @ service allows you to pass in parameters to the script. For this case, it will be the PORT environment variable.

Start/Stop/Restart

Once the scripts are in place. You need capistrano tasks that will. These tasks are all essentially the same and look like this:

namespace :app do
  desc "Start web server"
  task :start do
    on roles(:web) do |host|
      within release_path do
        execute :sudo, :systemctl, :start, "app-web@5000.service"
        execute :sudo, :systemctl, :start, "app-worker@5100.service"
      end
    end
  end
end

This will start up a web and worker instance with the default ports that foreman uses. As you add in new Procfile lines, you will need to add a new line here.

Adding to a deploy

Finally you need to insert systemd into the deploy cycle. This is easy by adding the tasks in after the appropriate capistrano task deploy:publishing.

after 'deploy:publishing', 'app:systemd'
after 'app:systemd', 'app:restart'

Enhancements

Capistrano connects into the deploy user and that user has sudo rights. It would be more secure to only allow these specific commands to be sudoable. Add this to your sudoers file to enable specific commands only.

deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart app-web@5000.service
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart app-worker@5100.service

The full config/deploy.rb file

namespace :app do
  desc "Start web server"
  task :start do
    on roles(:web) do |host|
      within release_path do
        execute :sudo, :systemctl, :start, "app-web@5000.service"
        execute :sudo, :systemctl, :start, "app-worker@5001.service"
        execute :sudo, :systemctl, :start, "app-clock@5002.service"
      end
    end
  end

  desc "Stop web server"
  task :stop do
    on roles(:web) do |host|
      within release_path do
        execute :sudo, :systemctl, :stop, "app-web@5000.service"
        execute :sudo, :systemctl, :stop, "app-worker@5001.service"
        execute :sudo, :systemctl, :stop, "app-clock@5002.service"
      end
    end
  end

  desc "Restart web server"
  task :restart do
    on roles(:web) do |host|
      within release_path do
        execute :sudo, :systemctl, :restart, "app-web@5000.service"
        execute :sudo, :systemctl, :restart, "app-worker@5001.service"
        execute :sudo, :systemctl, :restart, "app-clock@5002.service"
      end
    end
  end

  desc "Reload systemd"
  task :systemd do
    on roles(:web) do
      within release_path do
        execute :sudo, :foreman, :export, :systemd, "/etc/systemd/system", "--user deploy"
        execute :sudo, :systemctl, "daemon-reload"
      end
    end
  end
end

after 'deploy:publishing', 'app:systemd'
after 'app:systemd', 'app:restart'

Generating a Guardian Secret Key

Posted on 29 Dec 2016

I've been working on a Phoenix project at work and I added authentication for the API via Guardian. Guardian is easy to use except for setting up the secret key. It took a few tries to finally get a key that worked, and it wasn't easy to find out how.

Generate the key

To start, generate a key in iex -S mix.

jwk_384 = JOSE.JWK.generate_key({:ec, "P-384"})
JOSE.JWK.to_file("password", "file.jwk", jwk_384)

I placed this file in priv/repo/dev.jwk for development purposed. In staging/production I stick the file contents in an environment variable.

Configure Guardian

Once the key is generated, configure the application to load it.

config/config.ex
config :guardian, Guardian,
  allowed_algos: ["ES384"],
  issuer: "MytApp",
  ttl: { 30, :days },
  secret_key: fn ->
    secret_key = MyApp.config(Application.get_env(:my_app, :secret_key))
    secret_key_passphrase = MyApp.config(Application.get_env(:my_app, :secret_key_passphrase))

    {_, jwk} = secret_key_passphrase |> JOSE.JWK.from_binary(secret_key)
    jwk
  end,
  serializer: MyApp.GuardianSerializer
lib/my_app.ex
defmodule MyApp do
  def config({:system, env}), do: System.get_env(env)
  def config(value), do: value
end

I use this set of functions to allow loading config via environment variables.

Configure Environments

config/dev.exs
config :my_app, :secret_key_passphrase, "password"
config :my_app, :secret_key, File.read!("priv/repo/dev.jwk")

Configure the dev environment.

config/prod.exs
config :venu, :secret_key_passphrase, {:system, "SECRET_KEY_PASSPHRASE"}
config :venu, :secret_key, {:system, "SECRET_KEY"}

Conclusion

I hope this helps better explain how to set up Guardian for your Phoenix project.

Elasticsearch Cluster Snapshot & Restore

Posted on 01 Nov 2016

We recently needed to do a cross cluster snapshot and restore for elasticsearch. We were hosted on Elastic Cloud and found out the hard way that the tooling in place there does not work. We did find out that Elasticsearch has a backup repository system in place that works very well. We would have saved ourselves some time if we had just started with this.

Create a user in AWS IAM

First off create a user in AWS IAM that has access to S3. You can attach the full access or limit down to a single bucket as shown in the Elastic Cloud documentation.

{
  "Statement": [
    {
      "Action": [
        "s3:*"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::bucket-name",
        "arn:aws:s3:::bucket-name/*"
      ]
    }
  ]
}

Create the Repository

Create the repository in each cluster. This is also taken from that guide. You also need the repository-s3 plugin installed in each cluster for this to work.

sudo bin/elasticsearch-plugin install repository-s3

curl -X PUT localhost:9200/_snapshot/bucket-name -d '{
  "type": "s3",
  "settings": {
    "bucket": "bucket-name",
    "region": "us-east1",
    "access_key": "AKIAYOURKEYHERE",
    "secret_key": "secret-key",
    "compress": true
  }
}'

Snapshot

On the old cluster, create a snapshot. You can check on the status as it processes.

curl -X PUT localhost:9200/_snapshot/bucket-name/snapshot-backup-name
curl -X GET localhost:9200/_snapshot/bucket-name/snapshot-backup-name/_status

There are a lot of options you can provide to the snapshot, including limiting to certain indices.

Restore

On the new cluster, restore the snapshot. You can easily view the status of the restore with regular elasticsearch monitoring tools. The index health with be shown as shards come online.

curl -X POST localhost:9200/_snapshot/bucket-name/snapshot-backup-name/_restore -d '{
  "indices": "one-index"
}'

Conclusion

Hopefully this is of use to others. Snapshotting and restoring manually is a very simple process and was much easier than trying to figure out a custom solution from your elasticsearch host.

Backup with Duply and Duplicity

Posted on 17 Jun 2016

I recently stopped using Crashplan as my backup service of choice. I still wanted backups so I started looking for alternatives. Duply and duplicity came up as a nice choice. This is how I set it up.

This assumes you already have duplicity and duply installed. My linux of choice is Arch and duplicity was in the repo. duply is an AUR that is easily installed.

GPG

We need to start by generating a GPG key. This will be used to encrypt backups.

gpg --gen-key

After generating the key, make sure to save the key ID.

Issues

I had troubles with pinentry on a GUI-less system. You may need to symlink the pinentry-curses version of pinentry. Found from this forum post.

sudo ln -s /usr/bin/pinentry-curses /usr/bin/pinentry

I also needed to set up GPG to allow gpg password from an ENV variable. This is required with GPG 2.1.

~/.gnupg/gpg.conf

use-agent
pinentry-mode loopback
# ...

~/.gnupg/gpg-agent.conf

allow-loopback-pinentry
# ...

Duply

Start by creating a profile, then edit the configuration.

$ duply eric create

~/.duply/eric/conf

This file sets up basic configuration for duply. The file contains a lot of commented out options. These are the only options I have set right now. I commented what they do inline.

GPG_KEY='...' # your gpg key id, get from `gpg --list-keys`
GPG_PW='...' # your gpg password
GPG_OPTS='--pinentry-mode loopback' # required to use GPG_PW

TARGET='sftp://eric@hostname/backups/hostname' # backing up to another linux machine via SSH
SOURCE='/home/eric/' # grab my home folder

PYTHON="python2" # arch has python 3 default, set for python 2 instead

# run a full backup once a week
MAX_FULLBKP_AGE=1W
DUPL_PARAMS="$DUPL_PARAMS --full-if-older-than $MAX_FULLBKP_AGE "

~/.duply/eric/exclude

I wanted to only include certain folders in my home directory. Duplicity has a nice exclude file format you can use to deal with this. Duply has it baked in. The exclude file is created when creating your profile.

# keep out junk folders
- /home/eric/prog/*/*/log/
- /home/eric/prog/*/*/tmp/

# Include Folders
+ /home/eric/bin
+ /home/eric/Desktop
+ /home/eric/Documents
+ /home/eric/dotfiles
+ /home/eric/Music
+ /home/eric/ownCloud
+ /home/eric/Pictures
+ /home/eric/prog
+ /home/eric/.ssh

# Exclude everything else
- /home/eric/

The important thing to remember is duplicity goes down the list when determining if something is excluded or included. That's why the log and tmp folders are excluded first. Otherwise they would be included by the /home/eric/prog line if it was first.

Automatic backup

Set cron to run the backup script nightly. I use keychain and needed to source the correct file to get SSH working.

crontab -e

@daily . ~/.keychain/$(hostname)-sh && /usr/bin/duply eric backup

Improvements

Sometime in the future I want to switch to Google Cloud Storage instead of SSH. I'm currently only backing up between local machines while I'm testing this out. I would like to have an onsite and a cloud backup.

Reindexing Elasticsearch with Ruby

Posted on 31 May 2016

For a project at work we needed to reindex a large Elasticsearch index and couldn't do it via the _reindex API. The source needed to be processed slightly in the new index. We were reindexing to gain more shards.

This is the rake task that helped reindex. It is based on this Elasticsearch Guide.

Usage

First create the new index via Postman.

POST /new_index
{
  'settings': {
    'index': {
      'number_of_shards': 3
    }
  }
}

Next run the rake task. This will handle the reindex from one to the other via scan/scroll and the bulk API. I ran this on my local machine to not deal with Heroku timeouts. I simply set the correct environment variables to have Chewy pick up the production Elasticsearch. This is risky but I was not worried because we were dealing with only a new index.

bundle exec rake elasticsearch:reindex[old_index,new_index]

This rake task comes with a nice progress bar to track how far along the reindex is.

Once the reindex is complete you can update the alias you use to have production start using the new index.

POST /_aliases
{
  'actions': [
    { 'remove': { 'index': 'old_index', 'alias': 'alias' } },
    { 'add': { 'index': 'new_index', 'alias': 'alias' } }
  ]
}

Rake Task

namespace :elasticsearch do
  desc "Reindex a index"
  task :reindex, [:old_index_name, :new_index_name] => [:environment] do |t, args|
    client = Chewy.client
    results = client.search({
      index: args[:old_index_name],
      scroll: '10m',
      body: {
        "query" => { "match_all" => {} },
        "sort" => ["_doc"],
        "size" => 1000,
      },
    })

    progressbar = ProgressBar.create({
      :title => "Documents (thousands)",
      :total => results["hits"]["total"] / 1000 + 1,
      :format => '%a |%B| %p%% %t %c of %C',
    })

    loop do
      break if results["hits"]["hits"].empty?

      bulk_body = results["hits"]["hits"].map do |result|
      source = result["_source"]

      # process the source

      {
        index: {
          _index: args[:new_index_name],
          _type: result["_type"],
          _id: result["_id"],
          data: source,
        },
      }
      end

      response = client.bulk(body: bulk_body)

      if response["errors"]
        raise "Problem reindexing - #{response.inspect}"
      end

      progressbar.increment

      results = client.scroll({
        scroll: '10m',
        scroll_id: results["_scroll_id"],
      })
    end

    progressbar.finish
  end
end

Improvements

This rake task isn't perfect, but it gets most of the way there. Some future improvements will be better handling around timeouts when talking to Elasticsearch. It would also be good to deal with indexing errors when bulk importing into the new index.

Eric Oestrich
I am:
All posts
Creative Commons License
This site's content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise specified. Code on this site is licensed under the MIT License unless otherwise specified.