Rethinking Rails API Serialization - Part 1

Posted on 21 Mar 2016

For the last few months I’ve been working on the API for Artifact. I started out the project using ActiveModel::Serializers and chose collection+JSON as the media type. I wanted to give collection+json a try after using it at REST Fest 2015. This series of posts will show not why ActiveModel::Serializers is bad, but bad for what I tried using it for.

AMS + JSON API

ActiveModel::Serializer is heavily geared towards using JSON API. Because of this it feels like I’ve had to bend backwards to get it to do collection+json. You can see what I did in my Collection+JSON with ActiveModel::Serializers post.

I use and abuse the meta key in that post. Based on what I saw in the source code for ActiveModel::Serializer I don’t think I used the key as intended. While I misuse it, it was the only way I could find to get variables into the serializer from the render method.

Serializer Context

Another issue I ran across was trying to get general context inside of a serializer. ActiveModel::Serializer does current_user for you, extending this to include extra methods resulted in a fairly hacky monkey patch.

class ApplicationController < ActionController::Base
  # Extend what the Serializer can see by injecting the
  # current application.
  def _render_with_renderer_json(resource, options)
    options[:current_application] = current_application
    super(resource, options)
  end
end

This is what I added in order to push the current OAuth application inside of the serializer. I found this by browsing the source of ActiveModel::Serializer enough to finally see them do it for current_user. Inside of the serializer this is required:

def current_application
  @instance_options[:current_application]
end

It took a good deal of poking around in order to figure out how to inject this extra context into the serializer.

Collection Serializers

Collection serializers are second citizens in ActiveModel::Serializers. With collection+json, everything is a collection. Every resource I have in Artifact has an item and a collection serializer. While writing the collection serializers you start to notice things that are missing in the ActiveModel::Serializer::ArraySerializer. A lot of the context isn’t present and you see missing method errors more than you would expect.

Implicit Resources

One thing that seems like a nice feature of ActiveModel::Serializers is automatically detecting the serializer class. In practice I rarely use this feature. I have started treating API endpoints as resources and typically create a special serializer for the one endpoint. I found that I don’t create just a BookSerializer and use it everywhere.

For example, in Artifact you can create a collection of books. For this I would create a CollectionBookSerializer because the URLs contained within will be different than just a BookSerializer. I would also need the associated collection serializers.

Given this here is how a typical controller method renders:

def index
  render({
    :json => books,
    :each_serializer => CollectionBookSerializer,
    :serializer => CollectionBooksSerializer,
    :meta => { }, # "..."
  })
end

I personally now prefer explicitly referring to a serializer I want to use on an endpoint.

Next Time

These are all fairly minor pain points, but it has recently gotten me to thinking how I can make it better. In the next post I’ll go over what I came up with.

Rethinking Rails API Serializations Series

Signing Google Cloud Storage URLs

Posted on 29 Jan 2016

For my side project, Worfcam, I use Google Cloud Storage to host files. I have files that I want to be private most of the time, but allow anonymous access when I want. S3 has this by allowing you to sign a URL so it has an expiration. Luck for me Google Cloud Storage does this as well, the only problem was that the ruby SDK didn’t do this for you like the S3 SDK does.

After a lot of searching I finally figured out how to generated signed URLs that expired for Google Cloud Storage.

signer.rb

private_key = "..."
client_email = "..."
bucket = "..."
path = "..."

full_path = "/#{bucket}/#{path}"
expiration = 5.minutes.from_now.to_i

signature_string = [
  "GET",
  "",
  "",
  expiration,
  full_path,
].join("\n")

digest = OpenSSL::Digest::SHA256.new
signer = OpenSSL::PKey::RSA.new(private_key)
signature = Base64.strict_encode64(signer.sign(digest, signature_string))
signature = CGI.escape(signature)

"https://storage.googleapis.com#{full_path}?GoogleAccessId=#{client_email}&Expires=#{expiration}&Signature=#{signature}"

In the end this is pretty simple, but it took a long time to figure out exactly what I needed. This uses the private key that Google lets you download from the console, not the interoperability keys you can also downlown. This is very well explained in a Cloud Storage support page, but it only lists Java, Python, Go, and C# as the languages.

I couldn’t find anything for ruby so hopefully this helps someone else when they try to implement signed urls with ruby. This is also available as a gist.

Redis Cache for Google Cloud Storage Access

Posted on 19 Jan 2016

In my side project, worfcam, I host photos on Google Cloud Storage. When viewing pages that had multiple thumbnails on them I noticed loading photos took a while. Each photo took 400-500ms which isn’t bad but comes with a noticeable flicker of them loading. My application serves photos out, users do not go directly to GCS to load them. Since users are coming to my application, I was able to add a simple caching layer in front of the ruby GCS service.

This is a cache layer above the google-ruby-api-client gem.

`gcs_service_cacher.rb`

class GcsServiceCacher
  SIXTY_MINUTES = 60 * 60

  def initialize(service, redis_pool)
    @service = service
    @redis_pool = redis_pool # This is a ConnectionPool
  end

  def method_missing(method, *args)
    if @service.respond_to?(method)
      @service.send(method, *args)
    else
      super
    end
  end

  def insert_object(bucket, name:, upload_source:, cache: false)
    if cache
      @redis_pool.with do |redis|
        redis.setex("photo-cache:#{name}:binary", SIXTY_MINUTES, upload_source.read)
      end

      upload_source.rewind
    end

    @service.insert_object(bucket, name: name, upload_source: upload_source)
  end

  def get_object(bucket, path, download_dest:, cache: false)
    if cache
      data = @redis_pool.with do |redis|
        redis.get("photo-cache:#{path}:binary")
      end

      return StringIO.new(data) unless data.nil?
    end

    body = @service.get_object(bucket, path, download_dest: download_dest)
    body.rewind

    if cache
      @redis_pool.with do |redis|
        redis.setex("photo-cache:#{path}:binary", SIXTY_MINUTES, body.read)
      end
      body.rewind
    end

    body
  end
end

This is a pretty simple class. The two important methods are #insert_object and #get_obejct, everything else is delegated down to the actual service class via #method_missing. The redis_pool object is a ConnectionPool instance loaded with Redis.

The #insert_method class just sets the upload_source binary data to a redis key and delegates down. I have photos expire after 60 minutes to not clog up redis with old photos.

It should be noted that these two methods keep the same method signature as the service class itself, except for the cache keyword. I needed a way to let objects above this one hint that a file should not be cached. I only wanted to cache small thumbnails in redis to not completely blow away the memory. Thumbnails are the most accessed file as well.

The #get_object class is a little more complicated. If cacheing is enabled, it will check redis to see if the key exists and return that if it does. Otherwise it will pull it down from the service and cache it for quicker access next read.

Final results

With the caching layer in place, I have noticed extremely quick reads compared to loading from GCS each page load. Photo loading dropped to about 100-200ms total. Adding in correct caching headers completely reduces load time as well.

A nice side benefit to this is potentially reducing GCS bandwidth. Worfcam is completely hosted inside of the Google Cloud so I don’t have bandwidth fees between the Compute Engine and Cloud Storage, but if you weren’t hosted inside of Compute Engine then you could read from your caching layer to prevent huge bandwidth bills. Worfcam pulls down each file at least once and rolls every hour into a gif so this would help me out if I ever decide to host elsewhere.

Hosting Your Own Kubernetes NodePort Load Balancer

Posted on 11 Jan 2016

I recently switched from using a regular Loadbalancer in kubernetes to using a NodePort load balancer. I switched because I wanted to get away from using the fairly expensive network load balancer in Google Cloud Compute in a personal project.

I did this by creating a new service for nginx inside of the cluster, and setting it to use a NodePort load balancer. Then creating a micro instance that hosts another nginx that proxies to the private IPs of the cluster nodes.

nginx-service

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  type: NodePort
  ports:
    - port: 80
      nodePort: 30080
      name: http
    - port: 443
      nodePort: 30443
      name: https
  selector:
    name: nginx

The import pieces here are type and the nodePort line inside of ports. This will cause kubernetes to listen on 30080 and 30443 respectively. This port has to be between the range of 30000-32767.

Once the service is defined as a NodePort load balancer you need to set up the micro instance.

/etc/nginx/sites-enabled/example

upstream example-https {
  server 10.0.0.1:30443; # private node ip
  server 10.0.0.2:30443; # private node ip
}

server {
  # ... ssl setup ...

  location / {
    # ... proxy_pass configuration ...
    proxy_pass https://example-https;
  }

  # ... other nginx setup ...
}

Here we have a stripped down sample file sites-enabled/example that nginx will load and start serving. I have SSL set up and proxy_pass to the private IP of each kubernetes node. I do keep SSL between this front nginx and the nginx inside of the cluster.

There isn’t anything special about this file so I’ve stripped out the unimportant pieces. You can see a more complete example of my nginx files in the post hosting nginx in docker.

Drawbacks

There are a few drawbacks using this approach. I’ve had at least once the private IP change (I think the nodes rebooted themselves after a crash) and I had to update this frontend nginx. You also have to manually add in an new nodes. The normal load balancer would have handled each case for me.

Overall I’ve been happy with this setup. It was nice to see that kubernetes let me handle my own external load balancing.

Collection+JSON Ruby Client

Posted on 30 Nov 2015

At REST Fest this year I did a small Collection+JSON ruby client to work against a hypermedia server. The full source code is hosted on github.

Client classes

class CollectionJSONMiddleware < Faraday::Middleware
  def initialize(app)
    @app = app
  end

  def call(env)
    env[:request_headers]["Accept"] = "application/vnd.collection+json"
    @app.call(env)
  end
end

A small middleware class to shove the correct accept header in.

class Client
  def initialize(url)
    @url = url
  end

  def get(url)
    response = connection.get(url)
    json = JSON.parse(response.body)
    Collection.new(json)
  end

  def delete(item)
    response = connection.delete(item.href)
    json = JSON.parse(response.body)
    Collection.new(json)
  end

  def update(item, template)
    response = connection.put(item.href) do |req|
      req.headers["Content-Type"] = "application/vnd.collection+json"
      req.body = template.to_json
    end
    json = JSON.parse(response.body)
    Collection.new(json)
  end

  def add_template(collection, template)
    response = connection.post(collection.href) do |req|
      req.headers["Content-Type"] = "application/vnd.collection+json"
      req.body = template.to_json
    end
    json = JSON.parse(response.body)
    Collection.new(json)
  end

  private

  def connection
    @connection ||= Faraday.new(@url) do |conn|
      conn.use CollectionJSONMiddleware
      conn.adapter Faraday.default_adapter
    end
  end
end

Model Classes

These are Collection+JSON classes that are not specific to the underlying data at all.

class Collection
  attr_reader :version, :href, :links, :items, :queries, :template

  def initialize(json)
    collection = json.fetch("collection")

    @version = collection.fetch("version", nil)
    @href = collection.fetch("href", nil)
    @links = collection.fetch("links", []).map { |link| Link.new(link) }
    @items = collection.fetch("items", []).map { |item| Item.new(item) }
    @queries = collection.fetch("queries", []).map do |query|
      Query.new(query)
    end
    template = collection.fetch("template", nil)
    @template = Template.new(template) if template
  end

  def query(rel)
    queries.detect do |query|
      query.rel == rel
    end
  end
end

class Query
  attr_reader :rel, :href, :prompt, :data

  def initialize(attributes)
    @rel = attributes.fetch("rel")
    @href = attributes.fetch("href")
    @prompt = attributes.fetch("prompt")
    @data = attributes.fetch("data", []).map do |data|
      DataItem.new(data)
    end
  end

  def set(data, value)
    @values ||= Faraday::Utils::ParamsHash.new
    @values[data.name] = value
  end

  def to_url
    uri = URI.parse(href)
    uri.query = @values.to_query
    uri.to_s
  end
end

class Template
  attr_reader :prompt, :rel, :data

  def initialize(attributes)
    @prompt = attributes.fetch("prompt", nil)
    @data = attributes.fetch("data", []).map do |data|
      DataItem.new(data)
    end
  end

  def set(data, value)
    @values ||= {}
    @values[data.name] = value
  end

  def to_json
    {
      :template => {
        :data => data.map do |data|
          {
            :name => data.name,
            :value => @values[data.name],
          }
        end
      },
    }.to_json
  end
end

class Link
  def initialize(attributes)
    @attributes = attributes
  end

  def to_s
    "#{rel}: #{href}"
  end

  def [](key)
    @attributes[key]
  end

  def rel
    @attributes.fetch("rel")
  end

  def href
    @attributes.fetch("href")
  end
end

class Item
  attr_reader :href, :data, :links

  def initialize(attributes)
    @href = attributes.fetch("href")
    @data = attributes.fetch("data", []).map do |data|
      DataItem.new(data)
    end
    @links = attributes.fetch("links", []).map { |link| Link.new(link) }
  end

  def attribute(name)
    data.detect { |attribute| attribute.name == name }
  end
end

class DataItem
  def initialize(attributes)
    @attributes = attributes
  end

  def to_s
    "#{name}: #{value}"
  end

  def [](key)
    @attributes[key]
  end

  def name
    @attributes.fetch("name")
  end

  def value
    @attributes.fetch("value", nil)
  end

  def prompt
    @attributes.fetch("prompt", nil)
  end
end

Putting it together

Here is a class that lets you perform general collection+json actions on the command line. It is slightly specific for the app at REST Fest, but there isn’t that much specific.

class CommandLine
  attr_reader :collection

  def run
    @collection ||= client.get("/")

    puts "What do you want to do?"
    puts "  - Refresh Items (refresh)"
    puts "  - Items (items)"
    puts "  - Queries (queries)"
    puts "  - Template '#{collection.template.prompt}' (template)" if collection.template
    puts "  - Delete (delete)"
    puts "  - Edit (edit)"
    puts "  - Exit (exit)"

    choice = gets.chomp
    puts

    case choice
    when "refresh"
      refresh
    when "items"
      print_items(collection.items)
    when "queries"
      queries
    when "template"
      template
    when "delete"
      delete
    when "edit"
      edit
    when "exit"
      return
    end

    run
  end

  def refresh
    @collection = client.get(collection.href)
  end

  def queries
    queries = collection.queries.map do |query|
      "#{query.prompt} (#{query.rel})"
    end
    puts queries

    puts "Choose a query"
    query = gets.chomp

    search_query = collection.query(query)

    unless search_query
      puts "Query not found"
      return
    end

    puts

    if search_query.data.all? { |data| data.value.empty? }
      edit = "yes"
    else
      search_query.data.each do |data|
        puts "#{data.prompt}: #{data.value}"
      end

      puts "Edit? (yes/no)"
      edit = gets.chomp
      puts
    end

    case edit
    when "yes"
      search_query.data.each do |data|
        if data.value.present?
          puts "#{data.prompt} (#{data.value}):"
        else
          puts "#{data.prompt}:"
        end
        search_query.set(data, gets.chomp)
      end
      puts
    when "no"
      search_query.data.each do |data|
        search_query.set(data, data.value)
      end
    end

    collection = client.get(search_query.to_url)

    print_items(collection.items)
  end

  def template
    template = collection.template
    puts template.prompt
    template.data.each do |data|
      if data.value.present?
        puts "#{data.prompt} (#{data.value}):"
      else
        puts "#{data.prompt}:"
      end
      template.set(data, gets.chomp)
    end
    puts

    @collection = client.add_template(collection, template)
  end

  def delete
    item = choose_item
    @collection = client.delete(item)
  end

  def edit
    item = choose_item

    @collection = client.get(item.href)

    item = collection.items.first
    template = collection.template

    item.data.each do |data|
      next unless template.data.any? { |td| td.name == data.name }
      if data.value.present?
        puts "#{data.prompt} (#{data.value}):"
      else
        puts "#{data.prompt}:"
      end
      template.set(data, gets.chomp)
    end

    @collection = client.update(item, template)
  end

  private

  def client
    @client ||= Client.new("http://hyper-hackery.herokuapp.com/")
  end

  def choose_item
    print_items(collection.items, true)

    puts "Choose an item"
    item_number = gets.chomp.to_i

    collection.items[item_number]
  end

  def print_items(items, include_numbers = false)
    puts "Returned items"
    items.each_with_index do |item, index|
      value = item.attribute("completed").value
      finished = value == "true" || value == true ? "(X)" : "( )"
      number = " (#{index})" if include_numbers
      puts "  - #{finished}#{number} #{item.attribute("title").value}"
    end
    puts
  end
end

CommandLine.new.run

Here is a sample run:

ruby perform_search.rb
What do you want to do?
  - Refresh Items (refresh)
  - Items (items)
  - Queries (queries)
  - Template 'Add ToDo' (template)
  - Delete (delete)
  - Edit (edit)
  - Exit (exit)
items

Returned items
  - (X) one more test again
  - ( ) danny boy
  - ( ) fishing
  - ( ) goofing around
  - ( ) one more simple test
  - (X) one more minor test
  - ( ) update these docs
  - ( ) wheee
  - ( ) asdasd
  - ( ) something to test
  - (X) update cj parser
  - ( ) trying to test
  - ( ) one more simple test
  - (X) additional stuff
  - ( ) ssssss
  - (X) more stuff
  - ( ) one more simple test

What do you want to do?
  - Refresh Items (refresh)
  - Items (items)
  - Queries (queries)
  - Template 'Add ToDo' (template)
  - Delete (delete)
  - Edit (edit)
  - Exit (exit)
queries

Active ToDos (active collection)
Completed ToDos (completed collection)
Search ToDos (search)
Choose a query
search

Title:
one

Returned items
  - (X) one more test again
  - ( ) one more simple test
  - (X) one more minor test
  - ( ) one more simple test
  - ( ) one more simple test

What do you want to do?
  - Refresh Items (refresh)
  - Items (items)
  - Queries (queries)
  - Template 'Add ToDo' (template)
  - Delete (delete)
  - Edit (edit)
  - Exit (exit)
exit

Eric Oestrich

Recent Posts