Redis Cache for Google Cloud Storage Access

Posted on 19 Jan 2016 by Eric Oestrich

In my side project, worfcam, I host photos on Google Cloud Storage. When viewing pages that had multiple thumbnails on them I noticed loading photos took a while. Each photo took 400-500ms which isn’t bad but comes with a noticeable flicker of them loading. My application serves photos out, users do not go directly to GCS to load them. Since users are coming to my application, I was able to add a simple caching layer in front of the ruby GCS service.

This is a cache layer above the google-ruby-api-client gem.

gcs_service_cacher.rb
class GcsServiceCacher
  SIXTY_MINUTES = 60 * 60

  def initialize(service, redis_pool)
    @service = service
    @redis_pool = redis_pool # This is a ConnectionPool
  end

  def method_missing(method, *args)
    if @service.respond_to?(method)
      @service.send(method, *args)
    else
      super
    end
  end

  def insert_object(bucket, name:, upload_source:, cache: false)
    if cache
      @redis_pool.with do |redis|
        redis.setex("photo-cache:#{name}:binary", SIXTY_MINUTES, upload_source.read)
      end

      upload_source.rewind
    end

    @service.insert_object(bucket, name: name, upload_source: upload_source)
  end

  def get_object(bucket, path, download_dest:, cache: false)
    if cache
      data = @redis_pool.with do |redis|
        redis.get("photo-cache:#{path}:binary")
      end

      return StringIO.new(data) unless data.nil?
    end

    body = @service.get_object(bucket, path, download_dest: download_dest)
    body.rewind

    if cache
      @redis_pool.with do |redis|
        redis.setex("photo-cache:#{path}:binary", SIXTY_MINUTES, body.read)
      end
      body.rewind
    end

    body
  end
end

This is a pretty simple class. The two important methods are #insert_object and #get_obejct, everything else is delegated down to the actual service class via #method_missing. The redis_pool object is a ConnectionPool instance loaded with Redis.

The #insert_method class just sets the upload_source binary data to a redis key and delegates down. I have photos expire after 60 minutes to not clog up redis with old photos.

It should be noted that these two methods keep the same method signature as the service class itself, except for the cache keyword. I needed a way to let objects above this one hint that a file should not be cached. I only wanted to cache small thumbnails in redis to not completely blow away the memory. Thumbnails are the most accessed file as well.

The #get_object class is a little more complicated. If cacheing is enabled, it will check redis to see if the key exists and return that if it does. Otherwise it will pull it down from the service and cache it for quicker access next read.

Final results

With the caching layer in place, I have noticed extremely quick reads compared to loading from GCS each page load. Photo loading dropped to about 100-200ms total. Adding in correct caching headers completely reduces load time as well.

A nice side benefit to this is potentially reducing GCS bandwidth. Worfcam is completely hosted inside of the Google Cloud so I don’t have bandwidth fees between the Compute Engine and Cloud Storage, but if you weren’t hosted inside of Compute Engine then you could read from your caching layer to prevent huge bandwidth bills. Worfcam pulls down each file at least once and rolls every hour into a gif so this would help me out if I ever decide to host elsewhere.

comments powered by Disqus
Creative Commons License
This site's content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise specified. Code on this site is licensed under the MIT License unless otherwise specified.