RailsConf Day 2

Couldn’t keep up with writing these everyday, and attending all the conference festivities so here we are a solid week late just getting down to write the day 2 wrap-up. Oh well.

This was the rainy day, to give context to those of you that were there. Now onward to the talks..

Ruby on Rails: Tasty Burgers – Aaron Patterson (@tenderlove)

This talk was billed as a deep dive into some of the lower level libraries in Rails to see how the pieces all fit together. I was expecting that to mean ActionPack and ActiveModel, but turns out it was more like SQLite, ERb and YAML. That’s cool, but I still hunger for someone with deep knowledge of these pieces to explain core parts of Rails (like ActionPack and ActiveModel).

SQLite

  • writen by Jamis Buck
  • default rails database
  • ex:
    connection.prepare('...')
    connection.execute
  • prepared statements are better because they can be cached
  • AR does not use prepare, it uses connection.execute only

So, how does AR connect to a database?

Roughly:

AR -> Connection Pool -> db adapter

Connection pool is abstraction layer on top of adapters, passes back connection object

SQLite can be easily used outside of Rails too:

connection = SQLite3::Database.new('file.sqlite3')

Twisting SQLite3

See the slides for some SQLite3 hacks.

ERb

  • erb templates get converted to ruby code and stored in memory
  • one cool trick is to see the actual converted code with

    erb.src

    !ruby

    erb = ERB.new(File.read(FILE), nil, nil, ‘@foo’) erb.src

Rack

  • stands for Ruby Webserver interface ACK! (jk)
  • can be used outside of Rails pretty easily
  • see slides for ERb handler written in Rack

YAML

  • The new YAML parser is Psych, the old one is Sync and is unmaintained
  • JSON is a subset of YAML
  • With Psych we can just call to_json to get JSON

Redis, Rails, and Resque – Background Job Bliss by Chris Wanstrath

Redis

Redis is a key value store for real data structures. So you can store objects, hashes, arrays, etc, and they will be of that type when you retrieve them.

How does Redis differ from Memcached? * Redis stores data in memory, writes to disk for persistence * Therefore, the data you intend to store has to all fit in memory * and it’s possible to store real data structures

How does Github use Redis?

  • routing table, used to retrieve routes like users/:repo -> what machine does this user live on
  • When a new repo is created, determine what FS to put new user on
  • nice admin interface for stuff like routing table

Unicorn

As opposed to mongrel/ha proxy getting pushed requests, the unicorn servers pull requests, and load balancing is done via linux. This is key.

Resque

  • Every queue is a list
  • jobs are stored in json data in redis

One cool thing Github has done is to augment the process list output to contain info about the URL that initiated a job to be pushed into the background queue. This is easy to do with

$0
. This allows Github to see the URL’s being served currently via the output of ps -awx.

Resque queues within Github

Every 5 seconds a queue opens a connection to redis, collects messages that represent jobs to run, then closes the connection, this means jobs get processed as soon as they are created instead of polling every 5 seconds which is less ideal.

Handling failures within Resque

Resque can handle failures automatically, Hoptoad out of the box and Redis out of the box.

Scaling to hundreds of millions of requests: what worked and what didn’t – James Golick

James walked us through the trials and tribulations of scaling an app that was averaging 250 req/s. He went through 3 different setups and before finding something that worked.

Hosting

Starting point

  • Engine Yard, mongrel, Nginx, haproxy

First Try

  • Amazon EC2 and MongoDB

Constant need to add more app servers started seeming crazy. Also, the cloud had generate stability issues.

Second Try

  • non-cloud hosting with Softlayer

Softlayer was choses because you don’t have to enter into an 1 year contracts in order to sign up for service, and they have a quick provisioning time for adding new slices (2-4 hrs).

Result: response times dropped from 300-500ms to 98ms.

NoSQL

First Try

  • MongoDB

Chosen because of the ease of use, and it’s just intuitive for developers.

This worked great at first, however, at a certain number of rows (2 million I believe but don’t quote me), MongoDB started slowing down, causing queue times to go up causing the site to go down. Attempts to remedy this included pruning rows from the database, but deleting rows locks the DB, which also crashed the site. Ultimately deleting the entire database was the only solution, and fortunately the data was only activity logging for admin use.

Second Try

  • “friendly” plugin which treats MySql as a key/value store

[At this point I started researching this plugin and missed what the result of its use was, however, I can only assume that it didn't meet all the requirements]

Third Try

  • Cassandra

Cassandra has worked pretty well for them. Main complaints are that reads can be slow, and that running Cassandra with a small amount of nodes can become tedious because of Cassandra’s resource intensive re-balancing process when nodes are added or removed.

Major lesson: most important thing to bring to any engineering prob is knowledge and understanding of your tools, and believing hype and drinking the kool aid will only get you into trouble.

Introduction to Cassandra and CassandraObject – Michael Koziarski

Cassandra has a lot of nice features, but to me seems very restrictive. For example, any query you may want to run on your data, needs to be pre-determined, and then special columns need to be created within Cassandra to handle each of these queries. So, if you want to ever find Users by last name, then you need to have a UsersByLastName column in Cassandra. Think of it as adding indexes in MySQL, and if you don’t have a certain index added then you can’t query on that field. That being said, Cassandra seems very robust and designed for scaling and stability.

CassandraObject

This is the ActiveRecord type of class for Cassandra. It looks nice, and copies ActiveRecord as much as it can, but isn’t as simple of a drop in as something like MongoMapper, purely because of the the limitations of how Cassandra does stuff.

RailsConf Day 1

I’m a bit delirious from late nights and early mornings and lots of code talk. We all went out to Pratt’s Ale House the night before and weren’t our usual chipper selves this morning. Nonetheless, this is the first official day of RailsConf so let’s do it.

Note: I just want to get this stuff out there while it’s fresh in my mind. Once I catch up on everything I will refactor this.

Keynote: DHH

The grand master of ceremonies kicked off the day with an inspiring round of a very opinionated list of the best of the best features in Rails 3. And the winners are:

  • Bundler
  • ActiveRecord queries (chainable and lazily evaluated)
  • routes (nicer API)
  • ActionMailer (total rewrite)

Keynote: Michael Feathers

I hadn’t heard of Michael before, but he seems to be big in the industry. Wrote a book called “Working Effectively with Legacy Code”. He was a very sensible, even-keeled speaker. He spoke rationally, calmly and intelligently about his experiences working on various projects and things he’s learned to utilize and things he’s learned to avoid. One example of the types of wisdom droplets he disseminated:

“Novices start out writing good code, but as they become experts they write complex code as they understand too much.”

Interesting take on an effect I’ve definitely seen. Novices write better (in terms of clarity to others) code than the dude who has been developing on the project for 5+ years.

Check out Michael’s book, and @mfeathers on Twitter.

And Now The Real Presentations Start…

Building an API with Rails

This talk was a panel format composed of @joeferris (Thoughtbot, representing Hoptoad), @bitsweat (37 Signals), @noradio (Twitter), @technoweenie (GitHub) and @nytimes.

I chose this talk mainly based on the members of the panel. It was loosely moderated and had the feel of a political debate. The moderator would give a topic, then the members took turns rambling about their take on how to handle whatever the current topic was. My notes sum up their responses pretty well:

Authentication

Pretty much everyone agrees that oAuth is the way to go here, and all favored oAuth 2.0 over 1.0, though Twitter will continue to support oAuth 1.0 as well as 2.0 for now.

Response Formats

Hoptoad

  • only accepts xml
  • reasons are how write-heavy and data-heavy the API is

NY Times

  • an unordinary use case, but since NY Times has desirable data, there are lots of sites that aggregate that data and do refreshes every night which consists of hitting all parts of the API to download new data.
  • supports xml and json

Twitter

  • their ideal is to only support json
  • fragment caches responses in memcached
  • caches subfragments of fragments, think something like text of a tweet that will never change, but # of retweets does change, so they can cache the text as a subfragment and reload the retweet count more often

Versioning

Twitter

  • versioning is hard
  • main questions are how to split logic within the app and how to alter request params to signify version

37 signals

  • advocates for not needing versioning, calls it over-engineering
  • rely on devs to update their stuff to make things work with new versions

NY Times

  • no versioning, more dev-centric crowd

Scaling

  • varnish for cache invalidation
  • http cacheing – expires headers

Code Separation

  • GitHub uses separate controllers for API
  • also might move API to sinatra
  • Twitter used to have API, mobile and web req’s all in same controller action, but found the different requests required diff logic so code got complex, as conditionals were added for each variation. The conclusion was that it’s ok to have duplication in basic controller logic in order to separate the API out.
  • 37 Signals advocates for golden path approach, responds_with(@resource), and life is really that simple

Security

  • Twitter failed to sanitize string when eval’ing JSON response from API and got XSS’ed. (crazy, what?)
  • authentication vs authorization – 37 Signals made a point of realizing the difference between these two.

Dev Communication

  • good to have a status page
  • keep docs up to date
  • use your API, helps to fine tune documentation
  • build community around API, GitHub

Don’t Repeat Yourself Repeat Others – John Nunemaker

This was the second time I’ve seen a presentation from John, so I knew it would be good. John is energetic and passionate but the best trait about him is his ability to explain fundamental programming concepts in ways that aren’t demeaning to newcomers, but also provide value to more experienced devs. For example, even while teaching us what include and extend do, he gave anecdotes about the stumbling blocks he went through when learning Ruby, which every developer can relate to and possibly even learn new ways to tackle future obstacles. The theme of this talk was something along the lines of: I wrote MongoMapper and learned a ton of stuff about ActiveRecord and even though people thought I was reinventing the wheel it was totally worth it so reinvent the wheel for your own self betterment.

Here’s from my notes.

Some highlights

  • autoload Person, 'path/to/person'
    lazy loader used in Rails and MongoMapper
  • hash1 = {:foo => :bar}
    hash2 = hash1.clone
    hash2[:foo] = "surprise!"
    hash1[:foo]
    => "surprise!"
    

    The reason for this is that clone and dup do a shallow copy so the internals of both hashes point to the same instances.

  • 2 book recommendations “Ruby Design Patterns” and “Refactoring”
  • MongoMapper plugin architecture is cool
  • Plucky, the underlying class that translates MongoMapper finder calls into queries
  • Writing/blogging is super important. It forces you to recollect, rethink, and solidify your understanding of whatever you are writing about

John’s slides [link coming soon]

The Present Future of OAuth – Michael Bleigh

OAuth is mysterious, Michael came up with an analogy that totally made it click for me. Here’s how it works. We’ll use the Twitter API as an example. Say you want to allow users the option to authorize your app to pull in their friends timeline. This requires an authenticated API call to Twitter. Using OAuth the steps are:

  1. User clicks “login” or “authorize” from your app and is redirected to Twitter. With this request are passed along a client id and a client secret that identifies your app to Twitter.
  2. User logs into Twitter and chooses to accept or deny access to your app.
  3. Twitter creates a key card (like a hotel) and gives it to your app.

Why is it a key card? Really it’s a token that you can pass back to Twitter when you want to make authenticated API calls, but you can think of it as a key card because it’s only valid as long as Twitter keeps it around. They can choose to invalidate your card at any time, but while it is active it is your key to gain special access.

OAuth 2.0 > OAuth 1.0

No one should be using OAuth 1.0 any more. The biggest part of OAuth 2.0 is supporting multiple flows that accommodate granting authorization from multiple devices like phones, native apps, xbox 360, etc. There are 6 flows available, some of them are:

  • user agent flow (your app authenticates itself using a client secret, but you don’t require user authorization)
  • device flow (xbox – oauth provider sends back code that user has to then enter into browser)
  • password flow (request user password, send to app, app sends back key card)
  • assertion (you have a certificate)

Michael then walked us through an example of creating an OAuth 2.0 consumer and an OAuth 2.0 provider. Here’s the GitHub repo for that example.

Beyond Git Push Heroku: Battle Stories from Cloud Samurais – Teich (Heroku), Morten Bagai (Heroku)

Heroku is awesome. Use it. For those that don’t know, Heroku makes deploying your app as easy as:

git push heroku master

Here are some cool apps on Heroku: * scvngr – came out of nowhere, grew fast * Syphir – define more complex filters for gmail, nicer UI, delayed mail (awesome) * CloudApp – makes file sharing easy, take screenshot, app uploads to s3 in background and creates short url, paste url to your friend

Heroku add-ons

Heroku will be launching a way for developers to build add-ons. Some examples of ones currently available are Memcached and MongoDB.

Cloud Keiretsu FTW (small independent parts working together).

ZOMG: Domain-driven Test-assisted Production Rails Crisis Interventions – Rick Bradley

This talk was all about the steps to take when being contracted onto a project that is in a “rescue mission” state * start by doing light fixes, ui bugs and whatnot, easing your way in * always commit something cleaner than how you found it * don’t dig deeper than absolutely necessary * acts_as_ass (avoid “the guru”, funny story, i’ll tell you about it later)

The rinse and repeat process is: Reason the code, characterize the code (via code comments) and refactor the code. There are no giant “fix everything” style commits, it’s a process of characterizing the code bit by bit until you understand it and then refactoring.

When coming into an untested project that you know will be refactored, they drop in an rspec extension to add a currently method to specs, so you know what functionality will be disappearing.

currently('it does this crazy thing') do
  CrazyThing.crazy?.should be_true
end

Rick’s slides

Closing Keynote

Yehuda gave a talk about how to contribute to Rails and the reasons people shy away from it. It can be as simple as removing warnings. The #6 top committer to Rails did nothing but focus on removing the warnings that Rails was generating.

2010 Ruby Heroes are:

  • Jose Valim – agnostic generators, responds_with
  • Nick Quaranto – RubyGems.org
  • Xavier Noria
  • Aaron Patterson – nokogiri
  • Wayne Seguin – RVM
  • Gregory Brown – prawn, ruby mendicant

BOF Talks

Finally, I attended talks on Spree, Restfulie and an ActionPack deep dive lead by Yehuda. No time to write about these now as I have to head back to the convention center.

Exclude Pages by Slug in WordPress get_pages

The typical way to exclude pages when using get_pages is like this:

get_pages(array(
  "exclude" => "1,2,3"
));

Passing in comma-separated list of page id’s that you wish to exclude is easy for a machine to handle, but makes no sense for people. Also, page ID’s can get changed when migrating the site from your local dev set up onto the production server, and as a client starts editing content and creating/deleting pages. For a million reasons, this approach is flaky at best.

After lots of looking for a way to hack this in using WordPress hooks, I found none. The issue is that the get_pages filter only passes along the resulting array of pages, but there is no way to access the initial query parameters. There is also a way that requires hacking wordpress core files. Anyway, here’s how I ended up doing it:

Add this function to your functions.php file in your theme directory:

function get_ids_from_slugs($slugs){
  $slugs = preg_split("/,\s?/", $slugs);
  $ids = array();
  foreach($slugs as $page_slug){
    $page = get_page_by_path($page_slug);
    array_push($ids, $page->ID);
  }
  return implode(",", $ids);
}

Then call it when you call get_pages with an exclude option:

get_pages(array(
  "exclude" => get_ids_from_slugs("home, hidden-page")
));

It’s not the most elegant solution, but it won’t get stomped if the client updates wordpress, and it doesn’t rely on any obscure facets of wordpress that may change and break its functionality, so it works for me.

Using named_scope with finder_sql

Named scopes and associations using finder_sql don’t mix. And there really is no solution to that problem. Rails has no way to inject conditions into raw SQL that may or may not follow Active Record conventions. But, the need may come up that you want to return a set of records from an association driven by finder_sql, but also have the freedom to chain that result set with other named_scopes in your app. Here’s how I do it:

Let’s pretend that this query needs to be expressed in SQL:

class Owner < ActiveRecord::Base
  has_many :things, :finder_sql=>'SELECT things.id FROM things where (things.owner_id = #{id}) '
end

So this will return a collection of ActiveRecord Thing objects as an array, but you can’t call any other named_scopes on that array. Here’s the way around that:

class ThingsController < ApplicationController

def index @things = Thing.scoped({:conditions => {:id => @owner.things.map(&:id)}}) end end

So, Thing.scoped will return a named_scope so you can then take advantage of both a custom association driven by finder_sql and still utilize named_scopes. This solution isn’t perfect, but it just may get you out of a jam sometime.

Useful Regular Expressions for Ruby

I intend to make this post a running tally of useful regex’s that I use, but for now I just have one.

This will extract a URL form a string. I’m using it to tinyurl URL’s in a twitter post that are posted from the Firebelly CMS.

/(http|https):\/\/([a-z0-9]+)[-.]?([a-z0-9]+).[a-z]{2,5}(([0-9]{1,5})?[\/\?][[\S]]?)?/ix

Installing ffmpeg on Ubuntu Hardy

For the most part I followed this guide, but I kept getting this error:

ffmpeg: error while loading shared libraries: libavfilter.so.0: cannot open shared object file: No such file or directory
Turns out that the answer was also in a comment on that blog post
ldconfig
Then test the install again with ffmpeg -v and it should work. Feel free to e-mail me if you need help figuring this out as I just wasted 3 hours of my life on it :)

Parse Query String Into Associative Array With JavaScript

So, there are lots of functions out there to do this, but this one is my favorite for a couple of reasons.

  1. It parses the entire query string as soon as you call the function, whereas other functions require passing in a parameter which the value for the parameter is then extracted from the query string
  2. It parses the query string into an associative array so key value pairs are easy to access later on.

Read More »

So You’ve Got A New Computer…

Here’s a quick checklist of the steps I just took to get all my web dev stuff set up on my new MacBook with Leopard. I do mostly Ruby on Rails work with some PHP stuff thrown in there, so we’ll go through installing MySQL, PHP, Rails, Git and Passenger.

Read More »

I like to ride bikes

commuter

This is what I ride to work everyday.

Introduction

I’ve been developing websites for about 2 years, which makes me a relative newcomer to this scene. I currently work for Firebelly Design in Chicago as the lead developer. You can check us out here: http://www.firebellydesign.com. Read More »