Couldn’t keep up with writing these everyday, and attending all the conference festivities so here we are a solid week late just getting down to write the day 2 wrap-up. Oh well.
This was the rainy day, to give context to those of you that were there. Now onward to the talks..
Ruby on Rails: Tasty Burgers – Aaron Patterson (@tenderlove)
This talk was billed as a deep dive into some of the lower level libraries in Rails to see how the pieces all fit together. I was expecting that to mean ActionPack and ActiveModel, but turns out it was more like SQLite, ERb and YAML. That’s cool, but I still hunger for someone with deep knowledge of these pieces to explain core parts of Rails (like ActionPack and ActiveModel).
SQLite
- writen by Jamis Buck
- default rails database
- ex:
connection.prepare('...') connection.execute - prepared statements are better because they can be cached
- AR does not use prepare, it uses connection.execute only
So, how does AR connect to a database?
Roughly:
AR -> Connection Pool -> db adapter
Connection pool is abstraction layer on top of adapters, passes back connection object
SQLite can be easily used outside of Rails too:
connection = SQLite3::Database.new('file.sqlite3')
Twisting SQLite3
See the slides for some SQLite3 hacks.
ERb
- erb templates get converted to ruby code and stored in memory
one cool trick is to see the actual converted code with
erb.src
!ruby
erb = ERB.new(File.read(FILE), nil, nil, ‘@foo’) erb.src
Rack
- stands for Ruby Webserver interface ACK! (jk)
- can be used outside of Rails pretty easily
- see slides for ERb handler written in Rack
YAML
- The new YAML parser is Psych, the old one is Sync and is unmaintained
- JSON is a subset of YAML
- With Psych we can just call to_json to get JSON
Redis, Rails, and Resque – Background Job Bliss by Chris Wanstrath
Redis
Redis is a key value store for real data structures. So you can store objects, hashes, arrays, etc, and they will be of that type when you retrieve them.
How does Redis differ from Memcached? * Redis stores data in memory, writes to disk for persistence * Therefore, the data you intend to store has to all fit in memory * and it’s possible to store real data structures
How does Github use Redis?
- routing table, used to retrieve routes like users/:repo -> what machine does this user live on
- When a new repo is created, determine what FS to put new user on
- nice admin interface for stuff like routing table
Unicorn
As opposed to mongrel/ha proxy getting pushed requests, the unicorn servers pull requests, and load balancing is done via linux. This is key.
Resque
- Every queue is a list
- jobs are stored in json data in redis
One cool thing Github has done is to augment the process list output to contain info about the URL that initiated a job to be pushed into the background queue. This is easy to do with
$0. This allows Github to see the URL’s being served currently via the output of ps -awx.
Resque queues within Github
Every 5 seconds a queue opens a connection to redis, collects messages that represent jobs to run, then closes the connection, this means jobs get processed as soon as they are created instead of polling every 5 seconds which is less ideal.
Handling failures within Resque
Resque can handle failures automatically, Hoptoad out of the box and Redis out of the box.
Scaling to hundreds of millions of requests: what worked and what didn’t – James Golick
James walked us through the trials and tribulations of scaling an app that was averaging 250 req/s. He went through 3 different setups and before finding something that worked.
Hosting
Starting point
- Engine Yard, mongrel, Nginx, haproxy
First Try
- Amazon EC2 and MongoDB
Constant need to add more app servers started seeming crazy. Also, the cloud had generate stability issues.
Second Try
- non-cloud hosting with Softlayer
Softlayer was choses because you don’t have to enter into an 1 year contracts in order to sign up for service, and they have a quick provisioning time for adding new slices (2-4 hrs).
Result: response times dropped from 300-500ms to 98ms.
NoSQL
First Try
- MongoDB
Chosen because of the ease of use, and it’s just intuitive for developers.
This worked great at first, however, at a certain number of rows (2 million I believe but don’t quote me), MongoDB started slowing down, causing queue times to go up causing the site to go down. Attempts to remedy this included pruning rows from the database, but deleting rows locks the DB, which also crashed the site. Ultimately deleting the entire database was the only solution, and fortunately the data was only activity logging for admin use.
Second Try
- “friendly” plugin which treats MySql as a key/value store
[At this point I started researching this plugin and missed what the result of its use was, however, I can only assume that it didn't meet all the requirements]
Third Try
- Cassandra
Cassandra has worked pretty well for them. Main complaints are that reads can be slow, and that running Cassandra with a small amount of nodes can become tedious because of Cassandra’s resource intensive re-balancing process when nodes are added or removed.
Major lesson: most important thing to bring to any engineering prob is knowledge and understanding of your tools, and believing hype and drinking the kool aid will only get you into trouble.
Introduction to Cassandra and CassandraObject – Michael Koziarski
Cassandra has a lot of nice features, but to me seems very restrictive. For example, any query you may want to run on your data, needs to be pre-determined, and then special columns need to be created within Cassandra to handle each of these queries. So, if you want to ever find Users by last name, then you need to have a UsersByLastName column in Cassandra. Think of it as adding indexes in MySQL, and if you don’t have a certain index added then you can’t query on that field. That being said, Cassandra seems very robust and designed for scaling and stability.
CassandraObject
This is the ActiveRecord type of class for Cassandra. It looks nice, and copies ActiveRecord as much as it can, but isn’t as simple of a drop in as something like MongoMapper, purely because of the the limitations of how Cassandra does stuff.

