by Kevin Schroeder | 4:39 pm

I started this series back in December.  In fact I wrote 3 or 4 blog posts the day before I took two weeks of vacation.  It’s now approaching the end of the next quarter so I figured I should actually make some progress on this.

The last posting dealt with the concept of storage in the cloud.  In this one we are going to talk about database access.  You have probably heard about document databases.  While RDBMS systems are awesome for when you have related data and need ACID compliance, they are hard to scale.  When I was a consultant I was onsite with a customer who had a large Oracle implementation with some performance issues and had an Oracle consultant there at the same time.  The Oracle consultant was flabbergasted that I could get done in a week whereas their analysis could take several weeks to months.  The nature of a relational database dictates that it will require a LOT of logic, horsepower and consultant dollars to handle larger-scale scalability.

So, accessing data in a scalable environment will generally be easier (possible?) if you use non-relational data.  Well.. not NON relational, just not enforcing those relations in the same way an ACID compliant RDMS would.  So a document database makes a lot of sense and Amazon’s SimpleDB fits the bill nicely.  If you’re on EC2 it really makes the most sense, unless you need immediate consistency of data.  One of the ways you make data access scalable/highly available is by having many, many machines that can provide access to that data.  But that takes time to propagate that data to those machines and, like with the relation database I was talking about earlier, if you need immediate consistency across those nodes you need a lot of logic, horsepower with a bit of luck that you don’t accidentally deadlock the whole thing.  It’s just not worth it.  SimpleDB has what’s called Eventual Consistency”.  In other words, when you update, insert or delete eventually (within 2 seconds according to AWS, I think) the data will be righted.  Most of the time you can stand having data out of date for a little bit.

We will create our configuration just like we did with the storage adapter.

1
2
3
cloud.document_adapter="Zend_Cloud_DocumentService_Adapter_SimpleDb"
cloud.aws_accesskey="XXXXXXXXXXXXXXXXXXX"
cloud.aws_secretkey="XXXXXXXXXXXXXXXXXXX"

And when we want to get our document adapter we do just as we did before

1
2
3
4
5
6
7
8
$config = new \Zend_Config_Ini(__DIR__.'/../config/config.ini');
 
\Zend_Registry::set(
    'DocumentAdapter',
    Zend_Cloud_DocumentService_Factory::getAdapter(
        $config->cloud
    )
);

Now that we have our document adapter in the registry we can work with it.  I used it in two different places.  First, in the job itself so that the job would be able insert the references to the completed images so you can query them later on.  Second, when we query them later on.

The code in the asynchronous job is

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$documentAdapter = \Zend_Registry::get('DocumentAdapter');
$docClass = $documentAdapter->getDocumentClass();
$doc = new $docClass(
  array(
    'filename'    => $fileName,
    'name'        => $this->_name,
    'height'    => $height,
    'width'        => $width,
    'size'        => filesize($tmpfname),
    'date'        => date('c')
  ),
  $this->_sourceId . '_' . $width
);
 
$documentAdapter->insertDocument("images", $doc);

What this does is ask the document adapter for the document class (just in case there are some adapter-specific pieces of functionality) and creates the new document, inserting it into the DB.  When creating a new document object the first parameter of the constructor is a name=>value pair of the data you want to store and the second parameter is the optional primary key for the data.  When you insert the document you need to specify a collection for the document to be inserted into, images, in this case which is followed by the actual document object.

When querying the collection we do so by simply… well… querying the collection.

1
2
3
4
5
$session = new \Zend_Session_Namespace('ProcTask');
$adapter = \Zend_Registry::get('DocumentAdapter');
$query = $adapter->select();
$query->where('name = ?', array($session->name))->from("images");
$results = $adapter->query('images', $query);

Notice a few things.  First we’re not creating our select object directly, we’re asking the adapter for it.  Just like with the document object, the select object may have some adapter-specific logic.  Actually that’s quite likely.  Then you provide your query parameters, which can be done in a prepared statement-like syntax.  Before passing the query object to the adapter, you must provide the collection name to the query object.  Then, to get your data you need to pass in the collection name along with the query.  Why do you need to do that for both the query object and the adapter?  I dunno.  Maybe it’s a bug, or maybe it’s a feature.  I haven’t looked.

Once you have your data you can simply iterate over it and read each member like you would a stdClass object.

1
2
3
4
5
6
foreach ($results as $result):
  echo $result->height;
  echo $result->width;
  echo $result->size;
  echo $result->date;
}

Done

Comments

Matthew Weier O'Phinney

Regarding your question about why you need to provide the collection name to both the adapter and the select object: overlooked during development. Mind creating an issue and/or patch for us? 🙂

Mar 28.2011 | 10:43 am

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.