Merge pull request #154 from danhunsaker/readme/howitworks

Created HOWITWORKS; cleanup of README
2025-07-21 15:09:17 +00:00 · 2013-12-13 02:27:28 -08:00 · 2013-12-13 02:27:28 -08:00 · 610c4dcdbf
commit 610c4dcdbf
parent 02809d6632 3a62e47f16
2 changed files with 243 additions and 43 deletions
--- a/HOWITWORKS.md
+++ b/HOWITWORKS.md
@ -0,0 +1,157 @@
+*For an overview of how to __use__ php-resque, see `README.md`.*
+
+The following is a step-by-step breakdown of how php-resque operates.
+
+## Enqueue Job ##
+
+What happens when you call `Resque::enqueue()`?
+
+1. `Resque::enqueue()` calls `Resque_Job::create()` with the same arguments it
+   received.
+2. `Resque_Job::create()` checks that your `$args` (the third argument) are
+   either `null` or in an array
+3. `Resque_Job::create()` generates a job ID (a "token" in most of the docs)
+4. `Resque_Job::create()` pushes the job to the requested queue (first
+   argument)
+5. `Resque_Job::create()`, if status monitoring is enabled for the job (fourth
+   argument), calls `Resque_Job_Status::create()` with the job ID as its only
+   argument
+6. `Resque_Job_Status::create()` creates a key in Redis with the job ID in its
+   name, and the current status (as well as a couple of timestamps) as its
+   value, then returns control to `Resque_Job::create()`
+7. `Resque_Job::create()` returns control to `Resque::enqueue()`, with the job
+   ID as a return value
+8. `Resque::enqueue()` triggers the `afterEnqueue` event, then returns control
+   to your application, again with the job ID as its return value
+
+## Workers At Work ##
+
+How do the workers process the queues?
+
+1. `Resque_Worker::work()`, the main loop of the worker process, calls
+   `Resque_Worker->reserve()` to check for a job
+2. `Resque_Worker->reserve()` checks whether to use blocking pops or not (from
+   `BLOCKING`), then acts accordingly:
+  * Blocking Pop
+    1. `Resque_Worker->reserve()` calls `Resque_Job::reserveBlocking()` with
+       the entire queue list and the timeout (from `INTERVAL`) as arguments
+    2. `Resque_Job::reserveBlocking()` calls `Resque::blpop()` (which in turn
+       calls Redis' `blpop`, after prepping the queue list for the call, then
+       processes the response for consistency with other aspects of the
+       library, before finally returning control [and the queue/content of the
+       retrieved job, if any] to `Resque_Job::reserveBlocking()`)
+    3. `Resque_Job::reserveBlocking()` checks whether the job content is an
+       array (it should contain the job's type [class], payload [args], and
+       ID), and aborts processing if not
+    4. `Resque_Job::reserveBlocking()` creates a new `Resque_Job` object with
+       the queue and content as constructor arguments to initialize the job
+       itself, and returns it, along with control of the process, to
+       `Resque_Worker->reserve()`
+  * Queue Polling
+    1. `Resque_Worker->reserve()` iterates through the queue list, calling
+       `Resque_Job::reserve()` with the current queue's name as the sole
+       argument on each pass
+    2. `Resque_Job::reserve()` passes the queue name on to `Resque::pop()`,
+       which in turn calls Redis' `lpop` with the same argument, then returns
+       control (and the job content, if any) to `Resque_Job::reserve()`
+    3. `Resque_Job::reserve()` checks whether the job content is an array (as
+       before, it should contain the job's type [class], payload [args], and
+       ID), and aborts processing if not
+    4. `Resque_Job::reserve()` creates a new `Resque_Job` object in the same
+       manner as above, and also returns this object (along with control of
+       the process) to `Resque_Worker->reserve()`
+3. In either case, `Resque_Worker->reserve()` returns the new `Resque_Job`
+   object, along with control, up to `Resque_Worker::work()`; if no job is
+   found, it simply returns `FALSE`
+  * No Jobs
+    1. If blocking mode is not enabled, `Resque_Worker::work()` sleeps for
+       `INTERVAL` seconds; it calls `usleep()` for this, so fractional seconds
+       *are* supported
+  * Job Reserved
+    1. `Resque_Worker::work()` triggers a `beforeFork` event
+    2. `Resque_Worker::work()` calls `Resque_Worker->workingOn()` with the new
+       `Resque_Job` object as its argument
+    3. `Resque_Worker->workingOn()` does some reference assignments to help keep
+       track of the worker/job relationship, then updates the job status from
+       `WAITING` to `RUNNING`
+    4. `Resque_Worker->workingOn()` stores the new `Resque_Job` object's payload
+       in a Redis key associated to the worker itself (this is to prevent the job
+       from being lost indefinitely, but does rely on that PID never being
+       allocated on that host to a different worker process), then returns control
+       to `Resque_Worker::work()`
+    5. `Resque_Worker::work()` forks a child process to run the actual `perform()`
+    6. The next steps differ between the worker and the child, now running in
+       separate processes:
+      * Worker
+        1. The worker waits for the job process to complete
+        2. If the exit status is not 0, the worker calls `Resque_Job->fail()` with
+           a `Resque_Job_DirtyExitException` as its only argument.
+        3. `Resque_Job->fail()` triggers an `onFailure` event
+        4. `Resque_Job->fail()` updates the job status from `RUNNING` to `FAILED`
+        5. `Resque_Job->fail()` calls `Resque_Failure::create()` with the job
+           payload, the `Resque_Job_DirtyExitException`, the internal ID of the
+           worker, and the queue name as arguments
+        6. `Resque_Failure::create()` creates a new object of whatever type has
+           been set as the `Resque_Failure` "backend" handler; by default, this is
+           a `Resque_Failure_Redis` object, whose constructor simply collects the
+           data passed into `Resque_Failure::create()` and pushes it into Redis
+           in the `failed` queue
+        7. `Resque_Job->fail()` increments two failure counters in Redis: one for
+           a total count, and one for the worker
+        8. `Resque_Job->fail()` returns control to the worker (still in
+           `Resque_Worker::work()`) without a value
+      * Job
+        1. The job calls `Resque_Worker->perform()` with the `Resque_Job` as its
+           only argument.
+        2. `Resque_Worker->perform()` sets up a `try...catch` block so it can
+           properly handle exceptions by marking jobs as failed (by calling
+           `Resque_Job->fail()`, as above)
+        3. Inside the `try...catch`, `Resque_Worker->perform()` triggers an
+           `afterFork` event
+        4. Still inside the `try...catch`, `Resque_Worker->perform()` calls
+           `Resque_Job->perform()` with no arguments
+        5. `Resque_Job->perform()` calls `Resque_Job->getInstance()` with no
+           arguments
+        6. If `Resque_Job->getInstance()` has already been called, it returns the
+           existing instance; otherwise:
+        7. `Resque_Job->getInstance()` checks that the job's class (type) exists
+           and has a `perform()` method; if not, in either case, it throws an
+           exception which will be caught by `Resque_Worker->perform()`
+        8. `Resque_Job->getInstance()` creates an instance of the job's class, and
+           initializes it with a reference to the `Resque_Job` itself, the job's
+           arguments (which it gets by calling `Resque_Job->getArguments()`, which
+           in turn simply returns the value of `args[0]`, or an empty array if no
+           arguments were passed), and the queue name
+        9. `Resque_Job->getInstance()` returns control, along with the job class
+           instance, to `Resque_Job->perform()`
+        10. `Resque_Job->perform()` sets up its own `try...catch` block to handle
+            `Resque_Job_DontPerform` exceptions; any other exceptions are passed
+            up to `Resque_Worker->perform()`
+        11. `Resque_Job->perform()` triggers a `beforePerform` event
+        12. `Resque_Job->perform()` calls `setUp()` on the instance, if it exists
+        13. `Resque_Job->perform()` calls `perform()` on the instance
+        14. `Resque_Job->perform()` calls `tearDown()` on the instance, if it
+            exists
+        15. `Resque_Job->perform()` triggers an `afterPerform` event
+        16. The `try...catch` block ends, suppressing `Resque_Job_DontPerform`
+            exceptions by returning control, and the value `FALSE`, to
+            `Resque_Worker->perform()`; any other situation returns the value
+            `TRUE` along with control, instead
+        17. The `try...catch` block in `Resque_Worker->perform()` ends
+        18. `Resque_Worker->perform()` updates the job status from `RUNNING` to
+            `COMPLETE`, then returns control, with no value, to the worker (again
+            still in `Resque_Worker::work()`)
+        19. `Resque_Worker::work()` calls `exit(0)` to terminate the job process
+            cleanly
+      * SPECIAL CASE: Non-forking OS (Windows)
+        1. Same as the job above, except it doesn't call `exit(0)` when done
+    7. `Resque_Worker::work()` calls `Resque_Worker->doneWorking()` with no
+       arguments
+    8. `Resque_Worker->doneWorking()` increments two processed counters in Redis:
+       one for a total count, and one for the worker
+    9. `Resque_Worker->doneWorking()` deletes the Redis key set in
+       `Resque_Worker->workingOn()`, then returns control, with no value, to
+       `Resque_Worker::work()`
+4. `Resque_Worker::work()` returns control to the beginning of the main loop,
+   where it will wait for the next job to become available, and start this
+   process all over again
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@ php-resque: PHP Resque Worker (and Enqueue) [![Build Status](https://secure.trav
 ===========================================

 Resque is a Redis-backed library for creating background jobs, placing
-those jobs on multiple queues, and processing them later.
+those jobs on one or more queues, and processing them later.

 ## Background ##

@ -24,7 +24,7 @@ The PHP port provides much the same features as the Ruby version:

 * Workers can be distributed between multiple machines
 * Includes support for priorities (queues)
-* Resilient to memory leaks (fork)
+* Resilient to memory leaks (forking)
 * Expects failure

 It also supports the following additional features:
@ -53,9 +53,9 @@ If you're not familiar with Composer, please see <http://getcomposer.org/>.

 ```json
 {
-    //...
+    // ...
    "require": {
-        "chrisboulton/php-resque": "1.2.x"
+        "chrisboulton/php-resque": "1.2.x"	// Most recent tagged version
    },
    // ...
 }
@ -88,7 +88,7 @@ Resque::enqueue('default', 'My_Job', $args);

 ### Defining Jobs ###

-Each job should be in it's own class, and include a `perform` method.
+Each job should be in its own class, and include a `perform` method.

 ```php
 class My_Job
@ -111,7 +111,7 @@ result in a job failing.

 Jobs can also have `setUp` and `tearDown` methods. If a `setUp` method
 is defined, it will be called before the `perform` method is run.
-The `tearDown` method if defined, will be called after the job finishes.
+The `tearDown` method, if defined, will be called after the job finishes.


 ```php
@ -138,7 +138,7 @@ class My_Job

 php-resque has the ability to perform basic status tracking of a queued
 job. The status information will allow you to check if a job is in the
-queue, currently being run, has finished, or failed.
+queue, is currently being run, has finished, or has failed.

 To track the status of a job, pass `true` as the fourth argument to
 `Resque::enqueue`. A token used for tracking the job status will be
@ -185,9 +185,11 @@ not having a single environment such as with Ruby, the PHP port makes
 *no* assumptions about your setup.

 To start a worker, it's very similar to the Ruby version:
+
 ```sh
 $ QUEUE=file_serve php bin/resque
 ```
+
 It's your responsibility to tell the worker which file to include to get
 your application underway. You do so by setting the `APP_INCLUDE` environment
 variable:
@ -203,6 +205,10 @@ your application too!*
 Getting your application underway also includes telling the worker your job
 classes, by means of either an autoloader or including them.

+Alternately, you can always `include('bin/resque')` from your application and
+skip setting `APP_INCLUDE` altogether.  Just be sure the various environment
+variables are set (`setenv`) before you do.
+
 ### Logging ###

 The port supports the same environment variables for logging to STDOUT.
@ -236,18 +242,23 @@ All queues are supported in the same manner and processed in alphabetical
 order:

 ```sh
-$ QUEUE=* bin/resque
+$ QUEUE='*' bin/resque
 ```

 ### Running Multiple Workers ###

-Multiple workers ca be launched and automatically worked by supplying
-the `COUNT` environment variable:
+Multiple workers can be launched simultaneously by supplying the `COUNT`
+environment variable:

 ```sh
 $ COUNT=5 bin/resque
 ```

+Be aware, however, that each worker is its own fork, and the original process
+will shut down as soon as it has spawned `COUNT` forks.  If you need to keep
+track of your workers using an external application such as `monit`, you'll
+need to work around this limitation.
+
 ### Custom prefix ###

 When you have multiple apps using the same Redis database it is better to
@ -272,9 +283,9 @@ the job.
 Signals also work on supported platforms exactly as in the Ruby
 version of Resque:

-* `QUIT` - Wait for child to finish processing then exit
-* `TERM` / `INT` - Immediately kill child then exit
-* `USR1` - Immediately kill child but don't exit
+* `QUIT` - Wait for job to finish processing then exit
+* `TERM` / `INT` - Immediately kill job then exit
+* `USR1` - Immediately kill job but don't exit
 * `USR2` - Pause worker, no new jobs will be processed
 * `CONT` - Resume worker.

@ -286,11 +297,12 @@ and any forked children also set their process title with the job
 being run. This helps identify running processes on the server and
 their resque status.

-**PHP does not have this functionality by default.**
+**PHP does not have this functionality by default until 5.5.**

 A PECL module (<http://pecl.php.net/package/proctitle>) exists that
-adds this funcitonality to PHP, so if you'd like process titles updated,
-install the PECL module as well. php-resque will detect and use it.
+adds this functionality to PHP before 5.5, so if you'd like process
+titles updated, install the PECL module as well. php-resque will
+automatically detect and use it.

 ## Event/Hook System ##

@ -310,7 +322,7 @@ Resque_Event::listen('eventName', [callback]);
 * A string with the name of a function
 * An array containing an object and method to call
 * An array containing an object and a static method to call
-* A closure (PHP 5.3)
+* A closure (PHP 5.3+)

 Events may pass arguments (documented below), so your callback should accept
 these arguments.
@ -342,20 +354,20 @@ Called before php-resque forks to run a job. Argument passed contains the instan
 `Resque_Job` for the job about to be run.

 `beforeFork` is triggered in the **parent** process. Any changes made will be permanent
-for as long as the worker lives.
+for as long as the **worker** lives.

 #### afterFork ####

 Called after php-resque forks to run a job (but before the job is run). Argument
 passed contains the instance of `Resque_Job` for the job about to be run.

-`afterFork` is triggered in the child process after forking out to complete a job. Any
-changes made will only live as long as the job is being processed.
+`afterFork` is triggered in the **child** process after forking out to complete a job. Any
+changes made will only live as long as the **job** is being processed.

 #### beforePerform ####

 Called before the `setUp` and `perform` methods on a job are run. Argument passed
-contains the instance of `Resque_Job` about for the job about to be run.
+contains the instance of `Resque_Job` for the job about to be run.

 You can prevent execution of the job by throwing an exception of `Resque_Job_DontPerform`.
 Any other exceptions thrown will be treated as if they were thrown in a job, causing the
@ -384,28 +396,59 @@ Called after a job has been queued using the `Resque::enqueue` method. Arguments
 * Class - string containing the name of scheduled job
 * Arguments - array of arguments supplied to the job
 * Queue - string containing the name of the queue the job was added to
-* Id - string containing the new token of the enqueued job
+* ID - string containing the new token of the enqueued job
+
+## Step-By-Step ##
+
+For a more in-depth look at what php-resque does under the hood (without 
+needing to directly examine the code), have a look at `HOWITWORKS.md`.

 ## Contributors ##

-* chrisboulton 
-* thedotedge
-* hobodave
-* scraton
-* KevBurnsJr
-* jmathai
-* dceballos
-* patrickbajao
-* andrewjshults
-* warezthebeef
-* d11wtq
-* hlegius
-* salimane
-* humancopy
-* pedroarnal
-* chaitanyakuber
-* maetl
-* Matt Heath
-* jjfrey
-* scragg0x
-* ruudk
+### Project Lead ###
+
+* @chrisboulton
+
+### Others ###
+
+* @acinader
+* @ajbonner
+* @andrewjshults
+* @atorres757
+* @benjisg
+* @cballou
+* @chaitanyakuber
+* @charly22
+* @CyrilMazur
+* @d11wtq
+* @danhunsaker
+* @dceballos
+* @ebernhardson
+* @hlegius
+* @hobodave
+* @humancopy
+* @JesseObrien
+* @jjfrey
+* @jmathai
+* @joshhawthorne
+* @KevBurnsJr
+* @lboynton
+* @maetl
+* @matteosister
+* @MattHeath
+* @mickhrmweb
+* @Olden
+* @patrickbajao
+* @pedroarnal
+* @ptrofimov
+* @rajibahmed
+* @richardkmiller
+* @Rockstar04
+* @ruudk
+* @salimane
+* @scragg0x
+* @scraton
+* @thedotedge
+* @tonypiper
+* @trimbletodd
+* @warezthebeef