Merge pull request #154 from danhunsaker/readme/howitworks

Created HOWITWORKS; cleanup of README
This commit is contained in:
Chris Boulton 2013-12-13 02:27:28 -08:00
commit 610c4dcdbf
2 changed files with 243 additions and 43 deletions

157
HOWITWORKS.md Normal file
View File

@ -0,0 +1,157 @@
*For an overview of how to __use__ php-resque, see `README.md`.*
The following is a step-by-step breakdown of how php-resque operates.
## Enqueue Job ##
What happens when you call `Resque::enqueue()`?
1. `Resque::enqueue()` calls `Resque_Job::create()` with the same arguments it
received.
2. `Resque_Job::create()` checks that your `$args` (the third argument) are
either `null` or in an array
3. `Resque_Job::create()` generates a job ID (a "token" in most of the docs)
4. `Resque_Job::create()` pushes the job to the requested queue (first
argument)
5. `Resque_Job::create()`, if status monitoring is enabled for the job (fourth
argument), calls `Resque_Job_Status::create()` with the job ID as its only
argument
6. `Resque_Job_Status::create()` creates a key in Redis with the job ID in its
name, and the current status (as well as a couple of timestamps) as its
value, then returns control to `Resque_Job::create()`
7. `Resque_Job::create()` returns control to `Resque::enqueue()`, with the job
ID as a return value
8. `Resque::enqueue()` triggers the `afterEnqueue` event, then returns control
to your application, again with the job ID as its return value
## Workers At Work ##
How do the workers process the queues?
1. `Resque_Worker::work()`, the main loop of the worker process, calls
`Resque_Worker->reserve()` to check for a job
2. `Resque_Worker->reserve()` checks whether to use blocking pops or not (from
`BLOCKING`), then acts accordingly:
* Blocking Pop
1. `Resque_Worker->reserve()` calls `Resque_Job::reserveBlocking()` with
the entire queue list and the timeout (from `INTERVAL`) as arguments
2. `Resque_Job::reserveBlocking()` calls `Resque::blpop()` (which in turn
calls Redis' `blpop`, after prepping the queue list for the call, then
processes the response for consistency with other aspects of the
library, before finally returning control [and the queue/content of the
retrieved job, if any] to `Resque_Job::reserveBlocking()`)
3. `Resque_Job::reserveBlocking()` checks whether the job content is an
array (it should contain the job's type [class], payload [args], and
ID), and aborts processing if not
4. `Resque_Job::reserveBlocking()` creates a new `Resque_Job` object with
the queue and content as constructor arguments to initialize the job
itself, and returns it, along with control of the process, to
`Resque_Worker->reserve()`
* Queue Polling
1. `Resque_Worker->reserve()` iterates through the queue list, calling
`Resque_Job::reserve()` with the current queue's name as the sole
argument on each pass
2. `Resque_Job::reserve()` passes the queue name on to `Resque::pop()`,
which in turn calls Redis' `lpop` with the same argument, then returns
control (and the job content, if any) to `Resque_Job::reserve()`
3. `Resque_Job::reserve()` checks whether the job content is an array (as
before, it should contain the job's type [class], payload [args], and
ID), and aborts processing if not
4. `Resque_Job::reserve()` creates a new `Resque_Job` object in the same
manner as above, and also returns this object (along with control of
the process) to `Resque_Worker->reserve()`
3. In either case, `Resque_Worker->reserve()` returns the new `Resque_Job`
object, along with control, up to `Resque_Worker::work()`; if no job is
found, it simply returns `FALSE`
* No Jobs
1. If blocking mode is not enabled, `Resque_Worker::work()` sleeps for
`INTERVAL` seconds; it calls `usleep()` for this, so fractional seconds
*are* supported
* Job Reserved
1. `Resque_Worker::work()` triggers a `beforeFork` event
2. `Resque_Worker::work()` calls `Resque_Worker->workingOn()` with the new
`Resque_Job` object as its argument
3. `Resque_Worker->workingOn()` does some reference assignments to help keep
track of the worker/job relationship, then updates the job status from
`WAITING` to `RUNNING`
4. `Resque_Worker->workingOn()` stores the new `Resque_Job` object's payload
in a Redis key associated to the worker itself (this is to prevent the job
from being lost indefinitely, but does rely on that PID never being
allocated on that host to a different worker process), then returns control
to `Resque_Worker::work()`
5. `Resque_Worker::work()` forks a child process to run the actual `perform()`
6. The next steps differ between the worker and the child, now running in
separate processes:
* Worker
1. The worker waits for the job process to complete
2. If the exit status is not 0, the worker calls `Resque_Job->fail()` with
a `Resque_Job_DirtyExitException` as its only argument.
3. `Resque_Job->fail()` triggers an `onFailure` event
4. `Resque_Job->fail()` updates the job status from `RUNNING` to `FAILED`
5. `Resque_Job->fail()` calls `Resque_Failure::create()` with the job
payload, the `Resque_Job_DirtyExitException`, the internal ID of the
worker, and the queue name as arguments
6. `Resque_Failure::create()` creates a new object of whatever type has
been set as the `Resque_Failure` "backend" handler; by default, this is
a `Resque_Failure_Redis` object, whose constructor simply collects the
data passed into `Resque_Failure::create()` and pushes it into Redis
in the `failed` queue
7. `Resque_Job->fail()` increments two failure counters in Redis: one for
a total count, and one for the worker
8. `Resque_Job->fail()` returns control to the worker (still in
`Resque_Worker::work()`) without a value
* Job
1. The job calls `Resque_Worker->perform()` with the `Resque_Job` as its
only argument.
2. `Resque_Worker->perform()` sets up a `try...catch` block so it can
properly handle exceptions by marking jobs as failed (by calling
`Resque_Job->fail()`, as above)
3. Inside the `try...catch`, `Resque_Worker->perform()` triggers an
`afterFork` event
4. Still inside the `try...catch`, `Resque_Worker->perform()` calls
`Resque_Job->perform()` with no arguments
5. `Resque_Job->perform()` calls `Resque_Job->getInstance()` with no
arguments
6. If `Resque_Job->getInstance()` has already been called, it returns the
existing instance; otherwise:
7. `Resque_Job->getInstance()` checks that the job's class (type) exists
and has a `perform()` method; if not, in either case, it throws an
exception which will be caught by `Resque_Worker->perform()`
8. `Resque_Job->getInstance()` creates an instance of the job's class, and
initializes it with a reference to the `Resque_Job` itself, the job's
arguments (which it gets by calling `Resque_Job->getArguments()`, which
in turn simply returns the value of `args[0]`, or an empty array if no
arguments were passed), and the queue name
9. `Resque_Job->getInstance()` returns control, along with the job class
instance, to `Resque_Job->perform()`
10. `Resque_Job->perform()` sets up its own `try...catch` block to handle
`Resque_Job_DontPerform` exceptions; any other exceptions are passed
up to `Resque_Worker->perform()`
11. `Resque_Job->perform()` triggers a `beforePerform` event
12. `Resque_Job->perform()` calls `setUp()` on the instance, if it exists
13. `Resque_Job->perform()` calls `perform()` on the instance
14. `Resque_Job->perform()` calls `tearDown()` on the instance, if it
exists
15. `Resque_Job->perform()` triggers an `afterPerform` event
16. The `try...catch` block ends, suppressing `Resque_Job_DontPerform`
exceptions by returning control, and the value `FALSE`, to
`Resque_Worker->perform()`; any other situation returns the value
`TRUE` along with control, instead
17. The `try...catch` block in `Resque_Worker->perform()` ends
18. `Resque_Worker->perform()` updates the job status from `RUNNING` to
`COMPLETE`, then returns control, with no value, to the worker (again
still in `Resque_Worker::work()`)
19. `Resque_Worker::work()` calls `exit(0)` to terminate the job process
cleanly
* SPECIAL CASE: Non-forking OS (Windows)
1. Same as the job above, except it doesn't call `exit(0)` when done
7. `Resque_Worker::work()` calls `Resque_Worker->doneWorking()` with no
arguments
8. `Resque_Worker->doneWorking()` increments two processed counters in Redis:
one for a total count, and one for the worker
9. `Resque_Worker->doneWorking()` deletes the Redis key set in
`Resque_Worker->workingOn()`, then returns control, with no value, to
`Resque_Worker::work()`
4. `Resque_Worker::work()` returns control to the beginning of the main loop,
where it will wait for the next job to become available, and start this
process all over again

129
README.md
View File

@ -2,7 +2,7 @@ php-resque: PHP Resque Worker (and Enqueue) [![Build Status](https://secure.trav
===========================================
Resque is a Redis-backed library for creating background jobs, placing
those jobs on multiple queues, and processing them later.
those jobs on one or more queues, and processing them later.
## Background ##
@ -24,7 +24,7 @@ The PHP port provides much the same features as the Ruby version:
* Workers can be distributed between multiple machines
* Includes support for priorities (queues)
* Resilient to memory leaks (fork)
* Resilient to memory leaks (forking)
* Expects failure
It also supports the following additional features:
@ -53,9 +53,9 @@ If you're not familiar with Composer, please see <http://getcomposer.org/>.
```json
{
//...
// ...
"require": {
"chrisboulton/php-resque": "1.2.x"
"chrisboulton/php-resque": "1.2.x" // Most recent tagged version
},
// ...
}
@ -88,7 +88,7 @@ Resque::enqueue('default', 'My_Job', $args);
### Defining Jobs ###
Each job should be in it's own class, and include a `perform` method.
Each job should be in its own class, and include a `perform` method.
```php
class My_Job
@ -111,7 +111,7 @@ result in a job failing.
Jobs can also have `setUp` and `tearDown` methods. If a `setUp` method
is defined, it will be called before the `perform` method is run.
The `tearDown` method if defined, will be called after the job finishes.
The `tearDown` method, if defined, will be called after the job finishes.
```php
@ -138,7 +138,7 @@ class My_Job
php-resque has the ability to perform basic status tracking of a queued
job. The status information will allow you to check if a job is in the
queue, currently being run, has finished, or failed.
queue, is currently being run, has finished, or has failed.
To track the status of a job, pass `true` as the fourth argument to
`Resque::enqueue`. A token used for tracking the job status will be
@ -185,9 +185,11 @@ not having a single environment such as with Ruby, the PHP port makes
*no* assumptions about your setup.
To start a worker, it's very similar to the Ruby version:
```sh
$ QUEUE=file_serve php bin/resque
```
It's your responsibility to tell the worker which file to include to get
your application underway. You do so by setting the `APP_INCLUDE` environment
variable:
@ -203,6 +205,10 @@ your application too!*
Getting your application underway also includes telling the worker your job
classes, by means of either an autoloader or including them.
Alternately, you can always `include('bin/resque')` from your application and
skip setting `APP_INCLUDE` altogether. Just be sure the various environment
variables are set (`setenv`) before you do.
### Logging ###
The port supports the same environment variables for logging to STDOUT.
@ -236,18 +242,23 @@ All queues are supported in the same manner and processed in alphabetical
order:
```sh
$ QUEUE=* bin/resque
$ QUEUE='*' bin/resque
```
### Running Multiple Workers ###
Multiple workers ca be launched and automatically worked by supplying
the `COUNT` environment variable:
Multiple workers can be launched simultaneously by supplying the `COUNT`
environment variable:
```sh
$ COUNT=5 bin/resque
```
Be aware, however, that each worker is its own fork, and the original process
will shut down as soon as it has spawned `COUNT` forks. If you need to keep
track of your workers using an external application such as `monit`, you'll
need to work around this limitation.
### Custom prefix ###
When you have multiple apps using the same Redis database it is better to
@ -272,9 +283,9 @@ the job.
Signals also work on supported platforms exactly as in the Ruby
version of Resque:
* `QUIT` - Wait for child to finish processing then exit
* `TERM` / `INT` - Immediately kill child then exit
* `USR1` - Immediately kill child but don't exit
* `QUIT` - Wait for job to finish processing then exit
* `TERM` / `INT` - Immediately kill job then exit
* `USR1` - Immediately kill job but don't exit
* `USR2` - Pause worker, no new jobs will be processed
* `CONT` - Resume worker.
@ -286,11 +297,12 @@ and any forked children also set their process title with the job
being run. This helps identify running processes on the server and
their resque status.
**PHP does not have this functionality by default.**
**PHP does not have this functionality by default until 5.5.**
A PECL module (<http://pecl.php.net/package/proctitle>) exists that
adds this funcitonality to PHP, so if you'd like process titles updated,
install the PECL module as well. php-resque will detect and use it.
adds this functionality to PHP before 5.5, so if you'd like process
titles updated, install the PECL module as well. php-resque will
automatically detect and use it.
## Event/Hook System ##
@ -310,7 +322,7 @@ Resque_Event::listen('eventName', [callback]);
* A string with the name of a function
* An array containing an object and method to call
* An array containing an object and a static method to call
* A closure (PHP 5.3)
* A closure (PHP 5.3+)
Events may pass arguments (documented below), so your callback should accept
these arguments.
@ -342,20 +354,20 @@ Called before php-resque forks to run a job. Argument passed contains the instan
`Resque_Job` for the job about to be run.
`beforeFork` is triggered in the **parent** process. Any changes made will be permanent
for as long as the worker lives.
for as long as the **worker** lives.
#### afterFork ####
Called after php-resque forks to run a job (but before the job is run). Argument
passed contains the instance of `Resque_Job` for the job about to be run.
`afterFork` is triggered in the child process after forking out to complete a job. Any
changes made will only live as long as the job is being processed.
`afterFork` is triggered in the **child** process after forking out to complete a job. Any
changes made will only live as long as the **job** is being processed.
#### beforePerform ####
Called before the `setUp` and `perform` methods on a job are run. Argument passed
contains the instance of `Resque_Job` about for the job about to be run.
contains the instance of `Resque_Job` for the job about to be run.
You can prevent execution of the job by throwing an exception of `Resque_Job_DontPerform`.
Any other exceptions thrown will be treated as if they were thrown in a job, causing the
@ -384,28 +396,59 @@ Called after a job has been queued using the `Resque::enqueue` method. Arguments
* Class - string containing the name of scheduled job
* Arguments - array of arguments supplied to the job
* Queue - string containing the name of the queue the job was added to
* Id - string containing the new token of the enqueued job
* ID - string containing the new token of the enqueued job
## Step-By-Step ##
For a more in-depth look at what php-resque does under the hood (without
needing to directly examine the code), have a look at `HOWITWORKS.md`.
## Contributors ##
* chrisboulton
* thedotedge
* hobodave
* scraton
* KevBurnsJr
* jmathai
* dceballos
* patrickbajao
* andrewjshults
* warezthebeef
* d11wtq
* hlegius
* salimane
* humancopy
* pedroarnal
* chaitanyakuber
* maetl
* Matt Heath
* jjfrey
* scragg0x
* ruudk
### Project Lead ###
* @chrisboulton
### Others ###
* @acinader
* @ajbonner
* @andrewjshults
* @atorres757
* @benjisg
* @cballou
* @chaitanyakuber
* @charly22
* @CyrilMazur
* @d11wtq
* @danhunsaker
* @dceballos
* @ebernhardson
* @hlegius
* @hobodave
* @humancopy
* @JesseObrien
* @jjfrey
* @jmathai
* @joshhawthorne
* @KevBurnsJr
* @lboynton
* @maetl
* @matteosister
* @MattHeath
* @mickhrmweb
* @Olden
* @patrickbajao
* @pedroarnal
* @ptrofimov
* @rajibahmed
* @richardkmiller
* @Rockstar04
* @ruudk
* @salimane
* @scragg0x
* @scraton
* @thedotedge
* @tonypiper
* @trimbletodd
* @warezthebeef