From 3a62e47f1654a0999c33ac7bdacc3526ee682ec0 Mon Sep 17 00:00:00 2001 From: Daniel Hunsaker Date: Sat, 7 Dec 2013 00:38:16 -0700 Subject: [PATCH] Created HOWITWORKS; cleanup of README Addresses a request from @chrisboulton in GitHub issue #149 Slight grammar cleanup and content update in README.md Mention of HOWITWORKS.md in README.md, referring those who want to know more that direction Expanded and slightly cleaner version of a comment made in #149 that prompted this commit/PR was placed in HOWITWORKS.md Signed-off-by: Daniel Hunsaker --- HOWITWORKS.md | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 129 +++++++++++++++++++++++++++-------------- 2 files changed, 243 insertions(+), 43 deletions(-) create mode 100644 HOWITWORKS.md diff --git a/HOWITWORKS.md b/HOWITWORKS.md new file mode 100644 index 0000000..ec85fa3 --- /dev/null +++ b/HOWITWORKS.md @@ -0,0 +1,157 @@ +*For an overview of how to __use__ php-resque, see `README.md`.* + +The following is a step-by-step breakdown of how php-resque operates. + +## Enqueue Job ## + +What happens when you call `Resque::enqueue()`? + +1. `Resque::enqueue()` calls `Resque_Job::create()` with the same arguments it + received. +2. `Resque_Job::create()` checks that your `$args` (the third argument) are + either `null` or in an array +3. `Resque_Job::create()` generates a job ID (a "token" in most of the docs) +4. `Resque_Job::create()` pushes the job to the requested queue (first + argument) +5. `Resque_Job::create()`, if status monitoring is enabled for the job (fourth + argument), calls `Resque_Job_Status::create()` with the job ID as its only + argument +6. `Resque_Job_Status::create()` creates a key in Redis with the job ID in its + name, and the current status (as well as a couple of timestamps) as its + value, then returns control to `Resque_Job::create()` +7. `Resque_Job::create()` returns control to `Resque::enqueue()`, with the job + ID as a return value +8. `Resque::enqueue()` triggers the `afterEnqueue` event, then returns control + to your application, again with the job ID as its return value + +## Workers At Work ## + +How do the workers process the queues? + +1. `Resque_Worker::work()`, the main loop of the worker process, calls + `Resque_Worker->reserve()` to check for a job +2. `Resque_Worker->reserve()` checks whether to use blocking pops or not (from + `BLOCKING`), then acts accordingly: + * Blocking Pop + 1. `Resque_Worker->reserve()` calls `Resque_Job::reserveBlocking()` with + the entire queue list and the timeout (from `INTERVAL`) as arguments + 2. `Resque_Job::reserveBlocking()` calls `Resque::blpop()` (which in turn + calls Redis' `blpop`, after prepping the queue list for the call, then + processes the response for consistency with other aspects of the + library, before finally returning control [and the queue/content of the + retrieved job, if any] to `Resque_Job::reserveBlocking()`) + 3. `Resque_Job::reserveBlocking()` checks whether the job content is an + array (it should contain the job's type [class], payload [args], and + ID), and aborts processing if not + 4. `Resque_Job::reserveBlocking()` creates a new `Resque_Job` object with + the queue and content as constructor arguments to initialize the job + itself, and returns it, along with control of the process, to + `Resque_Worker->reserve()` + * Queue Polling + 1. `Resque_Worker->reserve()` iterates through the queue list, calling + `Resque_Job::reserve()` with the current queue's name as the sole + argument on each pass + 2. `Resque_Job::reserve()` passes the queue name on to `Resque::pop()`, + which in turn calls Redis' `lpop` with the same argument, then returns + control (and the job content, if any) to `Resque_Job::reserve()` + 3. `Resque_Job::reserve()` checks whether the job content is an array (as + before, it should contain the job's type [class], payload [args], and + ID), and aborts processing if not + 4. `Resque_Job::reserve()` creates a new `Resque_Job` object in the same + manner as above, and also returns this object (along with control of + the process) to `Resque_Worker->reserve()` +3. In either case, `Resque_Worker->reserve()` returns the new `Resque_Job` + object, along with control, up to `Resque_Worker::work()`; if no job is + found, it simply returns `FALSE` + * No Jobs + 1. If blocking mode is not enabled, `Resque_Worker::work()` sleeps for + `INTERVAL` seconds; it calls `usleep()` for this, so fractional seconds + *are* supported + * Job Reserved + 1. `Resque_Worker::work()` triggers a `beforeFork` event + 2. `Resque_Worker::work()` calls `Resque_Worker->workingOn()` with the new + `Resque_Job` object as its argument + 3. `Resque_Worker->workingOn()` does some reference assignments to help keep + track of the worker/job relationship, then updates the job status from + `WAITING` to `RUNNING` + 4. `Resque_Worker->workingOn()` stores the new `Resque_Job` object's payload + in a Redis key associated to the worker itself (this is to prevent the job + from being lost indefinitely, but does rely on that PID never being + allocated on that host to a different worker process), then returns control + to `Resque_Worker::work()` + 5. `Resque_Worker::work()` forks a child process to run the actual `perform()` + 6. The next steps differ between the worker and the child, now running in + separate processes: + * Worker + 1. The worker waits for the job process to complete + 2. If the exit status is not 0, the worker calls `Resque_Job->fail()` with + a `Resque_Job_DirtyExitException` as its only argument. + 3. `Resque_Job->fail()` triggers an `onFailure` event + 4. `Resque_Job->fail()` updates the job status from `RUNNING` to `FAILED` + 5. `Resque_Job->fail()` calls `Resque_Failure::create()` with the job + payload, the `Resque_Job_DirtyExitException`, the internal ID of the + worker, and the queue name as arguments + 6. `Resque_Failure::create()` creates a new object of whatever type has + been set as the `Resque_Failure` "backend" handler; by default, this is + a `Resque_Failure_Redis` object, whose constructor simply collects the + data passed into `Resque_Failure::create()` and pushes it into Redis + in the `failed` queue + 7. `Resque_Job->fail()` increments two failure counters in Redis: one for + a total count, and one for the worker + 8. `Resque_Job->fail()` returns control to the worker (still in + `Resque_Worker::work()`) without a value + * Job + 1. The job calls `Resque_Worker->perform()` with the `Resque_Job` as its + only argument. + 2. `Resque_Worker->perform()` sets up a `try...catch` block so it can + properly handle exceptions by marking jobs as failed (by calling + `Resque_Job->fail()`, as above) + 3. Inside the `try...catch`, `Resque_Worker->perform()` triggers an + `afterFork` event + 4. Still inside the `try...catch`, `Resque_Worker->perform()` calls + `Resque_Job->perform()` with no arguments + 5. `Resque_Job->perform()` calls `Resque_Job->getInstance()` with no + arguments + 6. If `Resque_Job->getInstance()` has already been called, it returns the + existing instance; otherwise: + 7. `Resque_Job->getInstance()` checks that the job's class (type) exists + and has a `perform()` method; if not, in either case, it throws an + exception which will be caught by `Resque_Worker->perform()` + 8. `Resque_Job->getInstance()` creates an instance of the job's class, and + initializes it with a reference to the `Resque_Job` itself, the job's + arguments (which it gets by calling `Resque_Job->getArguments()`, which + in turn simply returns the value of `args[0]`, or an empty array if no + arguments were passed), and the queue name + 9. `Resque_Job->getInstance()` returns control, along with the job class + instance, to `Resque_Job->perform()` + 10. `Resque_Job->perform()` sets up its own `try...catch` block to handle + `Resque_Job_DontPerform` exceptions; any other exceptions are passed + up to `Resque_Worker->perform()` + 11. `Resque_Job->perform()` triggers a `beforePerform` event + 12. `Resque_Job->perform()` calls `setUp()` on the instance, if it exists + 13. `Resque_Job->perform()` calls `perform()` on the instance + 14. `Resque_Job->perform()` calls `tearDown()` on the instance, if it + exists + 15. `Resque_Job->perform()` triggers an `afterPerform` event + 16. The `try...catch` block ends, suppressing `Resque_Job_DontPerform` + exceptions by returning control, and the value `FALSE`, to + `Resque_Worker->perform()`; any other situation returns the value + `TRUE` along with control, instead + 17. The `try...catch` block in `Resque_Worker->perform()` ends + 18. `Resque_Worker->perform()` updates the job status from `RUNNING` to + `COMPLETE`, then returns control, with no value, to the worker (again + still in `Resque_Worker::work()`) + 19. `Resque_Worker::work()` calls `exit(0)` to terminate the job process + cleanly + * SPECIAL CASE: Non-forking OS (Windows) + 1. Same as the job above, except it doesn't call `exit(0)` when done + 7. `Resque_Worker::work()` calls `Resque_Worker->doneWorking()` with no + arguments + 8. `Resque_Worker->doneWorking()` increments two processed counters in Redis: + one for a total count, and one for the worker + 9. `Resque_Worker->doneWorking()` deletes the Redis key set in + `Resque_Worker->workingOn()`, then returns control, with no value, to + `Resque_Worker::work()` +4. `Resque_Worker::work()` returns control to the beginning of the main loop, + where it will wait for the next job to become available, and start this + process all over again \ No newline at end of file diff --git a/README.md b/README.md index 2e13a76..06afc72 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ php-resque: PHP Resque Worker (and Enqueue) [![Build Status](https://secure.trav =========================================== Resque is a Redis-backed library for creating background jobs, placing -those jobs on multiple queues, and processing them later. +those jobs on one or more queues, and processing them later. ## Background ## @@ -24,7 +24,7 @@ The PHP port provides much the same features as the Ruby version: * Workers can be distributed between multiple machines * Includes support for priorities (queues) -* Resilient to memory leaks (fork) +* Resilient to memory leaks (forking) * Expects failure It also supports the following additional features: @@ -53,9 +53,9 @@ If you're not familiar with Composer, please see . ```json { - //... + // ... "require": { - "chrisboulton/php-resque": "1.2.x" + "chrisboulton/php-resque": "1.2.x" // Most recent tagged version }, // ... } @@ -88,7 +88,7 @@ Resque::enqueue('default', 'My_Job', $args); ### Defining Jobs ### -Each job should be in it's own class, and include a `perform` method. +Each job should be in its own class, and include a `perform` method. ```php class My_Job @@ -111,7 +111,7 @@ result in a job failing. Jobs can also have `setUp` and `tearDown` methods. If a `setUp` method is defined, it will be called before the `perform` method is run. -The `tearDown` method if defined, will be called after the job finishes. +The `tearDown` method, if defined, will be called after the job finishes. ```php @@ -138,7 +138,7 @@ class My_Job php-resque has the ability to perform basic status tracking of a queued job. The status information will allow you to check if a job is in the -queue, currently being run, has finished, or failed. +queue, is currently being run, has finished, or has failed. To track the status of a job, pass `true` as the fourth argument to `Resque::enqueue`. A token used for tracking the job status will be @@ -185,9 +185,11 @@ not having a single environment such as with Ruby, the PHP port makes *no* assumptions about your setup. To start a worker, it's very similar to the Ruby version: + ```sh $ QUEUE=file_serve php bin/resque ``` + It's your responsibility to tell the worker which file to include to get your application underway. You do so by setting the `APP_INCLUDE` environment variable: @@ -203,6 +205,10 @@ your application too!* Getting your application underway also includes telling the worker your job classes, by means of either an autoloader or including them. +Alternately, you can always `include('bin/resque')` from your application and +skip setting `APP_INCLUDE` altogether. Just be sure the various environment +variables are set (`setenv`) before you do. + ### Logging ### The port supports the same environment variables for logging to STDOUT. @@ -236,18 +242,23 @@ All queues are supported in the same manner and processed in alphabetical order: ```sh -$ QUEUE=* bin/resque +$ QUEUE='*' bin/resque ``` ### Running Multiple Workers ### -Multiple workers ca be launched and automatically worked by supplying -the `COUNT` environment variable: +Multiple workers can be launched simultaneously by supplying the `COUNT` +environment variable: ```sh $ COUNT=5 bin/resque ``` +Be aware, however, that each worker is its own fork, and the original process +will shut down as soon as it has spawned `COUNT` forks. If you need to keep +track of your workers using an external application such as `monit`, you'll +need to work around this limitation. + ### Custom prefix ### When you have multiple apps using the same Redis database it is better to @@ -272,9 +283,9 @@ the job. Signals also work on supported platforms exactly as in the Ruby version of Resque: -* `QUIT` - Wait for child to finish processing then exit -* `TERM` / `INT` - Immediately kill child then exit -* `USR1` - Immediately kill child but don't exit +* `QUIT` - Wait for job to finish processing then exit +* `TERM` / `INT` - Immediately kill job then exit +* `USR1` - Immediately kill job but don't exit * `USR2` - Pause worker, no new jobs will be processed * `CONT` - Resume worker. @@ -286,11 +297,12 @@ and any forked children also set their process title with the job being run. This helps identify running processes on the server and their resque status. -**PHP does not have this functionality by default.** +**PHP does not have this functionality by default until 5.5.** A PECL module () exists that -adds this funcitonality to PHP, so if you'd like process titles updated, -install the PECL module as well. php-resque will detect and use it. +adds this functionality to PHP before 5.5, so if you'd like process +titles updated, install the PECL module as well. php-resque will +automatically detect and use it. ## Event/Hook System ## @@ -310,7 +322,7 @@ Resque_Event::listen('eventName', [callback]); * A string with the name of a function * An array containing an object and method to call * An array containing an object and a static method to call -* A closure (PHP 5.3) +* A closure (PHP 5.3+) Events may pass arguments (documented below), so your callback should accept these arguments. @@ -342,20 +354,20 @@ Called before php-resque forks to run a job. Argument passed contains the instan `Resque_Job` for the job about to be run. `beforeFork` is triggered in the **parent** process. Any changes made will be permanent -for as long as the worker lives. +for as long as the **worker** lives. #### afterFork #### Called after php-resque forks to run a job (but before the job is run). Argument passed contains the instance of `Resque_Job` for the job about to be run. -`afterFork` is triggered in the child process after forking out to complete a job. Any -changes made will only live as long as the job is being processed. +`afterFork` is triggered in the **child** process after forking out to complete a job. Any +changes made will only live as long as the **job** is being processed. #### beforePerform #### Called before the `setUp` and `perform` methods on a job are run. Argument passed -contains the instance of `Resque_Job` about for the job about to be run. +contains the instance of `Resque_Job` for the job about to be run. You can prevent execution of the job by throwing an exception of `Resque_Job_DontPerform`. Any other exceptions thrown will be treated as if they were thrown in a job, causing the @@ -384,28 +396,59 @@ Called after a job has been queued using the `Resque::enqueue` method. Arguments * Class - string containing the name of scheduled job * Arguments - array of arguments supplied to the job * Queue - string containing the name of the queue the job was added to -* Id - string containing the new token of the enqueued job +* ID - string containing the new token of the enqueued job + +## Step-By-Step ## + +For a more in-depth look at what php-resque does under the hood (without +needing to directly examine the code), have a look at `HOWITWORKS.md`. ## Contributors ## -* chrisboulton -* thedotedge -* hobodave -* scraton -* KevBurnsJr -* jmathai -* dceballos -* patrickbajao -* andrewjshults -* warezthebeef -* d11wtq -* hlegius -* salimane -* humancopy -* pedroarnal -* chaitanyakuber -* maetl -* Matt Heath -* jjfrey -* scragg0x -* ruudk +### Project Lead ### + +* @chrisboulton + +### Others ### + +* @acinader +* @ajbonner +* @andrewjshults +* @atorres757 +* @benjisg +* @cballou +* @chaitanyakuber +* @charly22 +* @CyrilMazur +* @d11wtq +* @danhunsaker +* @dceballos +* @ebernhardson +* @hlegius +* @hobodave +* @humancopy +* @JesseObrien +* @jjfrey +* @jmathai +* @joshhawthorne +* @KevBurnsJr +* @lboynton +* @maetl +* @matteosister +* @MattHeath +* @mickhrmweb +* @Olden +* @patrickbajao +* @pedroarnal +* @ptrofimov +* @rajibahmed +* @richardkmiller +* @Rockstar04 +* @ruudk +* @salimane +* @scragg0x +* @scraton +* @thedotedge +* @tonypiper +* @trimbletodd +* @warezthebeef