Control policies for off-line adaptive radiation therapy 

Mustafa Y. Sir, Marina A. Epelman, Stephen M. Pollock

University of Michigan

In intensity-modulated radiation therapy (IMRT), a common form of cancer treatment, radiation is delivered to cancerous regions in order to damage the cells in the area being treated, interfering with their ability to divide and grow. Since both cancer and healthy cells are affected by radiation, any treatment plan should be designed in such a way that the dose delivered to the tumor(s) is high enough to stop the cancer cells from regenerating, while simultaneously avoiding the delivery of excessive doses of radiation to surrounding healthy tissue.

In IMRT, the radiation beams can be regarded as being comprised of many “beamlets” that can deliver radiation of different intensities. Existing planning methods (usually based on non-linear optimization algorithms) typically produce beamlet intensity vectors resulting in sharp dose gradients between a tumor and its neighboring healthy tissue. However, if random shifts occur during treatment, the tumor might be in a low-dose radiation field, while the healthy tissue regions may be exposed to a high-dose radiation field. This may lead to significant differences between the dose distribution calculated by an “optimal” treatment plan and the actual dose distribution delivered to a patient, resulting in complications and possibly failure of the treatment.

Once an IMRT plan is decided on, it is delivered as a series of small daily dosages, or fractions, over a period of time (typically 35 days). It has recently become technically possible to measure fraction-to-fraction variations in the patient setup, shapes of the organs (e.g., tumor expansion or shrinkage) and the actual delivered dose distributions using devices such as electronic portal imaging and CT scanning after each fraction. We will report on techniques that exploit the dynamic nature of radiation therapy and information gathering by adapting the treatment plan to fraction-to-fraction variations measured during the treatment course of an individual patient. This problem, called off-line adaptive radiation therapy, has been structured within a dynamic programming (DP) framework.

We will present and compare several (suboptimal) control policies, which are computationally feasible to implement, obtained using an approximation to the underlying DP. The common properties shared by all these policies are that they 1) perform a re-optimization of beamlet intensities before each delivery session to compensate for measured delivery errors in previous fractions, 2) use Bayesian updating of the individual patients positional uncertainty distribution, and 3) use a multiple instances of geometry approximation (MIGA) as the model of uncertainty. These policies differ in their re-optimization schemes, which include:

Certainty equivalent control (CEC): In this policy, at every re-optimization, the future stochastic ob jects (i.e., the patients fraction-to-fraction setup variation and the resulting dose deposition matrix) are replaced by nominal deterministic ones (e.g., the nominal position of the patient and the dose deposition matrix at the nominal position). The resulting optimization problem is simply a deterministic optimal open-loop control problem from the present fraction to the end of the treatment course.

Open-loop feedback control (OLFC): In this policy, stochastic nature of the doses delivered in the future fractions is incorporated into the re-optimization, but it is assumed that no further measurements will be taken, and therefore no further adjustment will be made to the beamlet intensities. This results in an optimization problem which minimizes the expected cost from the present fraction to the end of the treatment course and can be solved using stochastic programming techniques.

Cost-to-go approximation via Lagrangian relaxation: An approximation to the DP can be obtained by replacing the optimal cost-to-go functions in the DP algorithm by an approximate function. We propose an approximation based on a Lagrangian dual obtained by “dualizing” the constraints (e.g., nonnegativity constraints on beamlet intensities and upper and lower bounds on the dose delivered to various regions). The resulting optimization problem, similar to that of OLFC, minimizes the expected cost from the present fraction to the end of the treatment course, while taking the future adjustment of the beamlet intensities into account through Lagrange multipliers.

Computational experiments show that resulting individualized adaptive radiation therapy plans promise to provide a considerable improvement compared to static treatment plans, while remaining computationally feasible to implement.