AZP (Dev notes)
AZP (Automatic Zoning Procedure)
Original ClusterPy pacakge (2014) only works on python2.x.
It is under revising and will become a submodule in PySAL.
I am working on a C++ version of AZP and Max-p, which will be a regionalization feature in GeoDa.
- AZP
- AZP-Simulated Annealing
- AZP-Tabu
- AZP-Reactive Tabu
AZP
AZP is a mildly steepest descent algorithm that aggregates N zones (areas) into M regions. “The M output regions should be formed of internally connected, contiguous, zones.” ([Openshaw_Rao1995]_ pp 428).
A steepest descent algorithm would be an algorithm which follows the above update rule, where at each iteration, the direction ∆x(k) is the steepest direction we can take. That is, the algorithm continues its search in the direction which will minimize the value of function, given the current point.
Objective function
SS (sum-of-squares cost function) of selected variables: within-cluster sum of squares from each observation sto the attribute centroid of its cluster
Distance function
Used to compute SS: Euclidean (default)
Algorithm
- Construction 1.1. Exclude neighborless observations 1.2. Initial p starting observations using K-Means 1.3. Construct potential regions by “growing” from each starting observation, compute initial SS 1.4. Get the intra-bordering observations (those could be shared by more than two regions)
- Local Search 2.1 (step3) Select a random region 2.2 (step4) Get bordering observations of the selected region 2.3 (step5) Swap any bordering observation to neighbor region if SS improves 2.4 Repeat step3-4-5 until no improvement
Implementation
Input/Output
Input
-
data: n x m, n is the number of observation, m is the dimension of variable space which is used to determine the clusters
-
k: the number of “zones” or clusters
-
w: the contiguity weights or the connectivity structure defines who’s who’s neighbor
-
controls: variable and a limitation value that a cluster or zone has to be restricted on (e.g. population (sum) should be at least 10K but less than 50k)
ZoneControl
A ZoneControl is for one variable (e.g. population), and it contains the values of the variable. One can setup more than one ZoneControl for AZP.
A ZoneControl can have more than one control, which defines the Operation (Sum/mean/max/min), Comparator (less/more) and a restrict value. For example: the following two AddControl() function calls define two restrictions when building a zone
// pop is std::vector<double> with population values for all observations.
ZoneControl zc(pop);
zc.AddControl(Operation::SUM, Comparator::MORE_THAN, 10000);
zc.AddControl(Operation::SUM, Comparator::LESS_THAN, 50000);
When constructing a zone, it will check if the zone satisfies the following controls:
-
- SUM of population is MORE than value 10,000
-
- SUM of population is LESS than value 50,000
// zc_list is a std::vector<ZoneControl> array
// obs_list is a std::vector<int> array indicates which observations
// compose this zone
Zone zone(zc_list, obs_list);
bool is_valid = zone.CheckControl();
or
bool is_zone_valid = zc.CheckZone(zone);
1. AZP
Initial using K-Means
The logic starts by finding p starting areas using K-Means. Then, processing the neighbors of the p areas. Try to add neighbor to a nearby region, and adding neighor’s neighbors to process later. Repeat the previous step, until all areas have been processed.
Local Boundary Optimiser
a. Select and remove a random region K from p regions b. Identify a list of areas that are on border of region K c. Randomly select bordering area, and swap it to neighbor region if making improvement; Repeat until the list is exhausted d. Repeat a-c until no improvement
1. AZP-Simulated Annealing
c. Randomly select bordering area, and swap it to neighbor region if making improvement; Otherwise, make a move with a probability R(0,1) = f(temperature, cooling_rate) Repeat until the list is exhausted
Step 5 Randomly sample this list until there is a local improvement in the objective function or an equivalently good move. Then make the move. Otherwise make the move with a probability given Boltzmann’s equation
NOTE: the clusterpy did it incorreclty: for each possible move, if