Group of Employees v. Ontario Public Service Employees Union, 1999 CanLII 14827

0622-96 Group of Employees, Applicant v. Ontario Public Service Employees Union and Crown in Right of Ontario (Management Board Secretariat), Respondents

Before: Phyllis Gordon, Former Chair and Members Margaret Kvetan and Bruce Budd

Appearances: Leslie Dizgun, Rita Schreiber and Susan Dunbar for the Group of Employees; Elizabeth Shilton and Janet Wright for Ontario Public Service Employees Union; David Costen and Pat Weiner for the Crown in Right of Ontario (Management Board Secretariat)

Cite as: Management Board Secretariat (No. 6) (March 5, 1999) 0622-96 (P.E.H.T)

DECISION OF PHYLLIS GORDON, FORMER CHAIR AND MARGARET KVETAN, MEMBER, MARCH 5, 1999.

Introduction

The Applicant is a group of employees who work as psychiatric nurses in the Nurse 2 General position in psychiatric institutions of the Ontario Public Service (“OPS”). The Respondents are the Crown in the Right of Ontario, Management Board Secretariat (“MBS”) and the Ontario Public Service Employees Union (“OPSEU”), the union representing its public service employees. The Applicant employees are members of OPSEU and work at the Queen Street Mental Health Centre, the London Psychiatric Hospital, the Whitby Psychiatric Hospital, the Brockville Psychiatric Hospital, the North Bay Psychiatric Hospital and the Ministry of Correctional Services.
The Applicant employees allege that the pay equity plan negotiated by the Respondents violates sections 4, 5, 6, 7, 12, 13 and 14 of the Pay Equity Act, R.S.O. 1990, c.P.7, as amended (the “Act”). They seek various remedies including: an order appointing a review officer to prepare an amended pay equity plan with specific attention to the job content characteristics in nurses’ work; an order for adjustments to their compensation, including retroactive adjustments to January 1, 1990; an order that the review officer may retain the services of experts considered necessary to prepare the plan; an order that the remedies apply to all members of the affected job class whether or not they were parties to the Application; an order that the Respondent Crown pay the Applicant employees’ legal fees and disbursements; and, any other order thought appropriate by the Tribunal.

Legal Issues

Standing

When this Application was filed in 1992, the Respondents raised a preliminary objection that the Applicant employees did not have standing to challenge a plan which they had negotiated and executed and which was therefore deemed approved pursuant to section 14 of the Act. The Tribunal panel hearing the case first determined the preliminary objection and at the same time dismissed the Application. (Management Board Secretariat (1993), 4 P.E.R. 58) While all three members of the panel found that the Applicant employees had standing to bring a complaint, the majority dismissed the Application because they found it did not raise a prima facie case. The Applicant employees successfully requested that the earlier decision be reconsidered. The reconsideration request was granted by another panel of the Tribunal who held that the parties had previously not been afforded the opportunity to make submissions about the appropriate test to be used. The panel also held that the original decision in its entirety should be reconsidered because the attempt to segment the elements of the original decision for the purposes of reconsideration might unduly restrict a new panel in its determination. (Management Board Secretariat (No.2) (1994), 5 P.E.R. 10 at paragraphs 8 and 10) This is the reconsideration decision.
In an interim decision dealing with the process to be adopted in this case, the Tribunal determined that the complex issues involved would be better analyzed in the context of evidence. It held that the Respondents’ preliminary objection on standing would be dealt with only after hearing the evidence on the merits of the Application. In addition, and in light of the evolving jurisprudence, the Tribunal characterized the key issue in the preliminary objection to be whether a deemed approved plan can be challenged pursuant to s. 22(1) of the Act. (Management Board Secretariat (No. 3) (1995), 6 P.E.R. 105 at paragraphs 11 and 12) (A pay equity plan is “deemed approved” after it has been executed by the employer and the bargaining agent pursuant to section 14(5) in cases where there is a bargaining agent, and, where there is no bargaining agent, if no objection has been filed to the plan (posted by the employer) pursuant to section 15(8) of the Act.)
The Respondent Crown urged us to find that individual employees who are represented by a bargaining agent have no standing to complain about the plan negotiated and executed by their union and employer pursuant to s. 22(1) in any situation. The Applicant employees argued that they have broad standing to complain about any provision regardless of where it is situated in the Act. The Respondent OPSEU suggested that it is clear that the Tribunal will deal with complaints from individual employees in unionized work places, notwithstanding their represented status, in appropriate cases. Both Respondents urged us to find that the Applicant employees’ standing to complain did not extend to sections 12, 13 and 14.
The issues of standing and challenges to deemed approved plans have now been canvassed by the Tribunal and the Divisional Court in several decisions since it was first raised in the litigation between these parties: York Region Board of Education (1993), 4. P.E.R. 51; Ontario Northland Transportation Commission (1994), 4 P.E.R. 19 (Ont. Div. Ct.); Hamilton Civic Hospitals (No.1) (1995), 6 P.E.R. 86; Ottawa Board of Education (No.2) (1996), 7 P.E.R. 9; and, Parry Sound District General Hospital (No.2) (1996), 7 P.E.R. 73. Applying the principles developed in these cases, we find that the Applicant employees have standing to complain under s. 22(1) that the Act has been contravened. We agree, however, with the Respondents that individual employees do not have unlimited rights to complain about every aspect of the Act. Both the majority and the dissenting vice-chair in Management Board Secretariat (1993), 4 P.E.R. 58 conclude that the right to complain does not include standing to complain about contraventions of Part II of the Act where the plan is deemed approved. We share this view as expressed in the reasoning of the majority in paragraphs 29 and 30 as well as in paragraphs 13 and 14 of the dissent. A similar result is found in Ottawa Board of Education (No. 2) (1996), 7 P.E.R. 73 at paragraph 41 where a union’s complaint about a plan which had been deemed approved prior to the union’s certification was considered. The Tribunal held that complaints regarding deemed approved plans, in general, must be grounded in Part I of the Act. Therefore, we find, on the basis of Tribunal jurisprudence, that the Applicant employees do not have standing to challenge the deemed approved plan on the basis that it contravenes sections 12, 13, and 14 which are found in Part II of the Act.
The Applicant employees also allege that sections 4(1), 4(2), 5(1) and 6(1) and 7(1) and (2) found in Part I of the Act have been contravened. These sections read as follows:

4.(1) Purpose. The purpose of this Act is to redress systemic gender discrimination in compensation for work performed by employees in female job classes.

4.(2) Identification of systemic gender discrimination. Systemic gender discrimination in compensation shall be identified by undertaking comparisons between each female job class in an establishment and the male job classes in the establishment in terms of compensation and in terms of the value of the work performed. R.S.O. 1990, c. P.7, s. 4.

5.(1) Value determination. For the purposes of this Act, the criterion to be applied in determining value of work shall be a composite of the skill, effort and responsibility normally required in the performance of the work and the conditions under which it is normally performed.

6.(1) Achievement of pay equity. For the purposes of this Act, pay equity is achieved under the job-to-job method of comparison when the job rate for the female job class that is the subject of the comparison is at least equal to the job rate for a male job class in the same establishment where the work performed in the two job classes is of equal or comparable value. R.S.O. 1990, c. P.7, s. 6 (1); 1993, c. 4, s. 4 (1).

7.(1) Pay equity required. Every employer shall establish and maintain compensation practices that provide for pay equity in every establishment of the employer.

7.(2) Idem. No employer or bargaining agent shall bargain for or agree to compensation practices that, if adopted, would cause a contravention of subsection (1). R.S.O. 1990, c. P.7, s. 7.

Section 4(1) is the purpose provision. It informs and provides the context for the interpretation of the Act but does not confer substantive rights upon which a complaint of contravention can be grounded. In this regard, we adopt the reasoning found in the majority decision of the original panel in this case. (Management Board Secretariat (1993), 4 P.E.R. 58 at paragraph 11) Section 4(2) is a general direction regarding the identification of systemic gender discrimination in compensation. How this is to be done, for this employer, is set out in Part II. We find, therefore, that the substantive claims in this Application are based in sections 5(1) and 6(1). Whether section 7(1) and (2) have been contravened will be determined by our conclusions respecting sections 5(1) and 6(1), which will determine whether the relevant agreed to compensation practices do provide for pay equity in this case.

Standard of Review

9.9. The standard of review to be used when there is an allegation that a deemed approved plan contravenes the Act has now been established by Tribunal jurisprudence. Correctness is the appropriate standard when reviewing whether a plan contravenes a precise provision of the Act, and reasonableness is the appropriate standard when deciding whether a plan contravenes a provision that is not capable of exact application, but implies a range or an exercise of discretion. Parry Sound District General Hospital (No.2) (1996), 7 P.E.R. 73; Ottawa Board of Education (No.2) (1996), 7 P.E.R. 9; Parry Sound District General Hospital (No.1) (1995), 6 P.E.R. 124. These cases are a differentiation from the standard of review established by earlier Tribunal jurisprudence when plans, not yet deemed approved, were considered. Counsel for the Applicant employees urged us to revisit this issue and adopt the same standard of review whether or not the plan has been deemed approved. We have considered his submissions carefully and conclude that, within the context of the complex regime established under the Act, the Tribunal’s approach to the review of deemed approved plans is appropriate and justified.

In this case we are concerned with the allegations that the deemed approved plan contravenes sections 5(1) and 6(1). It is our view that neither of these provisions set out a precise standard, but imply a range or an exercise of discretion. Section 5(1) requires that the criterion used to carry out the valuation of work must be a composite of the skill, effort, responsibility normally required to do the work along with the working conditions involved. However, how that composite is arrived at is not precisely set out. The word “composite”, unqualified as it is, implies that there are different ways the factors can be combined. Similarly, section 6(1) implies that a range may exist and that an exercise of discretion may occur with the language “where the work performed in the two job classes is of equal or comparable value”.

Onus

The Applicant employees bear the onus and thus must demonstrate that the compensation practices agreed to by the Respondents do not provide for pay equity for their job class. It is our view that they need to establish either that the criterion used was unreasonable when they determined the value of work, (section 5(1)) or, that the choice of male comparators for their job class was unreasonable on the basis that the value of the work of the comparator was not equal or reasonably comparable (section 6(1)).

Policy-Capturing - General

The Respondents - MBS and OPSEU - decided to use a job evaluation methodology known as policy-capturing as the basis for their pay equity plan. Policy-capturing is one of the two basic approaches used in the development and application of factors and weights. The other is an a priori methodology¹. This is the first case before the Tribunal involving a pay equity plan which used policy-capturing. While we set out in some detail how the Respondents implemented this highly technical statistical approach in the section “What Occurred in this Case”, a brief overview is set out at this point.
The Respondents had as their reference a pioneering New York State study on comparable worth in the public service. It was conducted by the State University of New York at Albany and was published as The New York State Comparable Worth Study Final Report, October 1, 1985 authored by Steinberg, Haignere, Possin, Treiman, Chertos and Maisel (“the SUNY study”). Unlike the pay equity exercise conducted by the Respondents, this report was not undertaken pursuant to a statute with prescribed time-frames, process and content requirements. Its stated goal was:

The goal of the study is to assess whether the wages paid for jobs traditionally held by women and minorities accurately reflect their productive value to New York State or are artificially depressed because the work had been and continues to be performed by women and minorities.²

While it is not our task to evaluate the SUNY study, it is important to note that it was very influential in the application of the policy-capturing methodology before us and was frequently referred to in the course of the evidence.

Very generally, policy-capturing is the development of a statistical model in which specific job content features are grouped into factors. These factors are weighted in such a way that they statistically ‘predict’ the current wage structure. This is the pay policy, or pay prediction model, of the organization without accounting for the gender composition of the positions evaluated. The next step takes into account the gender composition, and in doing so, picks up the effect of gender bias so that it can be removed from the pay policy.
The focus of the case was disagreement on three major methodological aspects of the plan. The first of these is a critique of the modification of the factors carried out by the working group who developed the plan; the second is the significance of negative coefficients; and, the third is whether an additional a priori step is essential at the end of the process (as was recommended by the authors of the SUNY study although not used in New York State). The Applicant employees say that the single statistical adjustment carried out by the Respondents was inadequate to remove gender bias. They submit that the plan is fundamentally flawed because it fails to adequately measure female work and leaves some female job content invisible and nursing work undervalued. Therefore, the Applicant employees state that the plan deviates from the process, is inadequate and should be amended by the addition of an a priori evaluation of the factor weights.
The choice of a policy-capturing methodology per se was not challenged by the Applicant employees and is not the subject of this decision. We note at this time, however, that to all but skilled statisticians who are also conversant with issues of systemic discrimination and job evaluation, policy-capturing is difficult to understand. The evidence before us was almost entirely expert evidence. This decision is largely a review of that expert evidence and is therefore highly technical.³

Witnesses

Four witnesses testified at the hearing. Ms Nancy Caney (formerly Ms Nancy Robinson) testified on behalf of MBS. Now the Bureau Commander of the Ontario Provincial Police, Ms Caney has had a long career as a senior manager in the OPS in various capacities. She was the Manager of Pay Equity in the Pay and Classification Branch of the Employee Relations and Compensation Division of the Human Resources Secretariat from April 1988 through to 1990. Her responsibilities included managing a section of employees who were involved in developing three pay equity plans: the OPSEU plan; the plan for management and employees excluded from the OPSEU bargaining unit; and, the Ontario Provincial Police plan. She had a staff of approximately 15 who were assigned to three different working groups. We found her evidence to be clear and concise. Her credibility was not challenged in any significant way.
Dr. Lynda Ames, who testified for the Applicant employees, is an acknowledged expert in pay equity and has appeared before the Tribunal in the past. She is a professor in the Department of Sociology at the State University of New York at the Plattsburgh campus. She has been a consultant in equity projects, a research director, and has authored many papers with respect to organizations, women and work. She was hired by the Centre for Women in Government in 1988 - the organization that had developed the SUNY study in the mid 1980s - and has participated in a policy-capturing study in Manitoba and another in a county in New York State. She was very familiar with the SUNY study and referred to it frequently in her testimony and reports.
Dr. Nan Weiner, also qualified as a pay equity expert, testified on behalf of MBS. She has likewise given expert testimony at other proceedings. Dr. Weiner’s graduate training is in the field of industrial relations. She is a human resources professional who works primarily as a consultant in employment equity, pay equity, and compensation. Job evaluation is an important part of her work. From 1987 to 1990 Dr. Weiner was employed at the Pay Equity Commission of Ontario as the Job Evaluation Consultant, the Manager of Research and the Acting Director of Policy and Research. She was not involved in the negotiation of the OPS/OPSEU plan in any way. She also teaches at the university level and is the author of many articles in her field. Dr. Weiner was also very familiar with the SUNY study.
Dr. John Kervin was qualified as an expert in the area of data collection, analysis methodology and statistics, and testified on behalf of MBS. He is a statistician and a professor at the Centre for Industrial Relations at the University of Toronto. Dr. Kervin has published many articles in these areas, as well as major reports, books and chapters in other texts. He has developed a particular expertise in the application of statistics to the consideration of workplace issues. Examples of this work include a study about explaining nursing turnover, a statistical comparison of pay equity awards for the Northwest Territories, and the problems of part-time work for the Commission of Inquiry into Part-Time Work. In this case, Dr. Kervin used the SUNY study as one of his evaluation criteria.
Counsel for the Applicant employees made submissions with respect to the expertise of the three experts. He emphasized that Dr. Kervin is not a pay equity expert but was called as an expert in data collection methodology and statistics. Counsel suggested that, as such, Dr. Kervin does not have the necessary expertise to opine about the fundamental concerns about pay equity and what constitutes gender bias. Therefore, Counsel claims that little weight should be given to Dr. Kervin’s evidence as it relates to the work of women which has been historically invisible or undervalued in the workplace.
We find that Dr. Kervin’s evidence was fundamentally statistical and methodological in nature and that he did not put himself forward as a judge of the job content issues. However, the Applicant employees’ central allegations are founded in a statistical critique of the plan, particularly the appropriateness of the factor modification and the significance of regression coefficients in the regression analysis. We find that the debate about statistics and methodology is central to the dispute and, in that regard, Dr. Kervin’s expertise is useful and relevant.
Counsel for the Applicant employees also submitted that Dr. Ames’ evidence was to be preferred over that of Dr. Weiner. He relied on the fact that Dr. Ames had conducted a focus group with nurses working in the psychiatric hospitals and was thus more familiar with nursing work than was Dr. Weiner. As well, Dr. Weiner admitted that Dr. Ames was an expert in policy-capturing whom she might, herself, have called upon.
We find that the evidence of each of Dr. Weiner and Dr. Ames, both pay equity experts, was useful in specific discussions. It is not necessary to prefer the overall evidence of one of them over the other. We refer in this decision to specific points of each and comment on them in context.
OPSEU did not call any evidence although its Counsel participated at the hearing and made full submissions at the conclusion of the hearing.

What Occurred in this Case

The pay equity plan for the OPSEU bargaining unit was negotiated by the Respondents from early 1988 and was posted in March of 1990, with the first wage adjustments retroactive to January 1, 1990. At the time the plan was negotiated, OPSEU represented approximately 65,000 Crown employees in a single bargaining unit, who were organized for pay purposes into eight wage categories, 605 job classes, and further sub-divided for organizational purposes into 27,461 positions. The eight wage categories were: Institutional Care, Administrative, Correctional, General Operational, Maintenance, Office Administration, Scientific and Professional and Technical.
The outline of the steps taken by the negotiating parties in the development of their plan is based on the evidence of Ms Caney, who testified about how the working group went about its task, and of Dr. Kervin, who described the statistical methodology both in his reports and in his testimony. The description of the methodology is also based on the evidence of Dr. Ames. Dr. Kervin assessed each stage of the process using three criteria: sound statistical practice; the procedures of the SUNY study; and, the removal of what he refers to as “gender-composition wage bias”.
Prior to Ms Caney’s arrival at the Employee Relations and Compensation Branch of MBS in April of 1988, there had been preliminary and informal discussions with OPSEU about the Act and about what methodology should be used in order to comply. The negotiating parties were keenly aware that their plan was supposed to be posted by January 1, 1990 a very stringent time-line. As well, the plan to be developed had to include the huge diversity of positions and work. The working group believed that to construct an entirely new job evaluation system, from the ground up, would take too long, given the posting deadlines. They needed a credible, rational and well-analyzed approach, ready to implement in January of 1990. In the end, the negotiating parties agreed to use the policy-capturing methodology for their plan.
The process began with a preliminary job audit of 150 different jobs comprising a cross-section of the diverse work in the OPS. This sample assisted the working group to determine the elements of work about which information would need to be gathered. This information itself was not used in determining the analysis.
Another early task was the identification of the female and male job classes. The working group first considered incumbency numbers, with reference to the definitions for male and female job classes in the Act. If a position was more than 60 percent female it was automatically a female job class and if it was more than 70 percent male it was automatically a male job class. In addition, they discussed whether each job class with a female incumbency of less than 60 percent might have been historically female or stereotypically female. If so, these were then included as female classes for the purposes of the plan.

Questionnaire

The working group then constructed a questionnaire. This involved careful consideration of the questions, and detailed discussions about what the questions were probing and what elements of work they covered. The working group attempted to make the questions clear, unambiguous, and worded so that those answering the questionnaire would think them applicable to their own work. They were attentive to the need to develop questions which were gender- inclusive.⁴ The union and the employer signed off at this stage of the process.
The next stage was the administration of a test of the questionnaire. It was sent to 150 employees representing diverse jobs in both male and female job classes. Following completion of the questionnaire, the respondents to it were interviewed to find out whether they had understood the questions and to assess the form and the content of the questionnaire. Significant revisions were made following these interviews and another test of the questionnaire was completed with another 150 employees.
The working group then undertook what was called the pilot pre-test. This was a quantitative test to assess statistical reliability and validity as well as the logistics of administering the questionnaire, including the mail provisions and a telephone hotline. They conducted statistical procedures, including factor and regression analyses on the sample, to confirm that the questions asked were appropriate, that they would lead to statistically reasonable factors, and that a multiple regression against pay was possible⁵.
The working group then prepared the final revision of the questionnaire. The result was a closed-ended survey that consisted of seventy-six questions about job and work characteristics and included five preliminary items to identify the respondent’s job class, seniority and education level. Several of the questions had more than one part. Each part, whether a stand-alone question, or a part of a question, is referred to as an ‘item’. In all, there were 172 separate items inquiring about job content. Where there was more than one possible response, respondents were either to select the highest response, use what is called a frequency scale (how often something is done in a job), or use a duration scale (how much time a certain aspect of a job takes).
Dr. Kervin found no problems with this stage of the process. Dr. Ames also did not raise any criticisms about the questionnaire development or content.
The next step was the administration of the questionnaire. The working group selected a method for the distribution to respondents and sent the survey to 22,000 employees in the OPS represented by OPSEU. The surveys were sent to employees at their place of work and they were completed during work hours. Toll-free assistance lines in seven languages and for the hearing-impaired were established. A copy of the survey on tape was prepared for people with visual impairment. Return was via mail in pre-paid, pre-addressed envelopes.
The response rate was 59%. Dr. Kervin testified that a good response rate is 70% and that the minimum acceptable rate is 50%. While the response rate was somewhat low, he felt that this was mitigated because the unit of analysis was the job class, not the individual employee. He concluded that there was a sufficient response to obtain valid information about the job classes. Responses were not received from all of the 605 job classes surveyed. If a job class had fewer than 3 responses, these were discarded as not being statistically viable. In any event, the Applicant employees did not put the response rate in issue.
The working group then averaged the questionnaire answers for all respondents in each job class to get job class (rather than individual employee) scores for each item. The Office Administration Group (OAG) was aggregated into 13 job classes. The averages for each job class had to be standardized or scaled before the next step, as they had different ranges⁶. During the process, the 605 job classes were reduced to 470. This reduction took place for a number of reasons including the dropping of job classes with too few responses, those dropped automatically by the computer on account of missing information, and, most significantly, the aggregation of OAG jobs down to 13 job classes. Dr. Kervin repeated the standardizing exercise, using a different statistical program and came up with the same results. Dr. Ames did not have any concerns with this stage of the process.

Factor Analysis

The next stage was the factor analysis. This is a statistical technique that reduces a large number of items of information to a smaller number of factors which “capture” the meaning of the larger group of items. It is used to overcome the difficulties in manipulating and making sense of all the information identified by a large number of items. The factor analysis groups together items that are answered similarly across jobs. The items that correlate in similar ways are found to cluster together. For example, if jobs that score high on question 1 also score high on question 2, or, if they score low on both question 1 and question 2, then, question 1 and 2 seem to vary together. They tap different elements of the same underlying factor of job content.
Dr. Kervin outlined four stages that are generally present in factor analysis. First, a set of underlying factors is extracted from the original items. Then, the decision-maker determines the optimum number of factors needed to represent the items, usually on the basis of statistical criteria. Next, the factors are adjusted or rotated, (a statistical procedure) to make them easier to interpret. Finally, the decision-maker interprets the results, decides what each factor represents, and names the factor. Different statistical techniques are available to carry out these stages and a choice is made about which to utilize.⁷
The factor analysis analysed the 172 questionnaire items for the 470 job classes. The unadulterated computer output of the data resulted in a factor solution having 15 statistical factors on which the 172 items “loaded”. A “loading” indicates how strongly each questionnaire item relates to each statistical factor, and items “load” on factors to a measurable degree. For example, 51 of the items loaded highly on the first statistical factor, 27 loaded highly on the second statistical factor, and so on.
The factor solution with 15 statistical factors was found to explain 65% of the total variance. In other words, about two- thirds of the variation or spread in the answers to the 172 items was represented by these 15 factors. The working group found that solutions with fewer factors tended to blur distinctions that were potentially valuable in accounting for differences in pay among classes. Solutions with more factors didn’t improve their interpretability. Dr. Kervin stated that the factor analysis presented no problems. Dr. Ames was also satisfied with this part of the process.

Factor Modification

The next stage was the construction of modified factors. Although the modification of factors is not always carried out, it is a fairly common procedure. According to Dr. Kervin, several steps are entailed. The starting point is a review of the loadings. The designers/researchers typically refine the factors to better reflect the meanings they want to be represented by each factor. They may divide the items comprising a factor into two groups, drop items or move items from one factor to another. The objective is a set of modified factors that should have greater overall validity (better represent the meaning assigned by the researchers) and greater reliability (less random measurement error).
Before the working group modified the factors, they agreed to the general objectives of making the nature of the dimension of work (measurement validity) clearer and of strengthening weak factors to improve their reliability. They adopted specific criteria that would govern the modifications. These included: not interfering with the primary loading of items, particularly if high; only moving items to factors on which they had a secondary loading of .30 or more; not moving items which loaded only on one factor; considering the apparent correspondence between the “face validity” of the item and the interpretation given to the factor; considering the degree to which employees would expect a given item to be included in a particular factor; retaining items which correlated highly with salary; and, to retain items which correlated highly with the percent female in a job class.
The working group agreed to split two factors and drop one. They divided a factor that reflected “caring” into two factors representing “physical caring” and “emotional caring”. They divided a factor representing “administration” into two factors representing “responsibility for staff” and “administrative responsibility”. They also dropped a factor that had no questionnaire items with primary loadings (where no questionnaire items had their highest loading on this factor). The result was 16 modified job content factors. (For ease of reference, the 15 factors described in paragraph 41 and 42 are called statistical factors and the 16 modified factors are called job content factors.)
Dr. Kervin and Dr. Weiner thought that the working group’s construction of modified factors was appropriate. However, a major aspect of Dr. Ames’ critique is based on this stage of the process. This divergence of view is reviewed in paragraphs 62 to 68.
The next step was to calculate the score of each job class on each factor. The working group did this for each factor by summing the standardized item scores for each job class. The scores for each factor were then rescaled from 0 to 1. This was done to avoid giving extra weight to factors with more items (their sums would naturally tend to be higher). Dr. Kervin refers to this as necessary statistical practice that requires a simple arithmetic adjustment so that the highest score in each factor equals one, and the lowest equals zero. He states this is a fairly common practice and notes that it was the procedure utilized in the SUNY study. Dr. Ames does not critique this step of the process.

Statistical Analysis

The next undertaking of the working group was to run a statistical operation called multiple regression analysis. Multiple regression analysis is a statistical model used to show how two or more independent variables, in combination, predict a dependent variable. A multiple regression generates a prediction model which consists of a coefficient for each independent variable in the model and a constant. Each independent variable is multiplied by its coefficient and these weighted independent variables are added together with the constant. The result is the “best” prediction of the dependent variable for each case or instance that the analysis is run.
The working group did a multiple regression analysis of the job content factors against pay. This resulted in numerical coefficients for each of the 16 job content factors and provided a weighting of the factors. Ms Caney testified that, when they started this part of the work, they had initially thought that by adjusting the weights of the factors they would be able to eliminate gender bias from the system. The working group made several different kinds of adjustments to the weights (numerical coefficients), including bringing the negative weights to a positive value. Their analysis of these efforts indicated that the gender bias had not been reduced throughout the system. In addition, there were many anomalies that were not rational or logical and the adjusted weights did not result in the kind of payouts that anybody believed addressed gender bias. The most important problem, according to Ms Caney, was that gender bias in the percent female factor was not reduced. The working group decided to discard this approach.
They then carried out a multiple regression analysis which included the percent female factor. In this case, the working group’s dependent variable was wages, which they considered to be a proxy measure of the current value of a job. They used a scaled measure of 1989 wages (the maximum hourly rate for each job class, adjusted to range from 100 to 200). The independent variables were the set of scores for the 16 job content factors for each job class and the percent female in each job class (the percentage of women incumbents out of the total number of incumbents in a job class). In the end, the analysis was run on 394 job classes. The output of this analysis was a prediction of each job’s worth in job value points, based on the job content of the job class, as well as the percent female in a job class at that time. There was no correction for gender bias at this stage.
The multiple regression output also included an R-squared - a statistical measure of how much of the variation in wages (the dependent variable) between job classes was explained by the prediction model. In this case the R-squared was .79, which Dr. Kervin thought was an acceptable value.
In addition, numerical coefficients were generated for each of the 16 job content factors and for the percent female in a job class (the independent variables). The coefficient for the percent female in the job class was negative (-0.096703), meaning that the value of the job class was reduced by that amount for every additional percent female in the job class. So, a job class with 10% female would lose, on average, .97 job value points. A job class with 60% female would lose, on average, 5.80 job value points, and, a job class which was 100% female would lose, on average, 9.67 job value points. Neither Dr. Kervin nor Dr. Ames identified problems with the regression procedure.

53 The next step undertaken by the working group was to repeat the multiple regression, only this time, the analysis was run with an adjustment to remove the independent variable predictor percent female. The result was a prediction of the value of each job class in the absence of what Dr. Kervin refers to as gender-composition wage bias.

Dr. Kervin found the removal of the one predictor (percent female) to be a fairly straightforward application of a prediction model. It provides predicted values in the absence of the direct effect of that predictor. Dr. Ames agrees with this step but asserts that it is insufficient to remove gender bias.
The working group then checked the above procedures and ascertained that the coefficient for percent female was 0.000000125⁸, when the adjusted job class values (adjusted by removing the percent female) were used as the dependent variable. Dr. Kervin replicated these results and found no errors. His assessment was that the extremely small percent female coefficient verified that the adjusted job values were free of gender-composition wage bias.

Identification of Male Comparators

The working group then identified the male comparator for each of the female job classes. They first selected a representative female job class for each group of jobs judged to be in a sequence or series. Then they prepared two lists: one for the adjusted value of the female-dominated job classes and the other for the adjusted value of the male-dominated job classes.
They next established floating bands around the point value (the value predicted on the basis of the 16 job content factors only) of each female job class. Each band was plus or minus five percent of the value of the female job class. Any male job classes falling within the band were considered to be of comparable value. The working group then listed all the male job class jobs with an adjusted value within the five percent range.
Instead of using the lowest paid male job class as the comparator, as required by the Act, the parties agreed to use what they called the “fairest paid” male comparator, which they felt was both fairer and more logical. They stipulated that the fairest paid male comparator was the male job class, within the five percent band, whose actual wage was closest to what the position would receive if it were to be compensated solely on the basis of the 16 job content factors. They determined that the less the difference between predicted and actual value, the fairer paid the job.
MBS and OPSEU then negotiated the selection of the male comparator for each of the female-dominated job classes, using the fairest paid male comparator. However, in the case of the Nurse 2 General position, the wage of the fairest paid male comparator was less than the Nurse 2 General wage rate. They therefore agreed to select the Scientist 2 position – the highest paid male job class in the band. This resulted in an hourly increase of 54 cents to the nurses.

Bias Check

60 . The final step was to check for gender-composition wage bias in the overall results of the negotiated adjustments to the wages of all female-dominated job classes (rather than the predicted job values). In this instance, the regression results showed that the coefficient for percent female was -0.024214, reduced from -0.096703. Dr. Kervin’s view is that, while this figure is not statistically significant, it was not insubstantial. On average, a job class with 10 percent female composition would lose 0.24 job value points, one with 60 percent would lose 1.45 job value points, and one with 100 percent female would lose 2.42 job value points, or approximately 58.5 cents per hour on average in wages. This suggests that some gender-composition wage bias - about 25% - remained in the overall adjusted wages.

Dr. Kervin explained that the remaining gender-composition wage bias was somewhat inevitable, given the job-to-job procedure for finding male comparators and making pay equity adjustments. He testified that the most likely reason for this is that pay equity adjustments under the Act are provided only for employees in female-dominated jobs. This means that employees in jobs with less that 60% female incumbents, even when a majority of incumbents may have been female, received no pay equity adjustments. The effect of this would be that all of the effect of “percent female” in wages was not removed. In addition, had the working group sought to adjust wages on the basis of the lowest-paid comparator in the five percent band (rather than the fairest paid comparator), the residual gender-composition wage bias would have been even greater. Any residual pay inequity due to this cause would have been more, had the working group only used the lowest-paid comparator.

Experts’ Critique of the Modification of Factors

Dr. Ames does not take issue with the process of shifting items from one factor to another in general. She agrees with Dr. Kervin that the modification of factors is an expected feature of the summation method of assembling items into factors and notes that the authors of the SUNY study also modified factors. In this case, however, she states that the working group’s decisions were flawed, gender-biased and resulted in an undervaluation of the work performed by nurses. The significance she attributes to negative coefficients (discussed in detail in the next section of this decision) leads her to this conclusion.
Dr. Ames’ evidence focused on two aspects of the factor modification process referred to in paragraph 45. The first was the deletion of some items that had originally loaded relatively highly on the knowledge and mental demands factor, on which the nurses had scored highly. As this content was not placed elsewhere, the allegation is that the nurses were penalized. She claimed that, when taken cumulatively with other deficiencies, these deletions prevented important aspects of nursing work from being made visible, explicit and positively valued. The second was the shifting of items to other job content factors that had smaller or negative regression coefficients than the factor where the item originally loaded. As a result, the overall value of the particular item was reduced or negatively valued.
Dr. Weiner’s approach to this critique was to evaluate the twenty items that were of concern to Dr. Ames. Dr. Weiner agreed that if enough items that measure aspects of work unique to nursing (or other women’s work) are omitted, are not considered or are put into a less powerful factor, there is the potential that women’s work will be undervalued. She, therefore, examined the twenty items. It was her view that sixteen items measured work in nursing jobs, as well as in female jobs in general and in male jobs. Three related to hazards likely found in nursing work and in some male jobs, but she was unsure of their presence in other female jobs. Only one item, respecting comforting, measured the content of female jobs, including nursing work, but not the content of male jobs. She felt it most unlikely that a single question could severely affect the overall outcome. Dr. Weiner, therefore, concluded that the impact of the shifting and deletion of items did not lead to a gender bias or an undervaluing of nursing work.
Dr. Ames testified that the factor modification decisions were apparently made in the absence of any gender-neutral objective criteria. It became clear that she had been unaware of the criteria (set out in paragraph 44) which the working group had established before modifying the factors and, therefore, reached the conclusion that there had been no adequate rationale for the decisions taken. Despite learning during the hearing of the criteria used by the working group, Dr. Ames continued to refer to the decisions as subjective in nature and that subjective decisions should be analysed to ascertain whether the outcomes resulted in gender bias. Although she makes this critique, she did not undertake the analysis herself.
Dr. Kervin did so. He re-ran the data without shifting any items and dropping only those items with low loadings as suggested by Dr. Ames (where loadings were less than .40 – the criterion adopted in the SUNY study). He then compared these results with those of the working group after the factor modification process. His analysis shows that the Nurse 2 General position scored 136.9 using Dr. Ames’ approach and 137.9 using the working group’s approach. The Scientist 2 position scored 140.7 using Dr. Ames’ approach and 143.7 using the working group’s approach.
In Dr. Ames’ approach, the Scientist 2 position continues to fall within the floating band of plus or minus five percent. In the challenged plan, the parties negotiated the Scientist 2 position as comparator to the Nurse 2 General position even though it was paid more than the fairest paid comparator. So, although Dr. Ames raises an interesting theoretical possibility, the evidence regarding the application of the suggestions she made (which was not rebutted in any significant way) does not indicate that the choice made by the working group was unreasonable. We find that, when her suggestions were tested, they did not demonstrate the inadequacies she had anticipated would occur.
In light of Dr. Kervin’s analysis, our conclusion regarding negative coefficients set out in paragraph 86, and Dr. Weiner’s observations, we find that the Applicant employees have not demonstrated that the modification of the factors led to an undervaluation of nursing work or to the selection of an unreasonable male comparator.

Experts’ Views Regarding the Significance of Regression Coefficients

The Applicant employees and the Respondents agree that the results of the initial regression analysis indicated that there was gender bias in the OPS compensation system. The heart of the technical dispute between them is the interpretation of the numerical coefficients of the job content factors. This dispute is key to one of the central claims made by the Applicant employees: that the work of the Nurse 2 General position is negatively valued and that they are therefore penalized monetarily.
While the Applicant employees agree that the policy-capturing system used by the Respondents removed a measure of gender bias, they allege in their Application that “the system did not positively value the negative gender effects of important work characteristics of female dominated job classes. Therefore, the Respondents failed to employ a composite measure of skill, effort, responsibility and working conditions, which positively valued the work of the Applicants.” They claim that the nurses were penalized because their work, which is historically identified with women, has higher requirements for physical care, emotional care, extra effort, and the impact of errors. In the regression analysis, these job content factors had negative coefficients and, therefore, some of the content of their work was not positively valued.
This position is based on Dr. Ames’ opinion that if a job content factor has a negative coefficient, its value is deducted from the total value for the job class. Dr. Kervin’s opinion is very different. He reminds us that a regression prediction model is not a compensation model. He states that regression coefficients do not necessarily measure the impact of job factors on wages but rather measure the unshared (unique or direct) impact of each job factor. They may need to be negative in order to avoid double-counting. As this difference is fundamental to the dispute, we review the two expert views regarding the significance of the numerical coefficients generated by the methodology used by the working group.
As noted earlier, the working group used multiple regression analysis to generate a numerical coefficient for each job content factor (paragraph 49). The score for each job class on each factor (the loading) was multiplied by the factor’s coefficient in order to determine a value for each job class on that factor. Then, they summed these scores together and the result was the “worth” of the job class used to select the respective comparators (paragraph 47).
In order to illustrate what the debate between the experts is about, we set out Table 1 from Dr. Ames’ August 1996 report, which is based on the data she received from MBS. This table sets out the computation of worth for the Nurse 2 General and Scientist 2 positions using the coefficients generated by the working group’s regression. The left-hand column lists the score (loading) the position earned on each of the job content factors. The second column sets out the regression coefficient (or weight) generated by the computer for each of the factors. These coefficeinets are the same for all jobs. The next column is the name of each factor. The number in the last column is arrived at when the job class score (left-hand column) of a factor is multiplied by the regression coefficient for the factor (second column). It is agreed that these were not results in ‘dollars’ of worth, but were points of value by which comparators were selected.

Table 1: Computation of worth for Nurse 2 General and Scientist 2, using OPS-generated factors (as reported in OPS printouts).

Nurses’ Regression Result

Score on Coefficient in Dollars

Factor (Weight) of Worth

106.31 + Constant $106.31

(.51212 * 82.47) + Knowl/Mental + 42.23

(.28761 * - 7.48) + Phys Demands - 2.15

(.58186 * - 2.95) + Phys Care - 1.72

(.80751 * -11.05) + Emotional Care - 8.92

(.72697 * -11.59) + Extra Effort - 8.43

(.09098 * 21.13) + Language + 1.92

(.42871 * 6.16) + Resp Staff + 2.64

(.08078 * - 7.28) + Computers - .59

(.07151 * -13.18) + Outdoors - .94

(.16818 * - 8.40) + Impact Errors - 1.41

(.29390 * - 7.68) + Resp Admin - 2.26

(.46828 * 0.88) + Concentration + .41

(.28139 * 5.31) + Methods + 1.49

(.64373 * - 5.12) + Plan/Info - 3.30

(.83054 * 15.40) + Confidential + 12.79

(.06845 * - 3.15) + Financial Proc - .22

Nurses Worth $137.85

Scientists’ Regression Result

Score on Coefficient in Dollars

Factor (Weight) of Worth

106.31 + Constant $106.31

(.54478 * 82.47) + Knowl/Mental + 44.93

(.25406 * - 7.48) + Phys Demands - 1.90

(.04616 * - 2.95) + Phys Care - .14

(.01532 * -11.05) + Emotional Care - .17

(.35134 * -11.59) + Extra Effort - 4.07

(.06431 * 21.13) + Language + 1.36

(.38643 * 6.16) + Resp Staff + 2.38

(.62172 * - 7.28) + Computers - 4.53

(.06869 * -13.18) + Outdoors - .91

(.13740 * - 8.40) + Impact Errors - 1.15

(.23561 * - 7.68) + Resp Admin - 1.81

(.65661 * 0.88) + Concentration + .58

(.28816 * 5.31) + Methods + 1.53

(.43093 * - 5.12) + Plan/Info - 2.21

(.24066 * 15.40) + Confidential + 3.71

(.07173 * - 3.15) + Financial Proc - .23

Scientists Worth $143.68

It can be seen that of these computer-generated coefficients, ten were negative and six

were positive. Dr. Ames states that each of the job content factors with negative coefficients has a devaluing effect on the value of work performed by a job class, while those with positive coefficients contribute positively to the value of the work. She proposed two alternative approaches to remedy the problem she has identified, which are considered below in the section ‘Amendment of the Pay Equity Plan’. Dr. Kervin disagrees that there is a problem as stated by Dr. Ames.

Dr. Ames’ Reasoning

It is Dr. Ames’ opinion that regression analysis allows us to estimate the influence of each factor on the current wage structure. It measures the amount of variation in current salaries, and how much each factor, all else held constant, can predict that current salary. Each job content factor has an identifiable effect (the numerical value of the coefficient) on the worth attributed to a job, and consequently, the wages which attach to it.
For Dr. Ames, when a regression coefficient for a factor is positive, that factor has been positively valued and is rewarded monetarily. Conversely, when the coefficient is negative, the factor has been valued negatively and results in a lower total worth to the position with a resulting monetary penalty. Thus, in instances where the Nurse 2 General scores higher than the Scientist 2 on factors with negative coefficients, the nurses are penalized more heavily than the scientists. Then, as the value of a position is obtained by summing all of the factor scores, if the score is negative, it is deducted from the total and lessens the value attributed to the position. This, Dr. Ames claims, results in significant gender bias in the plan. She writes:

Several of the factors have a negative coefficient. This means just what it seems to mean. A job scoring high on these factors will be penalized, not rewarded. The predicted wage will be less than for a job which is low on that factor, but otherwise similar.⁹

Dr. Kervin’s Reasoning

In Dr. Kervin’s view, this is a misunderstanding of what can be attributed to each independent variable and the regression coefficient attached to it. Each of the job content factors (the independent variables) in the regression analysis does not necessarily have a causal connection to wages (the dependent variable). In some instances, there may be a co-variance or a correlation only, without a causal connection. The numerical coefficients do not indicate the nature of the relationship between the independent and dependent variables, whether it be a causal relationship, one of co-variance or correlation, or one which arises because of impure measures.
In his view, the regression coefficients indicate the parameters of the best prediction equation. They are numbers that describe the multi-dimensional best-prediction surface that regression analysis produces. As well, they indicate how much of the change in the dependent variable is associated with a unit of change in an independent variable, when all other independent variables are held constant. However, in the real world, as opposed to a laboratory setting, it is not possible to hold variables constant. Thus, when independent variables are related among themselves, they cannot be held constant. If two jobs that differ on one independent variable are compared, they are very likely to differ on other independent variables as well. When one changes, others also change.
According to Dr. Kervin, when one variable is related to another, the first variable goes into the equation and is held constant and cannot be changed. It may have picked up something else (now loaded on it) attributable to a second variable. If this is the case, the coefficient of the second variable needs to reflect that some of it has already been accounted for. It is already over-measured and needs to be corrected. That is, some must be taken away as there is too much of it and this results in a negative coefficient - an artifact of correlated independent variables. It isn’t that the content of the second variable was not measured, but that it was measured elsewhere - along with another variable with which it correlated. Thus, when the arithmetic summing of the factor scores for a job class occurs, what appears to be a subtraction of value is not really so, only an adjustment. Rather, the subtraction of points occurs because that job content has already been counted in, or picked up by, one or more of the other job content factors.¹⁰
Dr. Kervin provided examples of equations from statistics texts demonstrating that when successive variables are added to an equation, regression coefficients may change from positive to negative and/or vice versa. The regression procedure simply adjusts the prediction model: as one adds a variable, the other coefficients may change. At the same time as the prediction improves, the coefficients are adjusted by the regression procedure, taking into account that the later variable was related to other variables in the model. He suggests that if the coefficients in the prediction model are going to change as more information is added, one can’t rely on those coefficients to precisely indicate the impact of a variable. If the later variable is related to a variable already added, the prediction model may have to reduce the weight it attaches to a second variable. In other words, there is already too much of it and it must now have a negative coefficient.
In summary, Dr. Kervin’s view is that the size and the direction (positive or negative) of any regression coefficient is determined in part by its real impact on the dependent variable and, in part, by what other variables are in the equation and how they relate to that particular variable.

Analysis of the Expert Evidence

We note that while Dr. Ames and Dr. Kervin disagree on their views of the plan and the significance of some of the statistical operations, Dr. Ames seems to agree with Dr. Kervin’s views, at least in theory. She writes in her first report that the summation method of computing the factors (the one adopted by the Respondents)

...allows the factors to be correlated with each other. It is quite possible, even likely, that several factors end up measuring the same phenomena in slightly different ways. This could prove to be a statistical problem for later regression analyses (multicollinearity), and could be a problem for the gender neutrality of the job comparison system (double-counting)¹¹.

In the same report she also notes:

...because the factors in this plan were constructed by addition of standardized question scores rather than from factor loadings, it is likely that the factors are correlated with each other. If this is true to even a moderate degree, the regression model may be unstable. Collinearity among the independent variables creates statistical difficulties for the analysis. Especially since the coefficients from the model are directly used to make pay equity adjustments, the instability needs to have been investigated¹².

In his first report Dr. Kervin responded to this critique by running tests assessing the degree of correlation of factors in the OPS data in order to ensure that the data were not subject to multicollinearity. He describes multicollinearity as a problem that occurs whenever an independent variable in a regression analysis is fairly strongly related to a combination of one or more of the other independent variables and which can lead to potentially erroneous and unstable results in the regression analysis. He testified that multicollinearity is generally regarded as a threat when correlations among factors are .90 or greater. Dr. Kervin ran bivariate correlations among all the factors (meaning two at a time or, in pairs) and the highest value was at .77 and only 6 of the correlations were more than .50. On this basis, and after running other standard tests, he concluded that multicollinearity was not a problem among the job content factors in the data. Dr. Kervin refers to the same data in his testimony and his second report in order to establish that there was a significant amount of intercorrelation among the 120 pairs of the 16 job content factors. On this occasion, the 120 correlations of these pairs ranged from .005 (virtually none) to .77 (very highly related). Sixteen of the 120 correlations had values of .40 or greater, a level he interprets as a strong correlation. Thus, although the factors are not so overlapping that they are unstable statistically, they do correlate or co-vary.
Our review of the evidence indicates that Dr. Ames’ theoretical understanding is, at times, similar to that of Dr. Kervin. However, as was the case with her critique of the factor modification process, we find that she asserted a view but did not offer an analysis of the data to support her theoretical position. In this case, while she acknowledged the possibilities of multicollinearity and co-variation, she did not examine the data in order to see whether they actually occurred. Her final conclusions, in effect, dismiss the possible impact of co-variation or correlation of factors on the size or direction of their numerical regression coefficients and she remained quite categorical about the significance of the negative regression coefficients on job content factors in this case¹³.
Our own examination of the regression coefficients for the job content factors in Table1 does not support the proposition that a negative coefficient necessarily indicates an undervaluing of that particular job content. If this were the case, some of the factors with negative coefficients would not make sense. For example, it would be surprising if the job content factors of computers, impact of errors, administrative responsibilities and planning/information (which have negative coefficients) had a penalizing impact on wages. To be consistent, this is the conclusion that must result from Dr. Ames’ analysis.
On balance, we find the evidence of Dr. Kervin more convincing. In particular, we note the thoroughness and cogency of his presentation, the various statistical analyses and tests he carried out which illustrated his points as well as how Dr. Ames’ approach would likely work in practice. In contrast, we find Dr. Ames somewhat inflexible in her approach, on occasion without explanation. While her theoretical evidence certainly had a different emphasis to that of Dr. Kervin with respect to the significance of the coefficients, it overlapped with his understanding in some important ways. However, in her analysis of the coefficients in the context of this case, her views were categorical and did not take account of the complexities she had acknowledged in theory. We do not find, on the basis of this evidence, that the Applicant employees have established that a negative coefficient necessarily indicates an undervaluing of a job content factor within the policy-capturing system before us, or, that the value of the Nurse 2 General job class was thereby penalized.
Although we arrive at this conclusion on the basis of the Applicant employees’ evidence, there is an aspect of Dr. Kervin’s evidence that needs mention. He extended the debate about regression coefficients to further demonstrate that Dr. Ames’ view of their significance was inaccurate. He suggested that, other than a complicated procedure of structural equation modeling, which could provide an unambiguous measure of the “true” effect of an independent variable, there are no simple methods to do so. However, he testified that some options come close, including the relatively simple “hierarchical regression” (which were also referred to as ‘bivariate’ or ‘trivariate’ regressions). He undertook this analysis by running separate regressions for each of the factors, with the wage regressed only on that factor and percent female, thereby obtaining a bivariate coefficient. In this way, he testified, the relationship of the independent variable to the dependent variable (wages) is examined with no other variables controlled. As there are no other variables in the equation, there cannot be a negative relationship attributed to intercorrelations. All negative relationships will be “true” ones.
The results were that for each job content factor, the bivariate coefficient differed significantly from the multivariate coefficient found in the working group’s regression analysis. Some factors, such as input into methods, extra effort, planning and impact of errors, (all of which had either a negative coefficient or a very low positive value in the multivariate regression done by the working group) were strongly and positively related to wages.
Of greater concern to us, there were three job content factors with negative relationships to wages: physical demands, physical caring and emotional caring. It was Dr. Kervin’s view that the coefficient for emotional caring was so low that it was statistically insignificant and equivalent to “no relationship”. That is, the amount of emotional caring in a job is relatively unrelated to wages, as jobs with both relatively high, and relatively low, wages have elements of emotional caring. With respect to the other two factors, his interpretation of the results was that the greater the physical demands of a job, or the more physical caring it entails, the lower its wages tend to be.
This concern is tempered as Dr. Ames was of the very firm view that the bivariate coefficients are of no use. In her reply evidence and a final report filed with us, she emphasized that when many variables are involved, it is a misrepresentation of reality to measure the effect of only one variable. They tell us little. “The coefficeints produced by bivariate regressions are not “clean” but are “amalgamations” of other related variables.”¹⁴ On this issue, Dr. Ames provides a more in-depth analysis of the statistics to illustrate her point. The net result of this debate is that we draw no conclusions about the meaning of the bivariate regression coefficients presented by Dr. Kervin. In light of Dr. Ames’ critique, we cannot find that the three job content factors have an absolute negative impact in the ultimate evaluation of value. Dr. Kervin’s evidence regarding the bivariate coefficients is insufficient for us to conclude that the composite of effort, skill, responsibility and working conditions used to compare the value of male and female job classes was unreasonable.

Amendment of the Pay Equity Plan

The Applicant employees allege that the single statistical correction - removal of the percent female in a job class - does not adequately eliminate gender bias in the workplace as required by the Act. They submit that it is essential to go beyond this step in order to improve the plan and eliminate gender bias.

Adjustment of Negative Coefficients

Dr. Ames first demonstrated her concern with the negative coefficients by bringing them to zero. She argued that by doing so, the value of the work of a job class would not be diminished on the basis of a high score on a job content factor with a negative coefficient. The adjustment of the negative coefficients to zero would neutralize their effect. This would remove the penalizing effect of the negative coefficients (as she interprets it), for the Nurse 2 General position in particular. Table 2 of her August 1996 report shows the impact of this change.

Table 2: Computation of worth for Nurse 2 General and Scientist 2, using OPS-generated factors but adjusting for negative weights

Nurses’ Regression Result

Score on Coefficient in Dollars

Factor (Weight) of Worth

106.31 + Constant $106.31

(.51212 * 82.47) + Knowl/Mental + 42.23

(.28761 * 0) + Phys Demands 0

(.58186 * 0) + Phys Care 0

(.80751 * 0) + Emotional Care 0

(.72697 * 0) + Extra Effort 0

(.09098 * 21.13) + Language + 1.92

(.42871 * 6.16) + Resp Staff 2.64

(.08078 * 0) + Computers 0

(.07151 * 0) + Outdoors 0

(.16818 * 0) + Impact Errors 0

(.29390 * 0) + Resp Admin 0

(.46828 * 0.88) + Concentration + .41

(.28139 * 5.31) + Methods + 1.49

(.64373 * 0) + Plan/Info 0

(.83054 * 15.40) + Confidential + 12.79

(.06845 * 0) + Financial Proc 0 _______

Nurses Worth $167.79

Scientists’ Regression Result

Score on Coefficient in Dollars

Factor (Weight) of Worth

106.31 + Constant $106.31

(.54478 * 82.47) + Knowl/Mental + 44.93

(.25406 * 0) + Phys Demands 0

(.04616 * 0) + Phys Care 0

(.01532 * 0) + Emotional Care 0

(.35134 * 0) + Extra Effort 0

(.06431 * 21.13) + Language + 1.36

(.38643 * 6.16) + Resp Staff + 2.38

(.62172 * 0) + Computers 0

(.06869 * 0) + Outdoors 0

(.13740 * 0) + Impact Errors 0

(.23561 * 0) + Resp Admin 0

(.65661 * 0.88) + Concentration + .58

(.28816 * 5.31) + Methods + 1.53

(.43093 * 0) + Plan/Info 0

(.24066 * 15.40) + Confidential + 3.71

(.07173 * 0) + Financial Proc 0 _______

Scientists Worth $160.80

It can be seen (in Table 1) that the computer generated negative coefficients for ten job content factors and positive coefficients for the remaining six factors. So, after the adjustment is made (as shown in Table 2), only six factors contribute to the value of the job classes.
Dr. Weiner and Dr. Kervin each examined this critique and concluded that, if it were implemented, the adjustment indicated would have no effect on the result. For the Nurse 2 General, the Scientist 2 position remained a legitimate comparator, still within the floating band of plus or minus five percent of the nurses’ band.
Dr. Kervin also assessed this approach by comparing the ‘fit’ of the values found in Table 2 with the ‘fit’ obtained by the working group (paragraph 51). He expected that the alteration of the coefficients of a prediction model would change the regression ‘surface’, and that this surface would not ‘fit’ the original data as well as would the initial prediction model. He found that the resulting R-squared measure, in fact, indicated a less accurate fit. When he assessed the relative values of the two positions using the adjusted coefficients, the Nurse 2 General position was valued more highly than the Scientist 2 position, but the latter still fell within the nurses’ band and remained a legitimate comparator.

A Priori Amendment of the Coefficients (Reweighting of the Factors)

Dr. Ames’ testified that her second recommendation was essential to develop a plan free of gender bias. Her view is that the single correction for percent female provided consistency in the evaluation of factors for male-dominated and female-dominated jobs. “This allows a determination of one kind of gender bias in the existing system – to what degree the proportion of women in a job affects the pay of the job, net of job content.”¹⁵ What was now needed was a methodology to ensure that implicit gender-biased values are removed for the evaluation. The way to achieve this would be for the working group to design and negotiate a set of positive coefficients derived from her-review of a priori job evaluation systems. In this way, she said, an inclusive, gender-neutral and equitable result could occur. She divided the 16 job content factors into six groups and suggested a range of coefficients (or weights) for each group. The use of these weights would produce job values entirely free of gender bias. However, Dr. Ames did not work this idea out in detail or present any calculations herself.
On the other hand, Dr. Kervin demonstrated her suggestion. As Dr. Ames was not specific about the weights to be assigned to each factor, he made the assignment based on certain procedures he set out in his report which seem reasonable to us (and which were not rebutted in reply). He then compared the fit of this model with the fit obtained by the OPS/OPSEU model; examined the specific values it predicted for series of job classes, as well as for the Nurse 2 General and Scientist 2 job classes; and, examined the relative value of the Nurse 2 General position compared to all other OPS/OPSEU positions. He also compared the working group’s model to that of the ‘negative coefficient to zero’ model and to the ‘suggested weights’ model, and demonstrated that Dr. Ames’ suggested weights model fared poorly in predicting or explaining the variation in actual wages.
He concluded that this model had a much less accurate fit than the original data and, therefore, was a less accurate predictor of wages. It produced instability that resulted in dramatic changes in job values and inconsistent alteration in the value of jobs within job sequences. For example, even within series of jobs comprised of only female job classes, there were serious reversals. Several instances of this included the following examples: Translator 2 was valued less than Translator 1; X-Ray Technician 1B was valued less than X-Ray Technician 1A. It also resulted in a very considerable increase in the value of nursing work compared to all other OPSEU positions. These anomolies cannot be ignored.
Dr. Kervin states that this a priori amendment of the coefficients ignores the fact that the factors measuring job content were interrelated, and thus any single factor measured some of the content of other factors as well as its own job characteristics. He concluded that it is highly problematic to combine a statistically derived regression-based prediction model to find male comparators with a determination of job value based on assigned weights in an a priori compensation model. He writes:

Positive weights in a compensation model are perfectly appropriate when the component factors of the model are estimated by, for example, an a priori job evaluation committee. Such a committee can decide how each job rates on each factor, independently of other factors. Statistical intercorrellations are not an issue in these decisions. There is no factor analysis, there is not regression analysis – simply an agreed-upon compensation model. However, when a policy-capturing approach is used to estimate the value of jobs and to find male comparators, it is important not to mistake the statistically-based prediction equation for a compensation model.¹⁶

Ms Caney’s evidence was that the working group actually attempted to adjust the negative coefficients (paragraph 49). She testified that, in attempting to do so, they found that the factors were interconnected. When they reweighted one factor, “strange things” would happen to the others, and anomalies resulted. We note that Ms Caney testified that the most significant problem was that gender bias in the percent female factor was not reduced. The parties found the results to be bizarre and jointly rejected the procedure.
In her reply, Dr. Ames referred to brief statements about policy-capturing in publications of both the Equity Bureau of Manitoba Labour and the Pay Equity Commission of Ontario which refer to a second stage (the adjustment of the factors and weights) in the policy-capturing process. These statements are not helpful to our determination, except to indicate that others shared her view. We expect that the authors of these statements, written early in the history of pay equity implementation, were also influenced by the SUNY study’s approach. We were not referred to any academic critique of the statistical foundations of the SUNY study, nor are we even certain that they are the same as those underpinning the policy-capturing system reviewed in this decision.¹⁷
Based on this evidence, we do not agree that it is essential that a priori coefficients be designed for the job content factors established by the factor analysis. At the very least, the evidence indicates that adding an a priori step to the policy-capturing plan before us would be highly problematic. The theoretical arguments for doing so were not supported by the evidence. Nor does the evidence lead us to conclude that the failure to make a priori amendments to the deemed approved plan has resulted in an unreasonable choice of male comparator for the Nurse 2 General job class.

Decision

We find that, in all the circumstances of this case, including the size and variation of the bargaining unit as well as the brief time-frames the Respondents had to develop the plan, the results they achieved were reasonable. Of all the many steps they undertook, only three were ultimately challenged. While the Applicant employees have raised theoretical issues of interest, the evidence does not establish their claims. The allegation that the deemed approved plan contravenes sections 5(1) and 6(1) of the Act is based on three critiques of the policy-capturing methodology: the modification of the factors; the alleged penalizing impact of negative coefficients; and, the failure to amend the plan with a priori factor coefficients. For the reasons set out in this decision, we do not find that the Applicant employees have established that the Act has been contravened or that the decisions of the working group were unreasonable. We uphold the negotiated agreement of the Respondents and dismiss the Application.
We have considered the request for an order that the Respondent MBS pay the Applicant employees’ legal costs, as well as compensate those who lost wages and incurred expenses in order to attend the Tribunal hearing. In light of Tribunal jurisprudence, we decline to make the requested orders.

DECISION OF BRUCE BUDD, MEMBER, MARCH 5, 1999

I dissent from the majority conclusion that the Applicant group of employees has not demonstrated that the OPS pay equity plan is a violation of sections 5, 6 and 7 of the Pay Equity Act, R.S.O. 1990, c.P.7, as amended (the “Act”). Further, I find that the use by the OPS/OPSEU working group of the policy-capturing methodology with only a single statistical correction, for percent female, is not a sufficient and reasonable adjustment to achieve a gender-neutral comparison system and is, therefore, a breach of section 12 of the Act.
I acknowledge that the OPS/OPSEU working group had a large group of job classes to analyse within a limited time-frame, hence their selection of the SUNY policy-capturing methodology. In addition, the Ontario government was not just an ordinary employer - it had drafted and adopted this legislation, including the time-frames therein. It ran into initial difficulties when it attempted to follow the SUNY methodology. I believe that due to the time constraints this methodology was abandoned prematurely in favour of an unsubstantiated and ad hoc solution of adjusting to the “fairest” male comparator. Where that produced unsatisfactory results, the parties resorted to a second ad hoc solution of negotiating another male comparator. Neither of these solutions has been shown by OPS to adequately compensate for the female work characteristics that were invisible or undervalued in the employer’s original pay policy.
The majority correctly points out that “policy-capturing is the development of a statistical model in which specific job content features are grouped into factors. These factors are weighted in such a way that they statistically ‘predict’ the current wage structure.”^1_dissent The majority goes on to state that “the score for each job class on each factor (the loading) was multiplied by the factor’s coefficient in order to determine a value for each job class on that factor. Then, they summed these scores together and the result was the ‘worth’ of the job class used to select the respective comparators”.^2_dissent Thus, the weights for the job content factors were derived from the statistical model’s regression coefficients, made explicit what was currently implicitly valued and, after adjustment for percent female, ultimately determined the choice of male comparator.
The fundamental disagreement between the parties’ experts concerns the role of regression coefficients, particularly negative ones, in determining value and on the necessity of positively valuing all job content factors.^3_dissent Although, on balance, I prefer the evidence of Dr. Ames, I do not believe it is necessary for the panel to choose between the competing statistical interpretations. That is because, even on Dr. Kervin’s own analysis, the OPS plan fails to value, and indeed penalizes, nurses on at least three job content factors: physical caring, emotional caring and physical demands. These factors remain negative in Dr. Kervin’s additional analysis using heirarchical regression and all three are important in valuing traditionally female work. Nurses had higher loadings on these three factors, resulting in a much greater negative impact on their score than on that of their male comparator.
On the sufficiency of the single statistical adjustment for percent female undertaken by the OPS/OPSEU working group, Dr. Ames consistently maintained that to adjust for percent female was a necessary, but insufficient adjustment to eliminate gender bias because much of the gender bias in traditional pay systems comes from the undervaluing of the characteristics of women’s work which were largely invisible.^4_dissent Dr. Kervin initially was very careful in his reports and testimony to describe this exercise as an adjustment for “gender-composition wage bias”. This term quickly evolved into an adjustment that eliminated “gender bias” with no qualification or explanation by Dr.

Kervin. Yet when it comes to pay equity expertise, the majority acknowledges that Dr. Kervin was qualified as “an expert in the area of data collection, analysis methodology and statistics”while Dr. Ames “is an acknowledged expert in pay equity” upon whom the OPS’ other expert, Dr. Weiner, admitted she might have called on for her expertise in policy-capturing. I believe that the OPS process did adjust for most of the “gender-composition wage bias”, but not the undervaluing of traditionally women’s work in the current pay policies. As such, a considerable amount of gender bias remained in the comparison system and in the results.

Dr. Kervin even points out that, after testing the final results for the effectiveness of the single adjustment of removing percent female, some 25% of the original “gender-composition wage bias” remained in the overall adjusted wages. He states that “by my computations, for a job class with 100 percent female composition, this translates to approximately 58.5 cents per hour on average in wages”. In other words, the pay equity adjustment of 54 cents that the nurses received was less than the bias that Dr. Kervin estimated remained in the OPS plan (and the OPS methodology didn’t adjust for the gender bias resulting from the undervaluing of women’s work).
For the above reasons I believe that the Applicant group of employees has established that the comparison system used for the OPS/OPSEU plan was not gender-neutral and, therefore, did not comply with the provisions of the Act. It could not have resulted in the selection of an appropriate male comparator, except by accident. The Applicant employees are entitled to conclude a pay equity plan which would meet the requirements of the Act, with the assistance of a review officer and a mutually acceptable consultant, if necessary.

Dated at Toronto this 5^th^ day of March, 1999:

Phyllis Gordon

Former Chair

Margaret Kvetan

Member

Bruce Budd

Member

The authors of the SUNY study indicate that they extracted the job content factors in an orthogonal manner and similarly note that this method results in factors that are statistically independent of each other. They give the example of a factor called ‘working conditions’. If derived from an orthogonal solution, it will clearly represent working conditions and no other factors in the overall factor solution. On the other hand, they refer to ‘oblique’ solutions, which produce factors that are not independent and are correlated with each other. “With oblique solutions, this lack of clarity about what the factors are measuring makes it difficult to interpret these factors and the results of any regression equation in which they are included”. (SUNY Study, Footnote, p. 154) 

None of the experts before us commented on this and so, in the absence of evidence or argument on this point, we do not base any of our conclusions on this apparent distinction. We do note, however, that the SUNY study influenced the thinking of the working group as well as the commentary of Dr. Ames. We speculate that if the two policy-capturing projects are, indeed, different in this regard, this would explain, in part, the very different perspectives we heard.

Footnotes

An a priori method of job evaluation begins with a set of factors which are agreed upon before jobs are evaluated. Examples of factors include: experience, judgement, supervision of employees, financial responsibility, the number and levels of people an employee interacts with and why, impact of errors, complexity of work, mental effort, physical effort and working conditions. Each factor is usually subdivided into several levels describing degrees of the factor so that a broad range of jobs can be evaluated, e.g. the first level of the education factor may begin with a grade 10 with levels ranging up to a post-graduate degree. Each factor is assigned a weight and each level in each factor is assigned points. An organization may purchase an a priori job evaluation system from a management consulting firm. Or, the organization may chose to create its own system, selecting factors which reflect what the organization values for the purposes of compensation. Once the system has been selected and the factor weights and overall point system agreed upon, the jobs are evaluated. In an a priori system, this usually involves the use of a job evaluation committee which reviews job descriptions or job questionnaire results against the factors and must arrive at a decision on the overall point score for each job.
SUNY Study at p.2.
While not relevant to the outcome of the case, we comment that policy-capturing so significantly lacks in transparency that the people it was intended to assist must also rely on sophisticated technical expertise and are thereby removed from an appreciation of what gender-neutral job evaluation can really mean.
For example, one of the two questions measuring experience illustrates how sensitivity to potential gender bias shaped the questions. The question reads: “Different sorts of work prepare people for their jobs. This may be paid work or unpaid work such as volunteer work, driving or homemaking...”
Dr. Kervin states that the statistical analysis of the pilot pre-test results included: multiple comparison analysis (to look for differences among job classes); inspection for zero-order correlations of all item pairs (to check for expected relationships among similar items); correlations of each item and percent female (in job class and class series) to look for unanticipated results; and, factor analysis (to get an initial indication of whether the items would group into factors).
"Standardizing the scores for a variable means adjusting the values over all the cases so that their mean is zero and their standard deviation, a measure of ‘spread’ or ‘dispersion’ is 1.0. This allows variables to be added without biasing the result; without standardization, more weight is given to variables with higher means or standard deviations.” (Kervin, Report of June 1997 at p. 8)
The methodology selected involved principal factor extraction with squared multiple correlations on the diagonal and varimax rotation.
In order to achieve this they undertook the following: listed the value of each job class based on actual 1989 wages; added to the value the product of percent female in the job class and -0.096703 (the coefficient for percent female); used the resulting adjusted value as the dependent variable in a regression with the same independent variables - 16 job content factors and percent female; and, examined the new coefficient for the percent female variable to see how much gender-composition wage bias existed in the adjusted values.
Ames, Report of April 1992 at p. 20
Dr. Kervin illustrated how this works with an analogy: the making of a lemon/lime punch. A punch recipe calls for six lemons and two limes, equivalent to the regression prediction equation. The punch-maker has a can labeled “lemons” and another labeled “limes” believing them to contain the appropriate amounts needed. When the first can, (prepared by someone else) is added, the punch-maker discovers that it contained six lemons and three limes. The can had picked up erroneously, or perhaps inadvertently, not only the lemons that were intended for the punch, but three limes as well. This was more than was needed and represents what has happened to the lemons coefficient. So, instead of adding two limes, the punch-maker removes one of them. This is equivalent to a negative lime, resulting in a perfect punch, but a negative coefficient.
Ames, Report of April 1992 at p.6.
ibid., p.21
We note what may be a key distinction between the methodology used in the SUNY study and the OPS/OPSEU plan. Dr. Kervin states that there are two main approaches to the factor extraction part of the process. He briefly refers to the ‘orthogonal’ solution, and states that it was not the one used by the working group. He points out that where factors are extracted in an orthogonal manner, they will be uncorrelated and independent of other predictors.
Ames, Report of November 1997 at p. 4.
Ames, Report of August 1996 at p. 9.
Kervin, Report of June 1997 at p. 57.
See footnote 13.
At paragragh 14 (emphasis added).
At paragragh 72.
In an interesting statistical aside, the majority, in its footnote 13, notes “what may be a key distinction between the methodology used in the SUNY study and the OPS/OPSEU plan” - that of extracting the job content factors in an orthogonal manner which results in factors that are statistically independent of each other. This does not occur with the oblique manner of extraction used by the OPS team. Thus these exhibits seem to indicate that if the proper extraction technique had been used, the coefficients would have indicated the real weighting of the factors. Since the experts did not comment on this aspect, only a redoing of these parts of the process will allow this and other statistical disputes to be resolved.
At no time did Dr. Ames suggest that just zeroing out the negative coefficients was a sufficient adjustment to achieve pay equity. She just used it to demonstrate statistically the reversal in scores that would occur with such a simple procedure.