Testimony Regarding the Quality of the North American Registry of Midwives Certification Procedures
Robert A. Mahlman
Associate Director of Assessment
Vocational Instructional Materials Laboratory
The Ohio State University
Executive Summary
Substantial thought and resources were invested in the foundation of the certification system, i.e., the job analysis. Development of the preliminary list of tasks associated with the role of the entry-level, direct-entry midwife yielded a comprehensive coverage of the job. The comprehensiveness of this list is evidenced by the relatively few "add-on" tasks indicated in the task analysis survey. In other words, few, if any, important functions of the direct-entry midwife were not represented by this list. The large sample of midwives completing the task analysis survey (N=817) contained a broad representation of individuals serving within this capacity. The job analysis performed by Schroeder Measurement Technologies is representative of "best practice" for its purpose in terms of job analysis procedures. The task analysis survey was used to identify those tasks which are critical to performance of the job of entry-level, direct-entry midwife. Once identified, these critical tasks defined the content specifications for both the written and performance components of the certification assessment procedures. The development of content specifications such as these (KSAS) needed to demonstrate minimum proficiency is crucial to test fairness and coincides with the primary purpose of certification-- the protection of public health and safety.
The content specifications were created in a manner to ensure a representative sampling of critical KSAs defining the content domain. The use of detailed job and task analysis information as primary test specifications again represents "best practice" methodology in the development of certification assessments.
The item-writing process itself provided training to the writers and emphasized adherence to format and content specifications. Items were thoroughly reviewed by the item writing team as well as the NARM Board for adherence to these specifications. Item edits or item deletions were performed as necessary. Unquestionably, the procedures should ensure that the final forms of the tests exhibit content validity, and that the tests should, in theory, measure the KSAs needed for minimum proficiency as defined by the NARM board. There is also no question that the tests will exhibit job relatedness.
As in any valid certification program, the minimum proficiency needed to protect the safety of the public is defined by a panel of experts within the occupation. Widely accepted procedures were employed in determining pass/fail cut scores on both the written and performance components of the certification assessment procedures. While all cut-score methodologies rely on human judgment, SMT took significant precautions to minimize the error inherent in these judgments. As such, these cut-scores are likely to be as accurate as those associated with any quality certification program for the classification of competent versus non-competent candidates.
However, as with most newly-formed certification tests, there is a lack of empirical evidence describing the performance of the tests at and around the cut-scores, and a lack of criterion-related validity data showing evidence of the test(s) accuracy in classifying candidates.
This type of data takes time and resources to collect, and I would recommend that this issue be further researched by SMT and NARM. Given SMT's and NARM's planned attention to the maintenance and continuous improvement of the system over time, I have no doubt that this data will eventually be available. Other identified "weaknesses" in the certification system also pertain to the lack of statistically-related psychometric information, which again is to be expected in a newly developed certification system.
The certification process incorporates information and procedures to ensure accuracy and fairness in testing. These include well-defined test specifications provided to the candidates; thorough standardization of test administration procedures; training for Qualified Evaluators administering performance assessments; feedback to examinees regarding test performance; automatic re-scoring of tests resulting near the cut-scores; adequate procedures for test review and re-testing for failing candidates; and procedures for appeal for those not meeting certification requirements. Additionally, the screening of certification applicants performed via rigorous prerequisites should ensure that only individuals with the appropriate training and experience will be allowed to even attempt the certification assessments.
The overall quality of the processes used by Schroeder Measurement Technologies, Inc. to develop certification tests and testing procedures is very high and represents best practice within industry standards. I would venture to say that the quality is at least as good as comparable certification programs and likely better than most. The procedures followed were clearly based on established standards for educational and psychological testing as set forth by the American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education.
Outlines of Findings in the Review Leading To Testimony
Some General Issues in Certification Testing
- The primary objective of certification/licensure is the protection of the public health and safety.
- Protection of public safety and health is a function of the ability of the certification procedures to identify and classify individuals who possess, versus those who do not possess, the minimum knowledge, skills, and abilities (KSAS) required to competently perform the tasks necessary to the occupation.
- Ability of certification procedures to classify individuals as described above is dependent upon
- the quality of the process used to identify the KSAs (or tasks) critical to performance of the job;
- the accuracy of the definition of minimum required competence;
- the quality of the process used to develop and refine measures of the KSAs that are critical to performance of the job;
- the ability of the measures to discriminate between those who do and do not possess the necessary KSAS; and
- the ability of the established pass/fail cut-score to accurately classify individuals as competent or non-competent with reference to those KSAS.
- The most vital "litmus test" of the quality of any employment testing system (including those used for selection, classification, promotion, training decisions, and certification/licensure) is job relevance. The content of the measure must be related to the job in question in order for employment-related inferences derived from test scores to be considered valid.
- Job relevance is established by showing a direct link between job analysis information and the employment testing system.
- Job relevance is established via judgments by Subject Matter Experts (SMES) regarding the relationship between the test or test items and the job.
- For a certification test to be fair, it must measure KSAs critical to the specific job in question, not those expected to be possessed by some "ideal" candidate that are over and above what are needed to perform the job. An entry-level certification test must measure entry-level skills rather than those expected of individuals with many years of experience. The test must measure KSAs that the candidate has had the opportunity to learn.
Strengths of the North American Registry of Midwives Certification Procedures
- Multi-hurdle process that ensures candidates hold the necessary Knowledge, Skills, and Abilities to perform the role of entry-level midwife.
- Meeting stringent prerequisites before testing
- Experience
- minimum 20 births
- minimum 75 prenatal examinations
- 20 newborn examinations
- 40 postpartum examinations
- Knowledge
- Documentation of relevant education
- Skills
- Evidence indicated by preceptor/supervisor/mentor that proficiency has been attained for each skill listed in a comprehensive skill list
- Holder of a current CPR certificate
- Detailed reference forms completed by two clients and a professional.
- Passing the written certification test before performance assessment
- Passing the performance-based certification assessment
- Based on very thorough and appropriate job analysis
- Appropriate instructions to respondents regarding entry-level direct-entry midwives.
- Good rating scale - "extremely important" task anchor addresses the purposes of defining critical tasks for certification purposes- the health and safety of the mother, child, and midwife.
- Respondent sampling as good or better than can be expected. Large number of respondents.
- Content domain well-covered as evidenced by the relatively few "additional tasks" provided by the respondents.
- Appropriate use of job analysis data for the development of test specifications
- Including only tasks deemed as critical to the occupation
- Balancing test content coverage with relative quantity of critical tasks within each of the content areas- both on written and performance assessments.
- Separation of tasks to be addressed by performance assessment versus written assessment.
- Performance assessments developed as checklists based on steps involved in critical tasks.
- Adequate training for item writers with thorough item writing guidelines
- Substantial attention paid to item writing specifications including item layout and content specifications
- Attention paid to the level of knowledge that the test is designed to measure, forcing the candidate to demonstrate the required minimum KSAs as specified by the governing authority, and focusing only on content defined by the critical tasks identified in the job analysis.
- Appropriate content validation / item review procedures for both the written and performance assessments.
- Clearly defined content specifications served as the basis for item writing
- All items were reviewed and content validated by panel of item writers
- All items reviewed and content validated by NARM Board
- High quality cut-score setting procedures for written and performance assessments
- Significant attention paid to the selection of the SME panel for the establishment of cut-scores.
- Utilization of what is probably the most widely-accepted methodology for setting cut-scores; the Angoff method.
- Minimum competency adequately defined and discussed.
- Opportunity to discuss and refine item-level cut estimates. Thorough discussion of items where large SDs were encountered.
- Appropriate training for Qualified Evaluators for administering performance assessments
- Focus on identifying performance which fails to meet standards for safe practice
- Focus on standardization of assessment administration
- Focus on making Evaluators aware of various rating errors
- High quality test administration procedures
- Certification testing based on both written and performance assessments
- Clear test specifications provided to all candidates
- Substantial attention paid to test security issues
- Clear and detailed standardized test administration procedures for written and performance assessments.
- Substantial attention paid to scoring accuracy (e.g., re-scoring of tests near cut-score)
- Well-defined rights to appeal and review and re-testing
Weaknesses of the North American Registry of Midwives Certification Procedures
- While content coverage (content validity) has been thoroughly addressed, strong evidence is not provided regarding whether or not the test is capable of discriminating among competent vs. incompetent candidates. (It does not mean that the assessments do not discriminate appropriately, it simply means that empirical evidence was not provided.)
- A good certification test will have the majority of the items testing ability levels at or near the cut-score. Technical manual does not indicate frequency distribution of Angoff item weights, which would at least provide evidence regarding the number of items predicted to surround the judgment point. Actual item difficulties for individuals who are borderline competent or incompetent have not been obtained, as described below.
- The initial examinees completing the assessments are described as "extremely experienced compared to the minimum experience requirements established to take the examination." As such, p-values can be expected (and were found) to be quite high. A more appropriate sample, especially for piloting, would have included individuals expected to be classified as non-competent. Given the stringent prerequisites to candidacy for certification, it is unlikely that many non-competent individuals will advance to the testing stages of certification.
- A large number of items show item-total correlation near zero, and reliability coefficients are low for such a long written test. (These are not a major concerns, since they are likely caused by the lack of variability in the pilot sample. In addition, internal consistency is not as important a concern with competency tests and when measuring a construct as broad as occupational knowledge. Content coverage and the ability to correctly classify individuals are the primary concerns with this type of assessment)
- A thorough piloting of the assessments before live administration would have been desirable.
- If it is expected that the examinee population will be homogeneous (i.e., little variability in scores), equipercentile equating of scores (for the creation of parallel test forms) may be unstable (although I cannot identify a more acceptable alternative given the relatively low number of examinees.)
- Interrater reliabilities obtained from performance assessments have not been provided. (Although training exists for Qualified Evaluators administering performance assessments, I have not received enough detail on this training to provide comment.)
Summary of Testimony Regarding the Quality of the North American Registry of Midwives Certification Procedures
Robert A. Mahlman
Associate Director of Assessment Services
Vocational Instructional Materials Laboratory
The Ohio State University
The overall quality of the processes used to develop the certification tests and testing procedures is very high and represents "best practice" within certification industry standards. The procedures followed were clearly based on established standards for educational and psychological testing as set forth by the American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education. Substantial thought and resources were invested in the foundation of the certification system, i.e., the job analysis. The development of the content specifications based on the job analysis information was focused on the primary purpose of certification- the protection of public health and safety. The procedures used for test item writing, content validation, setting pass/fail cut scores, and the development of the overall certification process represent a system explicitly designed to optimize fairness and accuracy in certification testing.
Return to Ohio Testimony Summary Page