Working with these research files requires advanced data management skills. Many of the district and county research files may be too large for spreadsheet applications such as MS Excel. Database applications like MS Access, SAS, or SPSS will be required to fully manage these research files.
For each entity (school, district, county, or state), there are numerous records. Each record represents a different combination of demographic subgroups and grade levels. With multiple records per entity, it is critical that the desired combination of characteristics is accurately selected.
Copying individual report pages into a spreadsheet application is possible if the target computer is using the most current operating systems and spreadsheet application versions.
The Research files contain the aggregate score data for the Aprenda 3 test. The research files are available in the plain text, comma-delimited format. The first line of each file lists the appropriate field names. A statewide research file containing the state, county, district, and school data for "All Students" (no demographic subgroup data) will be available. In addition, a similar statewide research file containing the data for "All Subgroups" is available.
Files can also be downloaded for any single county, district or school. These files may contain all data (all subgroups) for all entities comprising the selected entity or only the selected subgroup, depending on the users choice at the time of download. For example, if a district file is selected, the data for all schools in that district will be included in the file. The research files are comma delimited and may be zipped to allow easier download and file import management.
The Entities File contains all school, district, and county names for which test results are available. This file must be merged with the research file to join these entity names with the appropriate score data. A database program such as MS Access is most appropriate for this purpose.
Files are named using the following convention:
The filename will always contain a complete CDS plus subgroup, and may be padded with zeros for higher level entities (e.g. a District level report would have a school code of "0000000"). A Subgroup_ID of "000" in the filename indicates All Subgroups. The file extension will be either TXT or ZIP, depending on whether the file was large enough to require compression.
When downloading the research files with the ".txt" extension, most browsers will give you the option of opening them in the default text viewer (NotePad on Windows, TextEdit on Mac). Typically, research file users do not want to simply view them in a text editor, so do not select Open. Instead, save the file to a known location on your hard disk.
Those interested in using Microsoft Excel to view and manipulate the Aprenda research data must take care to ensure that Excel does not automatically reformat the data. Aprenda research files are in the comma-separated format, but intentionally have the extension ".txt", not the more common ".csv". This is because Microsoft Excel will automatically import and format files ending in ".csv". This auto-formatting will remove all of the leading zeros from the County, District, Charter, and School codes, as well as the grade and subgroup ID fields. The removal of these zeros will cause problems if you attempt to link this data with an external source, such as the Entity file.
To properly open the research file, start Microsoft Excel and open the ".txt" data file that you saved to your hard disk. Excel will then begin the "Text Import Wizard." This wizard will allow you to properly format the file. On the first screen, ensure that the Original Data Type radio button indicates "delimited," then select Next. In Step 2, only the "Comma" checkbox should be checked in the Delimiters section. Set the "Text Qualifier" to "none" (you will find no quoted fields in the research or entity files). You should now see a properly segmented preview in the bottom box. Select Next. Step 3 is critical to properly formatting the file. On this screen, you will see a data preview in the lower box and a Column Data Format section in the upper right-hand corner. Click on each column that needs to be set to type "text" (county_code, district_code, charter_number, school_code, grade, and subgroup_ID) and then change the Column Data Format from "General" to "Text" (you may need to side-scroll the data preview box to see all of the columns). Once each column has been set to "Text," select the "Finish" button. Your Excel import should be complete. It is recommended that you then save the newly formatted file as an Excel spreadsheet.
If you are using a database or statistical analysis program to manipulate the data, you should follow the text import procedure for your application. Use comma as the delimiter, the first row as the field names, and ensure all zero padded fields have a text/varchar datatype (county_code, district_code, charter_number, school_code, grade, and subgroup_ID). This applies to both the research data files and entity files.
For a full description of the individual field definitions, see the 2008 Research File Layout (PDF).
The Research File Layout link provides the following information:
Users of comma delimited research files will find these layouts useful in confirming the sequence of elements as well as value lookup. Users may view and/or download any of the layouts and tables.
Large files downloaded from this site may be compressed using the ZIP format. If uncompression software is not already installed on the target computers, it is available at the following locations:
Achieving accurate results requires an understanding of the structure and content of the two primary tables: the entities and the test data tables. The research files have many rows for each entity. In order to correctly work with the data, you must use constraints to limit the data you are reporting. These constraints are discussed below.
In order to protect student confidentiality, scores are not included for any group of ten or fewer students. Suppressed scores are indicated by an asterisk (*) in the appropriate field.
Entities Table – This table is comprised of the state, all counties, districts, and schools in California for which Aprenda 3 results are reported. Because there are both school level and district summary records, as well as county and state summary records, it is critical that in any analysis, the correct type of record be selected. This will help avoid double or triple counting that will occur when a school count is also counted in the associated district record. Only counties, districts, and schools that had one or more students take the Aprenda 3 for the selected year will be included in the Entities table. Hence, Entities files will vary from year to year, and it is important to link the appropriate Entities table to the Test Data Table.
Test Data Table – This table is comprised of the school, district, county, and state aggregate Aprenda counts and scores.
To accurately analyze and report from these research files, the appropriate constraints must be applied to the following elements:
Providing accurate and meaningful reports from the research files generally requires the "linking" of the Entities and Test Data tables. Additional efforts might include linking to the "lookup" tables discussed above. Working with these tables requires an understanding of "relational" data tables and their manipulation.