Bi-directional interface between micro-CDS/ISIS rel.3.0 and micro-IDAMS User Guide December 1993 (c) UNESCO CII/PGI 7, Place de Fontenoy 75700 Paris, FRANCE Contents 1. INTRODUCTION 2. CONCEPTS AND TERMS 2.1. Repeatability and subfields in data base export 2.2. Missing data and code labels in import 2.3. Code labels in export 3. DATA BASE EXPORT TO IDAMS 3.1. Restore, save, and erase previous selection 3.2. Field and variable selection 3.3. Code label record generation 3.4. Dictionary construction 3.5. Data export 4. DATASET IMPORT FROM IDAMS 4.1. Restore, save, and erase previous selection 4.2. Variable selection 4.3. Field definition table (FDT) construction and updating 4.4. Data import 4.5. Code label transformation 5. RESTRICTIONS APPENDIX 1. Menus used by the interface 2. Program files 3. Messages 1. INTRODUCTION =============== The interface performs data description and data transfer between the micro-CDS/ISIS and micro-IDAMS packages in both directions. This means a transfer: - of a selected part of an ISIS data base into an IDAMS dataset, or - of a selected part of an IDAMS dataset into an ISIS data base. The transfer is controlled basically by the data description files of the respective packages, namely: - in the case of an ISIS to IDAMS transfer by the ISIS field definition table (FDT), and - in the case of an IDAMS to ISIS transfer by the IDAMS dictionary file. When going from ISIS to IDAMS, a new IDAMS dictionary and data files are always constructed and they can be match-merged with other data using IDAMS data management facilities. When going from IDAMS to ISIS, there are three basic possibilities: 1. a completely new data base can be constructed, 2. imported records can be added to an existing data base as new data base records, 3. records of an existing data base can be updated with the imported data. In case of options 2 or 3, the FDT can be modified with the description of the imported fields or can be left as it is, since ISIS can also handle fields not defined in the FDT. When importing an IDAMS dataset, the ISIS FDT and therefore the master file records as well, can be constructed quasi automatically. The other vital components of ISIS data base (like FST, display formats, etc.) should further be defined. When exporting an ISIS data base to IDAMS, the user is requested to complete his/her IDAMS dictionary. The interface, written in CDS/ISIS Pascal, is an optional part of the micro-ISIS package with the following function added to the main menu: F - ISIS-IDAMS Interface while the menu to be connected is EXIDN. 2. CONCEPTS AND TERMS ===================== In the subsequent text, terms from both the IDAMS and the ISIS packages are used. Instead of repeating their definitions here, we refer the user to the user manuals: - Mini-micro CDS/ISIS, Reference Manual (current version). UNESCO, 7, Place de Fontenoy, 75700 Paris - IDAMS-MF User Manual (current version). UNESCO, 7, Place de Fontenoy, 75700 Paris Nevertheless, we recall the reader's attention to the basic data storage concepts and terms of ISIS and IDAMS, and we show the correspondance between them in Table 1. These terms will be used in the sequel and the name of the package (i.e. ISIS or IDAMS) will be mentioned only in cases where its lack would lead to misunderstanding. | ISIS | IDAMS ============================================================================== integrated information | data base | dataset storage unit | | ------------------------------------------------------------------------------ data description | field definition table | dictionary ------------------------------------------------------------------------------ data | master file | data file ------------------------------------------------------------------------------ structured data unit | | to be | data record | data record/ retrieved/processed | | case ------------------------------------------------------------------------------ elementary data unit | field, subfield | variable (data element) | | ------------------------------------------------------------------------------ data element reference | tag, subfield identifier, | variable number | occurrence | ------------------------------------------------------------------------------ data element name | field name | variable name ------------------------------------------------------------------------------ data element type | alphanumeric, alphabetic, | alphabetic, numeric | numeric, pattern | ------------------------------------------------------------------------------ storage mode | character | character ------------------------------------------------------------------------------ | | missing data code, special features | repeatability | number of decimal places, | | code label records ------------------------------------------------------------------------------ Table 1. Data storage concepts and terms in ISIS and IDAMS Generally speaking, ISIS fields (subfields) are equivalent to IDAMS variables, both in export and in import. 2.1. Repeatability and subfields in data base export ==================================================== Depending on the existance of subfields in, and the repeatability of a field, the correspondance between fields and variables can be described in a two-way table (see Table 2.). If a field has no subfields then a single variable is constructed. If it is subfielded, a group of variables can be constructed. If a field is repeatable then its first 'n' occurrences are taken into account where 'n' is specified by the user. The occurrences may produce separate IDAMS variables or they may be placed into the same variable but in repeated records. The first way of repetition handling is called variable-wise, the second one case-wise repetition. We must be careful with specifying case-wise repetition for more than one repeatable field because all the occurrence combinations will yield separate records and the number of records thus created may be very high. In addition, we may face problems also when interpreting statistics calculated on the basis of such a dataset, since the repeated cases may weight the variables in an undesirable manner. If we specify variable-wise repetition and a variable group, derived from subfields of the repeated field at the same time, we have to pay attention to the number of variables created too. 'n' repetitions with 'm' subfields will produce n x m variables. The possible subfield-repetition handling combinations are as follows: a) No subfields, no repetition: One variable is created from one field. One IDAMS case is built from one ISIS data record. b) Subfielded field, no repetition: A variable group with more than one variable is created from one field, according to the selected subfields. One IDAMS case is built from one ISIS data record. c) No subfields, variable-wise repetition: A series of variables is created from the occurrences of a field. The number of export variables is equal to the maximum number of occurrences accepted. One IDAMS case is built from one ISIS data record. d) Subfielded field, variable-wise repetition: A series of variable groups is created from the occurrences of a field. The number of export variables is equal to the maximum number of accepted occurrences multiplied by the number of selected subfields. One IDAMS case is built from one ISIS data record. e) No subfields, case-wise repetition: One variable is created from one field. The occurrences of the field produce separate IDAMS cases, while the content of the other export variables are held "constant". The number of exported IDAMS cases is at most the maximum repetition number specified in the field selection step. f) Subfielded field, case-wise repetition: One variable group is created from one field. The number of exported IDAMS cases is at most the maximum repetition number specified in the field selection step. ----------------------------------------------------------------------------- | Repetition | |-------------------------------------------------------| | No | Variable-wise | Case-wise | ----------------------------------------------------------------------------- | | variable | series of variables | variable | | no | * | * | * | | | one export case | one export case | more than one | | | | | export case | subfields |-------|-----------------|---------------------|---------------| | | group of | series of variable | group of | | | variables | groups | variables | | yes | * | * | * | | | one export case | one export case | more than one | | | | | export case | ----------------------------------------------------------------------------- Table 2. Summary table of correspondance between fields and variables in export 2.2. Missing data and code labels in import =========================================== There are two special features in IDAMS which are of particular interest for data import. Missing data in IDAMS can be detected on the basis of missing data values defined in the dictionary. We can specify if we wish to recognize these when importing data. If yes, then data are checked for first, second or both missing data values. If missing data is detected in a variable then this variable is not transferred into the ISIS data record, i.e. the corresponding field will not exist for that data record. Code labels defined in the IDAMS dictionary can be used to substitute code values by a character string in the ISIS data record. 2.3.Code labels in export =========================================== When exporting an ISIS data base, IDAMS dictionary code label records can be generated automatically for such fields (subfields) which describe cat- egorical properties of objects in text format (category labels); and with the help of these code records the numerical codes defined in them can be exported instead of the original category labels stored in the ISIS records. 3. DATA BASE EXPORT TO IDAMS (select E option from ISIS-IDAMS Interface menu) ============================ The export menu has the following five items: (C) - Change data base - recall previous export specification (V) - Save field and variable selection (R) - Erase saved field and variable selection (L) - Code label generation (S) - Field and variable selection (D) - IDAMS dictionary construction (E) - Data export (X) - Exit The name of the data base to be exported can be specified before entering the export function. If no data base has previously be specified, use of 'S', 'D', or 'E' will generate a request for one. 3.1. Restore, save, and erase previous selection (option 'R' in export menu) ================================================ If a data base has already been specified but a different one is required, then use the 'C' option which changes the data base and in addition, restores the previousely defined export parameters whenever they have been saved. The actually defined export specification can be saved for later use with the help of the 'V' option. Formerly saved export specifications may be erased by using the 'R' option. When recalling, saving or erasing operations are initiated, the user is always requested to confirm the action. 3.2. Field and variable selection (option 'S' from main export menu) ================================= This function allows for the selection of the fields to be exported and the specification of the corresponding variables to be constructed. Only fields defined in the Field Definition Table can be exported. We can decide if we wish to continue a previously interrupted selection step or start a new selection. A worksheet called the Field Selection screen is then displayed: Tag Number ---------------------------------- ----------- | F I E L D S E L E C T I O N | | 0 | ---------------------------------- ----------- Field Name Subfield/Pattern ---------------------------------- --------------------------- | Master File Number | | | ---------------------------------- --------------------------- Repeatability --------------------- | Non-repeatable | --------------------- Selection For Export ------- | No | Field Type ------- --------------------- | Numeric | --------------------- S - Select | U - Unselect | N - Next field | P - Previous field | J - Jump T - Terminate selection | L - List selected fields and variables | -> The message area at the bottom of the screen shows the following options (control characters): S - Select. Displays the variable specification window (described later) and selects the field for export. U - Unselect. Cancels field selection, which means that the field is deleted from the selected fields list. Variable descriptor para- meters already specified are not erased but they are not valid and are invisible until the field is selected again. N - Next field. Displays next field from the FDT. P - Previous field. Displays previous field from the FDT. T - Terminate selection. A summary of fields and variables is displayed (see function L) before returning to the export menu. L - List selected fields and variables. A list of fields selected and the corresponding IDAMS variables to be generated is displayed. J - Jump. Jumps n fields forwards or backwards depending on the sign of n. When we choose the 'S' option to select the current field, a new set of control characters appears in the message area and the IDAMS variable specification window is displayed asking if a variable or a repetition is to be defined. There are two active control characters at this point: R - Repetition parameters to be shown in the window. This can only be used if the currently displayed field is repeatable. V - Variable specification parameters to be shown in the window. Parameters appearing on the screen depend on the field selected for export because only the appropriate ones are displayed. Having selected 'R' or 'V', the following control characters are active for both window types: M - Modify repetition or variable specification window, as selected. This option opens the window for entry or modification of parameters. C - Cancel the last parameter entry/modification. The original or default values are restored. D - Delete the last variable specification. E - Exit to the Field Selection window without saving parameters entered/modified for this field. X - Exit and save parameters entered/modified (concerns whole group of variables if the corresponding field is subfielded). The next two control characters are active only for the variable specification window: N - Parameters of the variable corresponding to the next subfield are presented if the field in question is subfielded. P - Parameters of the variable corresponding to the previous subfield are presented if the field in question is subfielded. Repetition window: (select 'R' in the Field Selection screen message area) ----------------- This can only be accessed if the currently selected field is repeatable. Repetition can be handled in a variable-wise or case-wise manner. Type 'M' to modify the parameters in the window. Then type 'V' or 'C' as appropriate followed by a value for the maximum number of repetitions. When variable-wise repetition handling is selected, this maximum, 'n', determines the number of variables or variable groups (see subfield selection) to be created for the field. If in an ISIS data record there are fewer than 'n' occurrences of the field, then the first missing data code (or space) will be assigned to the rest of the variables. If more than 'n' occurrences are found, then only the first 'n' will be used. When case-wise repetition handling is selected, a maximum of 'n' means that a maximum of 'n' IDAMS data records will be created from one ISIS data record according to the actual number of occurrences of the field. If more than one field is selected for case-wise repetition with n1,n2,...,nk occurrences respectively then n1 x n2 x ... x nk records will be produced. Variable description window (select 'V' in the Field Selection screen --------------------------- message area) If the field is subfielded, the first item in this window is the subfield selection. We can chose one of the subfield identifiers shown above the window on the screen to create one variable in the variable group derived from this field. The second item is always present and requests the type for the output variable. This type can be numeric or alphanumeric for any field type except alphabetic. Variables derived from the same subfielded field may have different types from each other. Missing data codes are requested only if numeric type is specified (if the variable type is alphanumeric, missing fields are always stored as blanks). Missing data codes must be unsigned integers. If no missing data code is required, leave the field empty. The number of decimals is also only specified for numeric variables. The default is 0. Variable width is an obligatory parameter and may vary between 1-9 for numeric and between 1-255 for alphanumeric variable type. Missing data code widths, number of decimals and variable width are cross- checked and are not accepted if they are inconsistent. In case of alphabetic or alphanumeric fields (subfields) automatic code label record generation and category label recoding into numerical code can be requested. If so, the maximum number of different category labels has also to be defined. This feature can only be used if the exported va- riable is numerical. After entering all parameters, type enter to exit to the message area and then 'X' to save this specification. When all required fields have been selected, type 'T' to exit to the main export menu. 3.3. Code label generation (option 'L' from main export menu) ============================ The basic code label generation screen (variable list) shows those variables for which code label record generation has been requested. The answer to the question appearing in the message area allows to E - Extract category labels from data base records. L - Load code labels and codes (code label records) generated in a previous step. The extracted category labels are sorted in alphabetic order and code values starting from 1 are assigned to them. After extracting category labels or loading code label records they can be overviewed and the code values can be modified. Variables can be selected by moving up and down in the variable list and code label records for the selected variable can be shown. Control characters in the message area are as follows: N - Position to the next line (next variable). P - Position to the previous line (previous variable). C - Show code labels in a window (code label window). X - Terminate and complete code label record generation. The code label window (option 'C') displays code labels and code values re- lated to a selected variable. Control characters now in the message area are as follows: N - Position to the next line (next code label and value). P - Position to the previous line (previous code label and value). M - Modify selected code value. X - End overview of code labels and modification of code values. After selecting option 'X' the variable list will reappear. Before leaving code label record generation the user can decide (by answering the question "Sort on code value?") if a sort of code label records on code value should be performed: Y - Sort on code value is requested. N - No sort is requested. 3.4. Dictionary construction (option D from main export menu) ============================ After the field selection step, and optionally the code label generation is complete, option 'D' is used from the main export menu in order to create an IDAMS dictionary. Before finalizing the dictionary, the variable descrip- tor records are displayed and the variable names can be modified. Control characters in the message area are as follows: N - Position to the next line (next variable). P - Position to the previous line (previous variable). M - Modify variable name. E - End corection and show next page of variables. (Next to the last page is the first one). X - End correction and execute dictionary construction using the modified variable descriptor records and exit. After making the required modifications, type 'X' to create the dictionary. You will be asked for the DOS directory in which to put the dictionary. The default directory is that of the data base which is being exported. The dataset name is then requested, for which the data base name is the default. The actual dictionary name will be: \.DIC The content of the dictionary will be as follows: Dictionary descriptor record : - First variable number: 1 - Last variable number: number of variables created for export - Variable location format: 1 (starting location and field width) Variable descriptor records : - Variable number: sequence number of variable in the set of variables created for export - Variable name: field name in FDT (truncated as necessary)/ occurrence / subfield identifier (Names can be modified by the user during dictionary creation) - Location starting position: filled by interface during the IDAMS dictionary building phase field width: field width in FDT (decreased as necessary or modified by user during field selection) - Number of decimal places: requested from user during field selection - Type and storage mode: defaults are as follow (in character mode): alphanumeric (X) -> 1 (alphabetic) alphabetic (A) -> 1 (alphabetic) numeric (N) -> Blank (numeric) pattern (P) -> 1 (alphabetic) - First missing data code: requested from user during field selection - Second missing data code: requested from user during field selection - Reference number: field tag number - Study ID: first three characters of ISIS data base name 3.5. Data export (option 'E' from main export menu) ================ D - Export all data base records R - Export retrieved records X - Exit Data export can be requested directly after the field selection step or, more usually, after the dictionary construction step. When option 'D' is selected, the user is asked to specify a starting and optionally an ending master file record number (MFN), e.g. 1 5 would select the first 5 records; 10 would use all records starting from MFN 10. The default is to use all records. If option 'R' is selected, a regular ISIS search function can be specified. After specifying a search function, the user is then asked for the required MFN range as above. The DOS directory and the dataset name are specified the same way as for the dictionary. The actual data file name will be: \.DAT The field selection and the IDAMS dictionary construction process provides a mapping from variable length ISIS data base records into fixed length IDAMS records. The contents of the ISIS fields are transferred according to this mapping with proper conversions and/or truncations. Non-existing (empty) ISIS fields will appear in the IDAMS data file as first missing data values for numeric variables and as a field of full spaces for alpha- numeric variables. A missing data value is also used if an invalid numeric value is encountered for a numeric variable or if the width of a numeric value is greater than the variable width. A value of -1 is used in the data record if no first missing data code was specified for the variable. If a value contains decimals, it is edited according to IDAMS rules. 4. DATASET IMPORT FROM IDAMS (select 'I' option from ISIS-IDAMS Interface ============================ menu) The import menu has the following five items: (C) - Recall previous import specification (V) - Save variable selection (R) - Erase saved variable selection (S) - Variable selection (D) - FDT construction/update (I) - Data import (L) - Code label transformation (X) - Exit 4.1. Restore, save, and erase previous selection (option 'R' in import menu) ================================================ Variable selections accomplished in previous sessions can be restored by using option 'C' and specifying a dataset name. The actually defined import specification can be saved for later use with the help of the 'V' option. Formerly saved import specifications may be erased by using the 'R' option. When recalling, saving or erasing operations are initiated, the user is always requested to confirm the action. 4.2. Variable selection (option 'S' in import menu) ======================= This function allows for selection of variables to be imported to ISIS and specification of data transfer parameters. We can decide if we wish to continue a previously interrupted selection step or start a new one. In the latter case the name of the DOS directory and the name of the dataset to be imported are requested. Note that the dictionary and data files of the IDAMS dataset must be named with a common dataset name followed by the extent DIC and DAT respectively, e.g. EGG.DIC, EGG.DAT where only EGG will be given as the dataset name. After having defined the dataset name the starting tag value is requested which will be used in automatic field tag assignment. This option provides tag values according to the formula: = + - 1. For selecting/unselecting the variables a following worksheet is used: Variable Number ---------------------------------------- ---------- | V A R I A B L E S E L E C T I O N | | 1 | ---------------------------------------- ---------- Variable Name Missing Data Codes ---------------------------------- --------------------------- | UNIT IDENTIFICATION NO | | NONE NONE | ---------------------------------- --------------------------- Number Of Decimals ------- | 0 | ------- Selection For Import ------- | No | Variable Type ------- --------------------- | Numeric | --------------------- S - Select | U - Unselect | N - Next variable| P - Previous variable| J - Jump T - Terminate selection | L - List selected variables and field | -> The message area at the bottom of the screen shows the following options (control characters): S - Select. Displays the data transfer specification window (described below) and selects the variable for import. U - Unselect. Cancels variable selection, which means that the variable is deleted from the selected variables list. Data transfer parameters already specified are not erased but they are not valid and are invisible until the variable is selected again. N - Next variable. Displays next variable from the dictionary. P - Previous variable. Displays previous variable from the dictionary. T - Terminate selection. This is the exit for the variable selection function, with a summary listing (see function L) before quitting. L - List selected variables and fields. An overview list is displayed with the selected variables and the corresponding fields constructed from them. J - Jump. Jumps n variables forwards or backwards depending on the sign of n. Choosing option 'S' produces the data transfer specification window and a new set of control characters in the message area: M - Modify data transfer specification window. This option allows for modification of parameters in the data transfer specification window. C - Cancel the last parameter modification. The original or default values are restored. E - Exit without saving modified parameters. X - Exit and save modified parameters. Data transfer specification window ---------------------------------- If the selected variable has code label records in the dictionary then its numeric code values can be recoded into the corresponding labels in the data base. This option is activated with the answer Y (Yes) to the question asking about the code label recode. The missing data code handling is the second parameter to be specified in the window. There are four possibilities: N - None of missing data codes are recognized as missing data. All data values are handled as valid data. 1 - First missing data code is recognized. 2 - Second missing data code is recognized. B - Both missing data codes are recognized. The third and fourth parameters specify the field and subfield where the variable will be imported. The default for field tag is the automatic tag value assignment ("AUTO") the formula for which is given above. A user defined tag value can be typed in as well as a subfield identifier. Field tag: AUTO - automatic field tag assignment. empty - automatic field tag assignment. number - user defined tag number. Subfield: letter - subfield identifier. space - no subfield will be used. Only the parameters relevant to the selected variable are presented in the window. 4.3. Field definition table (FDT) construction and updating =========================================================== (option 'D' in import menu) In the FDT construction menu we find the following options: C - Change data base. If a data base has already been selected, this option can be used to load a different one. N - New data base construction. A new data base is constructed on the basis of the fields defined in the variable selection step. The result is a data base with an empty master file. Data base files FST,FMT,PFT,XRF,CNT,N01,L01,N02,L02,IFP,ANY are copied from prototype files into files with the proper data base name. M - Modifying existing FDT. The field definition table of the selected data base is modified. The fields defined in the variable selection step are added to the FDT. The starting tag value defined in data transfer specification can be overridden by the user. X - Exit. Quit FDT construction. Content of the FDT records built will be as described below: - Field tag: starting tag value plus variable number in IDAMS dictionary minus one (implied by AUTO), or user defined tag value - Field name: variable name in IDAMS dictionary - Field length: field width in IDAMS dictionary - Field type: (character mode) alphanumeric -> X (alphanumeric) numeric -> N (numeric) if there are no decimals -> X (alphanumeric) if number of decimals is greater than zero or when code values are recoded to code labels - Repeatability: blank - Subfield/pattern: blank or as specified by the user Due to user defined tags (and subfields) a nonstandard FDT can be obtained. There can be more than one fields having the same tag value, with different subfields, according to the source variables, preserving the original vari- able names. This nonstandard FDT, however, does not influence the use of standard ISIS features. Conflicting tag/subfield specification is reported, but not prevented. 4.4. Data import (option I in import menu) ================ After the previous preparatory steps or independently of the second one we can perform the data import. In any case a target data base and the work files prepared in the variable selection step must exist before executing import. The default data base is determined on a priority basis using the following sources: 1. data base constructed/modified in the FDT construction step, 2. data base with the name of the import dataset, 3. currently opened data base. The default data base name can be changed by the user (only an existing data base name is accepted). The following options are in the data import menu: U - Update data base records (imported variables are added as new fields). When we select this option, imported cases are used for updating data base records. The variable number of the variable containing a value that matches the MFN of the corresponding record in the data base must be specified. If the content of this variable is not a valid numeric value or there is no data base record with the current MFN value then the import case is skipped. A - Add new records. All the cases read are added to the data base. When a new data base is being created this means that imported cases are added to the empty data base built in the FDT construction step (see option N in 4.2.). X - Exit. Quit data import from IDAMS. 4.5. Code label transformation (option 'L' in import menu) ============================== If we specified code label recode in the variable selection step for at least one variable then interface replaces numeric code values of this (these) variable(s) with corresponding code labels. Code value/code label paires, if any, are defined in the IDAMS dictionary. The last work files prepared in the variable selection step are always used. The data base to be "recoded" is selected exactly as in the data import step. The data base name is the only requested parameter when we start code label transformation. 5.RESTRICTIONS ============== 1. Maximum number of variables that can be exported is 150. 2. Maximum number of variables that can be imported is 150. 3. Maximum number of code value/code label pairs in code label recode in import is 300. 3. Maximum number of code label records per variable to be generated is 199. 4. Maximum total number of code label records is 400. 5. Maximum number of variables with code label generation is 20. 4. Maximum total length of code labels used in code label transformation in import is 8192. APPENDIX 1. Menus used by the interface ============================== EXIDN - Main IDIS menu EXIDE - Menu of the export function EXIDI - Menu of the import function EXEXP - Menu used within program IDEXP EXFDT - menu used within program FDTCON EXIMP - Menu used within program IDIMP 2. Program files ================ Export function: EXVARS - Field selection and variable specification CONPR - Code label generation DICCON - Dictionary construction IDEXP - Data export Import function: IMVARS - Variable selection and data transfer specification FDTCON - FDT construction or update IDIMP - Data import CODLAB - Code label recode Other: CHANG - Data base change and export specification restore SAVEX - Saving of export specification SCREX - Erasing saved export specification RESTR - Import specification restore SAVIM - Saving of import specification SCRIM - Erasing saved import specification 3. Messages =========== EMSG.MST - English message data base master file EMSG.XRF - English message data base cross-reference file