Index of Notes

  

These are my personal notes that I use as a quick help in my work.
You are welcome to read them.

Contents of current page Top-level home page
 
Index  Java Internet Oracle Notes
Linux Basics Web Basics SQL Notes
Informatica Servlets Apache BkpRstore SQL*Plus
Teradata   LDAP Storage PL/SQL
Windows     Tables OEM
UML   Net8 Portal
SQL Server Python perl Performance OLAP
Vmware Visual Basic PHP/MySQL User Mgmt  
Git        
More technical pages here

Contents

 


All areas to be formatted, or have other work done, are marked with todo.


Basics

connect to the repository

 

to view: drag into the workspace

When exiting: close all tools (menu repository > close all tools) otherwise some objects will be locked

Open the folder with right-click then "open" (even if the contents are visible)
Choose tool with little icons above the grey area

 

Use session start time for time stamps and not the sysdate. 

 

Stored procedure:
bug in SQL server: remove the "I" checks for output parameters.
The "R" column indicates the returned value. A check should show in the "O" column too
Remember to set the "connection information" in the next tab ($source or $target).
Then, this variable must be set in the task properties, under mapping > connection $source/$target connection value.
Status codes are also sent back by the stored procedure (success, failure), but these are not visible and are acted upon by PowerCenter.
Use unconnected stored procedure transformation for running a stored procedure before or after a session, pre- or post-load of the source, pre- or post-load of the target, handle null values, or run nested stored procudures.
Call an unconnected procedure: :sp.the_procedure(PROC_RESULT) PROC_RESULT does not correspond to anything

mapping variable:
menu mapping > parameters and variables

Shared folder:
Copy object to the shared folder. Save. Then copy back to original folder.
Note that with the ctrl key down, it copies instead of creating a shortcut.

Copy mapping:
open the mapping, then use the menu > mappings > copy as .
use small INT for boolean

Debugger:
in menu > mapping: start debugger
Choose "use an existing session".
Use "discard.." so as not to write to the database
Step through. See results in the "session log" tab
When done, stop the debugger.

source qualifier:
sort: add "number of sorted ports": this will sort the first n ports.
or use SQL override.
If the source has only a few tables, put the join in the "User-Defined Join" field instead of doing a SQL override.

DECODE (TRUE, DATATYPE != REPLACESTR(0,DATATYPE, 'char', 'text', 'lob', 'string', ''), 'C', DATATYPE != REPLACESTR(0,DATATYPE, 'int', 'decimal', 'long', 'float', 'money', 'numeric', 'number', 'real', 'double', ''), 'N', DATATYPE != REPLACESTR(0,DATATYPE, 'datetime', 'date', 'timestamp', ''), 'D', '?')

Precision (see Designer Guide)

When dragging from one folder to another

Source folderTarget folderResult
Open, object not savedOpenCopy object
Open, object savedOpen (has to be)Create a short cut

Error management

functions that can be used in expression to handle errors:

 

Propagate the error status CA7 --> script on Win --> pmcmd .... -paramfile ...

Propagate failure: Must put the check in "fail parent" Then check the status in each link between the sessions.

tip: default session load type: tools > options > misc > target load type

tasks: sequential buffer length: the buffor for just one line.

paths for server, sessions, values for $variables: menu server > server configuration > choose config > button edit > button advanced

 

Mapping Parameters and Variables

Mapping parameters do not change. Set them at the beginning. The parameter file is generally used for this. Otherwise set the value as a default value in menu mappings > parameters and variables > Initial value. To pass workflow value into the mapping, in the task properties, choose components tab > "Pre-session variable assignment."

Keep the column "IsExprVar" as false. Set to true if the variable contains an expression to be evaluated, such as a condition or an expression that varies.

To return a value up to the workflow level, use a mapping variable (not parameter). Set the value with SETVARIABLE($$RETURN_VALUE, 'something') in a place where the value is actually used, not in a port with no input and no output. In the "Post-session on success variable assignment," assign the value to a workflow parameter. The mapping variables change in the mapping and the value is kept from session to session, if not overridden by a parameter file. Use these functions to change the value: SetMaxVariable, SetMinVariable, SetCountVariable, and SetVariable.
SetVariable($$VARIABLE_NAME, new_value) (the corresponding port will take on the value).

I can't assign variables to a reusable task. Need to look into this more.

The string values do not need quotes around them

Workflow variables are created in menu workflows > edit > variables tab. They get initial values and can be defined as persistent. Values can also be assigned in an "Assignment" task.

A workflow variable can be used as the name of a target file: $$WKFL_THE_FILE_NAME. Then put it in an email (below are the attachement and the simple display of the variable's value): %a<$PMTargetFileDir/$$WKFL_THE_FILE_NAME> The value of the variable is $$WKFL_THE_FILE_NAME

The parameter files are defined in the session properties, first tab, in "Parameter filename".


Design Document

 

The design document will typically contain the following elements:

For each mapping, provide the following in addition to the mapping spreadsheet:

As a developer, ask these questions (as a designer, make sure the answers are defined):

Include the information described in the Change Management section below.

 


Transformations

Active or passive tranformations: active change the number of rows, passive transformations keep the existing number of rows.

Connected transformations are connected to the dataflow and to other transformations. Unconnected transformations are called from within another transformation.

  name Active Passive descr
  For all (or most) transformations     Most are connected transformations, some can be unconnected too.
Most have two groups of ports: input and output.
Aggregator Aggregator Active   Performs aggregate calculations. Define expression for the aggregate.
Can use local variables (see below). Try to use pre-sorted data if possible.
Aggregators do not sort, but use a clustering algorithm that appears to sort the output.
Application Source Qualifier Application Source Qualifier Active   Represents the rows that the Integration Service
reads from an application, such as an ERP source, when it runs a session.
Custom Custom Active Passive Calls a procedure in a shared library or DLL.
Expression Expression   Passive Calculates a value. Define the row-level calculation.
Can use local variables (see below).
External Procedure External Procedure   Passive Calls a procedure in a shared library or in the
COM layer of Windows. Can be unconnected.
Filter Filter Active   Filters data. Define the expression that returns true or false, with true indicating
that the data can go through.
HTTP Transformation HTTP Transformation   Passive Connects to an HTTP server to read or update data.
Contains input, output and header groups
Mapplet Input Mapplet Input   Passive Defines mapplet input rows. Available in the Mapplet Designer.
Java Java Active Passive Executes user logic coded in Java. The byte code for the user
logic is stored in the repository.
Joiner Joiner Active   Joins data from different databases or flat file systems.
See more below.
Lookup Lookup   Passive Looks up values.
See more below.
Normalizer Normalizer Active   Source qualifier for COBOL sources. Can also use in the pipeline to
normalize data from relational or flat file sources.
For bringing in n columns into one, define one column as occurring n times. Link the n columns to the n input occurrences of the one column. This goes into one output column. The GCID_col goes from 1 to n and identifies which column was the source. The GK_col outputs a unique identifier, which is probably redundant with a primary key used for the target.
Mapplet Output Mapplet Output   Passive Defines mapplet output rows. Available in the Mapplet Designer.
Rank Rank Active   Limits records to a top or bottom range.
Define expression for the ranking. Can use local variables (see below).
Router Router Active   Routes data into multiple transformations based on group conditions.
One input group, multiple output groups, with one default group. Expressions returning true or false define the groups.
A row for which all expression results are false goes to the default group.
When linking, drag the input fields to get the names right, then drag the appropriate output fields to make the links.
Sequence Generator Sequence Generator   Passive Generates primary keys.
Normally, "start value" = 0, "Increase by" = 1, "current value" = 1, "reset" = no. To start at 1 at each execution, set "current value" to 1 and put check in "reset".
Best practice is to make sequence generators reusable. Connect only the NEXTVAL port in mappings.
Sorter Sorter Active   Sorts data based on a sort key.
Source Qualifier Source Qualifier Active   Represents the rows that the Integration Service reads
from a relational or flat file source when it runs a session.
Only ports that are linked to the next transformation are generated in the select statement. If fields do not show, it is because a link should be made with the next transformation.
It is best to first generate the SQL before changing the "from" and "where" clauses so that the ports will line up with the fields of the SQL statement.
  SQL Active Passive Executes SQL queries against a database.
Stored Procedure Stored Procedure   Passive Calls a stored procedure. Can be unconnected.
Best practice is to make stored procedure transformations reusable.
See more information in the "basics" section.
Transaction Control Transaction Control Active   Defines commit and rollback transactions.
An expression determines what type of transaction is performed (commit before or after, rollback before or after, or no change).
Union Union Active   Merges data from different databases or flat file systems.
Multiple input groups, one output group.
Update Strategy Update Strategy Active   Determines whether to insert, delete, update, or reject rows. An expression returns a numerical value that determines which of the four strategies to use.
Select "forward rejected rows" to pass the rejects to another transformation.
Use variables in the "update strategy expression": DD_INSERT (0), DD_UPDATE (1), DD_DELETE (2), DD_REJECT (3). For updates, key is determined by the definition of the target, i.e. define the primary key in the target definition.

Session level update strategy is defined by the session's "treat source rows as" property. The options insert, update, or delete act on all rows. The data driven option uses an update strategy transformation within the mapping.

Only inserts or deletes or updates (mutually exclusive): "Treat Source Rows As" = insert or delete or update. Make sure to select insert/delete/update option for all target instances in the session.
Different operations on rows: add an update strategy transformation. Make sure that the insert, delete, and one of the update options is selected for each target instance in the session. Select "data driven" for the "treat source rows as" property.
  XML Generator Active   Reads data from one or more input ports and outputs XML
through a single output port.
Multiple input groups and one output group.
  XML Parser Active   Reads XML from one input port and outputs data to one or more output ports.
One input group and multiple output groups.
  XML Source Qualifier Active   Represents the rows that the Integration Service reads
from an XML source when it runs a session.
Multiple input and output groups.
  XML Target Definition     Multiple input groups.

What you do in a transformation:

When renaming ports, the ports are automatically renamed in the expressions.

Add comments with "--" or "//", or with the comments button.

Expressions are linked to the output port.

Ports are evaluated in the following order: input ports, variable ports and output ports.

Equivalence between numeric and boolean: true is non-zero, false is zero.

Initial values: 0 for numeric, '' for string, a low date for date/time. Input ports are null by default.
A blank default value means NULL.

Local variables:

 


Lookups

 

Lookup:
Define the lookup table.
Then bring in an outside field (or fields) for comparison. Call them "IN_"
Create the condition (see separate tab)
For unconnected lookups, the column that you want back must be checked with the "R". It must be unique.
Lookups should be made reusable.

Unconnected Lookup:
Inport the definition
Define the input ports
Define one of the ports as the return port by clicking in the "R" column
Only one column can be returned for unconnected lookups
Write the following expression: :LKP.the_transf_name(input_ports)

Dynamic lookup:
turn on "dynamic lookup chache" and "insert else update"
Bring in the key and the type 2 fields. Then assoicate the lookup fields with the type 2 fields.
Put a selection on current data in the sql override (end_effective_date is null or end_effetive_date = 31/12/9999)
Only equality is allowed in the condition.

Lookup Tips:

Must have at least one input port.
Properties: SQL override, table name, caching (flat files always cached), cache directory, cache persistence, cache size,
policy on multiple matches, connection information, and many others.
Best practice is to make lookups reusable.

 


Joiners

Joins data from different databases or flat file systems.
Contains master input group, detail input group and output group. Try to use pre-sorted data if possible.
Define the data set with lower cardinality as the master.
A normal join or a master outer join perform faster.
A master outer join discards unmatched master records and keeps all records from the detail source.
A detail outer join discards unmatched detail records and keeps all records from the master source.
The error "Joiner Transformation has invalid join condition: not all of the ports could be found" disappeared when the join types were tweaked.
Join Type   Normal   Master Outer Detail Outer Full Outer
image of master checkbox Master
Lower cardinality
    Keep records
only in master
Keep both
image of master checkbox Detail
Higher cardinality
  Keep records
only in detail
  Keep both
Performance Faster Faster    

 

MASTER Outer Join, better performance
Master Lower cardinality Drop image of master checkbox
Detail Higher cardinality Keep records image of master checkbox
DETAIL Outer Join
Master Lower cardinality Keep records image of master checkbox
Detail Higher cardinality Drop image of master checkbox
MASTER Outer Join, better performance
Detail Higher cardinality Keep records image of master checkbox
Master Lower cardinality Drop image of master checkbox
DETAIL Outer Join
Detail Higher cardinality Drop image of master checkbox
Master Lower cardinality Keep records image of master checkbox

 

 


Flat Files

Create a flat file definition from a table:


Database Connections

To renew the definition of a source or a target, simply re-import the table and choose the "replace" option. Remember to renew the definition of the primary key.

The difficulty is in updating the lookups. This has to be done manually. See lookups below.

 


Functions

 

substr(a_string, start_pos, length)
Return a substring. Start_pos=0..1: first character. Start_pos=2: second character.
Start_pos<0: count from the left.
If length<=0, then empty string. If length is omitted, then from start_pos to end.
instr(a_string, b_string [, start [, occ_num]])
Searches in a_string for an occurrence of b_string.
Starts at start (1=beginning of string, negative counts from the end)
If occ_num > 1, then take the 2nd, 3rd... occurrence. Must be 1 or more.
instr('whatever', chr(39), 1, 1)
Result: 0 --> not found; 1 or more --> position of b_string in a_string
is_date(a_date, 'MM/DD/YYYY')
The date format is compulsary. You must put in a format! Be exact: if hours exist, put 'MM/DD/YYYY HH24:MI'. Note that 0:00 is a 24-hour format and does not exist in the 12-hour format. If the month or day has just one digit, MM or DD works too. In fact, just M or just D will not work.
iif(isnull(THE_DATE), '', iif(IS_DATE(THE_DATE, 'MM/DD/YYYY HH12:MI AM'), TO_CHAR(TO_DATE(THE_DATE, 'MM/DD/YYYY HH12:MI AM'), 'MM/DD/YYYY'), ''))
Conversion functions. To prevent transformation errors, test the validity of the string as a date before converting. Notice the 12-hour format with the AM/PM indicator
in(in_port, 'A', 'B', 'C', case_flag)
Returns 1 (true) if the input value is in the enumeration.
case_flag=0 ==> case INsensitive;
case_flag=1 ==> case sensitive (actually, case_flag != 0)
decode(value, first_equiv, first_result, second_equiv, second_result [, default])
Returns the first matching result
variable port: iif(instr(ltrim(in_port), ' ', 1, 1)>0, instr(ltrim(in_port), ' ', 1, 1), length(ltrim(in_port)))
out port: iif(IS_NUMBER(substr(ltrim(in_port), 1, v_the_var_port)),
        rtrim(substr(ltrim(in_port), v_the_var_port + 1, length(in_port))),
        ltrim(rtrim(in_port)) )
Remove the first word only if it is a number. The variable is the length of the first word
lpad(string, up_to_length, 'x')
Pad the string on the left with a maximum of up_to_length occurrences of 'x'.
REPLACECHR(caseFlag, input_string, old_char_set, new_char_set)
Replace a character with another
caseFlag: 0 --> case INsensitive; not 0 --> case sensitive
Example: REPLACECHR(0, field_name, chr(10) || chr(13), '  ')
REPLACESTR(caseFlag, input_string, old_string1, [old_string2, ... old_stringn,] new_char_set)
Replace one of several strings with another
caseFlag: 0 (or null) --> case INsensitive; not 0 --> case sensitive
Example: REPLACESTR(0, field_name, '"', '""')
'My family' || CHR(39) || 's house' --> My family's house
Insert a single quote in a string with CHR(39)
MAKE_DATE_TIME(y, m, d [, h, m, s])
Create a date. Hours, minutes and seconds are optional
the_key    I O .   the_key
v_the_test . . V   iif(the_key=v_prev_key, 'same', 'new')    <-- this is a variable!
v_prev_key . . V   the_key
o_the_test . O .   v_the_test
Compare a value to that of previous row: remember that output ports are evaluated after variables. You may want to test for a null value of v_prev_key too.

 

Comma-separated values (csv)

First option is with no text delimiter in the file properties. Selectively surround fields containing commas and end-of-line characters with double quotes, and repeat any existing double quotes:
'"' || REPLACESTR(0, descr, '"', '""') || '"'
Handle any fields with leading zeros or spaces by making is a formula (only works with Excel target)
'="' || field_with_leading_zeros || '"'

Another option is to set the text delimiter to double quotes in the file properties. Repeat any existing double quotes in both the descriptions and the formula, but don't explicitely surround fields with double quotes:
REPLACESTR(0, descr, '"', '""')
Handle any fields with leading zeros or spaces by making is a formula (only works with Excel target). Use repeated double quotes:
'=""' || field_with_leading_zeros || '""'

See also notes on Excel

 

Aggregate functions:

Nest up to two aggregator functions, e.g. SUM(AVG(a_field))
Nulls are treated as either null or 0 depending on the setting of the Integration Service.
All of the aggregate functions take a second argument that is a filter (percentile takes up to three arguments): FCTN(value [, filter condition]). The Integration Service selects the row if the filter evaluates to true, and not if the filter evaluates to false or null.


Change Management

There is no convenient functionality to detect changes to the underlying sources and to determine the impacts of the changes. (Vive Genio!). A certain amount of discipline is needed to document the mappings in such a way as to determine impacts of changes: these are ideas for documentation.

Ideally, add these to the Design Document Section.

Some common problems:


Workflows

In the workflow manager:
create a workflow, then inside that create a task.
There is always a "start" task
In the "mapping" tab of the task, define the source and the target.
In the target, define the "target load type" as normal (not bulk)
Use "truncate target table option" to reload from start each time
There is also a "delete" option, but what does this do?
Do not override in the session
Generally make reusable sessions.

Define source and target in "connections". Define source file location and name in "files, directories and commands.
Use variables $Source and $Target and define the target in the "$Target Connection Value" (put $Target in places where Infa put the actual value).

Schedule a workflow

In workflow designer, drag the workflow into the "workflow designer" panel menu workflows > edit > scheduler tab > click on button next to scheduler box Remember to save

Workflow monitor:
execute: connect to the repository then to the scheduler

to find the session log in workflow monitor,
right-click on the session(not the workflow, not the green bar)
partition points in th the second "tab" of :
workflow manager, task, task propoerties, tab "mapping", button "partitions" below left corner.

Error Detection

Attachement in an email:

See attached file.
%a</u01/usr/target/e_file.txt>

Other codes:

 

 

Extras in Tasks

Generally, all of this happens in the "Components" tab of the tasks editor.

Rename a file before running the task
Non re-usable pre-session command:
cp /u01/after_cdc/export_file.out /u01/arch/export_file`date '+%Y_%m_%d_%H_%M_%S'`.out
Or copy the current file to the previous file before doing a Change Data Capture
cp /u01/curr/export_file.out /u01/prev/export_file.out
Send email on failure
Configure an on-error email in "components"
command task
In the commands, call a script: /u01/shell_scripts/check_import.sh
Send an email with an attachment
Create a dummy mapping. In the success email or the failure email, define the body as:
See attachment: %a</u01/the_file>

 

 


Repository Manager

Create a folder:
In repository manager, menu folder > create

 


Profiling

For group-by in auto-profile:

If field is too long, the auto-profile does not do the min/max and distinct/duplicate analysis. Under some conditions, the domain analysis is not done. Is it when the profile runs inside a workflow with other profiles?

This function (oops, which function?) can infer domains for columns with a numeric datatype with a precision of 28 digits or less or a string datatype with a precision of 200 characters or less. This function can also infer domains for columns of the Date/Time datatype.

 


XML Sources and Targets

 

Creating an XML View

 

<

Problems and Solutions


Tips

Reset a sequence generator:

-- Instructions: see below
-- Don't worry: nothing is updated by this script. Copy and paste to update.
declare @the_widget_type numeric(5), @the_attr_id numeric(5), @the_widget_id numeric(10)
set @the_widget_type = 7
set @the_attr_id = 4

-- To use this, set the variable "@the_widget_id" to 0
-- Look at the list and choose
-- Then set the variable to the appropriate ID
-- Don't worry: nothing is updated by this script. Copy and paste to update.
set @the_widget_id = 122

print 'Widgets'
select wa.WIDGET_ID
     , left(w.widget_name, 20) as widget_name
     , left(subj.SUBJ_NAME, 20) as subj_name
     , left(a.ATTR_NAME, 15) as attr_name
     , wa.ATTR_VALUE
   from opb_widget_attr wa
     inner join
   opb_attr a
     on wa.widget_type = a.object_type_id and wa.attr_id = a.attr_id
     inner join
   opb_widget w
     on w.widget_id = wa.widget_id
     inner join
   opb_subject subj
     on w.subject_id = subj.subj_id
   where wa.widget_type = @the_widget_type
     and wa.attr_id = @the_attr_id
     and (wa.widget_id = @the_widget_id or 0 = @the_widget_id);


print '--This is the update sql to use:'
print 'update opb_widget_attr set attr_value = ''1'' '
print ' where widget_type = ' + cast(@the_widget_type as varchar(10))
print ' and attr_id = ' + cast (@the_attr_id as varchar(10))
print ' and widget_id = ' + cast(@the_widget_id as varchar(10)) + ';'

print ' '

print ' '
print ' '
print 'These queries are just to verify that the widget ID and the attribute ID have not changed'
select left(object_type_name, 20) as object_type_name
     , object_type_id
     , 'assumed to be 7'
   from opb_object_type
   where object_type_id = @the_widget_type;
select left(attr_name, 20) as attr_name
     , attr_id
     , 'assumed to be 4'
   from opb_attr
   where object_type_id = @the_widget_type
     and attr_id = @the_attr_id;

Dependencies in a mapping:

It is possible to see dependancies between widgets. However, it is not easy to see the link between input ports and output ports inside the widgets. The challenge is with the expressions. The expressions are stored by not parsed. A possible option: select the expressions and manually parse them to create links between input ports and output ports. Another option is to look for where the "analyze" functionality is located in the repository?

A widget is a transformation. An instance is a widget inside a mapping. Here are some of the views:

Export Target and Import as Source

 

Some auditing values in mapping

 

 


Informatica Data Quality

Six dimensions of data quality. Compare to notes on data quality

Data Analysis Data Enhancement
Analysis
Identify quality-related features with fields with emphasis on completeness, conformity, and consistency
Standardization
Address issues of completeness, conformity, and consistency
Matching
Identify equivalent or related data records
Identify inaccuarate records by comparing to a reference data set
Consolidation
Address issues of accuracy, duplication, and integrity

Scorecarding: measures data quality for one field
Grading: measures data quality for records

A plan consists of

 

 

 

 

 

 

 


Administration

Modify the configuration:
infasetup command_name arguments
echo %errorlevel%
    (if error level > 0 then failure)

To define a domain:

 

Create an environment variable called INFA_DOMAINS_FILE and set it to C:\informatica\PowerCenter8.1.0\domains.infa
/u01/informatica/powercenter/domains.infa

 

To see session logs in a file format, choose the option for backward compatible session logs. The default location for session logs is /u01/informatica/powercenter/server/infa_shared/sessLogs.

The node configuration, including database connection details, is in /u01/informatica/powercenter/server/config/nodemeta.xml

 

Informatica security: permissions of a folder:

A folder is linked to one group and users are linked to several groups.