Project Management

Project Management

These are personal notes. You are welcome to read them.

Index page

Top-level home page  
Management hints
HTML, javascript
Other
     
More pages here ("technical" pages) Ham radio pages Contents of current page

 


Contents

Note: see page without "_"

 

 


 

 


Choosing Software and Services

 

Le produit

 

Editeur et / ou fournisseur

 

Coût

 

Buying

Steps:

Total Cost of Ownership

Costs vs investements

A purchase can incur an immediate cost or it can be considered an investment. In this case, the cost is spread over several years (the write-off period) with a write-off percentage for each year (the percentage may not be the same for each year). The write-offs correspond to depreciation.

Types of Costs

Purchase costs are one thing. Project costs are another: remember to add costs for analyzing, developing, installing, testing and rolling out hardware and software. Additional costs in the form of annual maintenance on software licences.

 

 

 


Delegation

 

Delegation

Give the opportunity, motives and means
(See the project management diagram)

P. Define the objective
A. Select the person/team
Transfer
  • Request
  • Information/give training
  • Tools
  • Budget
  • Deadline
  • Give authority
C. Follow-up
A. Coach

 

Delegator's Dozen [5]

  1. Set a clear objective, no details yet
  2. Select the delegtee
  3. Train the delegatee
  4. Get input from delegatee, get people involved, get commitment
  5. Assign project and deadline
  6. Provide necessrary guidance: Praise behavior you want repeated
  7. Make Delegation contract: 1 take action, 2 stay in touch, how often, 3 get approval for next phase 4 do only what I tell you to do.
  8. Establish controls
  9. Maintain controls
  10. Provide feedback
  11. Identify lessons learned
  12. Evaluate performance

Motivation

Prevent job from coming back to me:

Bridging the Gap between Business and IT

The gap is a cultural divide between analytical modelers and IT administrators [4].

The way analytical modelers see IT:

IT sees the business side as CPU hogs (because of big or runaway queries) who slow query response times for other users

Solutions mostly involve changing attitudes and behaviors

  1. Liaisons who understand and talk the language of both sides and who can build partnerships with people on both sides.
  2. Foster Dialogue between the two groups throug formal meetings with exchange of grievances and building of common objectives.
  3. Compromise by reaching out to the other side: Analysts should learn more IT and IT should give more access to the data.
  4. to increase mutual understanding, through socializing and
  5. Training: SQL for analysts and better tuning for DBAs in a a mixed environment with casual users, power users, and analysts.
  6. Establish an Analytic Sandbox: Read-only access to any data in the warehouse in s separate environment

 


Sponsorship

 

Do not get in a situation where the IT sponsors outnumber the business users.

 

 


Internet Best Practices

Web site indicators

Even in a web site, group changes into versions and test fully before implementing a version.

Web site success factors:

Two aspects of age-old work methods are disappearing: secrecy and restrictive access to information.

 

 


Data Warehouse Best Practises

 

Notes from Inmon [6]

Four levels in the DWH architecture:

The requirements for the data warehouse are not known in advance. The users (analysts) discover what is available as the data is loaded into the data warehouse. I guess this means that we can be more agile. With my experience at FMC, this means bringing data into an area that looks like the source, with as few transformations as possible:

Monitoring of the data warehouse includes monitoring of the data and of the users, to understand how the warehouse is growing, what data is used and by whom, and what performance is like (query response times).

Documentation includes:

Granularity affects the volume of data and the type of queries. The lower the level of granularity, with more detail, allows more types of analysis of the data. Many details are lost in summarized data. One option is to store detailed data in less expensive storage and summarized data in storage with faster access.

Partitioning breaks data up into smaller, more manageable chunks. This is often done by date, although other criteria can be used too. Partitioning can be done at the system or database level, or in the application. Partitioning in the application makes it easier to have a different physical layout for different years.

Although auditing can be done in the data warehouse, it makes more sense to do it elsewhere, in particular in the source systems.

No big bang approach:

[6, p.39]: Data warehouses are not built all at once. Instead, they are designed and populated one step at a time, and, as such, are evolutionary, not revolutionary. The costs of building a data warehouse all at once, the resources required, and the disruption to the environment all dictate that the data warehouse be built in an orderly, iterative, step-at-a-time fashion. The "big bang" approach to data warehouse development is simply an invitation to disaster and is never an appropriate alternative.

[6, p.41]: The path ... through the building of data marts first ... is short-sighted and leads to a great deal of waste.

Some More Tips

Overview of the Data Warehouse Project

Scope Assessment
of needs
Design Development Rollout Maintenance and further development
   

Technical architecture

Dimensional model then physical model

User application

Installation of products

Development of loading processes

User applications

Put all together and roll out  

 

Notes

 

Some Thoughts on Organizing Business Intelligence

Books have been written on this subject. Here is my humble contribution:

 

Design

Debate of dimensional model versus relational (normalized) model: Over the long term, a relational model is more robust and can handle changing requirements better. Use views to make a relational model look like a dimensional model. However, this assumes that the database server is powerful enough to support the use of many joins. Selective denormalization could be an option, knowing that it is risky to undo normalization efforts. Note that, if data marts are considered sub-sets of a data warehouse, each data mart should be in a dimensional model (star or snowflake).

There is a lot of data out there. But people want information. Yet, we (the designers) do not know what the users want, and nor do the users. The process is iterative. [Loosely from Micheal Scofield]

Note that some reports serve the purpose of making people feel good that everything is under control. This is fine. But if the report is expensive to produce, then point it out. [Michael Scofield]

Integration: when two or more databases are involved, agree on labels (the labels or categories have to have the same meanings) and on measures. Labels are the desciptions of the dimensions and these labels should be consistent across the enterprise. IT cannot enforce consistent labels, only senior management can. Master dimensions that contain common labels are called conformed dimensions.

Ideally, the most detailed or atomic dimensions should be stored. You can always summarize, but you can't re-create details. And don't mix the granularities.

Produce a diagram with the fact table in the middle, and its grain clearly shown, and the dimensions laid around.

Other important parts of the design are treated elsewhere:

International situations:

 

Thoughts about Kimball's kitchen in times of the horsemeat scandal
Kimball's kitchen analogy allows too much complication. Restaurant diners want taste and pleasure. Data warehouse users want to know what is in their plate. With complicated and refined dishes, it is difficult to establish lineage for the data elements. I like horsemeat, but I want it labelled as such. I like other meats, but I want to know which one it is.
--> SIMPLICITY
Yes, it is hard to model the real world. So simplify the concepts and establish clear source to target mappings.

 

Some Thoughts on Master Data Management

Master data management (MDM) helps maintain reference data about important business entities across and organization: single and consistent set of data. The MDM can be the source of data or it can simply store the keys to equivalent records (loosely taken from TDWI, multiple documents)

Dan Power in "Ten Mistakes to Avoid When Designing and Building Your MDM and Data Governance Initiative" (TDWI publication) says that the "build it and they shall come" "does not work in the real world of Master Data Management" (page 2). Instead, start with the problem and address existing issues. Corporate sponsorship is also very important. Concerning ownership, Dan Power says what I have seen very clearly in my experience: "The business has to own MDM and data governance. As tempting as it is to start and finish with the technology, it just doesn’t work. When MDM is driven by the IT group, the business may not understand or buy in (or even realize it’s there)." The business has to own the overall process, and IT simply provides a tool. The MDM is part of data governance "that happens to be facilitated by MDM technology."

 

Some Thoughts on Agile Development in Business Intelligence and Data Integration

Report writing can more easily use agile methods than data integration. Data integration spends a lot of time mapping source to target, both fields and keys. In addition, to be agile, test data for data integration is needed and this is not always easy to generate. (Notes loosely taken from TDWI material.)

Conformed dimensions enable more agile data warehouse development. If senior management, who wants the projects to go faster, supports the effort needed to define conformed dimensions, then this will help speed up development. A minimal number of agreed-upon attributes is enough to start and more can be added as time goes on. Start with a small number of attributes from a few sources, for the customer dimension for example. (See Kimball tips #73, #106, #111 and #135.)

Some notable aspects of agile development

A user review meeting:
"Developers should not solve problems, troubleshoot, or make commitments based on user feedback. The action taken as a result of feedback is prioritized based on an assessment of the effort required to make changes. Others may observe but should not participate in these meetings."[5]

In iteration 0 (see Highsmith according to [5]), the team sets up the environment

In data warehousing, the users are not interested in the ETL, but in the resulting reports. Yet, technical features need to be implemented too. Keep them small so that they fit in one iteration.

Highsmith in [5] says that after each cycle should leave a complete product so that "if project funding were stopped immediately, users would at least have a working product that meets some requirements." "Done" means "done."

For testing, build a copy of the source systems and keep a subset of meaningful samples. Keep a catalog of test cases with descriptions.

Difficulty with evolving data models: developers need them locked down, the process needs flexibility. So, keep the model simple without contingencies for all sorts of possible things in the future. Shortcuts and bad design increases maintenance efforts: try to continuously improve the design. Use refactoring techniques and automated testing to successfully evolve database models

Having a customer on the team is essential. He/she must be the appropriate customer.

 

Facts

One of the first steps is to determine the grain. Facts with different grains can be compared if the dimensions are conformed, but not stored on the same fact table.
There are only three types of grains in datawarehouse [Kimball]:

Generally, facts are additive. Inventories and bank balances are semi-additive, that is the data is added and averaged over the periods by dividing the sum by the number of periods (or returning the max or min). Events are represented by factless facts where the fact table contains only foreign keys and no facts.

In the bus matrix, the columns are the nouns and correspond to the dimensions. The rows are verbs describing activities or processes and correspond to the facts. Be sure to group all elements of a process together even if they cross departments. In the case of hierarchies, the column header shows the most detailed level; a fact requiring a rollup can show with a note in the cell.

Combining two different measures in a query (drilling across in [Kimball]) is best done by sending two queries to the database and combining the results. The dimensions need to be conformed (leitmotiv of Kimball). The results are sort/merged.

The accumulating snapshot has mutiple timestamps. Note that if a time dimension is used, then join multiple aliases of the time dimension table to the fact table. For each of the dates, add a numeric "fact", a 0 or 1, indicating whether the step has been completed. These additive numeric facts make reporting easier. We can also add a field with the lag or time difference between important steps.

Counts:
A way of getting around counts in an outer join: create a case statements to count when nothing is found in the join on one side, when nothing is found on the other side in the case of a full outer join, and when a row would have been returned by an inner join. Kimball's Tip #12

 

Dimensions

See diagrams for types further below.

Type 1 consists of overwriting any existing values with the new value.

Type 2 SCD (Slowly Changing Dimension): use a field for begin_effective_date and end_effective_date. The end_effective_date contains null or a maximum date such as 2099 (preferred by [Kimball]). For ease of query writing, add a flag indicting the current row (redundant with end effective date = 2099). An advantage of the type 2 SCD is that it partitions history (See and Kimball's Tip #8). This means that a fact record knows what the dimension looked like when the fact record was inserted. The historical rows in the fact tables reference the non-current rows in the dimension. Therefore, do not exclude the historical records with a condition on the flag mentionned above. The historical rows in the dimension table are NOT updated. Therefore, to get the current values for historical rows, see below.

The most a type 2 slowly changing dimension would need is:

To report on a dimension with current values, add extra attributes to the dimension that are updated each time a new row is added to the dimension. Kimball calls this type 6 (Kimbal Tip #15). This is in contract to the type 2 SCD that reports on facts linked to the dimensions as they were at the time of insertion (see above).

Aggregate dimension: pre-calculate facts with one dimension less or higher up in the hierarchy

"Junk" dimension: dimension table with miscellaneous, unrelated columns

Snowflake compared to star: snowflakes conform to third normal form and stars to second normal form. Star schemas are more optimal for DW/BI query performance. The update issues linked to the second normal form are handled through the data warehouse update process. Space saved through snowflakes is generally negligeable compared to the increased query performance.

Outriggers are dimension tables linked to other dimension tables, but not in a snowflake schema. They have their purpose, but use with moderation.

Bridge tables are used to represent many-to-many relationships or ragged hierarchies. They increase complexity. Using a primary sales person or a top member in the hierarchy represents a possible compromise to remove the need for a bridge table in most cases.

Degenerate dimensions are dimensions with just only one attribute. It is not worth building a table with this attribute and a surrogate key. So put the value directly in the fact table.

Store the data in the lowest granularity possible, and build summary reports. If only summaries are stored, you will never be able to drill down.

If each department defines dimensions in the Kimball approach, yes, there is a risk of having silos. That is why, if I understand correctly, Kimall et al. advocate building process-centric dimensions dimensional models.

Kimball et al. say that starting the data warehouse design from a report puts you at risk of unrealistic expectations from the users. Instead, add dimensions one at a time.

 

Type 1

KEY VALUE
123 ABC
789 XYZ

Insert 456, update 123, delete 789

VALUE
123 BCD
456 GHI

SELECT KEY, VALUE FROM TABL;

No history

 

Type 2

KEY VALUE BEGIN_EFF_DT END_EFF_DT CURR_IND
123 ABC 1988/01/01 2099/12/31 Y
789 XYZ 1988/01/01 2099/12/31 Y

Insert 456, update 123, delete 789, on 23 Apr 2022

KEY VALUE BEGIN_EFF_DT END_EFF_DT CURR_IND
123 ABC 1988/01/01 2022/04/23 N
123 BCD 2022/04/23 2099/12/31 Y
456 GHI 2022/04/23 2099/12/31 Y
789 XYZ 1988/01/01 2022/04/23 N

To get the same results as for type 1:
SELECT KEY, VALUE FROM TABL WHERE CURR_IND='Y';

To get the historical results from a previous date, e.g. 2020:
SELECT KEY, VALUE FROM TABL WHERE TO_DATE('2020/01/01', 'YYYY/MM/DD') BETWEEN BEGIN_EFF_DT AND END_EFF_DT;

 

Type 3

KEY VALUE OLD_VALUE
123 ABC
789 XYZ

Insert 456, update 123, delete 789

KEY VALUE OLD_VALUE
123 BCD ABC
456 GHI
789 XYZ

To get the same results as for type 1:
SELECT KEY, VALUE FROM TABL;

To get the historical results, but without any distinction of date:
SELECT KEY, OLD_VALUE FROM TABL;

 

Data Warehouse Loading Processes

 

Date and Time Dimension

Use a calendar dimension (with holidays, etc). When hours/minutes/sec is needed, then have a separate timestamp in the fact table: consider this a special type of fact. Therefore two fields: foreign key to the calendar dimension and timestamp field. [Kimball]

If there is no need to keep the time of day in the data warehouse, then keep the date rounded to the day as a "meaningful" surrogate key (the only exception to non-meaningful keys) and point to a date dimension that keeps track of the days of the week and the company holidays. Note that Ralph Kimball says that even the time dimension should not be meaningful, but he does admit that "the time dimension is the ONLY dimension that has any logic to the surrogate keys and is the only one we dare place application constraints." Tip #14.

Although Kimball insists that the keys should be meaningless surrogate keys, I think that dates should be an exception. It is tedious to have to join to a separate table to get the date for every date field. Handle dates like degenerate dimensions and keep the meaningful dates in the fact table, with a calendar dimension that you can join to if necessary.

If there is a need to keep track of the time of day, then keep one field as above with the date without the time and another field either with just the time or with the date and time. If the field contains just the time, then consider a time dimension with indications of work hours, shifts, etc.

 

History

The tracking of history requires that each piece of data have a defined begin and end dates of validity.

Source System Analysis

See notes on:

It is generally a good idea to create the role of Source Data Steward, especially for complex systems with data important to other business applications.

 

Terms related to next generation data warehouses (DWH)

 

Changed Data Capture

Work with technical staff who know how the source system updates data

 

Lessons Learnt from the Trenches

 

Data Lakes

 

 

 


Data Integration Design

With Source System Analysis, Target System Analysis, and mapping.

See also notes on Structured Analysis and Data Warehouse Best Practices.

Methodology

Documentation should contain:

 


Enterprise Architecture

 

Zachman Framework for Enterprise Architecture
Order Layer Stakeholder group What (Data) How (Function) Where (Network) Who (People) When (Time) Why (Motivation)
1 Scope (contextual boundary) Visionary /
Planner to the business
List of things important to the business
Entity = a business thing
List of processes the business performs
Process = a business process
List of locations in which the business operates
Node = major business location
List of organizations important to the business
People = major organizational unit
List of events significant to the business
Time = major business event or cycle
List of business goals/strategies
Ends/means = major business goal or strategy
2 Conceptual Business Model Owner Semantic or Entity-relationship Model
Business entities and business relationships
Business Process Model
Business processes and I/O as business resources
Business Logistics System
business locations as nodes and business linkage as links
Work Flow Model
organizational units as people and work products
Master Schedule business events as time, business cycles Business Plan
End = business objective; means = business strategy
3 Logical System Model Designer Logical Data Model
data entities and data relationships
Application Architecture
process = application function; I/O = user views
Distributed System Architecture
Nodes are processors, storage units, etc; links are line characteristics
Human Interface Architecture
People = roles; Work = deliverable
Processing Structure
Time = system event; cycle = processing cycle
Business Rule Model
end = structural assertion; means = action assertion
4 Pysical Technology Model Builder Physical Data Model
segments/tables and pointers, DB relationships
System Design
Computer functions and data elements
Technology Architecture
Nodes are hardware or software systems, links are line specifications
Presentation Architecture
people are users, work are the screen formats
Control Structure time = execute, cycle = component cycle Rule Design
end = condition, means = action
5 Component Configuration Implementer Data Definition Program Network Architecture Security Architecture Timing Definition Rule Specification
6 Functioning Enterprise Instances Worker Data Function Network Organization Schedule Strategy

References:

 


Risks

IT Risks

Project Risks

Attitude towards risk

In relation to technological innovation, there are the following types of profiles:

Risks

Risk = impact (e.g. delay or costs) * probability
This should lead to action to avert the risk or reduce the impact.

Stakeholders

Requirements

 

Risk Response Strategies:

 

 

 


Miscellaneous

The amateur knows what to do;
the professional knows what not to do.

Sources:

 

Some language tips: