Tuesday, 29 July 2014

Types of Dimensions in Data warehouse

A dimension table consists of the attributes about the facts. Dimensions store the textual descriptions of the business. With out the dimensions, we cannot measure the facts. The different types of dimension tables are explained in detail below.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined.
Eg: The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the fact table we need to maintain two keys referring to these dimensions. Instead of that create a junk dimension which has all the combinations of gender and marital status (cross join gender and marital status table and create a junk table). Now we can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and doesn’t have its own dimension table.
Eg: A transactional code in a fact table.
Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are called role-playing dimensions. For example, a date dimension can be used for “date of sale”, as well as “date of delivery”, or “date of hire”.

SCD type 4 – Fast growing dimension

In normal scenarios the dimension tables tend to grow slowly. That is the reason, they are called slowly changing dimensions. Example: Location attribute of a customer changes very rarely. However the salary band of a customer is likely to change every year. These type of attributes causes the customer dimension table to grow rapidly. SCD type 4 provides a solution to handle the rapid changes in the dimension tables. The concept lies in creating a junk dimension or a small dimension table with all the possible values of the rapid growing attributes of the dimension.
Example: Take a look at the following dimension attributes of customer
C_Id
Name
Location
Age_band
Salary_band
The age band and salary band are going to change frequently. So create a separate small dimension table with only these attributes. The number of possible values of age band will be around 20 and salary band will be around 30. The total number of rows in the new dimension table are 20×30=60.
The new tables are
Table name : customer
C_Id
Name
Location

Table name: customer_mini
M_id
Age_band
Salary_band

Fact table:
Id
C_Id
M_Id
----
The dimension key of the new table should be maintained in the fact table. This way we can handle the rapid changes in the dimension table.

Data Warehouse Design Approaches

Data warehouse design is one of the key technique in building the data warehouse. Choosing a right data warehouse design can save the project time and cost. Basically there are two data warehouse design approaches are popular.
Bottom-Up Design:
In the bottom-up design approach, the data marts are created first to provide reporting capability. A data mart addresses a single business area such as sales, Finance etc. These data marts are then integrated to build a complete data warehouse.  The integration of data marts is implemented using data warehouse bus architecture. In the bus architecture, a dimension is shared between facts in two or more data marts. These dimensions are called conformed dimensions. These conformed dimensions are integrated from data marts and then data warehouse is built.
Advantages of bottom-up design are:
  • This model contains consistent data marts and these data marts can be delivered quickly.
  • As the data marts are created first, reports can be generated quickly.
  • The data warehouse can be extended easily to accommodate new business units. It is just creating new data marts and then integrating with other data marts.
Disadvantages of bottom-up design are:
  • The positions of the data warehouse and the data marts are reversed in the bottom-up approach design.
Top-Down Design:
In the top-down design approach the, data warehouse is built first. The data marts are then created from the data warehouse.
Advantages of top-down design are:
  • Provides consistent dimensional views of data across data marts, as all data marts are loaded from the data warehouse.
  • This approach is robust against business changes. Creating a new data mart from the data warehouse is very easy.
Disadvantages of top-down design are:
  • This methodology is inflexible to changing departmental needs during implementation phase.
  • It represents a very large project and the cost of implementing the project is significant.

Data Warehouse Dimensional Modelling (Types of Schemas)

There are four types of schemas are available in data warehouse. Out of which the star schema is mostly used in the data warehouse designs. The second mostly used data warehouse schema is snow flake schema. We will see about these schemas in detail.
Star Schema:
A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.
Snow Flake Schema:
A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.
Galaxy Schema:
Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). This schema is a combination of many data marts.
Fact Constellation Schema:
The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city; constellation schema would have five dimensions instead of one.

Types of Facts in Data Warehouse

A fact table is the one which consists of the measurements, metrics or facts of business process. These measurable facts are used to know the business value and to forecast the future business. The different types of facts are explained in detail below.
Additive:
Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Eg: Facts which have percentages, ratios calculated.
Factless Fact Table:
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”. Eg: A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time.
A fact tables that contain aggregated facts are often called summary tables.

Data Warehouse

A data warehouse is a relational database that is designed for query and business analysis rather than for transaction processing. It contains historical data derived from transaction data. This historical data is used by the business analysts to understand about the business in detail.
A data warehouse should have the following characteristics:
Subject oriented: A data warehouse helps in analyzing the data.  For example, to know about a company’s sales, a data warehouse needs to build on sales data. Using this data warehouse we can find the last year sales. This ability to define a data warehouse by subject (sales) makes it a subject oriented.
Integrated: Bringing data from different sources and putting them in to a consistent format. This includes resolving the units of measures, naming conflicts etc.
Non volatile: Once the data enters into the data warehouse, the data should not be updated.
Time variant: To analyze the business, analysts need large amounts of data. So, the data warehouse should contain historical data.

Logical and Physical Design of Data Warehouse

Logical design:
Logical design deals with the logical relationships between objects. Entity-relationship (ER) modeling technique can be used for logical design of data warehouse. ER modeling involves identifying the entities (important objects), attributes (properties about objects) and the relationship among them.
An entity is a chunk of information, which maps to a table in database. An attribute is a part of an entity, which maps to a column in database.
A unique identifier can be used to make sure the data is consistent.
Physical design:
Physical design deals with the effective way of storing and retrieving the data. In the physical design, the logical design needs to be converted into a description of the physical database structures.
Physical design involves creation of the database objects like tables, columns, indexes, primary keys, foreign keys, views, sequences etc.

Why to create Datamart

ETL (Extract Transform Load) extract data from the Data Warehouse and populate one or more Data Marts for use by groups of decision makers in the organizations.  The Data Marts can be Dimensional (Star Schemas) or relational, depending on how the information is to be used and what “front end” Data Warehousing Tools will be used to present the information.
Each Data Mart can contain different combinations of tables, columns and rows from the Enterprise Data Warehouse.  For example, an business unit or user group that doesn’t require a lot of historical data might only need transactions from the current calendar year in the database. The Personnel Department might need to see all details about employees, whereas data such as “salary” or “home address” might not be appropriate for a Data Mart that focuses on Sales.

Why to create Datamart

The following are the reasons to create Datamart:
  • To partition data in order to impose access control strategies.
  • To speed up the queries by reducing the volume of data to be scanned.
  • To segment data into different hardware platforms.
  • To structure data in a form suitable for a user access tool.
Typical Data Warehousing Environment

Some Data Mart might need to be refreshed from the Data Warehouse daily, whereas user groups might want refreshes only monthly.

Friday, 25 July 2014

Cognos Interview Questions | Quontra Solutions

1. What is called Cognos Reporting tool?
Cognos is an IBM reporting tool which is used for reporting and analysis of various data from different datawarehouse.
2. What is Report item?
Report item is defined as a query item which can be dragged and dropped into a work area
3. What is Bitmap Index?
A bitmap index is used for replacement of list of rowids for each key value. This index is mainly efficient for data warehousing as it has low cardinality and low updates.
4. What is a Folder?
A folder consists of repository objects like sources, targets, transformation and mapping that help to organize data warehouse.What is
5.Cognos Connection?
It is a Web portal for Cognos 8 and a component which interracts with the Content Store. It is a frontend to publish, find, manage, organize, and view organization’s business intelligence data
6. What are the types of prompt in Cognos?
  • Value prompt
  • Text Prompt
  • Date prompt
  • Time prompt
  • Date and time prompt
7. How to join multiple db in catalog?
Mutiple database could not be connected in a single catalog. So we use hot file for this purpose.
8. How to generate cubes in cognos?
Power Play Transformer contain dimension, measure, model and cube. We can generate cube different way. Just right click on cube name and build. We can write script in unix. Using that we can generate cube.
9. What is snapshot?
A Snapshot is the copy of data, when we create a snapshot it copies the exact data that’s related to the at particular report, we use snapshot whenever we want to compare reports.(Example: we want to compare this month’s report with previous months).
10.What are the responsibilities of Cognos Administrator?
A Cognos Administrator is assigned with following responsibilities.
1.     Installations and configurations in distributed network environment.
2.     Creating the Repository (Content Store).
3.     Perform back up and recovery of Meta Data.
4.     Developing the user Administration.
5.     Tuning the servers.
6.     Deployment.
11. What are the responsibilities of a Cognos Architect?
An Architect is responsible for designing the interface by fulfilling the user requirements.
Once the interface has been designed it should be rigorously tested before distributing to the end user population.
12. What are the roles of an Application Developer?
Design the reports according to the report requirement templates.
Testing the each report with the following types of tests.
a.    Unit Testing
b.    System Testing
c.    Performance Testing
13.What is the information present in the Cognos configuration file?
The configuration file of Cognos is the normal configuration file that includes Cognos Dispatcher and IBM Cognos Report Server processes that is allocated to give it to the processor for faster processing of the resources. The configuration file consists of three to four low affinity threads and one high affinity thread. The threads are managed by using the traffic, which an application can handle. Affinity represents the report service that is provided to the original client from whom the request has come and there can be multiple interactions allowed to satisfy the request. High affinity connections are provided to process the requests from the service report.
14. How many numbers of cubes can we create on a single model? How can we navigate between those cubes?
Using a single Model, we can create as many number of cubes we want. By using the dimension views…etc. Regarding the navigation, when we save cubes, these act as separator multidimensional databases. There won’t be any relations. Navigate means opening the cubes separately.
15. What is the difference between drill down and drill through?
Drill down means it will navigate from detail information to summary level information within report. Drill through means it will navigate from summary to detail from report to report.
16. What is Cardinality?
Cardinality is nothing but relation between tables.
One to One
One to Many
Many to Many
Many to One
17. What is OLAP?
OLAP stands for On Line Analytical Processing, a series of protocols used mainly for business reporting.
18.What are components of report studio?
Insertable Objects pane
Properties pane
Explorer bar
Report Viewer
19. What are the components of Report Net?
 Framework Manager
Cognos Connection
Query Studio
Report Studio
20.Can report net supports cubes?
No