Præsentation er lastning. Vent venligst

Præsentation er lastning. Vent venligst

Experimental design and statistical analyses of data

Lignende præsentationer


Præsentationer af emnet: "Experimental design and statistical analyses of data"— Præsentationens transcript:

1 Experimental design and statistical analyses of data
Lesson 3: Analysis of variance (ANOVA)

2 Randomized block design
All treatments are allocated to the same experimental units Treatments are allocated at random C D A B Blocks (b = 3) Treatments (a = 4)

3 Treatments (drugs) Blocks (patients) Treatments Patient A B C D
Average 1 2 3 Treatments (drugs) Blocks (patients)

4 Treatments Patient A B C D Average 1 5.17 5.21 4.91 4.74 5.008 2 6.23 7.34 6.18 6.31 6.515 3 4.93 4.55 4.64 4.61 4.683 5.443 5.700 5.243 5.220 5.402

5 Patient 3 Drug B

6 Predicted values of y

7 Treatments Patient A B C D Average 1 5.17 5.049 5.21 5.306 4.91 4.849 4.74 4.826 5.008 2 6.23 6.557 7.34 6.813 6.18 6.357 6.31 6.333 6.515 3 4.93 4.724 4.55 4.981 4.64 4.524 4.61 4.501 4.683 5.443 5.700 5.243 5.220 5.402 Observed value of y Predicted valye of y

8 Residuals and residual variance

9 Variances and covariances
Orthogonalt design

10 95% confidence limits for parameters
Patients Drugs

11 Are there any differences between drugs?
Estimate Variance A-B A-C A-D B-C B-D C-D Ex B-D: 0.1 < P < 0.2

12 All pairwise differences
t P Pat 1 - Pat 2 6.224 0.0008 Pat 1 – Pat 3 1.342 0.2282 Pat 2 – Pat 3 7.566 0.0003 Drug A – Drug B 0.918 0.3942 Drug A – Drug C 0.715 0.5014 Drug A – Drug D 0.799 0.4550 Drug B – Drug C 1.644 0.1536 Drug B – Drug D 1.716 0.1369 Drug C – Drug D 0.083 0.9362

13 Why are pairwise comparisons not wise?
Pairwise comparisons are unwise of two reasons: (1) They often require many tests (2) They may increase the risk of type I error, i.e. rejection of H0 even when H0 is true

14 Multiple comparisons A factor has a levels. If we want to compare all possible differences between the averages of the a levels, the total number of pairwise tests becomes a = k = 1 a = k = 6 a = k = 45 a = k = 190

15 If α = 0.05 for a single test, then the probability of committing
at least one type I error (rejecting H0 when it is in fact true) is seen to be Probability of type I error if k = 1 Probability of no type I error if k = 1 Probability of no type I error if k > 1 Probability of at least one type I error a = k = P = 0.05 a = k = P = 0.265 a = k = P = 0.901 a = k = P =

16 The Bonferroni adjustment
The Bonferroni adjustment is an emergency solution to the problem of multiple tests Experimentwise error If we want that P(at least one type I error) ≤ α then we need to find α’ so that → α’ ≤ 1 – (1- α)1/k ≈ α/k 1-(1-α’)k ≤ α a = k = α’ ≤ 1 – ( )1/6 = α/k = 0.05/6 = a = k = α’ ≤ 1 – ( )1/45 = α/k = 0.05/45 = A disadvantage of the Bonferroni adjustment is that it is conservative, i.e. it increases the risk of type II errors (accepting H0 when it is in fact false)

17 The anova solution to the problem
Full model: Treatments Blocks Question 1: Are there any differences between patients? Question 2: Are there any differences between drugs?

18 Answer to question 1 If there are no differences between persons then β1, and β2 will both be 0. H0: No differences between patients β1 = β2 = 0 H1: Patients are different Full model: If H0 is correct then Reduced model:

19 Answer to question 2 If there are no differences between treatments then β3, β4, and β5 will all be 0. H0: No differences between treatments β3 = β4 = β5 = 0 H1: Treatments have an effect Full model: If H0 is correct then Reduced model:

20 Finally, if neither treatments nor patients differ we get
Full model: Reduced model:

21 Model 1: df = n-1 =11 Model 2a: df = n-p = 9 Model 2b: df = n-p = 8
Full model: df = n-p = 6

22 Test for effects of drugs
If H0 is true, then s12 Reduced model: If H0 is not true, then s32 > σ2 , s22 and s33 will all be estimates of σ2 Full model: Difference between reduced and full model:

23 Degrees of freedom for F
Since F is the ratio between s32 with p2-p1 df and s22 with n-p2 df F has p2-p1 df in the numerator and n-p2 df in the denominator, i.e. MS due to omitting the factor MS due to the full model The F-test is one-tailed (only values larger than 1 leads to rejection of H0)

24 Explained and unexplained variation
SSE2 Unexplained variation for model with the factor Unexplained variation for model without the factor SSE1 SSE1-SSE2 Explained variation by including the factor = SS(factor)

25 Test for effect of drugs
Model 1: df = n-1 =11 Model 2a: df = n-p = 9 Model 2b: df = n-p = 8 Full model: df = n-p = 6

26 Explained and unexplained variation for drugs
0.704 Unexplained variation with drugs 1.151 Unexplained variation without drugs Explained variation by drugs 0.447 = SS(drugs)

27 Test for effect of patients
Model 1: df = n-1 =11 Model 2: df = n-p = 9 Model 2: df = n-p = 8 Full model: df = n-p = 6

28 Explained and unexplained variation for patients
0.704 Unexplained variation with patients 8.352 Unexplained variation without patients Explained variation by patients 7.648 = SS(patients)

29 Sum of Squares (SS) Total variation = Variation explained by model
Variation due to patients + Variation due to drugs + Unexplained variation Variation explained by model SS (total) = SS (model) + SS (residual) = SS (patients) + SS (drugs) + SSE

30 Analysis of variance Source SS df MS F P Patients Drugs Error SS (pat)
SS (drugs) SSE b-1 a-1 n-a-b+1 SS(pat)/(b-1) SS(drugs)/(a-1) SSE/(n-a-b+1) MS(pat)/s2 MS(drugs)/s2 Total SS (total) n-1

31 Source SS df MS F P Source SS df MS F P
Patients Drugs Error SS (pat) SS (drugs) SSE b-1 a-1 n-a-b+1 SS(pat)/(b-1) SS(drugs)/(a-1) SSE/(n-a-b+1) MS(pat)/s2 MS(drugs)/s2 Total SS (total) n-1 Source SS df MS F P Model 8.095 5 1.619 13.838 0.003 Patients Drugs Error 7.648 0.447 0.704 2 3 6 3.824 0.149 0.117 32.68 1.27 0.0006 0.366 Total 8.799 11 ** ***

32 Source SS df MS F P Model 7.648 2 3.824 29.92 0.0001 Patients Error
1.151 9 0.128 Total 8.799 11 *** ***

33 Orthogonal designs

34 Orthogonal designs SS(total) = SS1+SS2+.....+SSk + SSE
A multifactorial experiment is said to be orthogonal if the estimates of the parameters associated with each factor are independent of each other SS(total) = SS1+SS SSk + SSE An experiment is orthogonal if each level of one factor occurs the same number of times as the number levels of the second factor, and if this applies to all the factors. If an experiment is not orthogonal, then the parameters will change each time a factor is removed from the model, and SS depends on the order in which factors are included in the model

35 How to do it with SAS

36 /* eksempel 5.1 i G. Nachman: Forsøgsplanlægning og statistisk
DATA eks5_1; /* eksempel 5.1 i G. Nachman: Forsøgsplanlægning og statistisk analyse af eksperimentelle data */ /* Programmet udfører en to-sidet variansanalyse med patient og behandling som faktorer. Designet er fuldstændigt faktorielt */ /* Bemærk at behandling er en systematisk faktor, mens patienter er tilfældig */ /* Analysen forudsætter, at der ikke er interaktion imellem medikament og patient */ INPUT pat $ treat $ y; /* indlæser data */ /* pat = patient (kvalitativ variabel) treat = behandling (kvalitativ variabel y = response (kvantitativ variabel) */ CARDS; /* her kommer data. Kan også indlæses fra en fil */ 1 A 5.17 2 A 6.23 3 A 4.93 1 B 5.21 2 B 7.34 3 B 4.55 1 C 4.91 2 C 6.18 3 C 4.64 1 D 4.74 2 D 6.31 3 D 4.61 ; PROC GLM; /* procedure General Linear Models */ TITLE 'Eksempel 5.1'; /* medtages hvis der ønskes en titel */ CLASS pat treat; /* pat og treat er klasse (kvalitative) variable */ MODEL y = pat treat / CLM SOLUTION; /* modellen forudsætter at y afhænger af patient og behandling */ /* CLM er en option som giver sikkerhedsgrænserne omkring middelværdien for en given kombination af patient og behandling */ /* SOLUTION udprinter parameterestimaterne */ OUTPUT OUT=new P = pred R= res; /* OUTPUT laver et nyt datasæt kaldet new. Det indeholder variablen pred og res, som er de predikterede værdier og residualerne */ /* Test parvise forskelle mellem behandlinger */ CONTRAST 'A versus B' Treat ; CONTRAST 'A versus C' Treat ; CONTRAST 'A versus D' Treat ; CONTRAST 'B versus C' Treat ; CONTRAST 'B versus D' Treat ; CONTRAST 'C versus D' Treat ; RUN; PROC PLOT DATA=new; /* plotter procedure */ TITLE 'Eksempel 5.1'; /* titel */ TITLE 'residual plottet mod predikterede værdier'; /* titel for plot */ PLOT res*pred = '*'; /* res plottes mod pred med * som symbol */ PROC UNIVARIATE FREQ PLOT NORMAL DATA=new; /* PROC UNIVARIATE giver information om den eller de variable, der defineres i VAR linien nedenfor. */ /* FREQ, PLOT, NORMAL osv. er options FREQ = antal observationer af en given værdi PLOT = plot af observationerne NORMAL = test for normalfordeling */ TITLE 'Eksempel 5.1'; /* titel */ VAR res; /* informationer om variablen res */ DATA eks5_1; /* eksempel 5.1 i G. Nachman: Forsøgsplanlægning og statistisk analyse af eksperimentelle data */ /* Programmet udfører en to-sidet variansanalyse med patient og behandling som faktorer. Designet er fuldstændigt faktorielt */ /* Analysen forudsætter, at der ikke er interaktion imellem medikament og patient */ INPUT pat $ treat $ y; /* indlæser data */ /* pat = patient (kvalitativ variabel) treat = behandling (kvalitativ variabel y = response (kvantitativ variabel) */ CARDS; /* her kommer data. Kan også indlæses fra en fil */ 1 A 5.17 2 A 6.23 3 A 4.93 1 B 5.21 2 B 7.34 3 B 4.55 1 C 4.91 2 C 6.18 3 C 4.64 1 D 4.74 2 D 6.31 3 D 4.61 ;

37 PROC GLM; /* procedure General Linear Models */
TITLE 'Eksempel 5.1'; /* medtages hvis der ønskes en titel */ CLASS pat treat; /* pat og treat er klasse (kvalitative) variable */ MODEL y = pat treat / CLM SOLUTION; /* modellen forudsætter at y afhænger af patient og behandling */ /* CLM er en option som giver sikkerhedsgrænserne omkring middelværdien for en given kombination af patient og behandling */ /* SOLUTION udprinter parameterestimaterne */ OUTPUT OUT=new P = pred R= res; /* OUTPUT laver et nyt datasæt kaldet new. Det indeholder variablen pred og res, som er de predikterede værdier og residualerne */ RUN;

38 Eksempel 13:18 Monday, November 5, 2001 General Linear Models Procedure Class Level Information Class Levels Values PAT TREAT A B C D Number of observations in data set = 12

39 Overall significance of the model Explained variation
Eksempel 13:18 Monday, November 5, 2001 General Linear Models Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square C.V Root MSE Y Mean Source DF Type I SS Mean Square F Value Pr > F PAT TREAT Source DF Type III SS Mean Square F Value Pr > F Overall significance of the model Explained variation Patients are significantly different Drugs are not significantly different

40 Parameter Estimate Parameter=0 Estimate
T for H0: Pr > |T| Std Error of Parameter Estimate Parameter= Estimate INTERCEPT B PAT B B B TREAT A B B B C B D B NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters.

41 Difference between SAS and GN
Parameter Estimate INTERCEPT B PAT B B B TREAT A B B B C B D B GN Estimate 4.7242 0.325 1.8325 0.0000 0.2567 -0.2 GN Estimate 5.0492 0.0000 1.5075 -0.325 0.2567 -0.2 GN Estimate 4.5008 0.325 1.8325 0.0000 0.2233 0.4800 0.0233 -0.325 +0.325 Ex: Patient 2 receiving drug C SAS: 4.5008 = GN: 5.0491 - 0.2 =


Download ppt "Experimental design and statistical analyses of data"

Lignende præsentationer


Annoncer fra Google