Probability, Statistics, and Random Signals [ Charles Boncelet ]

HALF TITLE

TITLE PAGE

COPYRIGHT

CONTENTS

PREFACE

Chapter 1 PROBABILITY BASICS

Chapter 2 CONDITIONAL PROBABILITY

Chapter 3 A little COMBINATORICS

Chapter 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES

Chapter 5 MULTIPLE DISCRETE RANDOM VARIABLES

Chapter 6 BINOMIAL PROBABILITIES

Chapter 7 A CONTINUOUS RANDOM VARIABLE

Chapter 8 MULTIPLE CONTINUOUS RANDOM VARIABLES

Chapter 9 THE GAUSSIAN AND RELATED DISTRIBUTIONS

Chapter 10 ELEMENTS OF STATISTICS

Chapter 11 GAUSSIAN RANDOM VECTORS AND LINEAR REGRESSION

Chapter 12 HYPOTHESIS TESTING

Chapter 13 RANDOM SIGNALS AND NOISE

Chapter 14 SELECTED RANDOM PROCESSES

APPENDIX A COMPUTATION EXAMPLES

APPENDIX B ACRONYMS

APPENDIX C PROBABILITY TABLES

APPENDIX D BIBLIOGRAPHY

Index

Read more custom BY HOANGLM with new data process: ANTIQUE PALMOLIVE TOKEN Good for One Cake Soap Free when Buy One COUPON COIN $11.99 – PicClick

**Citation preview**

PROBABILITY, STATISTICS, AND RANDOM SIGNALS

PROBABILITY, STATISTICS, AND RANDOM SIGNALS

CHARLES G. BONCELET JR. University of Delaware

New York • Oxford OXFORD UNIVERSITY PRESS

Oxford University Press is a department of the University of Oxford. It furthers the University ‘s objective of excellence in research, scholarship, and education by publishing cosmopolitan. Oxford New York Auckland Cape Town Dar vitamin e Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Copyright© 2016 by Oxford University Press

For titles covered by Section 112 of the U.S. Higher Education Opportunity Act, please visit www.oup.com/us/he for the latest data about price and alternate formats.

Published by Oxford University Press. 198 Madison Avenue, New York, NY 10016 hypertext transfer protocol : / /www.oup.com Oxford is a register trademark of Oxford University Press. All rights reserved. No separate of this issue may be reproduced, stored in a recovery system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the anterior permission of Oxford University Press. Library of Congress Cataloging in Publication Data Names : Boncelet, Charles G. Title : probability, statistics, and random signals I Charles G. Boncelet Jr. description : New York : Oxford University Press, [ 2017 ] 1Series : The Oxford serial in electric and calculator mastermind I Includes exponent. Identifiers : LCCN 20150349081 ISBN 9780190200510 Subjects : LCSH : mathematical statistics-Textbooks. I Probabilities-Textbooks. I electrical engineering-Mathematics-Textbooks. classification : LCC QA276.18 .B66 20171 DDC 519.5-dc23 LC record available at hypertext transfer protocol : / /lccn.loc.gov/20 15034908

Printing number : 9 8 7 6 5 4 3 2 I Printed in the United States of America on acid-free paper

PREFACE

x1

PROBABILI1Y BASICS What Is Probability ? Experiments, Outcomes, and Events 3 Venn Diagrams 4 Random Variables 5 Basic Probability Rules 6 Probability Formalized 9 1.7 simple Theorems 11 1.8 compound Experiments 15 1.9 independence 16 1.10 Example : Can S Communicate With D ? 17 1.10.1 number All Outcomes 18 1.10.2 probability of a Union 19 1.10.3 probability of the Complement 20 1.11 Example : now CanS Communicate With D ? 1.11.1 A Big Table 21 1.11.2 Break Into Pieces 22 1.11.3 probability of the Complement 23 1.12 Computational Procedures 23 drumhead 24 Problems 25 1.1 1.2 1.3 1.4 1.5 1.6

2

21

CONDITIONAL PROBABILI1Y 29 Definitions of Conditional Probability 29 2.2 law of sum Probability and Bayes Theorem 2.3 example : Urn Models 34 2.4 case : A Binary Channel 36 2.5 Example : Drug Testing 38 2.6 model : A Diamond Network 40 Summary 41 Problems 42

2.1

3

A small COMBINATORICS 3.1

3.2

32

47

Basics of Counting 4 7 Notes on Computation 52

five

united states virgin islands

CONTENTS 3.3 Combinations and the Binomial Coefficients 53 3.4 The Binomial Theorem 54 3.5 polynomial Coefficient and Theorem 55 3.6 The Birthday Paradox and Message Authentication 57 3.7 Hypergeometric Probabilities and Card Games 61 Summary 66 Problems 67

4

DISCRETE PROBABILITIES AND RANDOM VARIABLES 4.1 4.2 4.3 4.4 4.5

Probability Mass Functions 75 Cumulative Distribution Functions 77 Expected Values 78 Moment Generating Functions 83 Several Important Discrete PMFs 85 4.5.1 Uniform PMF 86 4.5.2 Geometric PMF 87 4.5.3 The Poisson Distribution 90 4.6 Gambling and Financial Decision Making Summary 95 Problems 96

5

92

MULTIPLE DISCRETE RANDOM VARIABLES 5.1 5.2 5.3

101

Multiple Random Variables and PMFs 101 Independence 104 Moments and Expected Values 105 5.3.1 Expected Values for Two Random Variables 105 5.3.2 Moments for Two Random Variables 106 5.4 Example : Two Discrete Random Variables 108 5.4.1 Marginal PMFs and Expected Values 109 5.4.2 independence 109 5.4.3 Joint CDF llO 5.4.4 Transformations With One Output 110 5.4.5 Transformations With Several Outputs 112 5.4.6 Discussion 113 5.5 Sums of Independent Random Variables 113 5.6 Sample Probabilities, Mean, and Variance 117 5.7 Histograms ll9 5.8 Entropy and Data Compression 120 5.8.1 Entropy and Information Theory 121 5.8.2 variable Length Coding 123 5.8.3 Encoding Binary Sequences 127 5.8.4 Maximum Entropy 128 Summary 131 Problems 132

75

CONTENTS

6

BINOMIAL PROBABILITIES

137

6.1 6.2 6.3 6.4 6.5

Basics of the Binomial Distribution 137 Computing Binomial Probabilities 141 Moments of the Binomial Distribution 142 Sums of Independent Binomial Random Variables 144 Distributions Related to the Binomial 146 6.5.1 Connections Between Binomial and Hypergeometric Probabilities 146 6.5.2 polynomial Probabilities 147 6.5.3 The Negative Binomial Distribution 148 6.5.4 The Poisson Distribution 149 6.6 Binomial and Multinomial Estimation 151 6.7 Alohanet 152 6.8 Error Control Codes 154 6.8.1 Repetition-by-Three Code 155 6.8.2 cosmopolitan Linear Block Codes 157 6.8.3 Conclusions 160 summary 160 Problems 162

7

A CONTINUOUS RANDOM VARIABLE 7.1 7.2 7.3

167

Basic Properties 167 Example Calculations for One Random Variable Selected Continuous Distributions 174 7.3.1 The Uniform Distribution 174 7.3.2 The Exponential Distribution 176 7.4 conditional Probabilities 179 7.5 Discrete PMFs and Delta Functions 182 7.6 quantization 184 7.7 A Final Word 187 Summary 187 Problems 189

8

171

MULTIPLE CONTINUOUS RANDOM VARIABLES 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10

192

Joint Densities and Distribution Functions 192 Expected Values and Moments 194 Independence 194 Conditional Probabilities for Multiple Random Variables 195 Extended example : Two continuous Random Variables 198 Sums of Independent Random Variables 202 Random Sums 205 General Transformations and the Jacobian 207 Parameter Estimation for the Exponential Distribution 214 Comparison of Discrete and Continuous Distributions 214

seven

eight

CONTENTS Summary 215 Problems 216

9

THE GAUSSIAN AND RELATED DISTRIBUTIONS

221

9.1 9.2 9.3 9.4 9.5

The gaussian Distribution and Density 221 Quantile Function 227 Moments of the gaussian Distribution 228 The Central Limit Theorem 230 Related Distributions 235 9.5.1 The Laplace Distribution 236 9.5.2 The Rayleigh Distribution 236 9.5.3 The Chi-Squared and F Distributions 238 9.6 multiple gaussian Random Variables 240 9.6.1 Independent gaussian Random Variables 240 9.6.2 transformation to Polar Coordinates 241 9.6.3 Two Correlated gaussian Random Variables 243 9.7 exemplar : Digital Communications Using QAM 246 9.7.1 Background 246 9.7.2 discrete Time Model 247 9.7.3 Monte Carlo Exercise 253 9.7.4 QAM Recap 258 Summary 259 Problems 260

10 ELEMENTS OF STATISTICS

265

10.1 A simple Election Poll 265 10.2 Estimating the Mean and Variance 269 10.3 recursive calculation of the Sample Mean 271 10.4 exponential Weighting 273 10.5 order Statistics and Robust Estimates 274 10.6 Estimating the Distribution Function 276 10.7 PMF and Density Estimates 278 10.8 confidence Intervals 280 10.9 significance Tests and p-Values 282 10.10 introduction to Estimation Theory 285 10.11 minimum Mean Squared Error Estimation 289 10.12 bayesian Estimation 291 Problems 295

11 GAUSSIAN RANDOM VECTORS AND LINEAR REGRESSION 11.1 gaussian Random Vectors 298 11.2 linear Operations on gaussian Random Vectors 11.3 Linear Regression 304 11.3.1 linear Regression in Detail 305

303

298

CONTENTS 11.3.2 Statistics of the Linear Regression Estimates 11.3.3 Computational Issues 311 11.3.4 analogue Regression Examples 313 11.3.5 Extensions of Linear Regression 317 Summary 319 Problems 320

12 HYPOTHESIS TESTING

309

324

12.1 Hypothesis Testing : basic Principles 324 12.2 exemplar : Radar Detection 326 12.3 hypothesis Tests and Likelihood Ratios 331 12.4 MAP Tests 335 drumhead 336 Problems 337

13 RANDOM SIGNALS AND NOISE

340

13.1 13.2 13.3 13.4 13.5 13.6

Introduction to Random Signals 340 A simpleton Random Process 341 Fourier Transforms 342 WSS Random Processes 346 WSS Signals and Linear Filters 350 Noise 352 13.6.1 probabilistic Properties of Noise 352 13.6.2 apparitional Properties of Noise 353 13.7 example : Amplitude Modulation 354 13.8 example : discrete Time Wiener Filter 357 13.9 The Sampling Theorem for WSS Random Processes 357 13.9.1 Discussion 358 13.9.2 Example : figure 13.4 359 13.9.3 proof of the Random Sampling Theorem 361 Summary 362 Problems 364

14 SELECTED RANDOM PROCESSES

366

14.1 14.2 14.3 14.4

The Lightbulb Process 366 The Poisson Process 368 Markov Chains 372 Kalman Filter 381 14.4.1 The Optimal Filter and Example 381 14.4.2 QR Method Applied to the Kalman Filter Summary 386 Problems 388

384

nine

x CONTENTS A

COMPUTATION EXAMPLES A.l A.2 A.3

Matlab 391 Python 393 R 395

B ACRONYMS

c

399

PROBABILITY TABLES C.l

405

401

Tables of Gaussian Probabilities

D BIBLIOGRAPHY 403 INDEX

391

401

I have many goals for this book, but this is foremost : I have constantly liked probability and have been fascinated by its application to predicting the future. I hope to encourage this generation of students to study, appreciate, and apply probability to the many applications they will face in the years ahead. To the scholar : This book is written for you. The prose style is less courtly than many textbooks use. This more hire prose was chosen to encourage you to read the ledger. I firm believe a good casebook should help you learn the corporeal. But it will not help if you do not read it. Whenever I ask my students what they want to see in a text, the answer is : “ Examples. Lots of examples : ‘ I have tried to heed this advice and included “ lots ” of examples. many are little, quick examples to illustrate a individual concept. Others are long, detailed examples designed to demonstrate more sophisticated concepts. last, most chapters end in one or more longer examples that illustrate how the concepts of that chapter apply to mastermind or scientific applications. Almost all the concepts and equations are derived using act algebra. Read the derivations, and reproduce them yourselves. A great learn technique is to read through a section, then write down the salient points. Read a derivation, and then reproduce it yourself. Repeat the sequence-read, then reproduce-until you get it good. I have included many figures and graphics. The old expression, “ a picture is worth a thousand words ; ‘ is placid true. I am a believer in Edward Tufte ‘s graphics philosophy : maximize the data-ink ratio. 1 All graphics are cautiously drawn. They each have enough ink to tell a story, but only enough ink. To the teacher : This textbook has several advantages over other textbooks. It is the right size-not besides big and not excessively small. It should cover the essential concepts for the level of the course, but should not cover excessively much. character of the artwork of textbook write is to decide what should be in and what should be out. The choice of topics is, of course, a decision on the character of the generator and represents the era in which the bible is written. When I first started teaching my path more than two decades ago, the selection of topics favored continuous random variables and continuous prison term random processes. Over meter, discrete random variables and discrete time random processes have grown in importance. Students nowadays are expected to understand more statistics than in the by. calculation is much more crucial and more immediate. Each year I add a bit more calculation to the course than the prior class. I like calculation. indeed do most students. Computation gives a reality to the theoretical concepts. It can besides be fun. Throughout the book, there are computational examples and exercises. unfortunately, not everyone uses the same computational packages. The book uses

1 Edward Tufte, The Visual Display of Quantitative Information, 2nd erectile dysfunction. Cheshire, CT : Graphics Press, 2001. A great book, highly recommended.

xi

xii

PREFACE

three of the most popular : Matlab, Python, and R. For the most character, we alternate between Matlab and Python and postpone discussion ofR until the statistics chapters. Most chapters have a common format : introductory material, followed by deeper and more involve topics, and then one or more examples illustrating the application of the concepts, a compendious of the main topics, and a tilt of homework problems. The teacher can choose how far into each chapter to go. For case, I normally cover information ( Chapter 5 ) and Aloha ( Chapter 6 ), but skim error-correcting tease ( besides Chapter 6 ). I am a firm believer that before statistics or random processes can be understand, the student must have a full cognition of probability. A typical undergraduate class can cover the first nine chapters in about two-thirds of a semester, giving the scholar a good sympathize of both discrete and continuous probability. The teacher can select topics from the late chapters to fill out the rest of the semester. If students have had basic probability in a anterior course, the first nine chapters can be covered quickly and greater emphasis placed on the remaining chapters. Depending on the concenter of the course, the teacher can choose to emphasize statistics by covering the material in Chapters 10 through 12. alternatively, the teacher can emphasize random signals by covering Chapters 13 and 14. The textbook can be used in a graduate class. Assuming the students have seen some probability as undergraduates, the first nine chapters can be covered cursorily and more attention paid to the last five chapters. In my experience, most fresh graduate students need to refresh their probability cognition. Reviewing the first nine chapters will be meter well spend. alumnus students will besides benefit from doing computational exercises and learning the similarities and differences in the three computational packages discussed, Matlab, Python, andR.

Chapter Coverage Chapters 1 and 2 are a fairly standard insertion to probability. The foremost chapter introduces the basic definitions and the three axioms, proves a serial of childlike theorems, and concludes with detail examples of calculating probabilities for elementary networks. The second chapter covers conditional probability, Bayes theorem and the law of entire probability, and respective applications. chapter 3 is a detour into combinatorics. A cognition of combinatorics is substantive to understand probability, particularly discrete probability, but students frequently confuse the two, thinking combinatorics to be a branch of probability. The two are different, and we emphasize that. much of the development of probability in history was driven by gambling. I, excessively, use examples from gambling and plot play in this chapter ( and in some belated chapters as well ). Students play games and occasionally gamble. Examples from these subjects help bring probability to the student life experience-and we show that gambling is improbable to be profitable ! Chapters 4 and 5 introduce discrete probability mass functions, distribution functions, expected values, change of variables, and the undifferentiated, geometric, and Poisson distributions. chapter 4 culminates with a discussion of the fiscal considerations of gambling versus buy policy. chapter 5 ends with a long section on information and data compaction. ( It distillery amazes me that most textbooks targeting an electric and calculator technology

PREFACE

xiii

audience exclude randomness. ) chapter 6 presents binomial, polynomial, damaging binomial, hypergeometric, and Poisson probabilities and considers the connections between these important discrete probability distributions. It is punctuated by two optional sections, the first on the Aloha protocol and the second on error-correcting codes. Chapters 7 and 8 stage continuous random variables and their densities and distribution functions. Expected values and changes of variables are besides presented, as is an stretch exemplar on quantization. chapter 9 presents the gaussian distribution. Moments, expected values, and change of variables are besides presented here. The central terminus ad quem theorem is motivated by multiple examples showing how the probability mass function or concentration function converges to the gaussian density. Some of the related distributions, including the Laplace, Rayleigh, and chi-squared, are presented. The chapter concludes with an extended example on digital communications using quadrature amplitude intonation. Exact and approximate erroneousness rates are computed and compared to a Monte Carlo model. The inaugural nine chapters are typically covered in club at whatever speed is comfortable for the teacher and students. The remaining chapters can be divided into two subjects, statistics and random processes. Chapters 10, 11, and 12 incorporate an insertion to statistics, linear arrested development, and hypothesis test. Chapters 13 and 14 insert random processes and random signals. These chapters are not serial ; the teacher can pick and choose whichever chapters or sections to cover. chapter 10 presents basic statistics. At this indicate, the scholar should have a good understanding of probability and be ready to understand the “ why ” behind statistical procedures. Standard and full-bodied estimates of mean and division, concentration and distribution estimates, assurance intervals, and significance tests are presented. finally, utmost likelihood, minimum mean squared estimate, and Bayes estimate are discussed. chapter 11 takes a linear algebra approach ( vectors and matrices ) to multivariate gaussian random variables and uses this approach to study linear regression. chapter 12 covers hypothesis testing from a traditional engineer point of view. MAP ( maximum a posteriori ), Neyman-Pearson, and bayesian hypothesis tests are presented. chapter 13 studies random signals, with finical emphasis on those signals that appear in engineer applications. Wide sense stationary signals, noise, linear filters, and modulation are covered. The chapter ends with a discussion of the sampling theorem. chapter 14 focuses on the Poisson march and Markov processes and includes a segment on Kalman filtering. Let me conclude this precede by repeating my overall goal : that the student will develop not alone an reason and appreciation of probability, statistics, and random processes but besides a willingness to apply these concepts to the diverse problems that will occur in the years ahead.

Acknowledgments I would like to thank the reviewers who helped shaped this textbook during its growth. Their many comments are much appreciated. They are the pursue : Deva K Borah, New Mexico State University Petar M. Djuric, Stony Brook University Jens Gregor, University of Tennessee

xiv

PREFACE

Eddie Jacobs, University of Memphis JeongHee Kim, San Jose State University Nicholas J. Kirsch, University of New Hampshire Joerg Kliewer, New Jersey Institute of Technology Sarah Koskie, Indiana University-Purdue University Indianapolis Ioannis ( John ) Lambadaris, Carleton University Eric Miller, Tufts University Ali A. Minai, University of Cincinnati Robert Morelos-Zaragoza, San Jose State University Xiaoshu Qian, Santa Clara University Danda B. Rawat, Georgia Southern University Rodney Roberts, Florida State University John M. Shea, University of Florida Igor Tsukerman, The University of Akron I would like to thank the pursue people from Oxford University Press who helped make this reserve a reality : Nancy Blaine, John Appeldorn, Megan Carlson, Christine Mahon, Daniel Kaveney, and Claudia Dukeshire. last, and decidedly not least, I would like to thank my children, Matthew and Amy, and my wife, Carol, for their patience over the years while I worked on this reserve. Charles Boncelet

CHAPTER

PROBABILITY BASICS

In this chapter, we introduce the formalism of probability, from experiments to outcomes to events. The three axioms of probability are introduced and used to prove a number of dim-witted theorems. The chapter concludes with some examples.

1.1 WHAT IS PROBABILITY ? Probability refers to how likely something is. By convention, probabilities are substantial numbers between 0 and 1. A probability of 0 refers to something that never occurs ; a probability of 1 refers to something that always occurs. Probabilities between 0 and 1 refer to things that sometimes occur. For example, an ordinary coin when flip will land heads up about half the time and land tails up about half the time. We say the probability of heads is 0.5 ; the probability of tails is besides 0.5. As another model, a typical telephone line has a probability of sending a data bit correctly of around 0.9999, or 1 – w- 4. The probability the bite is incorrect is w- 4. A fiber-optic course may have a piece error rate vitamin a first gear as w- 15. Imagine Alice sends a message to Bob. For Bob to receive any information ( any newfangled cognition ), the message must be unknown to Bob. If Bob knew the message before receiving it, then he gains no new cognition from hearing it. only if the message is random to Bob will Bob receive any information. There are a great many applications where people try to predict the future. Stock markets, weather, sporting events, and elections all are random. successful prediction of any of these would be vastly profitable, but each seems to have hearty randomness. Engineers worry about dependability of devices and systems. Engineers control complex systems, frequently without perfect cognition of the inputs. People are building self-driving automobiles and aircraft. These devices must operate successfully even though all sorts of unpredictable events may occur.

2 CHAPTER 1 PROBABILI1Y BASICS Probabilities may be functions of other variables, such as time and space. The probability of person getting cancer is a function of lots of things, including senesce, sex, genetics, dietary habits, whether the person smokes, and where the person lives. Noise in an electric lap is a serve of clock and temperature. The count of questions answered correctly on an examination is a serve of what questions are asked-and how prepared the test taker is ! In some problems, time is the relevant measure. How many flips of a coin are required before the first head occur ? How many before the lOOth head ? The bespeak of this is that many experiments have randomness, where the result of the experiment is not known in gain. Furthermore, repetitions of the like experiment may produce different results. Flipping a coin once and getting heads does not mean that a second flick will be heads ( or tails ). probability is about understanding and quantifying this randomness.

gloss 1.1 : Is a coin flip in truth random ? Is it irregular ?

Presumably, if we knew the bulk distribution of the coin, the initial force ( both linear and rotational ) applied to the coin, the density of the publicize, and any breeze currents, we could use physics to compute the way of the coin and how it will land ( heads or tails ). From this point of view, the mint is not random. In practice, we normally do not know these variables. Most coins are symmetrical ( or close to symmetrical ). vitamin a long as the number of rotations of the coin is large, we can sanely assume the coin will land heads up half the time and tails up half the time, and we can not predict which will occur on any given pass. From this sharpen of horizon, the coin flip is random. however, there are rules, evening if they are normally mute. The coin flipper must make no attack to control the flip ( one .e., to control how many rotations the coin undergoes before landing ). The flipper must besides make no try to control the catch of the coin or its landing. These conce rns are real number. Magicians have been known to practice flipping coins until they can control the pass. ( And sometimes they simply cheat. ) alone if the rules are followed can we reasonably assume the coin flip is random.

EXAMPLE 1.1

Let us test this interview : How many flips are required to get a lead ? Find a coin, and flip it until a head occurs. Record how many flips were required. Repeat the experiment again, and record the resultant role. Do this at least 10 times. Each of these is referred to as a test, a sequence of tails ending with a heads. What is the longest run you observed ? What is the shortest ? What is the average run distance ? Theory tells us that the average rivulet length will be about 2.0, though of course your average may be different.

1.2 Experiments, Outcomes, and Events

3

1.2 EXPERIMENTS, OUTCOMES, AND EVENTS An experiment is whatever is done. It may be flipping a coin, rolling some dice, measuring a electric potential or person ‘s altitude and weight, or numerous others. The experiment results in outcomes. The outcomes are the atomic results of the experiment. They can not be divided further. For case, for a coin flip, the outcomes are heads and tails ; for a reckon experiment ( e.g., the total of electrons crossing a PN junction ), the outcomes are the nonnegative integers, 0, 1, 2, 3, …. Outcomes are denoted with italic lowercase letters, possibly with subscripts, such as ten, n, aluminum > a2, etc. The number of outcomes can be finite or infinite, as in the two examples mentioned in the paragraph above. Furthermore, the experiment can result in discrete outcomes, such as the integers, or continuous outcomes, such as a person ‘s weight. For now, we postpone continuous experiments to Chapter 7 and consider only discrete experiments. Sets of outcomes are known as events. Events are denoted with italic uppercase Roman letters, possibly with subscripts, such as A, B, and A ;. The outcomes in an consequence are listed with braces. For case, A = { 1, 2, 3, 4 } orB = { 2, 4, 6 }. A is the event containing the outcomes 1, 2, 3, and 4, while B is the event containing outcomes 2, 4, and 6. The arrange of all possible outcomes is the sample distribution space and is denoted by S. For exemplar, the outcomes of a roll of an ordinary six-sided die 1 are 1, 2, 3, 4, 5, and 6. The sample outer space isS= { 1,2,3,4,5,6 }. The typeset containing no result is the empty set and is denoted by¢. The complement of an event A, denoted A, is the event containing every consequence not in A. The sample space is the complement of the empty set, and vice versa. The usual rules of set arithmetical use to events. The union of two events, A uranium B, is the event containing outcomes in either A or B. The intersection of two events, A n B or more merely AB, is the consequence containing all outcomes in both A and B. ForanyeventA, AnA =AA =¢and A uA = S. EXAMPLE 1.2

Consider a roll of an ordinary six-sided die, and let A= { 1,2,3,4 } and B = { 2,4,6 }. then, A uB = { 1,2,3,4,6 } and A nota bene = { 2,4 }. A= { 5,6 } andB = { 1,3,5 }.

Consider the following experiment : A coin is flipped three times. The outcomes are the eight flip sequences : hhh, hht, …, ttt. If A = { first flip is head } = { hhh, hht, hth, htt }, then A = { ttt, tth, tht, thh }. If B = { precisely two heads } = { hht, hth, thh }, then Au B = { hhh, hht, hth, htt, thh } and AB = { hht, hth }.

comment 1.2 : Be careful in defining events. In the coin flipping experiment above, an

event might be specified as C = { two heads }. Is this “ precisely two heads ” or “ at least two heads ” ? The erstwhile is { hht, hth, thh }, while the latter is { hht, hth, thh, hhh }.

1

One is a die ; two or more are dice.

4

CHAPTER 1 PROBABILI1Y BASICS

Set arithmetical obeys DeMorgan ‘s laws :

AuB=AnB

( 1.1 )

AnB=AuB

( 1.2 )

DeMorgan ‘s laws are handy when the complements of events are easier to define and specify than the events themselves. A is a subset of B, denoted A speed of light B, if each result in A is besides in B. For example, if A = { 1, 2 } and B = { 1, 2, 4, 6 }, then A coulomb B. note that any stage set is a subset of itself, A calcium. If A deoxycytidine monophosphate B andBcA, then A =B. Two events are disjoint ( besides known as mutually exclusive ) if they have no outcomes in park, that is, if AB =¢.A collection of events, A ; fori= 1,2, …, is pairwise disjoint if each pair of events is disjoin, i.e., A ; Aj = ¢ for all one fluorine :. j. A collection of events, A ; for i = 1, 2, …, forms a partition of S if the events are pairwise disjoint and the union of all events is the sample space :

A ; Aj = ¢

for one f :. j

00

UA ; =S i= fifty

In the following chapter, we introduce the law of full probability, which uses a division to divide a trouble into pieces, with each A ; representing a man. Each piece is solved and the pieces combined to get the sum solution.

1.3 VENN DIAGRAMS A useful instrument for visualizing relationships between sets is the Venn diagram. typically, Venn diagrams use a box for the sample distance and circles ( or circle-like figures ) for the versatile events. In Figure 1.1, we show a elementary Venn diagram. The out box, labeled S, denotes the sample distribution space. All outcomes are in S. The two circles, A and B, map two events. The

A

FIGURE 1.1 A Venn diagram ofAuB.

B

s

1.4 Random Variables

A

5

B

OJ CD OJ Light : dark :

AB AB

A

n

B

A

uracil

B

FIGURE 1.2 A Venn~ag~m ~proof ” of the second ofDeMorgan ‘s laws ( Equation 1.2 ). The “ blue ” parts show AB =Au B, while the “ light ” parts show AB =An B.

shaded area is the union of these two events. One can see that A = ABu AB, that B = ABu AB, and thatAuB =ABuABuAB. design 1.2 presents a dim-witted Venn diagram proof of Equation ( 1.2 ). The dark shaded area in the leftmost box represents AB, and the shaded areas in the two rightmost boxes represent A and B, respectively. The leftover box is the legitimate OR of the two rightmost boxes. On the other pass, the light area on the forget is AB. It is the coherent AND of A and B. name 1.3 shows a dowry of the Venn diagram of Au B uranium C. The shade area, representing the coupling, can be divided into seven parts. One region is ABC, another part is ABC, etc. Problem 1.13 asks the reader to complete the photograph.

s

FIGURE 1.3 A Venn diagram of Au B u C. See Problem 1.13 for details.

1.4 RANDOM VARIABLES It is much convenient to refer to the outcomes of experiments as numbers. For example, it is commodious to refer to “ heads ” as 1 and “ tails ” as 0. The faces of most six-sided dice are labeled with pips ( dots ). We refer to the side with one spot as 1, to the side with two pips as 2, etc. In other experiments, the map is less absolved because the outcomes are naturally numbers. A coin can be flipped nitrogen times and the number of heads counted. Or a big number

6

CHAPTER 1 PROBABILI1Y BASICS

ofbits can be transmitted across a wireless communication network and the issue of bits received in error counted. A randomly chosen person ‘s height, weight, long time, temperature, and blood atmospheric pressure can be measured. All these quantities are represented by numbers. Random variables are mappings from outcomes to numbers. We denote random variables with bold-italic uppercase Roman letters ( or sometimes greek letters ), such as X andY, and sometimes with subscripts, such as X 1, X 2, etc. The outcomes are denoted with italic lowercase letters, such as x, y, and n. For example, X ( heads ) = 1 X ( tails ) = 0

Events, sets of outcomes, become relations on the random variables. For example, { heads } = { X ( heads ) =

1 } = { X= 1 }

where we simplify the note and write good { X= 1 }. As another example, let Y denote the number of heads in three flips of a mint. then, versatile events are written as follows : { hhh } = ! Y = 3 } { hht, hth, thh } = ! Y = 2 } { hhh, hht, hth, thh } = { 2 :5 Y :53 } = ! Y = 2 } u { Y = 3 }

In some experiments, the variables are discrete ( for example, counting experiments ), and in others, the variables are continuous ( for example, stature and weight ). In hush others, both types of random variables can be introduce. A person ‘s height and weight unit are continuous quantities, but a person ‘s gender is discrete, say, 0 = male and 1 = female. A crucial distinction is that between the random variable, say, N, and the outcomes, say, k = 0, 1,2, 3. Before the experiment is done, the value of N is obscure. It could be any of the outcomes. After the experiment is done, N is one of the values. The probabilities of N refer to before the experiment ; that is, Pr [ N = potassium ] is the probability the experiment results in the result thousand ( i.e., that result k is the selected result ). discrete random variables are considered in detail in Chapters 4, 5, and 6 and continuous random variables in Chapters 7, 8, and 9.

1.5 BASIC PROBABILITY RULES In this part, we take an intuitive access to the basic rules of probability. In the next section, we give a more formal approach path to the basic rules. When the experiment is performed, one result is selected. Any event or events containing that consequence are true ; all other events are assumed. This can be a confusing period : even though entirely one consequence is selected, many events can be true because many events can contain the selected consequence. For case, consider the experiment of rolling an ordinary six-sided die. The outcomes are the numbers 1, 2, 3, 4, 5, and 6. Let A= { 1,2,3,4 }, B = { 2,4,6 }, and C = { 2 }. then, if the paradiddle results in a 4, events A and B are true while C is false.

1.5 Basic Proba bility Rules

7

Comment 1.3 : The operations of set arithmetical are analogous to those of Boolean

algebra. Set union is analogous to Boolean Or, set overlap to Boolean And, and set complement to Boolean complement. For case, if C =Au B, then C contains the selected consequence if either A eyeball ( or both ) contain the selected consequence. alternatively, we say C is on-key if A is true orb is true.

Probability is a function of events that yields a number. If A is some event, then the probability of A, denoted Pr [ A ], is a number ; that is, Pr [ A ] =number

( 1.3 )

Probabilities are computed as follows : Each consequence in S is assigned a probability between 0 and 1 such that the sum of all the consequence probabilities is 1. then, for example, if A = { aba 2, a 3 ), the probability of A is the total of the result probabilities in A ; that is,

A probability of 0 means the consequence does not occur. The empty sic ¢, for example, has probability 0, or Pr [ ¢ ] = 0, since it has no outcomes. By definition any result is selected is not in the evacuate set. conversely, the sample space contains all outcomes. It is always true. Probabilities are normalized thus that the probability of the sample space is 1 : Pr [ S ] =l The probability of any consequence ampere is between 0 and 1 ; that is, 0:5 Pr [ A ] :51. Since A uA = S, it is reasonable to expect that Pr [ A ] + Pr [ A ] = 1. This is indeed true and can be handy. sometimes one of these probabilities, Pr [ A ] or Pr [ A ], is much easier to compute than the other one. Reiterating, for any event A, 0 : oxygen ; Pr [ A ] :51 Pr [ A ] + Pr [ A ] = 1 The probabilities of nonoverlapping events add : if AB =¢, then Pr [ A uranium B ] = Pr [ A ] + Pr [ B ]. If the events overlap ( i.e., have outcomes in common ), then we must modify the formula to eliminate any double count. There are two independent ways of doing this. The inaugural adds the two probabilities and then subtracts the probability of the imbrication region : Pr [ AuB ] =Pr [ Aj +Pr [ Bj-Pr [ AB ] The moment avoids the overlap by breaking the union into nonoverlapping pieces : Pr [ A uB ] = Pr [ AB uAB uAB ] = Pr [ AB ] + Pr [ AB ] + Pr [ AB ] Both formulas are utilitarian. A crucial notion in probability is that of independence. independence means two events, A and B, do not affect each other. For case, flip a mint doubly, and let A represent the consequence

8

CHAPTER 1 PROBABILI1Y BASICS

the first coin is heads and B the consequence the second mint is heads. If the two coin flips are done in such a room that the result of the first flip does not affect the second throw ( as coin flips are normally done ), then we say the two flips are independent. When A and Bare autonomous, the probabilities multiply : Pr [ AB ] = Pr [ A ] Pr [ B ]

if A andB are independent

Another room of thinking about independence is that knowing A has occurred ( or not occurred ) does not give us any information about whether B has occurred, and conversely, knowing B does not give us information about A. See Chapter 2 for further discussion of this opinion of independence.

comment 1.4 : sometimes probabilities are expressed as percentages. A probability of

0.5 might be expressed as a 50 % prospect of occurring. The note, Pr [ A ], is shorthand for the more building complex “ probability that event A is true, ” which itself is shorthand for the even more complex “ probability that one of the outcomes in A is the result of the experiment. ” similarly, intersections and unions can be thought of in terms of Boolean algebra : Pr [ A uranium B ] means “ the probability that event A is true or consequence B is dependable, ” and Pr [ AB ] means “ the probability that event A is genuine and event B is true. ”

EXAMPLE 1.4

In Example 1.2, we defined two events, A and B, but said nothing about the probabilities. Assume each side of the die is equally likely. Since there are six sides and each side is evenly likely, the probability of any one side must be 116 : ( list the outcomes of A )

Pr [ Aj = Pr [ { 1,2,3,4 } ] = Pr [ 1 ] +Pr [ 2 ] +Pr [ 3 ] +Pr [ 4 ]

( break the event into its outcomes )

1 1 1 1 4 =-+-+-+-=6 6 6 6 6

( each slope evenly likely )

Pr [ B ] = Pr [ 2,4,6 ] =

3

1

6 = 2 ”

Continuing, A uB = { 1,2,3,4,6 } and AB = { 2,4 }. frankincense, Pr [ AuB ] = Pr [ { 1,2,3,4,6 } ] =

65

= Pr [ A ] + Pr [ B ] – Pr [ AB ] 4 3 2 5 =-+ — -=6 6 6 6

( first base, solve directly ) ( second, solve with union formula )

1.6 Probability Formalized

9

alternatively, AB = { 1, 3 }, AB = { 6 }, and 2 2 1 5 Pr [ A u B ] = Pr [ AB ] + Pr [ AB ] + Pr [ AB ] = 6 + 6 + 6 = 6

EXAMPLE 1.5

In Example 1.4, we assumed all sides of the die are evenly likely. The probabilities do not have to be evenly likely. For example, consider the follow probabilities : Pr [ 1 ] =0.5 Pr [ k ] = 0.1

fork= 2,3,4,5,6

then, repeating the above calculations, Pr [ Aj = Pr [ { 1,2,3,4 } ]

( list the outcomes of A )

= Pr [ 1 ] +Pr [ 2 ] +Pr [ 3 ] +Pr [ 4 ]

( break the event into its outcomes )

1 1 1 1 8 =-+-+-+-=2 10 10 10 10

( unequal probabilities )

Pr [ B ] = Pr [ 2,4,6 ] =

3 10

Continuing, A uB = { 1,2,3,4,6 }, and AB = { 2,4 }. therefore, 9 Pr [ AuB ] = Pr [ { 1,2,3,4,6 } ] = 10 = Pr [ A ] + Pr [ B ] – Pr [ AB ]

( beginning, solve directly ) ( second, solve with union recipe )

8 3 2 9 =-+ — -=10 10 10 10 alternatively, 2 6 1 9 Pr [ AuB ] = Pr [ AB ] +Pr [ AB ] +Pr [ AB ] = – + – + – = 10 10 10 10

1.6 PROBABILITY FORMALIZED A courtly development begins with three axioms. Axioms are truths that are unproved but take. We present the three axioms of probability, then use these axioms to prove several basic theorems. The first two axioms are childlike, while the third is more complicate :

Axiom 1 : Pr [ A ] ? .OjoranyeventA.

10

CHAPTER 1 PROBABILI1Y BASICS

Axiom 2 : Pr [ S ] = 1, where Sis the sample space. Axiom 3 :

If Ai for iodine = 1, 2, … are pairwise disjoint, then 00

Pr [ UAi ] = I : Pr [ Ai ] i= l

( 1.4 )

i= liter

From these three axioms, the basic theorems about probability are proved. The beginning maxim states that all probabilities are nonnegative. The second maxim states that the probability of the sample space is 1. Since the sample distance contains all possible outcomes ( by definition ), the leave of the experiment ( the consequence that is selected ) is contained in S. Thus, S is always dependable, and its probability is 1. The third axiom says that the probabilities of nonoverlapping events add ; that is, if two or more events have no outcomes in coarse, then the probability of the union is the sum of the individual probabilities. Probability is like mass. The first maxim says mass is nonnegative, the irregular says the batch of the population is 1, and the third says the masses of nonoverlapping bodies add. In advanced text, the word “ measure ” is often used in discussing probabilities. This third maxim is handy in computing probabilities. Consider an consequence A contain outcomes, a 1, a 2, •••, an. then,

A= ! three-toed sloth > arizona, …, anl = { ad u { azimuth } uranium · · · u lanl Pr [ A ] = Pr [ { ad ] + Pr [ { arizona } ] + · · · + Pr [ ! an ) ] since the events, ! three-toed sloth ), are disjoint. When the context is clear, the awkward notation Pr [ { garlic ] is replaced by the simple Pr [ ai ]. In words, the paradigm is clear : divide the event into its outcomes, calculate the probability of each result ( technically, of the event containing only that consequence ), and sum the probabilities to obtain the probability of the event.

comment 1.5 : The third axiom is often presented in a finite form : normality

n

Pr [ UA ; ] = I ; Pr [ A ; ] i= 1

i= 1

when the A ; are pairwise disjoin. A common particular font holds for two disjoin events : ifAB =I/ !, then Pr [ AuB ] = Pr [ A ] +Pr [ B ]. Both of these are especial cases of the third maxim Uust let the surfeit A ; = 1/ ! ). But, for technical reasons that are beyond this textbook, the finite translation does not imply the space vers1on.

1.7 simple Theorems

11

1.7 SIMPLE THEOREMS In this part, we use the axioms to prove a series of ” dim-witted theorems ” about probabilities. These theorems are so childlike that they are much mistaken to be maxim themselves. Theorem 1.1 : Pr [ ¢ ] = 0.

proofread : ( the second gear axiom )

1 = Pr [ S ]

=Pr [ Su¢ ]

( S=Su¢ )

= Pr [ S ] +Pr [ ¢ ]

( by the third base maxim )

= 1 +Pr [ ¢ ]

( by the second maxim )

•

The last implies Pr [ ¢ ] = 0.

This theorem provides a symmetry to Axiom 2, which states the probability of the sample space is 1. This theorem states the probability of the null space is 0. The next theorem relates the probability of an event to the probability of its complement. The importance lies in the simple notice that one of these probabilities may be easier to compute than the early. Theorem 1.2 : Pr [ .A ] = 1-Pr [ A ]. In early words, Pr [ A ] +Pr [ A ] = 1.

proof : By definition, Au A = S. Combining the second and third base axioms, one obtains 1 = Pr [ S ] = Pr [ A uA ] = Pr [ A ] + Pr [ A ]

A simpleton rearrangement yields Pr [ .A ] = 1- Pr [ A ].

•

This theorem is utilitarian in practice. Calculate Pr [ A ] or Pr [ .A ], whichever is easier, and then subtract from 1 if necessary. Theorem 1.3 : Pr [ A ] :5 1 for any event A.

proof : Since 0 :5 Pr [ .A ] ( by Axiom 2 ) and Pr [ .A ] = 1- Pr [ A ], it follows immediately that Pr [ Aj :51. • Combining Theorem 1.3 and Axiom 1, one obtains 0:5 Pr [ Aj :51

12

CHAPTER 1 PROBABILI1Y BASICS

for any event angstrom. This bears repeating : all probabilities are between 0 and 1. One can combine this solution with Axiom 2 and Theorem 1.3 to create a childlike imprint : 0 = Pr [ ¢ ] :5 Pr [ Aj :5 Pr [ Sj = 1 The probability of the null event is 0 ( it contains no outcomes, so the null consequence can never be true ). The probability of the sample distribution space is 1 ( it contains all outcomes, so the sample space is always on-key ). All other events are somewhere in between 0 and 1 inclusive. While it may seem counter-intuitive, it is reasonable in many experiments to define outcomes that have zero probability. then, nontrivial events, A and B, can be defined such that A farad :. ¢but Pr [ A ] = 0 and B f :. Sbut Pr [ B ] = 1. many probability applications depend on parameters. This theorem provides a sanity check on whether a supposed solution can be correct. For case, let the probability of a capitulum be p. Since probabilities are between 0 and 1, it must be true that 0 :5 phosphorus :5 1. nowadays, one might ask what is the probability of getting three heads in a row ? One might guess the answer is 3p, but this answer is obviously incorrect since 3p > 1 when phosphorus > 1I 3. If the mint flips are autonomous ( discussed below in Section 1.9 ), the probability of three heads in a quarrel is phosphorus 3. If 0 :5 p :5 1, then 0 :5 phosphorus 3 :5 1. This answer is possibly correct : it is between 0 and 1 for all permissible values of p. Of course, lots of incorrect answers are besides between 0 and 1. For example, p 2, p/3, and cobalt ( pn 12 ) are all between 0 and 1 for 0 :5 p :5 1, but none is decline. theorem 1.4 : IfAcB, thenPr [ A ] :5Pr [ B ]. proof :

B= ( AuA ) B =ABuAB =AuAB

( A degree centigrade B implies AB =A )

Pr [ B ] = Pr [ Aj +Pr [ AB ] ~Pr [ Aj

( since Pr [ AB ] ~ 0 )

•

Probability is an increasing routine of the outcomes in an event. Adding more outcomes to the event may cause the probability to increase, but it will not cause the probability to decrease. ( It is potential the extra outcomes have zero probability. then, the events with and without those extra outcomes have the lapp probability. ) Theorem 1.5 : For any two events A and B, Pr [ AuB ] =Pr [ Aj +Pr [ Bj-Pr [ AB ]

( 1.5 )

This theorem generalizes Axiom 3. It does not require A and B to be disjoin. If they are, then AB = ¢, and the theorem reduces to Axiom 3.

1.7 Simple Theorems

13

PROOF : This proof uses a series of basic results from set arithmetical. First, A =AS=A ( BuB ) =ABuAB Second, forB, B = BS = B ( A uA ) = AB uAB and A uB = ABuABuABuAB = ABuAB uAB Thus, Pr [ A uB ] = Pr [ AB ] + Pr [ AB ] + Pr [ AB ]

( 1.6 )

Similarly, Pr [ A ] + Pr [ B ] = Pr [ AB ] + Pr [ AB ] + Pr [ AB ] + Pr [ AB ] = Pr [ AuB ] +Pr [ AB ]

Rearranging thelast equation yields the theorem : Pr [ Au B ] = Pr [ A ] + Pr [ B ] – Pr [ AB ]. ( note that Equation 1.6 is a useful option to this theorem. In some applications, Equation 1.6 is easier to use than equality 1.5. ) • The follow theorem is known as the inclusion-exclusion formula. It is a generalization of the previous theorem. logically, the theorem fits in this section, but it is not a “ small ” theorem given the complexity of its instruction. Theorem 1.6 ( Inclusion-Exclusion Formula ) : For any events AI > Az, …, An, north

n i-l

Pr [ A 1 uAz uracil ··· uAn ] = I : Pr [ A ; j- LLPr [ A ; AJ ] i= liter

iodine = lj = fifty

north i-l j-1

+ LLLPr [ A ; AJAk ] – ··· ±Pr [ A ! Az···An ] i= lj= liter k= lambert

This theorem is actually easier to country in words than in symbols : the probability of the union is the total of all the individual probabilities minus the union of all pair probabilities plus the summarize of all ternary probabilities, etc., until the last term, which is the probability of the intersection of all events. proofread : This proof is by trigger. Induction proof consist of two major steps : the footing gradation and the inductive step. An doctrine of analogy is climbing a ladder. To climb a ladder, one must first get on the ladder. This is the basis measure. once on, one must be able to climb from the ( newton -1 ) -st step to the nth step. This is the inductive step. One gets on the first gradation, then climbs to the second, then climbs to the third, and so on, vitamin a high as one desires. The basis step is Theorem 1.5. Let A = A 1 and B = A 2, then Pr [ A 1 uracil A2 ] = Pr [ A 1 ] + Pr [ A 2 ] -Pr [ A 1 A2 ].

14

CHAPTER 1 PROBABILI1Y BASICS

The inductive step assumes the theorem is true for n – 1 events. Given the theorem is true for newton – 1 events, one must show that it is then true for north events. The controversy is as follows ( though we skip some of the more long-winded steps ). Let A= A 1 uracil A2 u .. · uracil An-I and B =An. then, Theorem 1.5 yields Pr [ A 1 uA 2 u .. ·UAn ] =Pr [ AuB ] = Pr [ Aj + Pr [ B ] – Pr [ AB ] = Pr [ A 1 uA2 u ··· UAn-d + Pr [ An ] – Pr [ CA1 uA2 u .. · uAn-llAn ] This last equation contains everything needed. The first base term is expanded using the inclusion-exclusion theorem ( which is true by the inductive assumption ). similarly, the last term is expanded the lapp direction. ultimately, the terms are regrouped into the design in the theorem ‘s affirmation. But we skip these boring steps. • While the inclusion-exclusion theorem is true ( it is a theorem after all ), it may be long-winded to keep chase of which outcomes are in which intersecting events. One option to the inclusion-exclusion theorem is to write the complicated event as a union of disassociate events. EXAMPLE 1.6

Consider the union Au B u C. Write it as follows : A uB uracil C =ABC uABCu ABC uracil ABC uABC uABC uA BC

Pr [ A uranium B u C ] = Pr [ ABC ] + Pr [ ABC ] + Pr [ ABC ] + Pr [ ABC ] +Pr [ ABC ] +Pr [ ABC ] +Pr [ ABC ] In some problems, the probabilities of these events are easier to calculate than those in the inclusion-exclusion formula. besides, note that it may be easier to use DeMorgan ‘s inaugural law ( Eq. 1.1 ) and Theorem 1.2 : Pr [ AuBuC ] = 1- Pr [ AuB uC ] = 1-Pr [ ABC ] EXAMPLE 1.7

Let us look at the respective ways we can compute Pr [ A u B ]. For the experiment, select one wag from a well-shuffled deck. Each calling card is equally likely, with probability equal to 1/52. Let A = { card is a O } and B = { card is a q }. The event Au B is the event the card is a rhombus or is a Queen ( Q ). There are 16 cards in this event ( 13 diamonds plus 3 extra Q ‘s ). The aboveboard solution is 16 Pr [ A u B ] = Pr [ one of 16 cards selected ] = 52

Using Theorem 1.5, AB = { card is a-I-p I-v

Note that the “ if ” statement above compares one number to another number ( so it is either constantly true or constantly fake, depending on the values of phosphorus, £, and vanadium ). In the common especial case when£= volt, the convention simplifies further to set iCY= 1 ) =I ifp > £.In other words, if the most park route to the observation Y =I is across the top, set iCY= 1 ) =I ; if the most common route to Y = I starts at X = 0 and goes up, set arctic = 1 ) = 0. This example illustrates an important problem in statistical inference : hypothesis testing. In this case, there are two hypotheses. The first is that X = I, and the second base is that X = 0. The receiver receives Y and must decide which hypothesis is truthful. Often Bayes theorem and the LTP are all-important tools in calculating the decision rule. Hypothesis test is discussed further in Section 2.5 and in Chapter I2.

38

CHAPTER 2 CONDITIONAL PROBABILI1Y

2.5 EXAMPLE : DRUG TESTING Many employers test potential employees for illegal drugs. Fail the test, and the applicant does not get the subcontract. unfortunately, no test, whether for drugs, pregnancy, disease, incoming missiles, or anything else, is arrant. Some drug users will pass the test, and some nonusers will fail. Therefore, some drug users will in fact be hired, while other absolutely innocent people will be refused a job. In this section, we will analyze this effect using conjectural tests. This text is not about blood or urine chemistry. We will not engage in how blood tests actually work, or in actual failure rates, but we will see how the general problem arises. Assume the test, T, outputs one of two values. T = I indicates the person is a drug drug user, while T = 0 indicates the person is not a drug user. Let U = I if the person is a drug user and U = 0 if the person is not a drug exploiter. notice that T is the ( possibly faulty ) result of the examination and that U is the decline measure. The performance of a examination is defined by conditional probabilities. The faithlessly positive ( or assumed dismay ) rate is the probability of the test indicating the person is a drug drug user when in fact that person is not, or Pr [ T =II U = 0 ]. The false negative ( or miss ) rate is the probability the test indicates the person is not a drug drug user when in fact that person is, or Pr [ T = 0 I U = I ]. A successful result is either a dependable positive or a true minus. See table 2.I, downstairs. TABLE 2.1 A bare confusion matrix showing the relationships between true and false positives and negatives. U=l

T=l T=O

U=O

True Positive

False Positive

False Negative

True Negative

immediately, a miss is unfortunate for the employer ( it is arguable how deleterious a drug user might be in most jobs ), but a assumed alarm can be devastating to the applicant. The applicant is not hired, and his or her repute may be ruined. Most tests try to keep the false positive rate low. Pr [ U = I ] is known as the a priori probability that a person is a drug user. Pr [ U = II T = I ] and Pr [ U =II T = 0 ] are known as thea posterioriprobabilitiesoftheperson being a drug exploiter. For a perfect quiz, Pr [ T = II U = I ] = I and Pr [ T = II U = 0 ] = 0. however, tests are not perfect. Let us assume the false positive pace is £, or Pr [ T = II U = 0 ] = £ ; the false negative rate is five, or Pr [ T = 0 I U = I ] = volt ; and the a priori probability that a given person is a drug drug user is Pr [ U =I ] = p. For a well test, both£ and vanadium are fairly small, much 0.05 or less. however, even 0.05 ( i, test is 95 % right ) may not be estimable enough if one is among those falsely accused. One significant interview is what is the probability that a person is not a drug exploiter given that he or she failed the test, or Pr [ U = 0 I T = I ]. Using Bayes theorem, Pr [ U = 0 I T = I ] = _Pr .. : … [ T_=___, I1 –, U_=_O .. : … ] P, — roentgen ” – [ U_=_O …. :. ] Pr [ T= I ]

2.5 Example : Drug Testing

39

u = O ] Pr [ U = o ] Pr [ T =II uranium = O ] Pr [ U = o ] +Pr [ T= II u = I ] Pr [ U =I ] Pr [ T =II

£ ( 1- phosphorus ) £ ( 1- p )

+ ( l – volt ) p

This formulation is arduous to appreciate, so let us put in some numbers. First, assume that the false alarm rate equals the miss rate, or£ = v ( much tests that minimize the total number of errors have this place ). In Figure 2.3, we plot Pr [ U = 0 IT = I ] against Pr [ U = I ] for two values offalse convinced and false negative rates. The first curve is for£ = volt = 0.05, and the irregular crook is for£ = vanadium = 0.20. One can see that when phosphorus.

1

1 / ‘ >. / ‘ >. 1 2 1. ? ‘ > ./ ‘ > ./ >. 1 3 3 1. ? > ./ ‘ > ./ ‘ > .. ? >. 1

1

4

6

4

1

“ ‘ ” ‘ ” ‘ ” ‘ 5 10 “ ‘ ” ‘ 10 “ ‘ ” ‘ ” ‘ ” ‘ 5 1

FIGURE 3.6 Pascal ‘s triangle for n up to 5. Each number is the union of the two numbers above it. The rows are the binomial coefficients for a given rate of nitrogen.

3.4 THE BINOMIAL THEOREM The binomial coefficients are so named because they are central to the binomial theorem. Theorem 3.1 ( Binomial Theorem ) : For integer north ~ 0, ( 3.7 )

The binomial theorem is normally proved recursively. The basis, nitrogen = 0, is trivial : the theorem reduces to 1 = 1. The recursive step assumes the theorem is true for n – 1 and uses this to show the theorem is dependable for n. The proof starts like this : ( a+ bel ) newton = ( a+ barn ) ( a + b ) n-l

( 3.8 )

3.5 Multinomial Coefficient and Theorem

55

immediately, substitute Equation ( 3.7 ) into Equation ( 3.8 ) :

fluorine ( n ) akbn-k = ( a+ b ) I : ( n -1 ) akbn-1-k k k

k=O

( 3.9 )

k=O

The perch of the validation is to use Equation ( 3.6 ) and rearrange the right-hand side to look like the leave. The binomial theorem gives some dim-witted properties of binomial coefficients :

0 = { ( -1 ) + 1 ) ” =

2 ” = ( 1 + 1 ) ” =

~ ( ~ ) ( -1h n-k = ( ~ ) _ ( ~ ) + ( ; ) _ … + ( -1 ) ” ( : )

~ ( ~ ) planck’s constant

n-k =

~ ( ~ ) = ( ~ ) + ( ~ ) + ( ; ) + … + ( : )

( 3.10 )

( 3.11 )

We can check Equations ( 3.10 ) and ( 3.11 ) using the beginning few rows of Pascal ‘s triangulum :

0=1-1=1-2+1=1-3+3-1=1-4+6-4+1 2 1 =2=1+1 22 = 4 = 1 +2+ 1 23 = 8 = 1 + 3 + 3 + 1 2 4 = 16 = 1 + 4 + 6 + 4 + 1 Since all the binomial coefficients are nonnegative, Equation ( 3.11 ) gives a limit on the binomial coefficients : for all k = 0,1, …, nitrogen

( 3.12 )

The binomial theorem is central to the binomial distribution, the subject of chapter 6.

3.5 MULTINOMIAL COEFFICIENT AND THEOREM The binomial coefficient tells us the number of ways of dividing nitrogen distinct objects into two piles. What if we want to divide the n objects into more than two piles ? This is the polynomial coefficient. In this section, we introduce the polynomial coefficient and show how it helps analyze versatile card games. How many ways can n distinct objects be divided into three piles with potassium 1 in the beginning pile, k 2 in the irregular, and k 3 in the third with kilobyte 1 + potassium 2 + k 3 = newton ? This is a polynomial coefficient and is denoted as follows :

We develop this count in two stages. First, the newton objects are divided into two piles, the first with kelvin 1 objects and the second with n- thousand 1 = k 2 + k 3 objects. then, the second batch is divided

56

CHAPTER 3 A LllTLE COMBINATORICS

into two piles with k2 in one pile and k3 in the other. The number of ways of doing this is the product of the two binomial coefficients :

If there are more than three piles, the formula extends simply : ( 3.13 ) The binomial coefficient can be written in this phase a well :

The binomial theorem is extended to the polynomial theorem :

Theorem 3.2 ( Multinomial Theorem ) : ( 3.14 )

where the summation is taken over all values of qi ~ 0 such that k 1 + k 2 + · · · + km = n. The union seems jumble, but in truth is not excessively badly. Consider ( a+ boron + degree centigrade ) 2 :

+ (

2

1,1,0

) alblco+ (

2

1,0,1

) albOcl+ (

= a2 + b2 + c2 + 2ab + 2ac + 2bc

Let a 1 = a 2 =···=am= l. then,

For exemplar,

2

0,1,1

) aobJcl

3.6 The Birthday Paradox and Message Authentication 57

=

( 2, ~, oxygen ) + ( o, ~, oxygen ) + ( o, ~,2 ) + C, ~, o ) + C, ~,1 ) + ( o, ~,1 )

=1+1+1+2+2+2 ( 3.15 )

=9

In the game of bridge, an average 52-card deck of cards is dealt into four hands, each with 13 cards. The issue of ways this can be done is 52 ) = 5.4 ( 13,13,13,13

X

1028

3.6 THE BIRTHDAY PARADOX AND MESSAGE AUTHENTICATION A classic probability paradox is the birthday problem : How large must k be for a group of kelvin people to be likely to have at least two people with the same birthday ? In this section, we solve this trouble and show how it relates to the problem of transmitting secure messages. We assume three things : • Years have 365 days. Leap years have 366, but they occur only once every four years. • Birthdays are independent. Any one person ‘s birthday does not affect anyone else ‘s. In particular, we assume the group of people includes no twins, or triplets, etc. • All days are evenly likely to be birthdays. This is not quite true, but the presumption spreads out the birthdays and minimizes the possibility of park birthdays. Births ( in the United States anyhow ) are about 10 % higher in the late summer than in winter. 1 There are fewer births on weekends and holidays. Ironically, “ Labor Day ” ( in September ) has a relatively low birthrate. We besides take “ likely ” to mean a probability of 0.5 or more. How large must k be for a group of kilobyte people to have a probability of at least 0.5 of having at least two with the same birthday ? Let Ak be the event that at least two people out of k have the like birthday. This consequence is complicated. multiple pairs of people could have common birthdays, triples of people, etc. The complemental event-that no two people have the same birthday-is much simple. Let q ( thousand ) = Pr [ Ak ] = Pr [ no park birthday with k people ]. then, q ( fifty ) = 1 since one person can not have a copulate. What about q ( 2 ) ? The first person ‘s birthday is one of 365 days. The second person ‘s birthday differs from the first ‘s if it is one of the remaining 364 days. The probability of this happen is 364/365 : q ( 2 ) 1

364 365 364 364 = q ( fifty ). 365 = 365. 365 = 365

National Vital Statistics Reports, vol. 55, no. 1 ( September 2006 ). hypertext transfer protocol : //www.cdc.gov

58

CHAPTER 3 A LllTLE COMBINATORICS

The one-third person does not match either of the foremost two with probability ( 363/365 ) : 363 365 364 363 q ( 3 ) = q ( 2 ). 365 = 365. 365. 365 We can continue this process and get a recursive formula for q ( thousand ) : q ( potassium ) = q ( k- 1 ).

365 + 1 – potassium 365

365. 364 … ( 366- kilobyte ) 365k

( 365lk 365k

( 3.16 )

Note that the probability is the number of ways of making a permutation of potassium objects ( days ) taken from normality = 365 divided by the number of ways of making an ordered with successor choice of kilobyte objects from 365. How large does k have to be indeed that q ( thousand ) < 0.5 ? A little arithmetic shows that k = 23 suffices since q ( 22 ) = 0.524 and q ( 23 ) = 0.493. The q ( k ) sequence is shown in Figure 3.7. Pr [ no pair ] 0.5 0 5 10 15 20 25 30 k FIGURE 3.7 The probability of no pair of birthdays fork people in a year with 365 days. Why is such a small number of people sufficient ? Most people intuitively assume the number would be much closer to 182 = 365/2. To see why a small number of people is sufficient, first we generalize and let n denote the count of days in a year : q ( kilobyte ) = ( nlk nk nitrogen ( n- 1 ) ( n- 2 ) · · · ( n- kilobyte + 1 ) nnn···n normality n-1 n-2 n-k+1 normality normality normality nitrogen = ( 1-~ ) ( 1-~ ) ( 1-~ ) ··· ( 1-k:1 ) The multiplication of all these terms is awkward to manipulate. however, taking logs converts those multiplications to additions : log ( q ( kelvin ) ) = log ( 1- ~ ) +log ( 1- ~ ) +log ( 1- ~ ) + … +log ( 1- potassium : 1 ) 3.6 The Birthday Paradox and Message Authentication 59 now, use an authoritative fact about logs : when xis little, log ( lambert +x ) for “ – ‘ x X ” – ‘ 0 ( 3.17 ) This is shown in Figure 3.8. ( We use this approximation respective more times in former chapters. ) y=x y = logarithm ( fifty +x ) 0.693 -0.5 X -0.693 FIGURE 3.8 Plot showing log ( 1 +x ) “ ‘ X when xis small. Since k « normality, we can use this approximation : 0 1 2 k- 1 1 k-l kelvin ( k -1 ) -log ( qCkl ) ” – ‘ -+-+-+···+- =-. [ l= – newton n n newton newton l=O 2n Finally, invert the logarithm : potassium ( k -1 ) ) q ( kelvin ) ” -‘exp ( -~ ( 3.18 ) When kilobyte = 23 and nitrogen = 365, the estimate evaluates to 0.50 ( actually 0.499998 ), close to the actual 0.493. alternatively, we can set q ( kilobyte ) = 0.5 and solve for k : 0.5 = q ( k ) -k ( k-1 ) log ( 0.5 ) = -0.693 “ ‘ – -n2 ( 3.19 ) Thus, k ( k -1 ) 12 “ ‘ 0.693n. For n = 365, k ( k -1 ) 12 “ ‘253. Hence, k = 23. The intuition in the birthday trouble should be that kilobyte people define kilobyte ( k -1 ) /2 pairs of people and, at most, about 0.693n pairs can have different birthdays. 60 CHAPTER 3 A LllTLE COMBINATORICS Comment 3.3 : We use “ log ” or “ log. ” to denote the natural logarithm. Some textbooks and many calculators use “ In ” rather. That is, ify = e ”, then x = log ( y ). later, we need logarithm 10 and log 2. lfy = 10x, then x = log10 ( y ), and ify = 2x, then x = log2 ( y ). See besides Comment 5.9. EXAMPLE3.3 The birthday problem is a commodious exemplar to introduce simulation, a computational procedure that mimics the randomness in real systems. The python command trial=randint ( newton, size=k ) returns a vector of k random integers. All integers between 0 and n- 1 are evenly likely. These are birthdays. The command unique ( test ) returns a vector with the unique elements of trial. If the size of the unique vector equals k, then no birthdays are repeated. Sample python code follows : kilobyte, n = 23, 365 numtrials = 10000 reckon = 0 for one in roll ( numtrials ) : test = randint ( north, size=k ) count+= ( alone ( trial ) .size k ) phat = count/numtrials std = 1.0/sqrt ( 2 * numtrials ) mark phat, phat-1.96 * doctor of sacred theology, phat+1.96 * std A sequence of 10 trials with north = 8 and k = 3looks like this : [ 0 7 6 ], [ 3 6 6 ], [ 52 0 ], [ 0 4 4 ], [ 56 0 ], [ 2 56 ], [ 6 2 5 ], [ 1 7 0 ], [ 2 2 7 ], [ 6 0 3 ]. We see the second, fourth, and one-ninth trials have repeated birthdays ; the other seven do not. The probability of no duplicate birthday is estimated asp= 7110 = 0.7. The demand probability is phosphorus = 1· ( 718 ) · ( 618 ) = 0.66. The assurance interval is an estimate of the range of values in which we expect to find the correct value. If n is big enough, we expect the decline value to be in the interval $ – 1. 96 I .. /iN, p+ 1. 96 I v’2NJ approximately 9 5 % of the prison term ( where N is the number of trials ). In 10,000 trials with nitrogen = 365 and k = 23, we observe 4868 trials with no recur birthdays, giving an estimate p = 4868110000 = 0.487, which is reasonably close to the claim value p = 0.493. The confidence interval is ( 0.473,0.501 ), which does indeed contain the adjust value. More information on probability estimates and assurance intervals can be found in Sections 5.6, 6.6, 8.9, 9.7.3, 10.1, 10.6, 10.7, and 10.8. What does the birthday problem have to do with fasten communications ? When security is significant, users often attach a message authentication code ( MAC ) to a message. The MAC is a b-bit signature with the following properties : • The MAC is computed from the message and a password shared by transmitter and liquidator. • Two different messages should with high probability have different MACs. 3.7 Hypergeometric Probabilities and Card Games 61 • The MAC algorithm should be one-way. It should be relatively easy to compute a MAC from a message but unmanageable to compute a message that has a finical MAC. Various MAC algorithm exist. These algorithms use cryptanalytic primitives. distinctive values of unsheathed 128, 196, and 256. The probability that two different messages have the lapp MAC is little, approximately 2-b. When Alice wants to send a message ( for example, a legal contract ) to Bob, Alice computes a MAC and sends it along with the message. Bob receives both and computes a MAC from the receive message. He then compares the calculate MAC to the received one. If they are the same, he concludes the get message is the same as the one transport. however, if the MACs disagree, he rejects the receive message. The sequence of steps looks like this : 1. Alice and Bob share a privy key, K. 2. Alice has a message, M. She computes a MACh= H ( M, K ), where His a MAC function. 3. Alice transmits M and heat content to Bob. 4. Bob receives M ‘ and heat content ‘. These possibly differ from M and heat content due to channel noise or an attack by an enemy. 5. Bob computes henry ” = H ( M ‘, K ) from the received message and the secret key. 6. If heat content ” = h ‘, Bob assumes M ‘ = M. If not, Bob rejects M ‘. If, however, Alice is dishonest, she may try to deceive Bob with a birthday attack. She computes a bombastic number thousand of messages in trying to find two with the same MAC. Of the two with the lapp MAC, one contract is friendly to Bob, and one chess Bob. Alice sends the favorable contract to Bob, and Bob approves it. sometime late, Alice produces the adulterous contract and falsely accuses Bob of reneging. She argues that Bob approved the cheat sign because the MAC matches the one he approved. How many contracts must Alice produce before she finds two that match ? The estimate ( Equation 3.18 ) answers this interview. With north = 2b and k large, the approximation indicates ( ignoring multiplicative constants that are close up to 1 ) : The birthday attack is much more effective than trying to match a specific MAC. For b-complex vitamin = 128, Alice would have to create about n/2 = 1.7 x 10 38 contracts to match a specific MAC. however, to find any two that equal, she has to create about kilobyte ” ‘ yn = 1.8 ten 10 19 contracts, a factor of 10 19 fast. Making birthday attacks unmanageable is one authoritative rationality why bacillus is normally chosen to be relatively boastfully. 3.7 HYPERGEOMETRIC PROBABILITIES AND CARD GAMES Hypergeometric probabilities use binomial coefficients to calculate probabilities of experiments that involve selecting items from respective groups. In this part, we develop the hypergeometric probabilities and show how they are used to analyze versatile card games. 62 CHAPTER 3 A LllTLE COMBINATORICS Consider the tax of making an ungraded choice without successor of k 1 items from a group of n 1 items, selecting k2 items from another group of n2 items, etc., to km items from nanometer items. The number of ways this can be done is the product of molarity binomials : ( 3.20 ) Consider an disordered without replacement selection of kelvin items from n. Let the n items be divided into groups with nl > n2, …, new mexico in each. similarly, divide the choice as kl > k2, …, km. If all selections are evenly likely, then the probability of this particular choice is a hypergeometric probability :

( 3.21 )

The probability is the number of ways of making this particular selection divided by the number of ways of making any excerpt. EXAMPLE3.4

Consider a box with four bolshevik marbles and five blue marbles. The box is shaken, and person reaches in and blindly selects six marbles. The probability of two loss marbles and four bluing ones is

( ~ ) ( ! ) 6·5 5 Pr 2 loss and 4 gloomy marbles = -9 ( = ( 9 .8. 7 ) = ) 14 6 3·2·1

[

liter

For the remainder of this section, we illustrate hypergeometric probabilities with some poker hands. In most versions of poker, each musician must make his or her best five-card hand from some number of cards distribute to the player and some count of community cards. For exemplar, in Seven-Card Stud, each player is conduct seven cards, and there are no community cards. In Texas Hold ’em, each actor is deal two cards, and there are five community cards. The poker hands are listed here from best to worst ( assuming no hazardous cards are being used ) :

Straight Flush The five best cards are all of the same suit ( a flush ) and in rank and file regulate ( a straight ). For example, 5

liter

Pr [ three Qs, two 8s =

( i ) ( i ) 4 .6 -6 es2 ) = 2598960 = 9.2 x 10

Comment 3.4 : Sometimes it is easier to remember this formula if it is written as

The mind is that the 52 cards are divided into three groups : Q ‘s, 8 ‘s, and Others. Select three Q ‘s, two 8 ‘s, and zero Others. The mnemonic is that 3 + 2 + 0 = 5 and 4+4+44=52.

The numeral of different full houses is ( 13h = 156 ( 13 ranks for the triple and 12 ranks for the copulate ). consequently, the probability of getting any full theater is Pr [ any Full Housel = 156. 9.2

X

10- 6 = 1.4 X 10- 3

“ ‘ 1 in 700 hands If the player chooses his or her best five-card hand from seven cards, the probability of a full house is much higher ( about 18 times higher ). however, the calculation is much more involve. First, list the different ways of getting a fully house in seven cards. The most obvious way is to get three cards of one rate, two of another, one of a third gear membership, and one of a fourth. The second gear means is get three cards in one rate, two in a second absolute, and two in a third rank. ( This fully house is the triple and the higher-ranked pair. ) The third base way is to get three cards of one rank, three cards of a second rank, and one card in a one-third rank. ( This full house is the higher-ranked triple and a pair from the lower triple. ) We use the shorthand 3211, 322, and 331 to describe these three possibilities.

64

CHAPTER 3 A LllTLE COMBINATORICS

Second, calculate the number ways of getting a broad house. Let nitrogen ( 3211 ) be the phone number of ways of getting a specific 3211 full house, N ( 3211 ) the number of ways of getting any 3211 full house, and similarly for 322 and 331 full houses. Since there are four cards in each social station,

n ( 3211 ) =

( ! ) ( ~ ) ( ; ) ( ; ) = 4 ·6 ·4 ·4 = 384

newton ( 322 ) =

( : ) ( ~ ) ( ~ ) = 4 ·6 ·6 = 144

n ( 331 ) =

( ! ) ( ! ) ( ; )

= 4 ·4 ·4 = 64

Consider a 3211 wide house. There are 13 ways of selecting the rank and file for the triple, 12 ways of selecting the rank for the pair ( since one membership is unavailable ), and C21 ) =55 ways of selecting the two ranks for the single cards. therefore,

N ( 3211 ) = 13 ·12 · (

~1 ) · nitrogen ( 3211 ) = 8580 · 384 = 3,294,720

Comment 3.5 : The count of ways of selecting the four ranks is not ( ~ ) since the ranks are not equivalent. That is, three Q ‘s and two K ‘s is different from two Q ‘s and three K ‘s. The two extra ranks are equivalent, however. In other words, three Q ‘s, two K ‘s, one A, and oneJ is the lapp as three Q ‘s, two K ‘s, oneJ, and one A.

For a 322 full house, there are 13 ways of selecting the social station of the treble and ways of selecting the ranks of the pairs :

N ( 322 ) = 13 ·

C

2 ) 2

= 66

c ; ) ·

newton ( 322 ) = 858 · 144 = 123,552

Finally, for a 331 full house, there are C { ) = 78 ways of selecting the ranks of the two triples and 11 ways of selecting the rank of the unmarried card :

N ( 331 )

=en

·11· n ( 331 ) = 858 · 64 = 54,912

now, let N denote the sum number of full houses : N = N ( 3211 ) + N ( 322 ) + N ( 331 ) = 3,294, 720 + 123,552 + 54,912 = 3,473,184

3.7 Hypergeometric Probabilities and Card Games 65

Third, the probability of a full house is N divided by the number of ways of selecting seven cards from 52 : Pr [ fu ll planck’s constant ouse lambert =

3,473,183

(

52

)

3,473,183

=

133,784,560

= 0.02596

7 “ ‘ 1 in 38.5 hands

So, getting a full house in a seven -card poker game is about 18 times a likely as getting one in a five-card game. incidentally, the conditional probability of a 3211 full house given one has a full house is

[

fifty

I

Pr 3211 full house any wax house =

3,294,720 3,473,184

= 0.95

About 19 of every 20 wide houses are the 3211 variety show.

gloss 3.6 : We did not have to consider the ( true improbable ) pass of three

cards in one rate and four cards in another absolute. This hand is not a broad house. It is four of a kind, and four of a kind beats a full house. See besides Problem 3.30.

In Texas Hold ’em, each actor is initially deal two cards. The best starting hand is two Aces, the probability of which is 8

( ~ ) ( ~ ) Pr [ two Aces ] =

( ~ )

e22 ) = e22 ) =

6 1 1326 = 221

The worst starting hand is considered to be the “ 7 2 off-suit ; ‘ or a 7 and a 2 in two different suits. ( This hand is improbable to make a straightaway or a bloom, and any pairs it might make would likely lose to early, higher, pairs. ) There are four ways to choose the become for the 7 and three ways to choose the suit for the 2 ( since the suits must differ ), giving a probability of

“ ff “ ] 4. 3 2 Pr [ 7 2 oxygen -suit = ( ) =

522

221

In a barbarous irony, getting the worst potential starting bridge player is twice a likely as getting the best start pass.

Comment 3.7 : probability analysis can answer many poker questions but not all,

including some truly important ones. Because of the bet sequence and the fact that a actor ‘s cards are hidden from the other players, simple questions like “ Do I have a winning hired hand ? ” much can not be answered with childlike probabilistic analysis. In most cases, the answer depends on the playing decisions made by the early players at the postpone.

66

CHAPTER 3 A LllTLE COMBINATORICS

In many experiments, the outcomes are evenly likely. The probability of an consequence then is just the number of outcomes in the event divided by the numeral of potential outcomes. accordingly, we need to know how to count the outcomes. Consider a choice of k items from normality items. The choice is ordered if the order of the choice matters ( i.e., if bachelor of arts differs from barium ). Otherwise, it is ungraded. The excerpt is with replacement if each item can be selected multiple times ( i.e., if aa is possible ). Otherwise, the excerpt is without substitute. The four possibilities for a excerpt of potassium items from n are considered below : • Ordered With replacement : The count of selections is nk. • Ordered Without successor : These are called permutations. The number of permutations is ( nlk = normality ! l ( n- kilobyte ) !, where nitrogen ! = north ( n -1 ) ( n- 2 ) · · · 2 ·1, 0 ! = 1, and north ! = 0 for nitrogen < 0. normality ! is pronounced “ newton factorial : ‘ • Unordered Without surrogate : These are combinations. The number of combinations of kilobyte items selected from normality is G ). This count is besides known as a binomial coefficient : ( north ) k – north ! ( nitrogen ) ( n-k ) ! kilobyte ! – n-k • Unordered With Replacement : The number of selections is ( n+z- 1 ). The binomial coefficients are central to the binomial theorem : The binomial coefficient can be generalized to the polynomial coefficient : The binomial theorem is extended to the polynomial theorem : where the summation is taken over all values of k ; ~ 0 such that k 1 + k 2 + · · · + km = n. Consider an disordered selection without substitution of k 1 items from newton 1, k 2 from newton 2, through kilometer from nanometer. The number of selections is Problems 67 The hypergeometric probabilities measure the likelihood of making this excerpt ( assuming all selections are equally likely ) : where nitrogen = n1 + n2 + · · · + new mexico and k = k1 + k2 + · · · + kilometer. The birthday problem is a classic probability paradox : How large must k be for a group of kelvin people to have two with the lapp birthday ? The surprising suffice is a group of 23 is more likely than not to have at least two people with the like birthday. The general answer for k people in a year of newton days is ( new hampshire ( k ( k-1 ) ) Pr [ no pair ] = k “ ‘exp – – – newton 2n Setting this probability equal to 0.5 and solving result in kelvin ( k-1 ) – -2 “ ‘ 0.693n If normality is boastfully, k “ ‘ yn. Combinatorics can get crafty. possibly the best advice is to check any formulas using small problems for which the answer is known. Simplify the problem, count the outcomes, and make indisputable the count agrees with the formulas. 3.1 List the sequences of three 1’sand two O ‘s. 3.2 Show ( 3.22 ) This rule comes in handy late when discussing binomial probabilities. 3.3 Prove the computational sequence in Figure 3.5 can be done using only integers. 3.4 Prove Equation ( 3.6 ) algebraically using Equation ( 3.4 ). 3.5 Write a calculator function to calculate the nth row of Pascal ‘s triangle iteratively from the ( newton -1 ) -st row. ( The calculation step, using vector operations, can be done in one line. ) 3.6 The sum of the integers 1 + 2 + · · · + n = north ( nitrogen + 1 ) /2 is a binomial coefficient. a. Which binomial coefficient is the sum ? b. Using Pascal ‘s triangle, can you think of an argument why this is therefore ? 3.7 Complete the missing steps in the proof of the binomial theorem, starting with Equation ( 3.9 ). 68 CHAPTER 3 A LllTLE COMBINATORICS 3.8 Using the polynomial theorem as in Equation ( 3.15 ) : a. Expand 42 = ( 1 + 1 + 1 + 1 ) 2. barn. Expand 42 = ( 2 + 1 + 1 ) 2. 3.9 Write a function to compute the polynomial coefficient for an arbitrary number of piles with kilobyte 1 in the first base, k 2 in the second, etc. Demonstrate your platform works by showing it gets the correct answer on respective concern examples. 3.10 If Alice can generate a billion contracts per moment, how long will it take her to mount a birthday attack for b = 32, 64, 128, and 256 ? Express your answer in human units ( for example, years, days, hours, or seconds ) as appropriate. 3.11 Assume Alice works for a well-funded organization that can purchase a million computers, each able of generating a billion contracts per second. How long will a birthday attack take forb= 32, 64, 128, and 256 ? 3.12 For a class of 366 days, how many people are required to make it likely that a pair of birthdays exist ? 3.13 A year on Mars is 687 ( Earth ) days. If we lived on Mars, how many people would need to be in the room for it to be probably that at least two of them have the lapp birthday ? 3.14 A class on Mars is 669 ( Mars ) days. If we were born on and lived on Mars, how many people would need to be in a room for it to be likely at least two of them have the lapp birthday ? 3.15 Factorials can get big, indeed big that computing them can be debatable. In your answers below, clearly specify what computing platform you are using and how you are doing your calculation. a. What is the largest value of n for which you can compute north ! using integers ? barn. What is the largest value of nitrogen for which you can compute normality ! using floating point numbers ? c. How should you compute log ( normality ! ) for large n ? Give a short plan that computes log ( nitrogen ! ). What is log ( n ! ) for north =50, 100,200 ? 3.16 Stirling ‘s formula gives an approximation for north ! : ( 3.23 ) The approximation in Equation ( 3.23 ) is asymptotic. The proportion between the two terms goes to 1 as n- oo : Plot north ! and Stirling ‘s formula on a log-log plot for north = 1,3, 10,30, 100,300,1000. notice that n ! can get huge. You probably can not compute north ! and then log ( normality ! ). It is better to compute log ( north ! ) immediately ( see Problem 3.15 ). 3.17 Show the follow : Problems 69 3.18 The approximation log ( fifty +x ) “ ‘ x ( Equation 3.17 ) is actually an inequality : log ( fifty +x ) ~ x for all x > -1

( 3.24 )

as careful examination of Figure 3.8 shows. Repeat the derivation of the approximation to q ( thousand ), but this fourth dimension, develop an inequality. a. What is that inequality ? b. Evaluate the claim probability and the inequality for several values of kilobyte and north to demonstrate the inequality. 3.19

The log inequality above ( Equation 3.24 ) is sometimes written differently. a. Show the log inequality ( Equation 3.24 ) can be written as y-1

~

log ( yttrium )

( 3.25 )

for y > O

b. Recreate Figure 3.8 using Equation ( 3.25 ).

= x -log ( fifty +x ) versus ten.

3.20

Plotj ( x )

3.21

One room to prove the log inequality ( Equation 3.24 ) is to minimize joule ( x )

=x

– log

( 1 +x ).

a. Use calculus to show that x

= 0 is a possible minimal ofj ( x ).

b. show that x = 0 is a minimal ( not a maximal ) by evaluatingj ( O ) and any other measure, such as joule ( lambert ). 3.22

Solve a different birthday problem : a. How many other people are required for it to be probably that person has the lapp birthday as you do ? barn. Why is this number so much larger than 23 ? c. Why is this number larger than 365/2

= 182 ?

3.23

In most modernize countries, more children are born on Tuesdays than any other day. ( Wednesdays, Thursdays, and to a lesser extent, Fridays are close to Tuesdays, while Saturdays and Sundays are a lot less so. ) Why ?

3.24

Using the code in Example 4. 7, or a dim-witted rewrite in Matlab orr, estimate the probabilities of no common birthday for the adopt : a. n = 365 and k = 1,2, … ,30. Plot your answers and the calculate probabilities on the same plot ( see Figure 3.7 ). b. normality = 687 ( Mars ) and k = 5,10,15,20,25,30,35,40,45,50. Plot your answers and the calculate probabilities on the same plot.

3.25

Sometimes “ unique ” identifiers are determined by keeping the least significant digits of a boastfully count. For example, patents in the United States are numbered with seven-digit numbers ( soon they will need eight digits ). Lawyers refer to patents by their final three digits ; for example, Edison ‘s light bulb patent is No. 223,898. Lawyers might refer to it as the “ 898 ” patent. A patent violation case might involve kelvin patents. Use the estimate in Equation ( 3.18 ) to calculate the probability of a list collision fork= 2,3, … ,10 patents.

70

CHAPTER 3 A LllTLE COMBINATORICS

3.26

Continuing Problem 3.25, organizations once normally used the last four digits of a person ‘s Social Security number as a unique identifier. ( This is not done indeed much nowadays for fear of identity larceny. ) a. Use the estimate in Equation ( 3.18 ) to calculate the probability of an identifier collision fork= 10, 30,100 people. b-complex vitamin. What value of k people gives a probability ofO.S of an identifier collision ?

3.27

The binomial coefficient can be bounded as follows ( crotch > 0 ) :

a. Prove the left inequality. ( The proof is straightforward. ) boron. The proof of the right -hand inequality is crafty. It begins with the follow :

Justify the inequality and the equality. c. The future pace in the proof is to show ( 1 + adam ) ” show this.

~ exn.

Use the inequality log ( lambert + x )

~

x to

d. Finally, let x =kin. Complete the proof. e. Evaluate all three terms for north = 10 and k = 0, 1, …, 10. The right-hand inequality is best when thousand is small. When k is large, use the simple inequality ~ 2 ” from Equation ( 3.12 ).

megabyte

3.28

In the game Yahtzee sold by Hasbro, a player throws five ordinary six-sided die and tries to obtain respective combinations of poker-like hands. In a throw of five dice, compute the be : a. All five dice showing the lapp number. b-complex vitamin. Four die showing one number and the other die showing a unlike numeral. c. Three die showing one number and the other two dice showing the same count ( in poker parlance, a wide firm ). d. Three dice showing one number and the other two die showing different numbers ( three-of-a-kind ). e. Two die showing one phone number, two other dice showing a different number, and the fifth die showing a third gear number ( two pair ). f. Five unlike numbers on the five dice. g. Show the probabilities above kernel to 1.

3.29

Write a short circuit program to simulate the project of five dice. Use the program to simulate a million throws of five dice and estimate the probabilities calculated in Problem 3.28.

3.30

What is the probability of getting three cards in one rank and four cards in another rank in a excerpt of seven cards from a standard deck of 52 cards if all combinations of ? cards from 52 are evenly probable ?

Problems 71 3.31

In Texas Hold ’em, each player initially gets two cards, and then five community cards are dealt in the center of the table. Each player makes his or her best five-card hand from the seven cards. a. How many initial two-card get down hands are there ? bacillus. many starting hands play the lapp manner. For exercise, suits do not matter for a couple of Aces. Each copulate of Aces plays the same ( on average ) as any other pair of Aces. The starting hands can be divided into three playing groups : both cards have the lapp rank ( a “ couple ” ), both cards are in the same suit ( “ suited ” ), or neither ( “ off-suit ” ). How many differently playing starting hands are in each group ? How many differently playing starting hands are there in total ? ( eminence that this answer is only about l/8th the answer to the doubt above. ) c. How many initial two-card hands correspond to each differently playing starting hand in the motion above ? Add them up, and show they total the number in the first wonder.

3.32

You are playing a Texas Hold ’em plot against one other player. Your adversary has a copulate of 9 ‘s ( somehow you know this ). The five community cards have not so far been deal. a. Which two-card hand gives you the best find of winning ? ( trace : the answer is not a copulate of Aces. ) b. If you do not have a pair, which ( non match ) hand gives you the best prospect of winning ?

3.33

In Texas Hold ’em, decide : a. The probability of getting a flush given your first two cards are the like courtship. b. The probability of getting a flush given your first two cards are in different suits. c. Which of the two hands above is more likely to result in a flush ?

3.34

In the game of Blackjack ( besides known as “ 21 “ ), both the player and principal are initially deal two cards. Tens and face cards ( for example, 10 ‘s, Jacks, Queens, and Kings ) count as 10 points, and Aces count as either 1 or 11 points. A “ black flag ” occurs when the first two cards sum to 21 ( counting the Ace as 11 ). What is the probability of a black flag ?

3.35

Three cards are dealt without successor from a well-shuffled, standard 52-card deck. a. Directly calculate Pr [ three of a kind ]. b-complex vitamin. Directly calculate Pr [ a copulate and one other card ]. c. Directly calculate Pr [ three cards of unlike ranks ]. d. Show the three probabilities above total to 1.

3.36

Three cards are dealt without refilling from a well-shuffled, standard 52-card deck. a. Directly calculate Pr [ three of the lapp suit ]. b. Directly calculate Pr [ two of one suit and one of a different suit ]. c. Directly calculate Pr [ three cards in three different suits ]. d. Show the three probabilities above sum to 1.

72

CHAPTER 3 A LllTLE COMBINATORICS

3.37

A typical “ Pick Six ” lottery race by some states and countries works as follows : six-spot total balls are selected ( ungraded, without substitution ) from 49. A musician selects six balls. Calculate the postdate probabilities : a. The player matches precisely three of the six selected balls. b. The player matches precisely four of the six selected balls. c. The player matches precisely five of the six selected balls. d. The player matches all six of the six selected balls.

3.38

“ Powerball ” is a popular bipartite lottery. The rules when this record was written are as follows : In the first region, five total blank balls are selected ( disordered, without replacement ) from 59 ; in the second part, one numbered red ball is selected from 35 red balls ( the selected red ball is the “ Powerball ” ). similarly, the musician has a ticket with five white numbers selected from 1 to 59 and one loss number from 1 to 35. Calculate the pursuit probabilities : a. The actor matches the loss Powerball and none of the white balls. b-complex vitamin. The musician matches the bolshevik Powerball and precisely one of the egg white balls. c. The player matches the red Powerball and precisely two of the white balls. d. The musician matches the red Powerball and all five of the white balls.

3.39

One “ Instant ” lottery works as follows : The player buys a ticket with five winning numbers selected from 1 to 50. For model, the winning numbers might be 4, 12, 13, 26, and 43. The tag has 25 “ trial ” numbers, besides selected from 1 to 50. If any of the trial numbers matches any of the winning numbers, the player wins a trophy. The ticket, possibly surprisingly, does not indicate how the trial numbers are chosen. We do know two things ( about the ticket we purchased ) : first, the 25 test numbers contain no duplicates, and second gear, none of the trial numbers match any of the winning numbers. a. Calculate the probability there are no duplicates for an ungraded selection with refilling of kilobyte = 25 numbers chosen from newton = 50. boron. In your best appraisal, was the excerpt of possibly winning numbers likely done with or without substitute ? c. Calculate the probability a randomly chosen ticket has no winners if all selections of thousand = 25 numbers from normality =50 ( unordered without substitute ) are evenly likely. d. Since the actual ticket had no winners, would you conclude all selections of25 winning numbers were equally likely ?

3.40

Consider selecting two cards from a well-shuffled deck ( disordered and without replacement ). Let K1 denote the event the first wag is a King and K 2 the event the moment card is a King. a. Calculate Pr [ K1 nK2 ] = Pr [ K ! ] Pr [ K2I K ! ]. barn. Compare to the formula of Equation ( 3.21 ) for calculating the same probability.

3.41

Continuing the Problem 3.40, let X denote any poster other than a King. a. Use the LTP to calculate the probability of getting a King and any early card ( i.e., precisely one King ) in an ungraded and without successor selection of two cards from a well-shuffled deck. b. Compare your answer in partially a to Equation ( 3.21 ).

Problems 3.42

73

Show Equation ( 3.21 ) can be written as

( 3.26 )

= 2, derive Equation ( 3.26 ) immediately from first principles.

3.43

For megabyte

3.44

In World War II, Germany used an electromechanical encoding car called Enigma. Enigma was an excellent machine for the time, and breaking its encoding was an important challenge for the Allied countries. The Enigma machine consisted of a switchboard, three ( or, near the goal of the war, four ) rotors, and a reflecting telescope ( and a keyboard and lights, but these do not affect the security of the system ). a. The switchboard consisted of 26 holes ( labeled A to Z ). separate of each day ‘s key was a stipulation of thousand wires that connected one fix to another. For exemplar, one wire might connect B to R, another might connect J to K, and a third might connect A to W How many possible connections can be made with thousand wires, where thousand = 0, 1, …, 13 ? Evaluate the count fork= 10 ( the most common respect used by the Germans ). note that the wires were interchangeable ; that is, a wire from A to B and one from C to D is the same as a wire from C to D and another from angstrom to B. ( trace : for kilobyte = 2 and k = 3, there are 44,850 and 3,453,450 configurations, respectively. ) b. Each rotor consisted of two parts : a wiring matrix from 26 inputs to 26 outputs and a chattel gang. The wire consisted of 26 wires with each electrify connecting one input signal to one output. That is, each input was wired to one and lone one output signal. The wiring of a rotor was fixed at the time of fabricate and not changed subsequently. How many potential rotors were there ? ( The Germans obviously did not manufacture this many different rotors. They merely manufactured a few different rotors, but the Allies did not know which rotors were actually in practice. ) c. For most of the war, Enigma used three different rotors placed left to right in arrange. One rotor was chosen for the first put, another rotor different from the foremost was chosen for the second situation, and a third base rotor different from the first two was chosen for the third gear position. How many potential selections of rotors were there ? ( touch : this is a very large count. ) d. In operation, the three rotors were rotated to a day by day starting configuration. Each rotor could be started in any of 26 positions. How many starting positions were there for the three rotors ? e. The two leftmost rotors had a movable ring that could be placed in any of 26 positions. ( The rightmost rotor rotated one place on each key weigh. The movable rings determined when the middle and left rotors turned. think of the dials in a mechanical car odometer or water system meter. ) How many unlike configurations of the two rings were there ?

74

CHAPTER 3 A LllTLE COMBINATORICS

f. The reflecting telescope was a pay back wire of 13 wires, with each wire connecting a letter to another letter. For exemplar, a electrify might connect C to G. How many ways could the reflector be wired ? ( tip : this is the lapp as the count of switchboard connections with 13 wires. ) g. Multiply the keep up numbers together to get the overall complexity of the Enigma machine : ( i ) the number of possible selections of three rotors, ( two ) the count of daily starting positions of the three rotors, ( three ) the count of daily positions of the two rings, ( intravenous feeding ) the act of switchboard configurations ( assume kelvin = 10 wires were used ), and ( volt ) the number of reflecting telescope configurations. What is the number ? ( tip : this is a in truth, truly big total. ) h. During the course of the war, the Allies captured several Enigma machines and learned respective significant things : The Germans used five different rotors ( late, eight different rotors ). Each day, three of the five rotors were placed left to right in the machine ( this was function of the casual identify ). The Allies besides learned the wire of each rotor and were able to copy the rotors. They learned the wire of the reflecting telescope. How many ways could three rotors be selected from five and placed into the machine ? ( Order mattered ; a rotor shape of 123 operated differently from 132, etc. ) i. After learning the wiring of the rotors and the cable of the reflector, the remaining

shape variables ( parts of the day by day secret key ) were ( one ) the placement of three rotors from five into the car, ( two ) the act of starting positions for three rotors, ( three ) the side of the two rings ( on the leftmost and middle rotors ), and ( intravenous feeding ) the switchboard configuration. Assuming k = 10 wires were used in the switchboard, how many possible Enigma configurations remained ?

joule. How important was capturing the several Enigma machines to breaking the encoding ? For more information, see A. Ray Miller, The Cryptographic Mathematics of Enigma ( Fort Meade, MD : Center for Cryptologic History, 2012 ).

CHAPTER

DISCRETE PROBABILITIES AND RANDOM VARIABLES

Flip a coin until a head appears. How many flips are required ? What is the average number of flips required ? What is the probability that more than 10 flips are required ? That the number of flips is even ? many questions in probability concern discrete experiments whose outcomes are most handily described by one or more numbers. This chapter introduces discrete probabilities and random variables.

4.1 PROBABILITY MASS FUNCTIONS Consider an experiment that produces a discrete set of outcomes. Denote those outcomes as adam 0, x 1, x 2, ••• • For exemplar, a binary bit is a 0 or 1. An ordinary die is 1, 2, 3, 4, 5, or 6. The numeral of voters in an election is an integer, as are the number of electrons crossing a PN junction in a unit of prison term or the phone number of telephone calls being made at a given clamant of time. In many situations, the discrete values are themselves integers or can be easily mapped to integers ; for example, Xk = k or possibly Xk = thousand ! ‘ … In these cases, it is commodious to refer to the integers as the outcomes, or 0, 1,2, …. Let X be a discrete random variable denoting the ( random ) result of the experiment. It is discrete if the experiment results in discrete outcomes. It is a random variable if its rate is the result of a random experiment ; that is, it is not known until the experiment is done.

remark 4.1 : advanced texts distinguish between countable and uncountable sets. A bent is countable if an integer can be assigned to each element ( iodine .e., if one can count the elements ). All the discrete random variables considered in this text are countable. The most authoritative case of an uncountable stage set is an interval ( e.g., the set ofvaluesx

75

76

CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES

such that 0 ~x ~ 1 ). For our purposes, sets are either discrete or continuous ( or an obvious combination of the two ).

comment 4.2 : random variables are denoted with uppercase bold letters, sometimes with subscripts, such as X, Y, N, X, , and X2. The values random variables take on are denoted with lowercase, italic letters, sometimes with subscripts, such as ten, y, nitrogen, x, , and x2.

A probability mass officiate ( PMF ) is a map of probabilities to outcomes :

phosphorus ( k ) = Pr [ X = Xk ] for all values of k. PMFs possess two properties : 1. Each value is nonnegative :

phosphorus ( thousand )

~0

( 4.1 )

2. The summarize of the phosphorus ( kilobyte ) values is 1 : ( 4.2 )

LP ( kelvin ) = 1 k

Some comments about PMFs are in order : • A PMF is a routine of a discrete argument, k. For each value of k, phosphorus ( potassium ) is a count between 0 and 1. That is, p ( liter ) is a number, phosphorus ( 2 ) is a possibly different number, p ( 3 ) is so far another total, etc. • The simplest way of describing a PMF is to simply list the values : phosphorus ( O ), p ( liter ), phosphorus ( 3 ), etc. Another way is to use a board :

thousand phosphorus ( k ) I

0~4

1

0.3

2 0.2

3 0.1

still a third base manner is to use a shell argument :

p ( kelvin ) =

r 0.3 0.2 0.1

k=O k=1 k=2 k=3

• Any collection of numbers satisfying Equations ( 4.1 ) and ( 4.2 ) is a PMF. • In other words, there are an eternity of PMFs. Of this eternity, throughout the ledger we focus on a few PMFs that frequently occur in applications.

4.2 accumulative Distribution Functions

77

4.2 CUMULATIVE DISTRIBUTION FUNCTIONS A accumulative distribution serve ( CDF ) is another map of probabilities to outcomes, but unlike a PMF, a CDF measures the probabilities of { X :5 uracil ) for all values of-= < u < = : Fx ( uranium ) = Pr [ X :5 u ] =. [ p ( kilobyte ) thousand : xk ‘ 5 : uranium where the sum is over all k such that Xk :5 uranium. note that even though the outcomes are discrete, u is a continuous argument. All values from -= to= are allowed. CDFs own several significant properties : 1. The distribution function starts at 0 : FxC-=l = 0 2. The distribution function is nondecreasing. For u 1 < u2, 3. The distribution officiate ends at 1 : FxC=l = 1 A CDF is useful for calculating probabilities. The consequence { X :5 ud can be written as { X :5 uo ) uracil { uo “ kp ( kilobyte ) ( 4.17 ) k When it is clear which random variable star is being discussed, we will drop the subscript on Jtx ( u ) and write simply Jt ( u ). 84 CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES The MGF is the Laplace transform of the PMF except that randomness is replaced by -u. apart from this ( by and large irrelevant ) difference, the MGF has all the properties of the Laplace transform. belated on, we will use the fact that the MGF uniquely determines the PMF, just as the Laplace transform uniquely determines the bespeak. In signal march, we rarely compute moments of signals, but in probability, we much compute moments of PMFs. We show how the MGF helps compute moments by two different arguments. The first argument begins by expanding the exponential in a Maclaurin series * and taking expect values, term by condition : ‘ immediately, take a derivative instrument with deference to u : five hundred ~ -d Jt ( uranium ) =O+E [ X ] +uE [ X 2 ] + -E [ X 3 ] +··· u 2 ! Finally, set u = 0 : d Jt ( u ) l =O+E [ X ] +O+O+···=E [ X ] u=O dU Notice how the only term that “ survives ” both steps ( taking the derived function and setting u = 0 ) isE [ X ]. The second moment is found by taking two derivatives, then setting uranium = 0 : In general, the kth moment can be found as ( 4.18 ) The second argument showing why the MGF is useful in computing moments is to use properties of expected values directly : ! ! _Jt ( uracil ) = ! ! _E [ euX ] =E [ ! ! _euX ] =E [ XeuX ] du du du : uranium Jt ( uracil ) lu=O= E [ Xe xj = E [ Xj 0 “ The Maclaurin series of a serve isj ( x ) = joule ( O ) + joule ‘ ( O ) x+ j ” ( O ) x2 /2 ! +···.See Wolfram Mathworld, hypertext transfer protocol : //mathworld.wolfram.com/TaylorSeries.html. 4.5 Several Important Discrete PMFs 85 Similarly, for the kth moment, The above argumentation besides shows a utilitarian fact for checking your calculations : J£ ( 0 ) =E [ e0Xj =E [ lj = 1 Whenever you compute an MGF, take a moment and verify J£ ( 0 ) = 1. The MGF is helpful in computing moments for many, but not all, PMFs ( see example 4.5 ). For some distributions, the MGF either can not be computed in close class or evaluating the derivative is slippery ( requiring I : hospital ‘s rule, as discussed in Example 4.5 ). A general rule is to try to compute moments by Equation ( 4.3 ) first, then try the MGF if that fails. comment 4.8 : Three transforms are widely used in signal work : the Laplace translate, the Fourier transform, and the Z transform. In probability, the analogs are MGF, the characteristic function, and the generate serve. The MGF is the Laplace translate with s replaced by -u : Jt ( u ) = degree fahrenheit [ e ” X ] = L, e ” kitchen police ( thousand ) = 5l ‘ ( -u ) k The characteristic { fulsomeness is the Fourier transform with the sign of the advocate reversed : ‘ i & ‘ ( tungsten ) = E [ eiwX ] = L, eiwkp ( thousand ) = $ ( – w ) k The generating { Unction ( or the probability generating { ointment ) is the Z transform with z- 1 replaced by second : W ( s ) =£ [ ? ] = L, lp ( thousand ) =.I ( s- 1 ) kelvin Mathematicians use the beget function extensively in analyzing combinatorics and discrete probabilities. In this text, we by and large use the MGF in analyzing both discrete and continuous probabilities and the Laplace, Fourier, or Z transform, as appropriate, for analyzing signals. 4.5 SEVERAL IMPORTANT DISCRETE PMFs There are many discrete PMFs that appear in diverse applications. Four of the most important are the uniform, geometric, Poisson, and binomial PMFs. The uniform, geometric, and Poisson are discussed below. The binomial is presented in some detail in chapter 6. 86 CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES 4.5. 1 Uniform PMF For a uniform PMF, all nonzero probabilities are the like. typically, X takes on integral values, with potassium = 1,2, …, thousand : Pr [ X=k ] = { ; fork=1,2, …, m for all other thousand The uniform PMF with m = 6 ( for example, for a fair fail ) is illustrated in the stalk plot below : 3 6 k Computing the mean and variation is aboveboard ( using Equations 4.6 and 4.7 ) : ( 4.19 ) The bastardly is the leaden average of the PMF values ( note that if meter is evening, the mean is not one of the samples ) : now, compute the division : 2 a2=E [ X2 ] -E [ X ] 2= ( 2m+1 ) ( m+l ) _ ( m+l ) = ( m+1 ) ( m-l ) X 6 4 12 EXAMPLE4.5 ( 4.20 ) The consistent distribution is a badly exemplar for the utility of computing moments with the MGF, requiring a level of tartar well beyond that required elsewhere in the text. With that said, here goes : 1 m Jt ( u ) = euk mk=i L Ignore the 11m for now, and focus on the kernel. Multiplying by ( europium -1 ) allows us to sum the telescope series. then, set u = 0 to check our calculation : ( eu -1 ) thousand m+ fifty molarity k=i k=2 k=i I : euk = I : euk- I : euk = european union ( m+l ) – europium megabyte europium ( m+l ) – europium I : euk= — -k=i europium -1 4.5 several Important Discrete PMFs europium ( m+l ) – e ” einsteinium ” -1 I 0 0 1-1 u=O 87 1-1 Warning bells should be clanging : danger ahead, proceed with caution ! In situations like this, we can use LHospital ‘s rule to evaluate the terminus ad quem as u ~ 0. LHospital ‘s predominate says differentiate the numerator and denominator individually, set uranium = 0 in each, and take the ratio : .. 4 .. ( eu ( m+ I ) – e ” ) I u=O du -Ju ( e ” -1 ) lu=O ( em+ fifty ) europium ( m+ fifty ) – vitamin e ” fifty ) lu=O m+ 1-1 — -=m e ” lu=O After dividing by thousand, the check works : Jt ( 0 ) = 1. Back to E [ X ] : take a derivative of Jt ‘ ( uranium ) ( using the quotient rule ), then set uracil = 0 : meu ( m+2 ) – ( molarity + liter ) europium ( m+l ) + einsteinium ” thousand ( e ” -1 ) 2 I u=O Back to LHospital ‘s rule, but this time we need to take two derivatives of the current numerator and denominator individually and set uracil = 0 ( two derivatives are necessity because one placid results in 0/0 ) : 2 3 E [ X ] = megabyte ( m+2 ) – ( m+l ) +1 = m ( m+l ) = m+1 2m 2m 2 Fortunately, the moments of the uniform distribution can be calculated well by the target formula as the MG F is amazingly complicated. however, the MGF is much easier to use than the send convention. 4.5.2 Geometric PMF The geometric PMF captures the notion of flipping a coin until a head appears. Let p be the probability of a point on an individual flip, and assume the flips are independent ( one throw does not affect any early flips ). A sequence of flips might look like the follow : 00001. This sequence has four nothing followed by a 1. In general, a sequence oflength kilobyte will have k – 1 zero followed by a 1. Letting adam denote the number of flips required, the PMF of X is fork= 1,2, … k~O The PMF values are clearly nonnegative. They sum to 1 as follows : 88 CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES The first 12 values of a geometric PMF with p = 0.3 are shown below : 0.3 0.3 X X 03 “ -~ 0.7 = 0.21 — ~ 2 0.7 = 0.147 — -r r Pr [ X = k ] versus kilobyte with p = 0.3 radius T T, • • • • 5 k 10 The beggarly is difficult to calculate directly : E [ X ] = farad : kp ( l- p ) k- 1 k=1 ( 4.21 ) It is not obvious how to compute the sum. however, it can be calculated using the MGF : … thyroxine { ( u ) =E [ euX ] = farad : eukp ( l- p ) k-1 k=1 = peu f : europium ( k-1 ) ( 1- p ) k-1 k=1 oo =peui : ( O-p ) european union ) I ( changing variable lambert = k – 1 ) ( 4.22 ) 1=0 Let r = ( 1- phosphorus ) europium, and using Equation ( 4.8 ), oo L ( 0-p ) european union ) 1=0 I oo ( substituting gas constant = ( 1- phosphorus ) europium ) = I : rl 1=0 1 for lrl < 1 1-r 1 1- ( 1- phosphorus ) europium = 1 ( 1 -p ) einsteinium ” fifty < 1 in a vicinity around uracil = 0 ( to show the series converges and that we can take a derivative of the MGF ). Solving for u results in u < logarithm ( 1i ( 1 -p ) ). lfp > 0, log ( 1i ( 1 -p ) ) > 0. thus, the series converges in a region of u = 0. We can therefore take the derivative of the MGF at u = 0.

comment 4.9 : We should verity that lrl

Substituting this result into Equation ( 4.22 ) allows us to finish calculate … thyroxine { ( uranium ) :

… metric ton { ( u ) =

peu 1 – ( 1 – p ) european union

e-u -1 ·-=p ( e-u-1+p ) e-u

Check the calculation : … metric ton { ( O ) =pi ( e-o -1 + p ) = private detective ( l-1 + phosphorus ) =pip= 1.

( 4.23 )

4.5 several Important Discrete PMFs 89 Computing the derived function and setting u = 0,

!

.it ( uracil ) = phosphorus ( -1 ) ( e-u- 1 + pr2

vitamin d I E [ X ] = -.it ( uranium ) du

( -l ) e-u

phosphorus = -1 = 2 ”

u=O

P

P

On average, it takes 1I phosphorus flips to get the first head. If phosphorus = 0.5, then it takes an average of2 flips ; if p = 0.1, it takes an average of 10 flips. Taking a second derivation of .it ( u ) and setting u = 0 yield E [ X 2 ] = ( 2- phosphorus ) lp. The variance can now be computed easily : a2 x

2

2-p

1

1-p

=E [ X2 ] -E [ X ] =p2- -p2 – =p2-

The standard deviation is the square root of the variability, ax= yttrium ‘ O- phosphorus ) jp. A succession of 50 random bits with phosphorus = 0.3 is shown below : 10100010100011110001000001111101000000010010110001 The runs end with the fifty ‘s. This particular sequence has 20 runs : 1·01·0001·01·0001·1·1·1·0001·000001·1·1·1·1·01· 00000001·001·01·1·0001 The tend lengths are the issue of digits in each run : 12424111461111283214 The average run distance of this particular sequence is 1+2+4+2+4+1+1+1+4+6+1+1+1+1+2+8+3+2+1+4 20

~

– – – – – – – – – – – – – – – – – – – – – – – = – = 2.5

This average is stopping point to the expected value, E [ X ] EXAMPLE4.6

20

= 1/p = 1/0.3 = 3.33.

When a cell phone wants to connect to the network ( for example, make a earphone call or access the Internet ), the telephone performs a “ random access ” operation. The telephone transmits a message. If the message is received correctly by the cell column, the tugboat responds with an recognition. If the message is not received correctly ( e.g., because another cellular telephone earphone is transmitting at the same clock time ), the telephone waits a random period of clock time and tries again. Under reasonable conditions, the success of each test can be modeled as independent Bernoulli random variables with probability p. then, the expect number of tries needed is 1/p. In actual practice, when a random access message fails, the next message normally is sent at a higher power ( the master message might not have been “ forte ” adequate to be heard ). In this lawsuit, the tries are not independent with a coarse probability of success, and the analysis of the expect number of messages needed is more complicate.

90

CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES

Comment 4.10 : People are notoriously badly at generating random sequences by

themselves. Let us demonstrate this. Write down a “ random ” succession of 1 ‘sand O ‘s with a duration of at least SO ( more is better ) and ending with a 1 ( probability of a 1 is O.S ). Do not use coins or a calculator ; barely let the sequence flow from your head. Compute the run lengths, and compare the lengths of your runs to the expected lengths ( half the runs should be length 1, one quarter of distance 2, and so forth ). If you are like most people, you will have excessively many short runs and besides few long runs.

4.5.3 The Poisson Distribution The Poisson distribution is wide used to describe counting experiments, such as the total of cancers in a certain sphere or the act of phone calls being made at any given prison term. The PMF of a Poisson random variable with parameter 1\, is ; ~, thousand

Pr [ X = k ] = p ( kelvin ) = potassium ! east — ‘ fork= 0, 1,2, …

( 4.24 )

The first 14 points of a Poisson PMF with 1\, = 5 are shown below : 0.175

/1, =5

Pr [ X=k ]

• T 4

0

5

k

radius T• • 9

• •

l3

To show the Poisson probabilities sum to 1, we start with the Taylor series fore- ‘ : 1\,2 /1,3 e-‘=1+1\, +-+-+··· 2 ! 3 !

now, multiply both sides by vitamin e — ‘ :

The terms on the right are the Poisson probabilities. The moments can be computed with the MGF : Jt ( u ) =E [ einsteinium ” x ]

=

( definition ofMGF )

f : einsteinium ” kitchen police ( kelvin )

( definition of expected value )

k=O 00

1\, k

= I : e ” ke — ‘ k=O

kilobyte !

-. 5. The ratio can be rearranged to yield a convenient method for computing the p ( thousand ) succession : phosphorus ( O ) = e-, thymine ;

fork= 1,2, … do 1

phosphorus ( kilobyte ) =

( ~ ) ·p ( k-1 ) ;

end A sequence of 30 Poisson random variables with 1\, = 5 is 5665857344578656785822121743448 There are one 1, two 2 ‘s, two 3 ‘s, five 4 ‘s, six 5 ‘s, four 6 ‘s, four 7 ‘s, five 8 ‘s, and one 12. For example, the probability of a 5 is 0.175. In a succession of 30 random variables, we would expect to get about 30 x 0.175 = 5.25 fives. This sequence has six, which is close to

92

CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES

the expect number. As another exemplar, the expect number of 2 ‘s is

52

30·Pr [ X=2 ] =30·

21 e-

5

=2.53

The actual number is 2, again close to the expected phone number. It is not always true, however, that the agreement between expected and actual is so good. For example, this sequence has five 8 ‘s, while the have a bun in the oven phone number of 8 ‘s is 30·Pr [ X=8 ] =30·

ss

B ! e- 5 = 1.95

The succession is random after all. While we expect the count of each count to be approximately adequate to the expected phone number, we should not expect to observe precisely the “ correctly ” total. This example looked ahead a bit to multiple random variables. That is the topic of the adjacent chapter. EXAMPLE4.7

The image detector in a digital television camera is made up of pixels. Each pixel in the detector counts the photons that hit it. If the shutter is held open for deoxythymidine monophosphate seconds, the number of photons counted is Poisson with mean .Itt. Let X denote the number of photons counted. then, E [ X ] =.Itt Var [ X ] =.Itt ThecameracomputesanaverageZ =X ft. Therefore, E [ Z ] =It and Var [ Z ] = ( l/t 2 ) Var [ X ] = ltjt. The performance of systems like this is typically measured as the signal-to-noise ratio ratio ( SNR ), defined as the average office of the bespeak divided by the average of the noise power. We can write Z in bespeak plus noise form as Z = It + ( Z – It ) = It + N = sign + make noise

where It is the signal and N = Z – It is the noise. Put another way, SNR=

bespeak power It 2 It 2 = — =-=.Itt noise power Var [ N ] lt/t

We see the SNR improves as It increases ( i, as there is more light ) and as thymine increases ( i, longer shutter times ). Of course, long shutter times merely work if the submit and the television camera are sanely stationary.

4.6 gambling AND FINANCIAL DECISION MAKING Decision making under uncertainty is sometimes called gambling, or sometimes invest, but is frequently necessity. sometimes we must make decisions about future events that we can only partially predict. In this section, we consider respective examples.

4.6 Gambling and Financial Decision Making 93

Imagine you are presented with the postdate choice : spend 1 dollar to purchase a ticket, or not. With probability p, the tag wins and returns w + 1 dollars ( watt dollars represent your winnings, and the extra 1 dollar is the return of your original ticket price ). If the ticket loses, you lose your original dollar. Should you buy the ticket ? This example represents many gambling situations. Bettors at horse racing buy tickets on one or more horses. Gamblers in casinos can wager on respective card games ( for example, black flag or baccarat ), die games ( for example, craps ), roulette, and others. People can buy lottery tickets or set bets on sporting events. Let X represent the gambler ‘s reach ( or loss ). With probability phosphorus, you win west dollars, and with probability, 1 – p you lose one dollar. frankincense, -1

X=

{ west

with probability 1 – p with probability phosphorus

E [ X ] =pw+ ( lp ) ( -1 ) =pw- ( 1-p ) E [ X2j = pw2 + ( 1- p ) ( -1 ) 2 2 Var [ X ] = E [ X 2 j-E [ Xj = p ( l- p ) ( w+ 1 ) 2 The bet is fair if E [ X ] = 0. If E [ X ] > 0, you should buy the slate. On median, your winnings exceeds the monetary value. conversely, if E [ X ] < 0, do not buy the slate. Let Pe represent the break-even probability for a given wand We the break-even succeed for a given p. then, at the break-even point, E [ X ] =O=pw- ( 1-p ) Rearranging yields We = ( 1 - p ) I p or Pe = 1I ( tungsten + 1 ). The proportion ( 1 - p ) I p is known as the odds proportion. For case, if watt = 3, the break-even probability is Pe = 1 I ( 3 + 1 ) = 0.25. If the actual probability is greater than this, make the bet ; if not, do not. EXAMPLE4.8 Imagine you are playing Texas Hold ’em poker and know your two cards and the first four coarse cards. There is one more coarse wag to come. You presently have a lose handwriting, but you have four cards to a flower. If the last tease makes your flower, you judge you will win the pot. There are w dollars in the pot, and you have to bet another dollar to continue. Should you ? Since there are 13-4 = 9 cards remaining in the flush suit ( these are “ outs ” in poker parlance ) and 52- 6 = 46 cards overall, the probability of making the flush is p = 9146 = 0.195. If the pot holds more than We= ( 1- phosphorus ) lp = 3719 = 4.1 dollars, make the bet. comment 4.11 : In gambling parlance, a stake that returns a succeed ofw dollars for each dollar wagered offers w-to-1 odds. For exercise, “ 4-to-1 ” odds means the break-even probability is Pe = 1 1 ( 4 + 1 ) = 115. Some sports betting, such as baseball bet, uses money lines. A money occupation is a plus or damaging number. If positive, say, 140, the money line represents the 94 CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES gambler ‘s potential win on a $ 1 00 bet ; that is, the gambler bets $ 100 to win $ 140. If negative, say, -120, it represents the amount of money a gambler needs to wager to potentially win $ 1 00 ; that is, a $ 120 count on the front-runner would win $ 100 if successful. A baseball crippled might be listed as -140, + 120, meaning a count on the front-runner requires $ 140 to win $ 100 and a bet of $ 100 on the underdog might win $ 120. ( That the numbers are not the same represents a profit potential to the bookmaker. ) When placing a bet, the gambler risks a certain thing ( the money in his or her pocket ) to potentially win west dollars. The bookmaker will adjust the payout watt so that, on average, the expect return to the gambler is negative. In effect, the gambler trades a lower expected value for an increased discrepancy. For a distinctive bet, E [ X ] = wp- ( 1-p ) 0 Buying insurance is the opposite bet, trading a lower expected value for a lower discrepancy. Let coke be the cost of the policy, p the probability of something bad happen ( for example, a tree falling on your house ), and v the price when the bad thing happens. typically, p is small, and v relatively large. Before buying insurance, you face an expect value of -vp and a variance of volt 2p ( l- phosphorus ). After buying insurance, your expected value is -c whether or not the bad thing happens, and your discrepancy is reduced to 0. Buying policy reduces your expected respect by c – vp > 0 but reduces your variability from volt 2p ( l- p ) to 0. In compendious, buying insurance is a shape of gamble, but the tradeoff is different. The insurance buyer replaces a potential large passing ( of size five ) with a guarantee small loss ( of size carbon ). The indemnity broke profits, on median, by c- vp > 0.

comment 4.12 : many fiscal planners believe that reducing your variability by buying

indemnity is a good bet ( if the insurance is not besides costly ). lottery tickets ( and other forms of gambling ) are considered to be bad bets, as lowering your expected value to increase your variability is considered poor fiscal planning. For many people, however, buying lottery tickets and checking for winners is fun. Whether or not the fun gene is worth the cost is a personal decision outside the region of probability. It is possibly a deplorable comment to note that government-run lotteries tend to be more costly than lotteries run by organized crime. For exemplify, in a “ Pick 3 ” game with phosphorus = 1/1000, the distinctive politics payout is SOD, while the typical organized crime payout is ? SO. In real-world policy problems, both p and v are unknown and must be estimated. Assessing risk like this is called actuarial skill. person who does so is an statistician.

Summary

95

A random variable is a variable whose value depends on the result of a random experiment. A discrete random variable can take on one of a discrete set of outcomes. Let X be a discrete random variable, and letxk fork= 0, 1,2, … be the discrete outcomes. In many experiments, the outcomes are integers ; that is, Xk = k. A probability multitude officiate ( PMF ) is the collection of discrete probabilities, phosphorus ( kelvin ) :

p ( thousand ) = Pr [ X = xk ] The p ( k ) satisfy two important properties : 00

p ( thousand ) ~ 0

LP ( thousand ) = 1 k=O

and

A accumulative distribution function ( CDF ) measures the probability that X values of uracil :

Fx ( uracil ) = Pr [ X :5 u ] = L

:5

u for all

p ( kilobyte )

kilobyte : xk ‘ 5 : uracil

For all uracil < volt, 0 = Fx ( -=l :5 Fx ( u ) :5 Fx ( v ) :5 FxC=l = 1 Expected values are probabilistic averages : 00 E [ gCXl ] = LK ( Xk ) p ( thousand ) k=O Expected values are besides linear : E [ ag1 ( X ) + bgz ( X ) j = aE [ g1 ( X ) ] + beryllium [ gz ( X ) ] however, they are not multiplicative in general : E [ g1 ( X ) gz ( X ) ] degree fahrenheit :. E [ g1 ( X ) ] E [ gz ( X ) ] The mean is the probabilistic average value of ten : 00 llx=E [ X ] = LXkp ( k ) k=O The variance of X is a measure of spread about the mean : The moment generatingfunction ( MPF ) is the Laplace transform of the PMF ( except that the sign of the advocate is flipped ) : 00 Jtx ( u ) = E [ euX ] = L euxkp ( kilobyte ) k=O 96 CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES The MGF can be used for computing moments : Four authoritative discrete distributions are the Bernouli, consistent, geometric, and Poisson : • The Bernoulli PMF models binary random variables with Pr [ X = 1- phosphorus, E [ X ] = p, and Var [ X ] = phosphorus ( l- p ). 1 ] = p and Pr [ X = 0 ] = • The uniform PMF is used when the outcomes are equally likely ; that is, phosphorus ( kelvin ) = 11 north for kelvin = 1,2, …, normality, E [ X ] = ( nitrogen + 1 ) /2, an Var [ X ] = ( newton + 1 ) ( normality -1 ) 112. • The geometric PMF is used to model the total of trials needed for a solution to occur ( i, the number of flips required to get a heads ) : p ( k ) = phosphorus ( l- p ) k-l fork= 1,2, …, E [ X ] =lip, and Var [ X ] = ( lp ) ! p 2. • The Poisson distribution is used in many count experiments, such as counting the number of cancers in a city : phosphorus ( k ) = Jtke- ” -1 kilobyte ! fork= 0, 1,2, …, E [ X ] = Jt, and Var [ X ] = Jt. 4.1 X has the probabilities listed in the mesa downstairs. What are east [ X ] and Var [ X ] ? k Pr [ X = k ] I 1 0.5 2 3 4 0.2 0.1 0.2 4.2 For the probabilities in Problem 4.1, compute the MGF, and use it to compute the entail and discrepancy. 4.3 X has the probabilities listed in the postpone under. What is the CDF of X ? kelvin Pr [ X = k ] I 1 0.5 2 3 4 0.2 0.1 0.2 4.4 For the probabilities in Problem 4.3, compute the MGF, and use it to compute the think of and variation. 4.5 X has the probabilities listed in the mesa under. What are east [ X ] and Var [ X ] ? k Pr [ X=k ] 4.6 I 1 0.1 2 3 4 0.2 0.3 0.2 0.2 X has the probabilities listed in the board downstairs. What is the CDF of X ? k Pr [ X=k ] I 1 0.1 2 0.2 3 0.3 4 0.2 0.2 Problems 97 4.7 Imagine you have a mint that comes up heads with probability 0.5 ( and tails with probability 0.5 ). a. How can you use that coin to generate a bit with the probability of a 1 equal to 0.25 ? bacillus. How might you generalize this to a probability of k/2n for any potassium between 0 and 2n ? ( touch : you can flip the coin as many times as you want and use those flips to determine whether the generate bit is 1 or 0. ) 4.8 If X is uniform on k = 1 to k = megabyte : a. What is the distribution function of X ? b. Plot the distribution officiate. 4.9 We defined the uniform distribution on kilobyte = 1,2, …, m. In some cases, the consistent PMF is defined on kilobyte = 0, 1, …, m- 1. What are the bastardly and variance in this case ? 4.10 Let N be a geometric random variable with parameter p. What is Pr [ N ~ kilobyte ] for arbitrary integer k > 0 ? Give a elementary interpretation of your suffice.

4.11

Let N be a geometric random variable with argument p. Calculate Pr [ N =liN~ kilobyte ] for l~k.

4.12

Let N be a geometric random variable with parameter p. Calculate Pr [ N : : ; M ] form, a positive integer, and Pr [ N =kIN : : ; M ] fork= 1,2, …, M.

4.13

Let N be a geometric random variable with argument Pr [ N = 2 ], and Pr [ N ~ 2 ].

4.14

Let N be a Poisson random variable with argument A= 1. Calculate and diagram Pr [ N = 0 ], Pr [ N = 1 ], …, Pr [ N = 6 ].

4.15

Let N be a Poisson random variable with argument A= 2. Calculate and diagram Pr [ N = 0 ], Pr [ N = 1 ], …, Pr [ N = 6 ].

4.16

Using plotting software, plot the distribution function of a Poisson random variable star with parameter A = 1.

4.17

For N Poisson with parameter A, usher E [ N2 ] = A+A2.

4.18

The have a bun in the oven value of the Poisson distribution can be calculated directly :

a. use this proficiency to compute E [ X ( X moment. )

-1 ) ].

p=

113. Calculate Pr [ N : : ; 2 ],

( This is known as the second base factorial

b-complex vitamin. Use the think of and second factorial moment to compute the discrepancy. c. You have computed the first and second factorial moments ( E [ X ] and E [ X ( X- 1 ) ] ). Continue this design, and guess the kth factorial moment, E [ X ( X -1 ) ··· ( X- kilobyte + 1 ) ]. 4.19

Consider a peal of a bazaar six-sided die. a. Calculate the base, moment moment, and variability of the cast using Equations ( 4.19 ) and ( 4.20 ). bacillus. Compare the answers above to the direct calculations in Examples 4.2 and 4.4.

98

CHAPTER 4 DISCRETE PROBABILITIES AND RANDOM VARIABLES

4.20

Show the discrepancy of Y =aX+ boron is uJ on bel.

4.21

Let X be a discrete random varying.

=a

2

u~. eminence that the variance does not depend

a. show that the discrepancy of X is nonnegative. b-complex vitamin. Show the even moments E [ X2k ] are nonnegative. c. What about the odd moments ? Find a PMF whose base is positive but whose third gear moment is minus. 4.22

What respect of a understate e [ ( X- a ) 2 ] ? Show this two ways. a. Write E [ ( X- a ) 2 ] in terms of u 2, f.l, and a ( no expected values at this point ), and find the value of a that minimizes the expression. bacillus. Use calculus and Equation ( 4.14 ) to find the minimizing prize of a.

4.23

Often random variables are normalized. Let Y = ( X- Jlx ) lux. What are the average and variance of Y ?

4.24

Generate your own sequence of 20 runs ( twenty 1 ‘s in the sequence ) with phosphorus = 0.3. The Matlab command witwatersrand ( 1, normality ) Yl ) p ( thousand, liter ) thousand

( 5.3 )

fifty

Comment 5.2 : In chapter 8, we consider multiple continuous random variables. The

generalization of Equation ( 5.3 ) to continuous random variables replaces the double sum with a double over built-in over the planar density officiate ( Equation 8.3 in Section 8.2 ) : vitamin e [ gravitational constant ( X, Y ) ]

=

L : liter :

guanine ( x, y ) fXY ( x, y ) o/dx

With this change, all the properties developed below hold for continuous equally well as discrete random variables.

Expected values have some basic properties : 1. If gigabyte ( X, Y ) = g 1 ( X, Y ) + g2 ( X, Y ), then the expected rate is additive : east [ g1 ( X, Y )

+ g2 ( X, Y ) ] = E [ g1 ( X, Y ) ] + E [ g2 ( X, Y ) ]

In particular, means lend : east [ X + Y ] = E [ X ] +E [ Y ]. 2. In cosmopolitan, the expect rate is multiplicative only if g ( X, Y ) factors and X and Yare mugwump, which means the PMF factors ; that is, PXY ( k, fifty ) = Px ( kilobyte ) py ( fifty ). If guanine ( X, Y ) = g 1 ( X ) g2 ( Y ) and X and Yare independent,

106

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

( 5.4 ) If X and Yare dependent, then Equation ( 5.4 ) is, in general, not true.

5.3.2 Moments for Two Random Variables In addition to the moments ( for example, mean and discrepancy ) for the individual random variables, there are joint analogs of the think of and discrepancy. These are the correlation and covariance. The correlation of X and Y is rxy = E [ XY ]. ten and Yare uncorrelated if rxy = JlxJly ( i.e., if E [ XY ] = E [ X ] E [ Yj ). Are ten and yin Example 5.1 uncorrelated ? We need to compute three moments : 3

E [ X ] = L, k·px ( k ) =Ox 0.1 + 1 x 0.1 +2 x 0.5+3 ten 0.3 =2.0 k=O I

E [ Y ] = L, l·p1 ( [ ) =Ox 0.7+ 1 adam 0.3 =0.3 1=0 3

I

E [ XY ] = L, L, k·l·px1 ( thousand, [ ) k=OI=O

= 0 X 0 X 0.0 + 0 X 1 X 0.1 + 1 X 0 X 0.1 + 1 X 1 X 0.0 + 2 X 0 X 0.4 + 2 X 1 X 0.1 + 3 X 0 X 0.2 + 3 X 1 X 0.1 =0.5 Since E [ XY ] = 0.5 fluorine :. E [ X ] E [ Y ] = 2.0 ten 0.3 = 0.6, X andY are not uncorrelated ; they are correlated. Independence implies uncorrelated, but the converse is not necessarily true. If X and Yare autonomous, then Pxy ( potassium, fifty ) = Px ( thousand ) py ( lambert ) for all potassium and I. This implies that E [ XY ] = E [ X ] E [ Y ]. Hence, X and Yare uncorrelated. See Problem 5.8 for an exemplar of two uncorrelated random variables that are not mugwump. The covariance of X andY is O ” xy = Cov [ X, Y ] =E [ ( X- JlxlCY- p1 ) ]. Like the variability, the covariance obeys a decomposition theorem : theorem 5.1 : For all X and Y, Cov [ X, Y ] =E [ XY ] -E [ X ] E [ Y ] = rxy – JlxJly

This theorem is an annex of Theorem 4.1. One corollary of Theorem 5.1 is an alternate definition of uncorrelated : if X and Yare uncorrelated, then axy = 0, and frailty versa. To summarize, if X andY are uncorrelated, then all the following are dependable ( they all say the lapp thing, barely in different ways or with different notations ) :

rxy = JlxJly E [ XY ] =E [ X ] E [ Y ]

5.3 Moments and Expected Values

107

axy =0 E [ ( X -pxlCY -py ) ] = 0

Comment5.3 : lfY=X,

In other words, the covariance between X and itself is the division of X.

Linear combinations of random variables occur frequently. Let Z = aX+ by. then, E [ Zj =E [ aX+bY ] =aE [ X ] +bE [ Y ] = apx + bpy

Var [ Z ] =Var [ aX+bY ] = E [ { CaX +bY ) – ( ajlx + bpy ) ) 2

2

]

2

= E [ a ( X -px ) +2ab ( X -px ) ( Y -py )

+ b2 ( Y -pyl 2 ]

= a 2 Var [ X ] + 2ab Cov [ X, Y ] + b2 Var [ Y ] 2

2

= a a ; + 2abaxy + bel aJ

The correlation coefficient coefficient is axy Pxy=-axay

( 5.5 )

sometimes this is rearranged as a xy = Pxya xa y· Theorem 5.2 : The correlation coefficient coefficient is between -1 and 1, -1 :5 Pxy :5 1

( 5.6 )

Define normalize versions of X and Y as follows : adam = ( X – llxl I ax and Y = ( Y -py ) lie. then, both X and Yhavezeromean and unit variability, and the correlation between X and Y is the correlation coefficient coefficient between X and Y : proof :

– – ] – [ — ] – [ X-px Y-py ] – Cov [ X, Y ] _ Cov [ X, Y -E XY -E – – – – -Pxy ax ay axay

The division of X± Y is nonnegative ( as are all variances ) : 0 :5 Var [ X ± Y ] = Var [ X ] ± 2 Cov [ X, Y ] + Var [ Y ]

108

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

= 1 ±2Pxy + 1 = 2 ( 1 ±Pxy )

The entirely manner this is genuine is if -1

:5

•

p xy :5 1.

comment 5.4 : For two random variables, we focus on five moments, /lx, /ly, a ;, a ‘ j, and axy, and the two relate values, rxy and Pxy· These five ( or seven ) appear in many applications. much of statistics is about estimating these moments or using them to estimate other things.

5.4 EXAMPLE : TWO DISCRETE RANDOM VARIABLES We can illustrate these ideas with an exemplar. Let X and Y be discrete random variables whose PMF is undifferentiated on the 12 points shown below :

2

•

•

•

•

•

2

3 kilobyte 4

l

0 0

The idea is that X and Yare restricted to these 12 points. Each of these points is equally likely. Pr [ X= kn Y= l ] = phosphorus ( k, fifty ) = {

~

for k and l highlighted above otherwise

Probabilities of events are computed by summing the probabilities of the outcomes that comprise the consequence. For exemplar, { 2 : o ; X : o ; 3 newton 1 : oxygen ; Y : o ; 2 } = { X= 2 n Y = 1 } u { X= 2 newton Y = 2 } uracil { X= 3 nitrogen Y = 1 }

The dots within the square shown below are those that satisfy the event in which we are concerned :

•

•

•

•

•

•

D •

•

•

5.4 Example : Two Discrete Random Variables

109

The probability is the sum of the dots :

1 ] +Pr [ X=2n Y=2 ] +Pr [ X= 3n Y= 1 ]

Pr [ 2 : oxygen ; X : o ; 3n 1:5 y : o ; 2 ] = Pr [ X=2n Y=

3 =- +- +- =12 12 12 12 1

1

1

5.4. 1 fringy PMFs and Expected Values The inaugural practice is to calculate the bare PMFs. post exchange ( potassium ) is calculated by summingpXY ( thousand, [ ) along I ; that is, by summing column :

3

12

3

12

3

12

2

12

I

12

Px ( potassium )

For exercise, Px ( O ) = PxCll = Px ( 2 ) = 3112, Px ( 3 ) = 2/12, and Px ( 4 ) = 1112. py ( [ ) is computed by summing over the rows : 3

py ( fifty )

12 4

12 5

12 Computing the means of X and Y is straightforward : 1 19 3 3 3 2 E [ X ] =0·- +1·- +2· – + 3 · – +4·- = 12 12 12 12 12 12 5 4 3 10 E [ Y ] =0· – +1· – +2· – = 12 12 12 12

5.4.2 independence Are X andY autonomous ? independence requires the PMF, PXY ( k, fifty ), divisor as Px ( kelvin ) py ( [ ) for all thousand and I. On the other hand, showing adam andY are dependent ( i.e., not independent ) takes alone one measure of kilobyte and I that does not factor. For case, PXY ( O, O ) = 1112 farad :. Px ( O ) py ( O ) = ( 5112 ) · ( 3112 ). consequently, X andY are pendent.

110

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

5.4.3 Joint CDF Computing the joint CDF of X and Y is aboveboard, though a moment boring :

L L

Fxy ( u, five ) =Pr [ ten : o ; unY : oxygen ; v ] =

Pxr ( k, [ )

thousand : xk ~U degree fahrenheit : YI ‘ 5 : V

In the design below, the region where { X :5 2.6 north Y

•

•

1.4 } is highlighted in gray :

:5

• •

The probability of this event ( the grey region ) is computed by summing the probabilities of the dots within the area : Fxy ( 2.6, 1.4 ) =

6 12

Since the PMF is discrete, the CDF is constant for all ( 2.0 :5 u < 3.0 ) nitrogen ( 1.0 :5 v < 2.0 ). One way to illustrate the CDF is to replace the dots with the CDF prize at that point, as shown below : EXAMPLES.3 • • • • • 12 6 • • • • • • Replace the rest of the dots with their CDF values. Describe the region where the CDF is equal to l. 5.4.4 Transformations With One Output Let Z =X+ Y. What is pz ( megabyte ) ? The first step is to note that 0 :5 Z :54. Next, look at the consequence { Z = thousand ) and then at the probability of that event : ! Z=m ) = { X+ Y=m ) = { ( X= minnesota Y=O ) uranium ( X= garand rifle nitrogen Y= lambert ) u … uracil ( X=On Y= m ) ) = { X= m n Y = 0 } u { X= m -1 n Y = 1 } u .. · uracil { X= 0 newton Y = molarity ) pz ( thousand ) = Pr [ Z = m ] =Pr [ X+Y=m ] 5.4 Example : Two Discrete Random Variables 111 = 1 ] = Pr [ X = m n Y = 0 ] + Pr [ X = m -1 n Y + · · · + Pr [ X = 0 north Y = thousand ] = p ( thousand, O ) + p ( megabyte -1, 1 ) + · ·· + phosphorus ( O, meter ) Thus, the level curves of Z = thousand are diagonal lines, as shown below : I 2 12 3 12 3 12 12 3 12 Pz ( thousand ) The beggarly of Z can be calculated two unlike ways : 1 2 3 3 3 E [ Zj = – ·0+- ·1 + – · 2 + – ·3+- ·4 12 12 12 12 12 29 12 =E [ X+Y ] =E [ X ] +E [ Y ] ( additivity of expected value ) 19 10 29 =-+-=12 12 12 One transformation that occurs in many applications is the soap function ; that is, W = soap ( X, Y ). For exercise, soap ( 3,2 ) = 3, soap ( -2,5 ) = 5, etc. Often, we are concern in the largest or worst of many observations, and the soap ( · ) transformation gives this value : Pr [ W= w ] = Pr [ soap ( X, Y ) = tungsten ] = Pr [ CX= wn Y < west ) u ( X < wn Y=w ) u ( X= wn Y=wl ] The level curves are right angles, as shown below : I 12 3 12 5 12 Pw ( w ) 2 12 I 12 112 CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES For exemplar, Pr [ W = 2 ] = E [ Wj fz. Thus, 1 3 5 2 1 23 = – ·0+- ·1 + – ·2+ – · 3 + – ·4=12 12 12 12 12 12 5.4.5 Transformations With several Outputs Transformations can besides have angstrom many outputs as inputs. Consider a joint transformation Z =X+ Y and Z ‘ =X- Y. What is the joint PMF of Z and Z ‘ ? The simplest solution to this problem is to compute a table of all the values : X y z 0 0 0 Z ‘ 0 -I 0 -2 2 0 0 0 -I 2 2 0 2 4 0 4 3 2 4 4 2 2 2 2 3 3 0 4 0 These Z and Z ‘ values can be plotted as dots : • 4 • 3 • 2 • Z ‘ 1 • • 0 2 -1 -2 • 3 • 4 z • now, one can compute the person PMFs, expected values, and joint CDF as ahead. 5.5 Sums of Independent Random Variables 113 5.4.6 discussion In this case, we worked with a two-dimensional distribution. The same techniques work in higher dimensions as well. The grade curves become subspaces ( for example, surfaces in three dimensions ). One expression to pay care to is the probability distribution. In this model, the distribution was uniform. All values were the lapp. Summing the probabilities simply amounted to counting the dots and breed by 1/12. In more general examples, the probabilities will vary. rather of counting the dots, one must sum the probabilities of each dot. 5.5 SUMS OF INDEPENDENT RANDOM VARIABLES Sums of independent random variables occur frequently. here, we show three important results. First, the entail of a sum is the sum of the means. Second, the variation of the sum is the summarize of the variances. Third, the PMF is the gyrus of the individual PMFs. LetX 1, X 2, •••, Xn be random variables, and letS =X 1 +X2 + · · · +Xn be the sum of the random variables. The means add evening if the random variables are not autonomous : This result does not require the random variables to be independent. It follows from the additivity of expected values and applies to any fructify of random variables. The variances add if the covariances between the random variables are 0. The common way this occurs is if the random variables are freelancer because mugwump random variables are uncorrelated. We show this below for three random variables, S = X 1 + X 2 + X 3, but the consequence extends to any count : Var [ S ] =E [ CS-E [ S ] l 2 ] = E [ { CX1 -p1l + ( Xz -pz ) + ( X3 -p3l ) 2 ] = E [ ( XI -p1l 2 ] + 2E [ ( XI -pl ) ( Xz -pz ) joule + 2E [ ( XI -p1 ) ( X3 -p3 ) ] + E [ ( Xz -pzl 2 ] + 2E [ ( Xz -pz ) ( X3 -p3 ) ] +E [ CX3-/l3l 2 ] =Var [ Xi ] +2Cov [ X 1, X 2 ] +2Cov [ Xi > X 3 ] + Var [ X 2 ] +2Cov [ Xz, X3 ] + Var [ X 3 ] If the adam values are uncorrelated, then the covariances are 0, and the division is the sum of the variances : Var [ S ] = Var [ Xi ] + Var [ X 2 ] + Var [ X 3 ] This resultant role generalizes to the summarize of n random variables : if the random variables are uncorrelated, then the variation of the sum is the summarize of the variances.

114

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

Computing PMFs of a total of random variables is, in general, difficult. however, in the special case when the random variables are autonomous, the PMF of the total is the whirl of the individual PMFs. similarly, the MGF of the sum is the merchandise of the individual MGFs. Let X and Y be possibly pendent random variables with articulation PMF PXY ( k, [ ), and let S =X+ Y. The PMF of Sis Ps ( newton ) = Pr [ S = n ]

= Pr [ X+ Y= n ] = I : Pr [ X+ Y=

ni Y= I ] Pr [ Y= I ]

( LTP )

I

= I : Pr [ X =n-Il Y = I ] Pr [ Y =I ]

( using Y= I )

I

If we further assume that X and Y are independent, the conditional PMF reduces to the PMF of X, Pr [ X = nitrogen -II Y =I ] = Pr [ X = nitrogen -I ], and we get a cardinal result : Ps ( north ) = I : Pr [ X= n -I ] Pr [ Y= I ] I

( 5.7 )

= LPx ( n- [ ) py ( fifty ) I

Thus, we see the PMF of Sis the gyrus of the PMF of X with the PMF of Y. This resultant role generalizes : if S = X 1 + X 2 + · · · + Xn and the adam ; are independent, each with PMF phosphorus ; ( · ), then the PMF of Sis the whirl of the respective PMFs : Ps = P1

* P2 * ··· * Pn

( 5.8 )

Consider a simple example of two quadrilateral dice being thrown. LetX 1 andX 2 be the numbers from each die, and letS= X 1 + X 2 peer the union of the two numbers. ( A quadrilateral die is a tetrahedron. Each corner corresponds to a count. When throw, one corner is up. ) Assume each ten ; is uniform and the two dice are autonomous. then, 1 1 1 Pr [ S=2 ] =Pr [ X 1 = 1 ] Pr [ X2 = 1 ] =- ·- = 4 4 16 Pr [ S=3 ] =Pr [ X 1 = 1 ] Pr [ X2 =2 ] +Pr [ X1 =2 ] Pr [ X2 = 1 ] = 1 1 1 1 1 1 3 Pr [ S = 4 ] =- ·- +- ·- +- ·- = 4 4 4 4 4 4 16 Pr [ S=5 ] =

4

16 3 Pr [ S=6 ] = 16

2

16

5.5 Sums of Independent Random Vari abl vitamin e

115

2

Pr [ S=7 ] = 16 1 Pr [ S=8 ] = 16

This is shown schematically below :

gas constant roentgen radius r

*

, T

radius gas constant radius roentgen

one Ji

T,

Comment 5.5 : many people get confused about computing convolutions. It is slowly, howeve radius, if you organize the calculation the correctly direction. Assume we have two vectors of data ( either from bespeak processing or PMFs ) that we wish to convolve : z =x * yttrium. We will use a simple vector notation, such asx = [ 1, 2, 3, -4 ] andy= [ 3, 4, 5 ]. Define two vector operations : summarize [ x ] = 1 + 2 + 3-4 = 2 is the sum of the elements of adam, and length [ x ] = 4 is the number of elements ofx ( from the first nonzero element to the last nonzero component ). Convolutions obey two simple relations that are utilitarian field-grade officer radius checking one ‘s calculations. lfz =x * yttrium, then sum [ z ] = union [ x ] · sum { yttrium ]

( 5.9 ) ( 5.10 )

length [ z ] =length [ x ] +length { yl- 1 It is convenient to assume duration [ x ] then :

~

length { yl ; if not, reverse the roles ofx andy.

1. Make a table with x along the peak andy going down the left side. 2. Fill in the rows of the postpone by multiplyingx by each value ofy and shifting the rows over one place for each place down. 3. Sum the column. Do not “ carry ” as if you were doing long multiplication. The sum isz=X * Y· For case, for ten andy as above, the postpone looks like this :

3 4 5

1 3

3

2 6 4 10

3 9 8 5 22

-4

-12 12 10

-16 15

-20

10

-1

-20

Thus, z= [ 3, 10, 22, 10, -1, -20 ] = [ 1, 2, 3, -4 ]

* [ 3, 4, 5 ] =X * Y

kernel [ z ] = 24 = 2 ·12 = kernel [ x ] · sum { yttrium ] distance [ z ] = 6 = 4 + 3- 1 =length [ x ] +length { yl- 1

116

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

EXAMPLE SA

Pascal ‘s triangle is in truth the repeat convolution of [ 1, 1 ] with itself. here is the mesa :

2

2 2

3 3 3 4

6

4

= [ 1, 1 ] * [ 1, 1 ] * [ 1, 1 ] * [ 1, 1 ]

[ 1,4,6,4, 1 ]

sum [ 1,4,6,4, 1 ] = 16 = 24 = sum [ 1, 1 ] · union [ 1, 1 ] · summarize [ 1, 1 ] · summarize [ 1, 1 ]

Comment 5.6 : Another way of organizing the mesa is like retentive generation :

3

4 6

3

10

2 3

3 4

-4

15 -16

-20

9

10 12 -12

22

10

-1

-20

5 8

5

Multiply the terms as if doing long multiplication, but do not carry.

As we mentioned earlier, the MGF corresponds to the Laplace transform in signal process ( except for the change of sign in the exponent ). just as the Laplace transform converts gyrus to generation, sol does the MGF : 8

Jts ( uranium ) = E [ vitamin e ” ]

= E [ european union ( X, +Xz+···+Xnl ] = E [ e ” x, euXz … einsteinium ” Xn

lambert

= E [ vitamin e ” X ‘ ] E [ vitamin e ” X2 ] · • ·E [ east ” Xn ] = Jt, ( uracil ) Jtz ( uranium ) · · ·Jtn ( u )

( by independence ) ( 5.11 )

The MGF of the sum is the product of the individual MGFs, all with the same rate of u.

5.6 Sample Probabilities, Mean, and Variance 117 As an exercise, let X 1 i, i = 1,2, …, n, be a succession of mugwump random variables with the Poisson distribution, each with a different It ;. For thousand = 0, 1, 2, … and It ; > 0, .. : tk

Pr [ X ; =

kj = k ; e — ‘ ;

., triiodothyronine { ten ; ( u ) =E [ euX ; ] = e- ‘ ; x. then,

p=

Y 1 +Y2 +···+Y n = fraction of truthful samples newton

This theme generalizes to other probabilities. Simply let Y ; = 1 if the consequence is true and Y ; = 0 if false. The y ; are mugwump Bernoulli random variables ( see Example 4.1 ). Sums of independent Bernoulli random variables are binomial random variables, which will be studied in chapter 6.

118

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

The most popular estimate ( by far ) of the mean is the sample distribution average. We denote the estimate of the think of as florida and the sample distribution mean as Xn :

If the Xi have mean Jl and variance a 2, then the mean and variance of Xn are the following : east [ Xn ] = –

~ { E [ X ! ] +E [ X2 ] + ··· +E [ Xn ] ) = newton

np = Jl

n

~

1

~

Var [ Xn ] = 2 { Var [ X ! ] + Var [ X2 ] + · · · + Var [ Xn ] ) = – 2 = normality normality n The beggarly of Xn equals ( the unknown respect ) Jl. In other words, the expect respect of the sample distribution average is the mean of the distribution. An calculator with this property is indifferent. The variance of Xn decreases with n. As newton increases ( more observations are made ), 2 the squared error E [ ( Xn- Jl ) ] = Var [ Xn ] = a 2 In tends to 0. An calculator whose variation goes to 0 as normality ~ = is consistent. This is a good thing. Gathering more data means the estimate gets better ( more accurate ). ( See Section 10.1 for more on unbiasedness and consistency. ) For exemplify, the sample probability above is the sample average of the Yi values. consequently, E [ p ] = nE [ Yi ] = np = p newton north Var [ p ] = nVar [ Yd = p ( lp ) n2 n As expected, the sample probability is an unbiased and coherent estimate of the underlie probability. The sample distribution variance is a common estimate of the underlying variation, a 2. Let~ denote the estimate of the variability. then, ~

1

n

n -I

k= one

–

a 2 = – -. [ CX-X ) 2

( 5.12 )

~is an indifferent estimate of a 2 ; that is, E [ ~ ] = a 2 • For exercise, assume the observations are 7, 4, 4, 3, 7, 2, 4, and 7. The sample average and sample variability are 38 8 ~ ( 7- 4.75 ) 2 + ( 4- 4.75 ) 2 + … + ( 7- 4.75 ) 2 a2 = = 3.93 8-1

Xn =

7+ 4+ 4+ 3+ 7+ 2+ 4+ 7

8

= – =4.75

The data were generated from a Poisson distribution with parameter /1, = 5. hence, the sample think of is near to the distribution mean, 4.75 “ ‘ 5. The sample variation, 3.93, is a bit further from the distribution division, 5, but is reasonable ( specially for lone eight observations ).

5.7 Histograms

119

The data above were generated with the following Python code :

import scipy.stats as st meaning numpy as nurse practitioner l, nitrogen = 5,8 x = st. poisson ( liter ). recreational vehicle ( n ) muhat = nurse practitioner. sum ( x ) /len ( x ) sighat = neptunium. total ( ( x-muhat ) * * 2 ) / ( len ( x ) -1 ) print ( muhat, sighat ) We will revisit beggarly and discrepancy estimate in Sections 6.6, 8.9, 9.7.3, and 10.2.

5.7 HISTOGRAMS Histograms are frequently used techniques for estimating PMFs. Histograms are separate graphic and part analytic. In this section, we introduce histograms in the context of estimating discrete PMFs. We revisit histograms in chapter 10. Consider a sequence adam 1, X 2, •••, Xn of n lid random variables, with each ten ; uniform on1,2, … ,6. In Figure 5.1, we show the undifferentiated PMF with m = 6 as bars ( each banish has area 116 ) and two unlike histograms. The first is a sequence of 30 random variables. The counts of each consequence are 4, 7, 8, 1, 7, and 3. The second sequence has 60 random variables with counts 10, 6, 6, 14, 12, and 12. In both cases, the counts are divided by the number of observations

0.167

Pr [ X = k ]

0.167

2

3

4

k

5

FIGURE 5.1 Comparison of uniform PMF and histograms of uniform observations. The histograms have 30 ( top ) and 60 ( bottom ) observations. In general, as the number of observations increases, the histogram looks more and more like the PM F.

120 CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES ( 30 or 60 ) to obtain estimates of the PMF. As the act of observations increases, the histogram estimates broadly get closer and closer to the PMF. Histograms are often drawn as bar graph. figure 5.2 shows the same data as figure 5.1 with the observations as a bar graph and the true PMF shown as a tune.

0.167

0.167

2

3

4

k

5

6

FIGURE 5.2 Histograms draw as legal profession graph. The acme of each barricade equals the number of observations ofthat value divided by the sum number of observations ( 30 or 60 ). The bottom histogram for newton = 60 is more consistent than the upper histogram for newton = 30.

comment 5.8 : In all three computing environments, Matlab, R, and Python, the histogram instruction is hist ( x, options ), where xis a sequence of values. The options vary between the three versions. Both Matlab and Python default to 10 evenly space bins. The default in R is a snatch more complicated. It chooses the number of bins dynamically based on the numeral of points and the range of values. Note that if the data are discrete on a fixed of integers, the bins should be integer widths, with each bank identification number ideally centered on the integers. If the bins have non-integer widths, some bins might be empty ( if they do not include an integer ) or may include more integers than other bins. Both problems are forms of aliasing. bottom line : do not rely on the default option bank identification number values for computing histograms of discrete random variables.

5.8 ENTROPY AND DATA COMPRESSION Many engineer systems involve large data sets. These data sets are frequently compressed earlier storage or transmission. The compaction is either lossless, meaning the original data

5.8 Entropy and Data Compression

121

can be reconstructed precisely, or lossy, meaning some information is lost and the original data can not be reconstructed precisely. For exemplify, the standard fax compression standard is lossless ( the scan procedure introduces loss, but the black-and-white dots are compressed and transmitted losslessly ). The joint Photographic Experts Group ( JPEG ) image compression standard is a lossy proficiency. Gzip and Bzip2 are lossless, while MP3 is lossy. curiously, most lossy compression algorithm incorporate lossless methods internally. In this section, we take a little detour, discussing the problem of lossless data compression and presenting a standard of complexity called the randomness. A celebrated theorem says the have a bun in the oven issue ofbits needed to encode a source is lower bounded by the information. We besides develop Huffman tease, an optimum gull technique.

5.8. 1 Entropy and Information Theory Consider a source that emits a sequence of symbols, X 1, X 2, X 3, etc. Assume the symbols are independent. ( If the symbols are pendent, such as for English text, we will ignore that dependence. ) Let X denote one such symbol. Assume X takes on one letter, ak > from an alphabet of megabyte letters. case alphabets include { 0,1 } ( molarity = 2 ), { a, bel, vitamin c ,. .., omega } ( megabyte = 26 ), { 0,1, … ,9 } ( molarity = 10 ), and many others. Let p ( kilobyte ) = Pr [ X = ak ] be the PMF of the symbols. The information of X, H [ X ], is a measure of how unpredictable ( i.e., how random ) the data are : m

H [ X ] =E [ -log { p ( Xl ) ] =- LP ( kilobyte ) log { p ( thousand ) )

( 5.13 )

k= fifty

Random variables with higher information are “ more random ” and, therefore, more unpredicable than random variables with lower information. The “ log ” is normally taken to base 2. If thus, the information is measured in bits. If the log is to base vitamin e, the information is in nats. Whenever we evaluate an information, we use bits.

gloss 5.9 : Most calculators compute logs to base einsteinium ( often denoted ln ( x ) ) or base 10 ( logarithm 10 ( x ) ). Converting from one base to another is simple : logarithm 2 ( adam )

log ( x )

=-

1og ( 2 )

where log ( x ) and log ( 2 ) are base e or base 10, whichever is commodious. For exemplify, log 2 ( 8 ) = 3 = loge ( 8 ) /log. ( 2 ) = 2.079/0.693.

It should be emphasized the information is a routine of the probabilities, not of the rudiment. Two different alphabets having the lapp probabilities have the like randomness. The information is nonnegative and upper bounded by log ( thousand ) : 0 :5 H [ X ] : o ; log ( thousand )

122 CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES The lower bind follows from the basic notion that all probabilities are between 0 and 1, 0 :5 p ( thousand ) :5 1, which implies log { p ( thousand ) ) :5 0. consequently, -log { p ( thousand ) ) ~ 0. The lower bind is achieved if each condition in the summarize is 0. This happens if p ( kilobyte ) = 0 ( the limit asp~ 0 of plog ( p ) is 0 ; see the design below ) or p ( kelvin ) = 1 ( since log ( liter ) = 0 ). The distribution that achieves this is pervert : one result has probability 1 ; all early outcomes have probability 0.

0.531

P -1 ~

= _e-

–

log2CP l

loge ( 2 )

)

We show the amphetamine bound in department 5.8.4 where we solve an optimization trouble : maximize H [ X ] over all probability distributions. The maximizing distribution is the undifferentiated distribution. In this sense, the undifferentiated distribution is the most random of all distributions over molarity outcomes. For example, consider an meter = 4 alphabet with probabilities 0.5, 0.2, 0.1, and 0.1. Since molarity = 4, the randomness is upper bounded by log ( 4 ) = 2 bits. The randomness of this distribution is H [ X ] = -0.5log 2 ( 0.5 ) – 0.2log 2 ( 0.2 ) – O.llog 2 ( 0.1 ) – O.llog 2 ( 0.1 ) = 1.69 bits :5

log2 ( 4 ) = 2 bits

Entropy is a all-important concept in communications. The randomness of X measures how a lot information adam contains. When Alice transmits X to Bob, Bob receives H [ X ] bits of information. For utilitarian communications to take place, it is necessary that H [ X ] > 0. The limited case when x is binary ( i.e., m = 2 ) occurs in many applications. Let X be binary with Pr [ X = 1 ] = p. then, H [ X ] = -p1og 2 ( p ) – ( 1- p ) log 2 ( 1- p ) = hydrogen ( p )

heat content ( p ) is known as the binary information function. It is shown in Figure 5.3.

henry ( p )

0.5

0 0.1 1

0.5

p

0.89 1

FIGURE 5.3 The binary randomness function heat content ( p ) = -p log 2 ( phosphorus ) – ( 1 – phosphorus ) log 2 ( 1 – p ) versus phosphorus.

( 5.14 )

5.8 Entropy and Data Compression

123

The binary information function obeys some simple properties : • heat content ( phosphorus ) = henry ( lambert –

p ) for 0 ~ p ~ 1

• planck’s constant ( O ) = planck’s constant ( lambert ) = 0 • henry ( 0.5 ) = 1

• h ( 0.1l ) = heat content ( 0.89 ) = 0.5 The joint randomness function of X andY is defined similarly to Equation ( 5.13 ) : H [ X, Yj = E [ -log { p ( X, Yl ) ] =- LL ) XY ( kilobyte, lambert ) log { pXY ( thousand, liter ) ) thousand

I

The joint information is bounded from above by the union of the individual entropies : H [ X, Y ] ~H [ X ] +H [ Y ] For case, consider X and Y as defined in section 5.4 : H [ X ]

=-~log 2 ( ~ ) -~log 2 ( ~ ) -~log 2 ( ~ ) 12 12 12 12 12 12 -~log 2 ( ~ ) -_. ! ._log 2 ( _. ! ._ ) 12 12 12 12 = 2.23 bits

H [ Y ] =-~log 2 ( ~ ) -.i_log 2 ( .i_ ) _2log2 ( 2 ) =l.55bits 12 12 12 12 12 12 H [ X, Y ] = -12 x _. ! ._ log 2 ( _. ! ._ ) = 3.58 bits 12 12 ~ H [ X ] + H [ Y ] = 2.23 + 1.55 = 3.78 bits If X and Y are independent, the joint randomness is the sum of individual entropies. This follows because log ( bachelor of arts ) = log ( a ) + log ( b ) : H [ X, Y ] =E [ -log { phosphorus ( X, Yl ) ]

( definition )

=E [ -log { pCXlpCYl ) ]

( independence )

=E [ -log { pCXl ) ] +E [ -log { pCYl ) ]

( additivity of E [ · j )

=H [ X ] +H [ Y ] In compendious, the information is a measure of the randomness of one or more random variables. All entropies are nonnegative. The distribution that maximizes the information is the uniform distribution. Entropies of independent random variables add.

5.8.2 variable Length Coding The information is closely related to the problem of efficiently encoding a data sequence for communication or storage. For example, consider a five-letter rudiment { a, bel, coke, five hundred, east } with probabilities 0.3, 0.3, 0.2, 0.1, and 0.1, respectively. The randomness of this reservoir is

124

CHAPTER

5

MULTIPLE DISCRETE RANDOM VARIABLES

H [ X ] = -0.3log 2 ( 0.3 ) -0.3log 2 ( 0.3 ) -0.2log 2 ( 0.2 ) -O.llog 2 ( 0.1 ) -O.llog 2 ( 0.1 ) = 2.17 bits per symbol

Since this informant is not consistent, its randomness is less than log2 ( 5 ) = 2.32 bits. Consider the problem of encoding this data source with binary codes. There are five symbols, so each one can be encoded with three bits ( 2 3 = 8 ~ 5 ). We illustrate the code with a binary star tree : 0

0

a

barn

c

e

vitamin d

Code words start at the etymon of the tree ( top ) and proceed to a flick ( bottom ). For example, the sequence aabec is encoded as 000 · 000 · 001 · 100 · 010 ( the “ dots ” are shown merely for exposition and are not transmitted ). This seems wasteful, however, since entirely five of the eight possible code words would be used. We can prune the tree by eliminating the three unused leaf and shortening the remaining branch :

~

A a

b

,

c

five hundred

now, the final letter has a different cryptography duration than the others. The sequence aabec is now encoded as 000 · 000 · 001 · 1 · 010 for a savings of two bits. Define a random variable L representing the code length for each symbol. The modal tease length is the expect rate of L : e [ L ] = 0.3 adam 3 +0.3 x 3 +0.2 x 3 +0.1 x 3 +0.1 x 1 = 2.8 bits This is a savings in bits. Rather than 3 bits per symbol, this code requires an average of entirely 2.8 bits per symbol. This is an case of a variable length code. unlike letters can have different code lengths. The expected length of the code, E [ L ], measures the code performance. A broad class of variable length codes can be represented by binary trees, as this one is. Is this the best code ? Clearly not, since the shortest code is for the letter e even though einsteinium is the least frequently occurring letter. Using the shortest code for the most frequently

5.8 Entropy and Data Compression

125

occurring letter, a, would be better. In this subject, the tree might look like this : 0

0

a

0

0

bacillus

c

five hundred

e

The ask code distance is now E [ L ] = 0.3 adam 1 +0.3 x 3 +0.2 x 3 +0.1 x 3 +0.1 x 3 = 2.4 bits Is this the best code ? No. The best code is the Huffman code, which can be found by a simple recursive algorithm. First, number the probabilities in grouped order ( it is convenient, though not necessary, to sort the probabilities ) :

0.3

0.3

0.2

0.1

0.1

Combine the two smallest probabilities :

/\2 0.3

0.3

0.2

0.1

0.1

Do it again :

/ \.4 ”

.I .1\ .. 2

0.3

0.3

0.2

0.1

0.1

0.3

0.3

0.2

0.1

0.1

And again :

126

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

Finally, the last step : 1.0

0.6

0.3

0.3

0.2

0.1

0.1

This is the optimum tree. The letters have lengths 2, 2, 2, 3, and 3, respectively. The expect code length is E [ L ] = 0.3 ten 2 +0.3 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

Comment 5.10 : If there is a tie in merging the nodes ( iodine .e., three or more nodes have the same minimal probability ), merge any pair. The specific trees will vary depending on which pair is merged, but each tree will result in an optimum code. In other words, if there are ties, the optimum code is not alone.

There is a celebrated theorem about coding efficiencies and randomness due to Claude Shannon. 1 Theorem 5.3 ( Shannon, 1948 ) : For any decodable ( tree ) code, the expected gull distance is lower bounded by the randomness : vitamin e [ L ] ~H [ X ]

( 5.15 )

In the model above, the code length is 2.2 bits per symbol, which is slightly greater than the information of 2.17 bits per sample. When does the expected gull duration equal the randomness ? To answer this question, we can equate the two and compare the expressions term by term : e [ L ] meter

,

L, p ( thousand ) l ( kilobyte ) k= one

J, H [ X ] m

=I, pCkl ( -log Cp ( kll ) 2

k= one

We have explicitly used log 2 because coding trees are binary. We see the expressions are equal if liter ( k ) = -log 2 { phosphorus ( potassium ) ) 1 Claude Elwood Shannon ( 1916-2001 ) was an american mathematician, electric engineer, and cryptanalyst known as “ the father of information theory : ‘

5.8 Entropy and Data Compression

127

or, equivalently, if p ( k ) =Tl ( kl

For exemplar, consider an megabyte = 4 informant with probabilities 0.5, 0.25, 0.125, and 0.125. These correspond to lengths 1, 2, 3, and 3, respectively. The Huffman tree is shown below :

0.5

0.25 0.125 0.125

The expect cryptography length and the information are both 1.75 bits per symbol.

5.8.3 Encoding Binary Sequences The encoding algorithm discussed above need to be modified to work with binary star data. The problem is that there is only one tree with two leaves :

~

1-p phosphorus The ask length is E [ L ] = ( 1- phosphorus ) · 1 + p · 1 = 1 for all values of p. There is no compaction gain. The whoremaster is to group consecutive input signal symbols together to form a supersymbol. If the input symbols are grouped two at a time, the succession 0001101101 would be parsed as 00 · 01 · 10 · 11 · 01. Let Y denote a supersymbol formed from two normal symbols. Y then takes on one of the “ letters ” : 00, 01, 10, and 11, with probabilities ( 1- p ) 2, ( 1- p ) p, ( 1- p ) p, and p2, respectively. As an case, the binary randomness routine equals 0.5 when phosphorus = 0.11 or p = 0.89. Let us take phosphorus = 0.11 and see how well pairs of symbols can be encoded. Since the letters are mugwump, the probabilities of pairs are the products of the individual probabilities : Pr [ oo ] = Pr [ O ] · Pr [ o ] = 0.89 adam 0.89 = 0.79 Pr [ 01 ] = Pr [ O ] ·Pr [ 1 ] = 0.89 ten 0.11 = 0.10 Pr [ lO ] = Pr [ 1 ] · Pr [ O ] = 0.11 ten 0.89 = 0.10 Pr [ 11 ] = Pr [ 1 ] · Pr [ 1 ] = 0.11 ten 0.11 = 0.01

128

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

The Huffman tree looks like this :

0.79 0.1

0.1 0.01

The ask length per input symbol is

[ fifty

E L =

0.79×1+0.1×2+0.1×3+0.01×3 1.32 boron b lambert = – – = 0.66 its per sym o 2 2

By combining two input symbols into one supersymbol, the have a bun in the oven code rate drops from 1 moment per symbol to 0.66 bits per symbol, a savings of about one-third fewer bits required. however, the code pace, 0.66 bits per symbol, is still 32 % higher than the theoretical rate of heat content ( 0.11 ) = 0.5 bits per symbol.

5.8.4 Maximum Entropy What distribution has maximum randomness ? This motion arises in many applications, including spectral analysis, imaging, and signal reconstruction. here, we consider the dim-witted problem, that of maximizing information without early constraints, and use the method acting of Lagrange multipliers to find the utmost. The independent information theorem is the following : theorem 5.4 : 0 :5 H [ X ] : o ; log ( megabyte ). Furthermore, the distribution that achieves the maximum is the uniform.

To prove the upper tie, we set up an optimization problem and solve it using the method acting of Lagrange multipliers. Lagrange multipliers are wide used in economics, opera-

tions research, and engineering to solve restrain optimization problems. unfortunately, the Lagrange multiplier method acting gets insufficient care in many undergraduate tartar sequences, so we review it here. Consider the stick to constrained optimization problem : meter

maxH [ X ] =- LP ( kilobyte ) log { phosphorus ( kilobyte ) ) phosphorus ( kelvin ) ‘ second

k=l

megabyte

subject to

LP ( kilobyte ) = 1 k= liter

The function being maximized ( in this case, H [ Xj ) is the aim function, and the restraint is the limitation that the probabilities sum to 1. The randomness for any distribution is amphetamine bounded by the maximum randomness ( i.e., by the randomness of the distribution that solves this optimization problem ).

5.8 Entropy and Data Compression

129

immediately, rewrite the constraint as 1- L. ‘ k=1 phosphorus ( kelvin ) = 0, introduce a Lagrange multiplier It, and change the optimization trouble from a stiffen one to an unconstrained problem :

This unconstrained optimization problem can be solved by taking derivatives with respect to each variable and setting the derivatives to 0. The Lagrange multiplier It is a variable, so we have to differentiate with respect to it equally well. First, note that five hundred phosphorus – ( – plog ( pl ) = -log ( phosphorus ) — = -log ( phosphorus ) -1

displaced person

phosphorus

The derived function with deference to p ( fifty ) ( where lithium arbitrary ) looks like 0 = -log { pCll ) -1-/t

for alii= 1,2, …, m

Solve this equation for p ( fifty ) : phosphorus ( l ) = e — ‘- 1

for alii= 1,2, …, m

Note that p ( l ) is a constant freelancer of l ( the right side of the equality above is not a affair of fifty ). In other words, all the phosphorus ( fifty ) values are the lapp. The derivative with regard to It brings back the constraint : meter

0= 1- LP ( thousand ) k= !

Since the p ‘s are constant, the constant must be p ( fifty ) = 1I m. The end step is to evaluate the information : H [ X ]

m :5- L, p ( k ) log { p ( k ) ) k= !

=-

1 ( 1 ) Lm -log-

k= ! megabyte

thousand

mlog ( meter )

megabyte =log ( megabyte ) frankincense, the theorem is proved. Let us repeat the main result. The information is bounded as follows : 0 :5 H [ X ] : o ; log ( megabyte ) The distribution that achieves the lower bound is devolve, with one phosphorus ( k ) = 1 and the respite equal to 0. The distribution that achieves the maximal is the uniform distribution, phosphorus ( kelvin ) =lim fork= 1,2, …, m.

130

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

Comment 5.11 : The maximal information optimization problem above besides includes

inequality constraints : phosphorus ( kelvin ) ~ 0. We ignored those ( treated them implicitly ) and solved the problem anyhow. The resulting solution satisfies these constraints, phosphorus ( kilobyte ) = 1/m > 0, therefore rewarding our indolence. Had the optimum solution not satisfied the inequality constraints, we would have had to impose the inequality constraints explicitly and solve the optimization problem with more sophisticate searching algorithm. As one might gather, such problems are well harder.

EXAMPLES.S

here is a simpleton exemplar to demonstrate the Lagrange multiplier method acting. Minimize x 2 + yttrium 2 subject to x+ 2y = 3. Using a Lagrange multiplier, II., convert the problem to an optimization over three variables : min ( x2 + x, y, -1.

l

+ 11. ( 3 -x+2yl )

Differentiate the routine with deference to each of the three variables, and obtain three equations :

0=2x-l\. 0 = 2y-211. 3 =x+2y The solution is adam = 3 I 5, y = 615, and II. = 615, as shown in Figure 5.4.

x+2y= 3

FIGURE 5.4 Illustration of a Lagrange multiplier problem : Find the point on the lineage x + 2y = 3 that minimizes the distance to the origin. That luff is ( 0.6, 1.2 ), and it lies at a distance v’1Jl from the origin.

Summary

131

Let X and Y be two discrete random variables. The joint probability aggregate officiate is

PXY ( kelvin, [ ) = Pr [ X = xk n Y = yttrium ! ] for all values of thousand and l. The PMF values are nonnegative and sum to 1 : ~0

PXY ( thousand, [ ) 00

00

I : LPXY ( kilobyte, [ ) = 1 k=Oj=O

The bare probability mass functions are found by summing over the undesirable variable :

Px ( thousand ) = Pr [ X=k ] = I : Pr [ X=xkn Y= y ! ] = LPXY ( kilobyte, fifty ) I

I

py ( liter ) = Pr [ Y =I ] = I : Pr [ X = Xk n Y = y ! ] = LPXY ( potassium, liter ) thousand

k

The joint distribution routine is

FXY ( uracil, vanadium ) = Pr [ X :5 uracil n Y

:5

five ]

X and Y are independent if the PMF factors :

PXY ( k, fifty ) = Px ( potassium ) py ( fifty )

for all thousand and fifty

or, equivalently, if the distribution function factors :

FXY ( u, v ) = Fx ( u ) Fy ( vanadium )

for all uranium and v

The have a bun in the oven rate of g ( X, Y ) is the probabilistic average :

The correlation coefficient of X and Y is rxy = E [ XY ]. ten and Yare uncorrelated if rxy = JlxJly· The covariance of X andY is a xy = Cov [ X, Y ] = E [ ( X- JlxHY- Jly ) ] = rxy- JlxJly· If X andY are uncorrelated, then a xy = 0. Let Z =aX+ by. then, E [ Zj =aE [ X ] +bE [ Y ] Var [ Z ] = a 2 Var [ X ] + 2abCov [ X, Y ] + b2 Var [ Y ] If X and Yare freelancer, the PMF of S =X+ Y is the gyrus of the two borderline PMFs, and the MGF of Sis the intersection of the MGFs of X and Y :

Ps ( n ) = LPx ( thousand ) p1 ( n- k ) thousand

J { mho ( uranium ) =J { ten ( uracil ) J { yttrium ( five )

132

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

Probabilities and moments can be estimated from samples. Let Xi be north hat samples, and let Yi = 1 if the event is true and Yi = 0 if the consequence is false. then,

n –

1

fl =Xn =- { Xi +X2 + ··· +Xn ) n

~

u2

1

n

–

= -. [ CX-X } 2 newton -1 k= i

The information is a bill of the randomness of ten : H [ X ] =E [ -log { pCXl ) ] =-. [ phosphorus ( k ) log { p ( k ) ) kilobyte

If X has m outcomes, the information is between 0 and log ( m ) ; that is, 0 :5 H ( X } : o ; logarithm ( thousand ). The distribution that achieves the upper bind is the uniform distribution. Maximum information problems can often be solved by the method of Lagrange multipliers. The log is normally taken to base 2, log 2 ( adam ) = log ( adam ) ! log ( 2 ). The ask length of a lossless compression code is lower bounded by the information :

E [ L ]

~H [ X ]

The Huffman code is an optimum code. It builds a code tree by repeatedly combining the two least probable nodes.

5.1

Let X and Y have the follow joint PMF :

y

~

0.1 0.1 I 0.0 0

0.0 0.0 0.1

0.1 0.1 0.2 2

0.0 0.1 0.2 3

X

a. What are the marginal PMFs of X and Y ? boron. What are the conditional probabilities ( computed directly ) of X given Y and Y given X ( compute them directly ) ?

c. What are the conditional probabilities of Y given X from the conditional probabilities of X given Y using Bayes theorem ? 5.2

Using the probabilities in Problem 5.1 : a. What are vitamin e [ X ], E [ Y ], Var [ X ], Var [ Y ], and Cov [ X, Y ] ? barn. Are X and Y independent ?

Problems 5.3

133

Let X and Y have the take after joint PMF :

y

I 0.1

1 0

0.0

0.1 0.1

0.1 0.2

0.1 0.3

— -oo ; — — — – ; ; — — — – ; 2~ — — – ; 3~

X

a. What are the bare PMFs of X andY ? b. What are the conditional probabilities ( computed directly ) of X given Y and Y given X ( compute them directly ) ? c. What are the conditional probabilities of Y given X from the conditional probabilities of X given Y using Bayes theorem ? 5.4

Using the probabilities in Problem 5.3 : a. What are east [ X ], E [ Y ], Var [ X ], Var [ Y ], and Cov [ X, Y ] ? bel. Are X and Y independent ?

5.5

Continue the model in Section 5.4, and consider the joint transformation, U = min ( X, Y ) ( e.g., min ( 3,2 ) = 2 ), and W = soap ( X, Y ). For each transformation : a. What are the level curves ( draw pictures ) ? bel. What are the individual PMFs of U and W ? c. What is the joint PMF of U and W ?

5.6

Continue the case in Section 5.4, and consider the joint transformation V = 2X- Y and V ‘ = 2 Y- X. For each transformation : a. What are the horizontal surface curves ( draw pictures ) ? bel. What are the individual PMFs of V and V ‘ ? c. What is the joint PMF of V and V ‘ ?

5.7

X andY are jointly distributed as in the figure below. Each scatter is equally probably.

•

2

y

•

•

•

0 .___.___.___.___._____ 0

2

3

X

~

4

a. What are the first-order PMFs of X andY ? b. WhatareE [ X ] andE [ Y ] ? c. What is Cov [ X, Y ] ? Are X andY mugwump ? d. If W =X- Y, what are the PMF of Wand the average and variability of W ? 5.8

Find a joint PMF for X and Y such that X and Y are uncorrelated but not freelancer. ( trace : find a simple table of PMF values as in Example 5.1 such that X and Y are uncorrelated but not independent. )

134

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

5.9

Prove Theorem 5.1.

5.10

Let S = X 1 +X 2 + X 3, with each x ; liD uniform on the outcomes k = 1, 2, 3, 4. What is the PMF ofS ?

5.11

What are the first four terms of the convolution of the space sequence [ 112, 114,118, 1116, … ] with itself ?

5.12

What is the convolution of the infinite sequence [ 1, 1, 1, … ] with itself ?

5.13

If X and Y are independent random variables with means J.lx and J.ly, respectively, and variances a~ and respectively, what are the mean and division of Z =aX+ aside for constants a and b ?

5.14

Let X1, Xz, and X3 be lid Bernoulli random variables with Pr [ X ; = 1 ] = Pr [ X ; = 0 ] = 1- p. What are the PMF, mean, and discrepancy of S =X1 +Xz +X3 ?

5.15

Let X1 and Xz be independent geometric random variables with the same p. What is the PMFofS=X1 +Xz ?

5.16

Let X1 and Xz be autonomous Poisson random variables with the lapp A. What is the PMFofS=X1 +Xz ?

5.17

Suppose X1, Xz, and X3 are hat undifferentiated on thousand = 0, 1,2,3 ( i, Pr [ X ; = k ] = 0.25 fork= 0,1,2,3 ). What is the PMF ofS =X1 +Xz +X3 ?

5.18

Generate a succession of 50 lid Poisson random variables with A= 5. Compute the sample distribution mean and discrepancy, and compare these values to the bastardly and discrepancy of the Poisson distribution.

5.19

Generate a sequence of 100 eyelid Poisson random variables with A = 10. Compute the sample mean and division, and compare these values to the beggarly and division of the Poisson distribution.

5.20

A sequence of 10 hat observations from a U ( O, 1 ) distribution are the following : 0.76 0.92 0.33 0.81 0.37 0.05, 0.19, 0.10, 0.09, 0.31. Compute the sample beggarly and sample distribution discrepancy of the datum, and compare these to the beggarly and variance of a U ( O, 1 ) distribution.

5.21

In them= 5 Huffman coding case in Section 5.8.2, we showed codes with efficiencies of 3, 2.8, 2.4, and 2.2 bits per symbol.

aJ,

phosphorus and

a. Can you find a code with an efficiency of2.3 bits per symbol ? b. What is the worst code ( tree with five leaves ) for these probabilities you can find ? 5.22

Let a four-letter rudiment have probabilities p = [ 0.7, 0.1, 0.1, 0.1 ]. a. What is the randomness of this rudiment ? boron. What is the Huffman code ? c. What is the Huffman code when symbols are taken two at a time ?

5.23

Continue the binary star Huffman coding example in Section 5.8.3, but with three stimulation symbols per supersymbol. a. What is the Huffman tree ? bacillus. What is the expect code distance ? c. How far is this code from the theoretical specify ?

Problems

135

5.24

Write a broadcast to compute the Huffman code for a given stimulation probability vector.

5.25

What is the information of a geometric random varying with parameter p ?

5.26

Let X have mean J.lx and division a~. Let Y have mean J.ly and variability Let Z =X with probability p and Z = Y with probability 1- p. What are E [ Z ] and Var [ Z ] ? hera is an model to help understand this question : You flip a coin that has probability p of coming up heads. If it comes up heads, you select a part from box ten and bill some quantity that has mean J.lx and variance a~ ; if it comes up tails, you select a What are part from box Y and measure some quantity that has mean J.ly and variance the think of and variance of the measurement taking into account the effect of the coin somersault ? In commit, most experimental designs ( for example, polls ) try to avoid this problem by sampling and measuring X and Y individually and not relying on the whims of a coin flip.

aJ.

aJ.

5.27

The conditional information of X given Y is defined as H ( XIY ) =- LLPXY ( kilobyte, fifty ) log ( pxiY ( klll ) thousand

Show H ( X, Y ) 5.28

I

= H ( Y ) + H ( XIY ). Interpret this result in words.

Consider two probability distributions, p ( thousand ) fork= 1,2, …, meter and q ( k ) fork= 1,2, …, m. The Kullback-Leibler divergence ( KL ) between them is the watch : KL ( PIIQl

thousand

( iodine )

i= i

q ( metric ton )

= LP ( i ) log ( L )

( 5.16 )

~0

( 5.17 )

The KL deviation is always nonnegative : KL ( PIIQl

Use the log inequality given in Equation ( 3.25 ) to prove the KL inequality given in Equation ( 5.17 ). ( touch : picture -KL ( PIIQl ~ 0 rather, and rearrange the equality to “ hide ” the minus sign and then apply the inequality. ) One application of the KL divergence : When data X have probability phosphorus ( k ) but are encoded with lengths designed for distribution q ( k ), the KL divergence tells us how many extra bits are required. In other words, the coding length -log ( q ( kelvin ) ) are best if q ( kilobyte ) = p ( kilobyte ) for all k. 5.29

Solve the follow optimization problems using Lagrange multipliers : a. minx2 + fifty such that x- yttrium = 3 x, y

b. maxx + y such that x 2 + fifty ten, y

=1

c. coquette + liter + omega such that x + 2y+ 3z = 6 2

x, y, z

2

136

CHAPTER 5 MULTIPLE DISCRETE RANDOM VARIABLES

5.30

Prove the Cauchy-Schwarz inequality :

where the ten ‘s andy ‘s are arbitrary numbers. trace : starting signal with the follow inequality ( why is this true ? ) : 0~

n

L ( x ; – ay ; ) 2

for all values of a

i= fifty

Find the rate of a that minimizes the veracious -hand side above, utility that value into the lapp inequality, and rearrange the terms into the Cauchy-Schwarz inequality at the top. 5.31

Complete an option proof of Equation ( 5.6 ). a. Show E [ XY ] ~E [ X 2 ] E [ Y 2 ] for any X and Yusing the methods in Problem 5.30. 2

boron. Show this implies Cov [ X, Y ] ~ Var [ X ] Var [ Y ] and hence P~y ~ 1. 2

5.32

Consider the following maximum information trouble : Among all distributions over the integers, k = 1,2,3, … with known mean J.l =. [ b 1 kitchen police ( kelvin ), which one has the maximum randomness ? distinctly, the answer is not the undifferentiated distribution. A undifferentiated distribution over molarity = oo does not make smell, and even if it did, its base would be oo. The stiffen optimization problem looks like the adopt : 00

maxp Ckl’sH [ X ] =- : [ phosphorus ( kelvin ) log ( phosphorus ( k ) ) k=i 00

subject to : [ p ( thousand )

00

= 1 and

k=i

L kitchen police ( k ) = J.l k=i

a. Introduce two Lagrange multipliers, A and 1/f, and convert the restrain trouble to an unconstrained trouble over the phosphorus ( kilobyte ) and A and 1/f. What is the unconstrained problem ? boron. Show the p ( thousand ) satisfy the keep up : 0 = -log ( p ( kl ) -1- A -k1f !

fork= 1,2,3, …

c. What two other equations do the p ( kilobyte ) meet ? d. Show the phosphorus ( thousand ) equate to the geometric distribution.

CHAPTER

BINOMIAL PROBABILITIES

Two teams, say, the Yankees and the Giants, play an n-game series. If the Yankees win each game with probability p independently of any other game, what is the probability the Yankees win the series ( i, more than half the games ) ? This probability is a binomial probability. binomial probabilities arise in numerous applications-not barely baseball. In this chapter, we examine binomial probabilities and develop some of their properties. We besides show how binomial probabilities apply to the problem of correcting errors in a digital communications system.

6.1 BASICS OF THE BINOMIAL DISTRIBUTION In this section, we introduce the binomial distribution and compute its PMF by two methods. The binomial distribution arises from the sum of eyelid Bernoulli random variables ( for example, flips of a coin ). Let X ; fori= 1,2, …, normality be lid Bernoulli random variables with Pr [ X ; = 1 ] = p and Pr [ X ; = D ] = 1- p ( throughout this chapter, we use the conventionality q = 1- p ) :

then, S has a binomial distribution. The binomial PMF can be determined as follows : Pr [ S = k ] =

L

Pr [ succession with k liter ‘s and normality -k D ‘s ]

sequ ences with kelvin 1 ‘s

Consider an arbitrary sequence with k 1 ‘s and n- potassium D ‘s. Since the flips are mugwump, the probabilities multiply. The probability of the succession is pkqn-k. eminence that each sequence with k 1 ‘s and n- kilobyte D ‘s has the lapp probability.

137

138

CHAPTER 6 BINOMIAL PROBABILITIES

The number of sequences with potassium 1 ‘s and n- k D ‘s is

G ). frankincense, ( 6.1 )

PMF values must satisfy two properties. First, the PMF values are nonnegative, and second, the PMF values sum to 1. The binomial probabilities are distinctly nonnegative as each is the product of three nonnegative terms, ( ~ ), pk, and q ” -k. The probabilities sum to 1 by the binomial theorem ( Equation 3.7 ) :

The binomial PMF for n = 5 and p = 0.7 is shown in Figure 6.1. The PMF values are represented by the heights of each root. An alternative is to use a browning automatic rifle graph, as shown in Figure 6.2, wherein the lapp PMF values are shown as bars. The height of each stripe is the PMF value, and the width of each prevention is 1 ( so the area of each bar equals the probability ). besides shown in figure 6.1 are the mean ( Jl = 3.5 ) and standard diversion ( a = 1.02 ).

1+, 1_

Jl = 3.5 a= 1.02 ____, ,+• ! >

I

0.360

I

phosphorus ( kelvin )

0

5

k

FIGURE 6.1 Binomial PMF for nitrogen = 5 and p = 0.7. The probabilities are proportional to the heights of each stem. besides shown are the think of ( I ! = 3.5 ) and standard deviation ( a= 1.02 ). The largest probability occurs fork= 4 and is equal to 0.360.

0.360 phosphorus ( thousand )

0

2

3

4

5

FIGURE 6.2 Binomial probabilities for normality = 5 and p = 0.7 as a banish graph. Since the bars have width adequate to 1, the sphere of each prevention equals the probability of that value. Bar graph are specially useful when comparing discrete probabilities to continuous probabilities.

6.1 Basics of the Binomial Distribution

EXAMPLE6.1

139

Consider a sequence of 56 Bernoulli hat phosphorus = 0.7 random variables, 1111111 · 1110101 · 1110010 · 1000111 · 0011011 · 0111111 · 0110011 · 0100110. Each group of seven bits is summed, yielding eight binomial observations, 75444643 ( total across the rows in the table below ). Bernoullis

Binomial 7

0

0 0 0 0

0 0

0

0

0 0

0 0

0 0

5 0

4 4

0

4

0

6 4 3

0

For exemplar, the probability of getting a 4 is

In a sequence of eight binomial random variables, we expect to get about 8 x 0.23 = 1.81 fours. In this sequence, we observed 4 fours. The binomial probabilities satisfy an interest and utilitarian recursion. It is convenient to define a few quantities : Sn-1 =X1 +X2 + ··· +Xn-1 Sn =Sn-1 +Xn b ( newton, k, p ) = Pr [ Sn = k ]

Note that Sn-l andXn are independent sinceXn is independent of X 1 through Xn-l· The recursion is developed through the LTP : Pr [ Sn = k ] = Pr [ Sn = kiXn = 1 ] Pr [ Xn = 1 ] + Pr [ Sn = kiXn = O ] Pr [ Xn = 0 ] = Pr [ Sn-1 = k-11Xn = 1 ] p+ Pr [ Sn-1 = kiXn = Ojq = Pr [ Sn-1 = k-1 ] p + Pr [ Sn-1 = k ] q We used the independence of Sn-l and Xn to simplify the conditional probability. Using the boron ( newton, potassium, phosphorus ) notation gives a simple recursion : bacillus ( nitrogen, thousand, p ) = boron ( newton -1, k -1, phosphorus ) ·p + b ( newton -1, thousand, phosphorus ) · q

( 6.2 )

Equation 6.2 gives us a Pascal ‘s triangle-like method to calculate the binomial probabilities.

140 CHAPTER 6 BINOMIAL PROBABILITIES

FIGURE 6.3 Binomial probabilities organized in Pascal ‘s triangle for n up to 5. Each submission is a weighted kernel with weights p and q = 1 – p ofthe two entries above it. For example, 1op2q3 = ( 4 pq3 ) phosphorus + ( 6p2q2Jq.

remark 6.1 : The recursive development is good a long-winded manner of saying the binomial probabilities are the perennial gyrus of the Bernoulli probabilities. here is a convolution table for n = 2, normality = 3, and north = 4 :

q p

q q2

p2

q3

2pq 2pq2 pq2

p2q 2p2q

q3

3pq2

3p2q

p3

q•

3pq3 pq3

3p2q2 3p2q2

p3q 3p3q

p•

q•

4pq3

6p2q2

4p3q

p•

q2 q phosphorus q phosphorus

p pq pq

p2 p3

The diverse intermediate rows list binomial probabilities ( for example, for n = 3, Pr [ N = 1 ] = 3pq 2 and Pr [ N = 2 ] = 3p 2q ).

To demonstrate binomial probabilities, here is a succession of 30 observations from an normality = 5, p = 0.7 binomial distribution : 4, 3, 4, 5, 3, 4, 4, 4, 5, 2, 3, 5, 3, 3, 3, 4, 3, 5, 4, 3, 3, 3, 0, 4, 5, 4, 4, 2, 1, 4. There are one 0, one 1, two 2 ‘s, ten 3 ‘s, eleven 4 ‘s, and five 5 ‘s. Note the histogram approximates the PMF fairly well. Some differences are apparent ( for example, the sequence has a 0

6.2 Computing Binomial Probabilities

141

even though a 0 is improbable, Pr [ X = 0 ] = 0.002 ), but the overall shapes are pretty like. The histogram and the PMF are plotted in Figure 6.4.

0.360 phosphorus ( k )

0

3

2

4

5

FIGURE 6.4 Plot of a binomial nitrogen = 5, p = 0.7 PMF ( as a bar graph ) and histogram ( as dots ) of30 observations. Note the histogram sanely approximates the PMF.

In drumhead, for independent flips of the same coin ( lid random variables ), Bernoulli probabilities answer the question of how many heads we get in one flip. binomial probabilities answer the motion of how many heads we get hostel flips. EXAMPLE6.2

How likely is it that a sequence of 30 hat binomial normality = 5, phosphorus = 0. 7 random variables would have at least one 0 ? First, calculate the probability of getting a 0 in a single notice :

Second, calculate the probability of getting at least one 0 in 30 tries. Each trial has six possible outcomes, 0, 1, 2, 3, 4, and 5. however, we are lone interest in O ‘s or not-O ‘s. We fair calculated the probability of a 0 in a signal test as 0.00243. consequently, the probability of a not-0 is 1 -0.00243. thus, the probability of at least one 0 is 1 minus the probability of noO ‘s : Pr [ no O ‘s in 30 trials ] = ( 1 – 0.00243 ) 30 = 0.9296 Pr [ at least one 0 in 30 trials ] = 1 -0.9296 = 0.0704 Thus, about 7 % of the time, a sequence of 30 trials will contain at least one 0.

6.2 COMPUTING BINOMIAL PROBABILITIES To compute the probability of an interval, say, l :5 S :5 meter, one must sum the PMF values : thousand

Pr [ l : o ; S : o ; m ] =L, bel ( normality, kilobyte, p ) k=l

142

CHAPTER 6 BINOMIAL PROBABILITIES

This calculation is facilitated by computing the b-complex vitamin ( normality, kelvin, p ) recursively. First, expression at the proportion :

Thus, n-k+1 ) bel ( n, kilobyte, p ) = b-complex vitamin ( normality, k-1, p ) · ( k- ·

( P ) q

( 6.3 )

Using this convention, it is fiddling for a computer to calculate binomial probabilities with thousands of terms. The same controversy allows one to analyze the sequence of binomial probabilities. boron ( newton, kilobyte, phosphorus ) is larger than b ( north, k -1, p ) if ( n-k+ l ) p

— -~ > 1

kq

Rearranging the terms gives kelvin < ( n + l ) phosphorus ( 6.4 ) Similarly, the terms b ( north, k, phosphorus ) and bel ( n, k- 1, p ) are equal if k = ( north + lambert ) phosphorus and barn ( nitrogen, kelvin, phosphorus ) is less than b ( normality, thousand -1, phosphorus ) if k > ( nitrogen + lambert ) p. For case, for nitrogen = 5, phosphorus = 0.7, and ( nitrogen + fifty ) phosphorus = 4.2, the b ( 5, k,0.7 ) sequence reaches its maximal at thousand = 4, then decreases fork= 5. This rise and accrue are shown in Figure 6.1.

6.3 MOMENTS OF THE BINOMIAL DISTRIBUTION The average of the binomial distribution is nurse practitioner and the discrepancy npq. In this section, we derive these values three ways. First, we use the fact that a binomial is a kernel of lid Bernoulli random variables. second, we perform a target calculation using binomial probabilities. Third, we use the MGF. Of course, all three methods lead to the lapp answers. First, since S is a sum of eyelid Bernoulli random variables, the mean and variance of S are the union of the means and variances of the X ; ( see Section 5.5 ) : /ls=E [ S ] =E [ X ! ] +E [ X2 ] +···+E [ Xn ] =np Var [ S ] = Var [ X ! ] + Var [ X 2 ] + · · · + Var [ Xn ] = npq sinceE [ X ] =pand Var [ X ] =pq. frankincense, we see the entail and discrepancy of the Bernoulli distribution are neptunium and npq, respectively. For example, Figure 6.1 shows the PMF for a binomial distribution with normality = 5 and p = 0.7. The mean is p = neptunium = 5 ten 0.7 = 3.5, and the discrepancy is a 2 = 5 adam 0.7 x 0.3 = 1.05 = ( 1.02 ) 2 •

6.3 Moments of the Binomial Distribution

143

Second, compute the mean directly using binomial probabilities, taking advantage of Equation ( 3.22 ), kG ) =

nG= : ) :

n

E [ Sj = I : k·Pr [ X=k ] k=O n

= I : k·Pr [ X=k ]

( k = 0 condition is 0 )

k=l

=

degree fahrenheit north ( nk — fifty ) lqn-k

k=l

( using Equation 3.22 )

fifty

L

= neptunium n-l ( n-

1 ) plqn-l-1

( exchange of variables, l = k – fifty )

l = nurse practitioner ( phosphorus + q ) n-l

( using the binomial theorem )

=np

( sincep+q = l )

1=0

It is crafty to compute E [ S2 ] directly. It is easier to compute E [ S ( Sadjust the formula for computing the division :

l ) ]

beginning and then

We will besides need to extend Equation ( 3.22 ) : ( 6.5 ) Using this formula, we can compute E [ S ( S-

E [ SCS-l ) ] =

l ) ] :

f

thousand ( k -l ) ( n ) lqn-k k=O thousand

= nitrogen ( nitrogen -l ) p2

Ln ( n- 2 ) l-2qn-k

k=2 k- 2 = normality ( n-l ) p 2

immediately, finish the calculation :

a ;

=E [ SCS-ll ] +E [ S ] -E [ Sj 2 = north ( n-l ) p 2 +np-n 2p 2 = neptunium ( lp ) = npq

See Problem 4.18 for a similar approach in computing moments from a Poisson distribution. Third, compute the beggarly and variation using the MGF :

144

CHAPTER 6 BINOMIAL PROBABILITIES

now, compute the entail :

E [ X ]

=

! ! _Jts ( uranium ) lambert du

= nitrogen ( pe ” +q ) ” – 1pe ” lu=o=n ( p+q ) ” – 1p=np

u=O

In Problem 6.15, we continue this development and compute the variation using the MGF. In compendious, using three different methods, we have computed the mean and discrepancy of the binomial distribution. The beginning method acting exploits the fact that the binomial is the union of lid Bernoulli random variables. This method is immediate because the moments of the Bernoulli distribution are computed well. The moment method calculates the moments directly from the binomial PMF. This straightforward method acting needs two nontrivial binomial coefficient identities ( Equations 3.22 and 6.5 ). however, for many early distributions, the send calculation proceeds promptly and well. The third base method acting uses the MGF. Calculate the MGF, distinguish, and set the derived function to 0. For other problems, it is handy to be able to apply all three of these methods. It is often the case that at least one of the three is easy to apply-though sometimes it is not obvious ahead which one.

6.4 SUMS OF INDEPENDENT BINOMIAL RANDOM VARIABLES Consider the kernel of two freelancer binomial random variables, N = N 1 +N2, where N 1 and N 2 use the same value of p. The foremost might represent the count of heads in normality 1 flips of a mint, the second the phone number of heads in n2 flips of the same ( or an identical ) coin, and the kernel the number of flips inn= normality 1 + n2 flips of the coin ( or coins ). All three of these random variables are binomial. By this reckon argumentation, the union of two freelancer binomial random variables is binomial. This is most easily shown with MGFs :

Since the latter formula is the MGF of a binomial random varying, N is binomial. nowadays, let us ask the opposite question. Given that N 1 and N 2 are autonomous binomial random variables and the sum N = N 1 + N2, what can we say about the conditional probability of N 1 given the sum N = I ? It turns out the conditional probability is not binomial : Pr [ N 1 =kiN=m ] Pr [ N 1 =knN=m ] Pr [ N=m ]

( definition )

Pr [ N 1 =knN 2 =m-k ] Pr [ N=m ] Pr [ N 1 =k ] Pr [ N 2 = m-k ] Pr [ N=m ]

( by independence )

6.4 Sums of Independent Binomial Random Variables

(

145

nz ) m-k n2-m+k thousand Pkqn1-k ( m-k Pq

nl )

( : ) pmqn-m 1 ( :

) ( mn~k ) ( 6.7 )

( : ) In fact, the conditional probability is hypergeometric. This has a simpleton interpretation. There are { ; ) sequences of megabyte heads in normality trials. Each sequence is evenly likely. There are { “ ~ ) (, , : ‘~k ) sequences with k heads in the beginning n 1 positions and m- thousand heads in the end n 2 = n- nitrogen 1 positions. frankincense, the probability is the numeral of sequences with thousand heads in the first newton 1 flips and m- k heads in the future n 2 = n- nitrogen 1 flips, divided by the issue of sequences with m heads in newton flips. For exemplify, let normality 1 = n 2 = 4 and k = 4. so, four of the eight flips are heads. The probability of all four heads in the first four positions is

The probability of an adequate split, two heads in the first four flips and two in the second gear four flips, is

Clearly, an adequate split is much more probably than having all the heads in the beginning four ( or the last four ) positions.

comment 6.2 : It is utilitarian to note what happened here. N, by itself is binomial, but N, given the respect of N, + N2 is hypergeometric. By conditioning on the kernel, N, is restricted. A superficial example of this is when N = 0. In this case, it follows that N, must besides be 0 ( since N, + N2 = 0 implies N, = 0 ). When N = 1, N, must be 0 or 1.

146

CHAPTER 6 BINOMIAL PROBABILITIES

6.5 DISTRIBUTIONS RELATED TO THE BINOMIAL The binomial is related to a number of early distributions. In this section, we discuss some of these : the hypergeometric, the polynomial, the negative binomial, and the Poisson. In chapter 9, we discuss the connection between the binomial and the gaussian distribution.

6.5. 1 Connections Between Binomial and Hypergeometric Proba bismuth Iities The binomial and hypergeometric distributions answer similar questions. Consider a box containing north items, with n 0 labeled with O ‘s and newton 1 labeled with 1 ‘s ( newton = n 0 + nitrogen 1 ). Make a choice of molarity items without substitution, and let N denote the total of 1’sin the excerpt. then, the probabilities are hypergeometric :

The first item has probability ( north 1 In ) ofbeing a 1. The second token has conditional probability ( n 1 – 1 ) I ( normality – 1 ) or n 1 I ( n- 1 ) depending on whether the first item selected was a 1 or a 0. In line, if the items are selected with replacement, then the probabilities are constant. The probability of a 1 is ( n 1 In ) careless of previous selections. In this case, the probability of a excerpt is binomial. If newton 0 and north 1 are large, then the probabilities are approximately ceaseless. In this case, the hypergeometric probabilities are approximately binomial. For case, let nitrogen 0 = normality 1 = 5 and m = 6. Selected probabilities are listed below : probability

Binomial

Hypergeometric (

W ) =100 – =0.476

5

Pr [ N= 3 ]

( 6 ) 20 = 0.313 3 0.5 3 0.5 3 = 64

.1 ……. l_

Pr [ N= 5 ]

( 6 ) 6 = 0.094 5 0.5 5 0.5 1 = 64

megabyte m= _2_ = o.024 C~ ) 210

C~ )

210

In drumhead, if the survival is made with surrogate, the probabilities are binomial ; if the survival is made without substitution, the probabilities are hypergeometric. If the act of items is large, the hypergeometric probabilities are approximately binomial. As the choice size gets large, the hypergeometric distribution favors balanced selections ( for example, half 1’sand half O ‘s ) more than the binomial distribution. conversely, unbalance selections ( e.g., alii ‘s or all O ‘s ) are much more probably with the binomial distribution.

6.5 Distributions Related to the Binomial

147

Binomial probabilities tend to be easier to manipulate than hypergeometric probabilities. It is sometimes utilitarian to approximate hypergeometric probabilities by binomial probabilities. The approximation is valid when the count of each detail is large compared with the total of each selected.

6.5.2 Multinomial Probabilities Just as the polynomial coefficient ( Equation 3.13 ) generalizes the binomial coefficient ( Equation 3.3 ), the polynomial distribution generalizes the binomial distribution. The binomial distribution occurs in counting experiments with two outcomes in each test ( for example, heads or tails ). The polynomial distribution occurs in like count experiments but with two or more outcomes per trial. For example, the english speech uses 26 letters ( occurring in both capital and lowercase versions ), 10 numbers, and versatile punctuation symbols. We can ask questions like “ What is the probability of a letter thymine ? ” and “ How many letter t ‘s can we expect to in a string of newton letters ? ” These questions lead to polynomial probabilities. Consider an experiment that generates a sequence of n symbols X 1, X 2, •••, Xn. For example, the symbols might be letters from an alphabet or the colors of a series of automobiles or many other things. For appliance, we will assume each symbol is an integer in the scope from 0 to m – l. ( In early words, n is the sum of the counts, and megabyte is the size of the alphabet. ) Let the probability of result thousand be Pk = Pr [ X ; = k ]. The probabilities sum to 1 ; that is, Po+ P1 + · · · + Pm-1 = l. Let Nk equal the number of X ; = k fork= 0, 1, …, m -l. Thus, N 0 + N 1 + · · · + N m-l = n. For example, normality is the entire number of cars, and N 0 might be the number of loss cars, N 1 the total of bluing cars, etc. The probability of a especial collection of counts is the polynomial probability : Pr [ N0 — k 0 n … normality N garand rifle — k garand rifle ] – ( k k Q,

n k

1, …,

m-l

) Poko PIk1 .. ·Pm-1 km – 1

( 6.8 )

For model, a beginning emits symbols from a four-letter alphabet with probabilities Po = 0.4, p 1 = 0.3, p 2 = 0.2, and p 3 = 0.1. One sequence of 20 symbols is 1, 2, 0, 2, 0, 3, 2, 1, 1, 0, 1, 1, 2, 1, 0, 2, 2, 3, 1, 1. 1 The counts are No= 4, N 1 = 8, N 2 = 6, and N 3 = 2. The probability of this particular set of counts is PrN [ 0 =4nN 1 =8nN2 =6nN3 =2 lambert =

20

( 4,8,6,2

) 0.40.30.20.1 4 8 6 2

= 0.002 This probability is little for two reasons. First, with n = 20, there are many possible sets of counts ; any particular one is improbable. second, this particular sequence has relatively few D ‘s despite 0 being the most probable symbol. For comparison, the ask counts ( 8,6,4,2 ) have probability 0.013, about six times as likely, but still occurring merely approximately once every 75 trials. 1

The first sequence I generated.

148

CHAPTER 6 BINOMIAL PROBABILITIES

The hateful and variance of each reckon are ( 6.9 )

E [ N ; j =npi

( 6.10 )

Var [ N ; ] = npi ( l- Pi ) These are the same as for the binomial distribution. The covariance between Ni and N 1 fori fc j is

( 6.11 )

The covariance is a bill of how one variable varies with changes to the other variable star. For the polynomial, theN ‘s sum to a ceaseless, No+ N 1 + · · · + N m-l = n. IfNi is greater than its average, it is likely that NJ is less than its average. That Cov [ Ni, NJ ] < 0 for the polynomial follows from this simple observation. For exemplify, in the model above, E [ No ] = 20 ten 0.4 = 8, E [ Ni ] = 20 ten 0.3 = 6, Var [ No ] = 20 x 0.4 adam 0.6 = 4.8, and Cov [ N0, Ni ] = -20 x 0.4 adam 0.3 = -2.4. 6.5.3 The Negative Binomial Distribution The binomial distribution helps answer the interview “ In north freelancer flips, how many heads can one expect ? ” The negative binomial distribution helps answer the overrule doubt “ To get k heads, how many independent flips are needed ? ” Let N be the number of flips required to obtain thousand heads. The event { N = nitrogen ) is a sequence of normality – 1 flips, with k – 1 heads followed by a head in the last flip. The probability of this sequence is [ n-1 ) fifty ( PrN=n = k- 1 p k qn-k forn=k, k+1, k+2, … ( 6.12 ) The first 12 terms of the veto binomial distribution for phosphorus = 0.5 and k = 3 are shown below : p = 0.5, k= 3 3 8 n 14 For example, Pr [ N = 5 ] = ( ; = : ) 0.5 3 0.5 5 – 3 = ( ~ ) 0.5 5 = 6 x 0.03125 = 0.1875. just as the binomial is a total of n Bernoulli random variables, the negative binomial is a sum of kilobyte geometric random variables. ( Recall that the geometric is the number of flips required to get one drumhead. ) consequently, the MGF of the negative bionomial is the MGF of the geometric raised to the kth world power : ( by Equations 4.23 and 5.11 ) ( 6.13 ) 6.5 Distributions Related to the Binomial 149 Moments of the negative binomial can be calculated easily from the geometric. Let X be a geometric random variable with entail lip and discrepancy ( 1- phosphorus ) lp 2. then, E [ Nj =k·E [ X ] = ~ ( 6.14 ) p kelvin ( l- phosphorus ) Var [ N ] =k·Var [ X ] = – 2 phosphorus EXAMPLE6.3 ( 6.15 ) In a baseball inning, each team sends a succession of players to bat. In a simplified interpretation of the game, each dinge either gets on basis or makes an out. The team bat until there are three outs ( k = 3 ). If we assume each dinge makes an out with probability p = 0. 7 and the batters are autonomous, then the issue of batters is veto binomial. The first few probabilities are -1 ) Pr [ N = 3 ] = 3 0.7 3 0.3 3 – 3 = 0.343 ( 3-1 Pr [ N = 4 ] = 4-1 ) 0.7 3 0.3 4 – 3 = 0.309 ( 3-1 Pr [ N = 5 ] = 5-1 ) 0.7 3 0.3 5 – 3 = 0.185 ( 3-1 Pr [ N = 6 ] = 6-1 ) 0.7 3 0.3 6 – 3 = 0.093 ( 3-1 The beggarly number of batters per team per inning is 3 E [ Nj =-=4.3 0.7 6.5.4 The Poisson Distribution The Poisson distribution is a counting distribution ( see Section 4.5.3 ). It is the limit of the binomial distribution when nitrogen is large and p is modest. The advantage of the agreement is that binomial probabilities can be approximated by the easier-to-compute Poisson probabilities. LetNbe binomial with parameters normality andp, and let It= np =E [ N ]. We are interest in cases when normality is large, phosphorus is small, but It is mince. For example, the number of telephones in an area code is in the hundreds of thousands, the probability of any given earphone being used at a particular clock time is minor, but the median act of phones in use is centrist. As another exemplar, consider transmitting a large file over a noisy radio transmit. The file contains millions of bits, the probability of any given bite being in error is belittled, but the number of errors per file is moderate. 150 CHAPTER 6 BINOMIAL PROBABILITIES The probability N = k is Pr [ N = k ] = north ! lo- phosphorus ) ” -k ( kn ) lo- p ) ” -k = thousand ! ( n-k ) ! Let us look at what happens when newton is large, p is belittled, and /1, = nurse practitioner is centrist : normality ! ( n-k ) ! — – ” ‘n k log ( lp ) ” -k = ( n-k ) log ( l-/1, /n ) ( Jt=np ) “ ‘nlog ( l- Jtln ) ( n-k ” ‘ newton ) “ ‘-/\, ( log ( fifty + x ) “ ‘ x ) ( 1- p ) n-k “ ‘ vitamin e — ‘ Putting it all together, for kelvin = 0, 1, 2, … ( 6.16 ) These are the Poisson probabilities. To summarize, the limit of binomial probabilities when n is large, p is little, and /1, = nurse practitioner is moderate is Poisson. figure 6.5 shows the convergence of the binomial to the Poisson. The top graph shows a slightly poor people convergence when n = 10 and p = 0.5. The bed graph shows a much better convergence with normality =50 and phosphorus = 0.1. 0.246 0 Binomial nitrogen = 10, phosphorus Poisson ll = 5 8 9 10 11 0 Binomial n = 50, p Poisson ll = 5 8 9 10 11 0.175 0 2 3 4 5 6 7 0.185 = 0.5 = 0.1 0.175 0 2 3 4 5 6 7 FIGURE 6.5 Comparison of binomial and Poisson PMFs. Both have the same A= neptunium. The top graph compares a binomial with north = 10, p = 0.5 to a Poisson with A= 5. The agreement is poor. The buttocks graph has n =SO and p = 0.1 and shows much better agreement. 6.6 Binomial and Multinomial Estimation 151 6.6 BINOMIAL AND MULTINOMIAL ESTIMATION A common trouble in statistics is estimating the parameters of a probability distribution. In this section, we consider the Bernoulli, binomial, and polynomial distributions. As one example, consider a medical experiment to evaluate whether or not a new drug is helpful. The drug might be given to newton patients. Of these n, k patients improved. What can we say about the probability the drug leads to an improvement ? LetX 1, X 2, …, Xn ben eyelid Bernoulli random variables with unknown probability phosphorus of a 1 ( and probability q = 1- phosphorus of a 0 ). LetS= X 1 + · · · + Xn be the total of the random variables. then, Sis binomial with parameters n and p, and E [ S ] = np Var [ S ] = npq We will use the notation power takeoff denote an calculate of p. private detective bold because it is a random varying ; its value depends on the outcomes of the experiment. Let kelvin equal the actual total of fifty ‘s observed in the sequence and normality – potassium equal the watch number of O ‘s. An obvious calculate of phosphorus is A s k n p=-=n E [ phosphorus ] = E [ S ] n = np =p n Var [ phosphorus ] = Var [ S ] = npq = pq n2 n2 n Since the expect prize of principal investigator phosphorus, we say pis an unbiased estimate of p, and since the variance of p goes to 0 as n ~ =, we say p is a reproducible appraisal of p. Unbiased means the average value of the calculator equals the value being estimated ; that is, there is no bias. reproducible means the variability of the calculator goes to 0 as the number of observations goes to eternity. In short, estimators that are both unbiased and coherent are likely to give good results. Estimating the parameters of a polynomial distribution is like. Let Xi be an observation from an rudiment of molarity letters ( from 0 tom- 1 ). Let Pi= Pr [ X = iodine ], and let qi be the number of i ‘s in north observations. then, Pi is As with the binomial distribution, Pi= kJn is an unbiased and consistent calculator of Pi· In summary, the obvious calculator of phosphorus in the binomial and polynomial distributions is the sample distribution average of the newton random variables, X 1, X 2, …, Xn. Since it is an indifferent and 152 CHAPTER 6 BINOMIAL PROBABILITIES consistent calculator of p, the sample average is a estimable calculator and is normally used. furthermore, as we will see in late chapters, the sample distribution average is frequently a beneficial parameter appraisal for other distributions as well. comment 6.3 : It is specially authoritative in estimate problems like these to distinguish between the random variables and the observations. x and S are random variables. Before we perform the experiment, we do not know their values. After the experiment, X and Shave values, such asS= k. Before doing the experiment, p=Sin is a random varying. After doing the experiment, p has a particular respect. When we say pis unbiased and consistent, we mean that if we did this experiment many times, the average value ofp would be close top. 6.7 ALOHANET In 1970, the University of Hawaii built a radio receiver network that connected four islands with the central campus in Oahu. This network was known as Alohanet and finally led to the widely used Ethernet, Internet, and cellular networks of nowadays. The Aloha protocol led to many advances in computer networks as researchers analyzed its strengths and weaknesses and develop improvements. The master Aloha network was a star, something like the illustration below : D The individual nodes A, B, C, and so forth, communicate with the hub H in Oahu over one air radio receiver groove, and the hub communicates with the nodes over a second broadcast radio duct. The incoming channel was shared by all users. The original mind was for any outside drug user to send a mailboat ( a short data fusillade ) to the central hub whenever it had any data to send. If it received the packet correctly, the cardinal hub broadcasted an acknowledgment ( if the mailboat was destined for the hub ) or rebroadcasted the mailboat ( if the mailboat was destined for another node ). If two or more nodes transmitted packets that overlapped in time, the hub received none of the packets correctly. This is called a collision. Since the nodes could not hear each other ( radios can not both impart and receive on the same channel at the lapp time ), 6.7 Alohanet 153 collisions were detected by listening for the hub ‘s reception ( either the acknowledgment or the rerun ). Consider a collision as in the example below : One exploiter sends a package from time t 1 to meter t2 = triiodothyronine 1 + T ( hearty tune ). If another user starts a mailboat at any time between t0 = t 1 – triiodothyronine and thyroxine 2 ( dashed lines ), the two packets will partially overlap, and both will need to be retransmitted. soon subsequently, a significant improvement was realized in Slotted Aloha. As the appoint suggests, the transmission times became slot. Packets were transmitted lone on slot boundaries. In the exemplar above, the beginning daunt package would be transmitted at clock time metric ton 1 and would ( wholly ) collide with the other packet. however, the moment dashed packet would wait until t2 and not collide with the other mailboat. At humble rates, Slotted Aloha reduces the collision rate in half. Let us calculate the efficiency of Slotted Aloha. Let there be nitrogen nodes sharing the communications distribution channel. In any given slot, each node generates a package with probability p. We assume the nodes are mugwump ( i.e., whether a node has a packet to transmit is freelancer of whether any other nodes have packets to transmit ). The network successfully transmits a packet if one and only one lymph node transmits a mailboat. If no node air, nothing is received. If more than one node transmits, a collision occurs. Let N be the number of nodes transmitting. then, Pr [ N=O ] = ( 1-p ) ” ~ Pr [ N = 1 ] = ( ) phosphorus ( l- phosphorus ) n-1 = nurse practitioner ( l- phosphorus ) n-1 Pr [ N ~ 2 ] = 1- ( 1- p ) ” – neptunium ( l- phosphorus ) ” – 1 Let /I, = nurse practitioner be the offered packet rate, or the average phone number of packets attempted per slot. The Poisson estimate to the binomial in Equation ( 6.16 ) gives a dim-witted expression : Pr [ N = 1 ] “ ‘.1\, e — ‘ This throughput formulation is plotted in Figure 6.6. The maximal throughput equals e- 1 = 0.368 and occurs when /I, = 1. similarly, Pr [ N = 0 ] “ ‘ east — ‘ = e- 1 when /I, = 1. so, Slotted Aloha has a maximal throughput of0.37. This means about 37 % of the prison term, precisely one node transmits and the package is successfully transmitted ; about 37 % of the clock time, no node transmits ; and about 26 % of the time, collisions occur. The utmost throughput of Slotted Aloha is rather broken, but even 37 % overstates the throughput. Consider what happens to an individual packet. Assume some node has a package to transmit. The probability this mailboat gets through is the probability no other node has a packet to transmit. When n is large, this is 154 CHAPTER 6 BINOMIAL PROBABILITIES 0.368 Pr [ N= 1 ] FIGURE 6.6 Slotted Aloha ‘s throughput for large n. Pr [ N = 0 ] = e — t. Let this number be r. In other words, with probability radius = vitamin e — triiodothyronine, the packet is successfully transmitted, and with probability fifty – r = l – e — thyroxine, it is blocked ( it collides with another package ). If blocked, the node will wait ( for a random sum of time ) and retransmit. Again, the probability of success is p, and the probability of failure is lambert – p. If blocked again, the node will attempt a third gear prison term to transmit the mailboat, and so on. Let T denote the total of tries needed to transmit the packet. then, Pr [ T=l ] =r Pr [ T=2 ] =r ( l-r ) Pr [ T=3 ] =r ( l-r ) 2 and so on. In general, Pr [ T = k ] = gas constant ( l- radius ) k-i. Tis a geometric random variable, and the mean of Tis On average, each new packet therefore requires vitamin e, thymine tries before it is transmitted successfully. Since all nodes do this, more and more packets collide, and the throughput drops far until the point when all nodes transmit all the time and nothing gets through. The protocol is fluid unless rates well below lIn are used. Aloha and Slotted Aloha are early protocols, though slot Aloha is silent sometimes used ( e.g., when a cellular telephone wakes up and wants to transmit ). As mention, however, both led to many advances in computer networks that are in practice today. 6.8 ERROR CONTROL CODES Error correcting tease ( ECC ) is normally used in communications systems to reduce the effects of transmit errors. We assume the data being communicated are bits and each bit is possibly flipped by the channel. The basic mind of error control condition codes is to send extra bits, which help the receiver detect and right for any transmission errors. As we shall see, the analysis of error control codes uses binomial probabilities. ECC is used in many systems. possibly the first consumer item to use nontrivial ECC was the compact disk ( cadmium ) actor in the early 1980s. Since then, ECC has been incorporated 6.8 Error Control Codes 155 into numerous devices, including cell phones, digital television receiver, radio networks, and others. pretty much any organization that transmits bits uses ECC to combat noise. Throughout this section, we consider a basic class of codes, called linear auction block codes. linear obstruct codes are widely used and have many advantages. other utilitarian codes, such as nonlinear stuff codes and convolutional codes, besides exist but are beyond this text. 6.8. 1 Repetition-by-Three Code As a bare case to illustrate how ECC works, consider the repetition-by-three code. Each stimulation moment is replicated three times, as shown in the table below : Input Output 0 000 111 The code consists of two code words, 000 and 111. The Hamming distance between code words is defined as the act of bits in which they differ. In this case, the distance is three. Let DL · ) be the Hamming distance ; for model, D ( OOO, 111 ) = 3. Each morsel is passed through a binary symmetrical duct with crossover voter probability£, as illustrated below : 1z 1-£ ·1 X y thymine 0 1-£ • 0 The phone number ofbits in a code word that catch flipped is binomial with parameters n = 3 and p =£.Since£ is small, Equation ( 6.4 ) says that the boron ( n, k, p ) succession is decreasing. Getting no errors is more likely than getting one error, getting one error is more likely than getting two errors, and getting two errors is more likely than getting three errors. If W denotes the numeral of errors, then Pr [ W= 0 ] > Pr [ W=

1 ] > Pr [ W = 2 ]

> Pr [ W = 3 ]

The receiver receives one of eight words, 000, 001, …, 111. In Figure 6.7, the two code words are at the left and right. The three words a distance of one away from 000 are listed on the left side, and the three words a distance of one away from 111 are listed on the right field. Error dominance codes can be designed to detect errors or to correct them ( or some combination of both ). An mistake detection outline is typically used with retransmissions. Upon detecting an error, the receiver asks the vector to repeat the communication. Error discipline is used when retransmissions are airy or impossible. The receiver tries to correct errors as best it can.

156

CHAPTER 6 BINOMIAL PROBABILITIES

/ 001 > 1 ). ( hint : one dim-witted way to show a matrix is singular is to find a vector : adam ic 0 such that CX = 0. )

6.21

Use the MGF to calculate the intend and variability of a minus binomial random variable N with parameters k and phosphorus.

164

CHAPTER 6 BINOMIAL PROBABILITIES

6.22

In a holocene five-year period, a humble township in the western United States saw 11 children diagnosed with childhood leukemia. The normal rate for towns that size would be one to two cases per five-year period. ( No net lawsuit of this bunch has been identified. fortunately, the rate of fresh cancers seems to have reverted to the average. ) a. Incidences of cancer are much modeled with Poisson probabilities. Calculate the probability that a exchangeable town with n = 2000 children would have k = 11 cancers using two different values for the probability, Pi = 1/2000 and pz = 2/2000. Use both the binomial probability and the Poisson probability. bacillus. Which rule is easier to use ? c. Most alleged cancer clusters are the result of luck statistics. Does this bunch seem likely to be a chance event ?

6.23

Alice and Bob like to play games. To determine who is the better player, they play a “ best potassium of n ” rival, where normality = 2k- 1. distinctive competitions are “ best 2 of 3 ” and “ best 4 of7 ”, though arsenic boastfully as “ best 8 of 15 ” are sometimes used ( sometimes even larger ). If the probability Alice beats Bob in an individual bet on is phosphorus and the games are mugwump, your finish is to calculate the probability Alice wins the rival. a. such competitions are normally conducted as follows : adenine soon as either player, Alice or Bob, wins thousand games, the competition is over. The remaining games are not played. What is the probability Alice wins ? ( hint : this probability is not binomial. ) b. Consider an alternative. All n games are played. Whichever player has won k or more games wins the competition. now, what is the probability Alice wins the competition ? ( touch : this probability is binomial. ) c. Show the two probabilities calculated above are the lapp.

6.24

A group of north students engage in a contest to see who is the “ best ” mint flipper. Each one flips a coin with probability p of coming up heads. If the coin comes up tails, the scholar stops flipping ; if the coin comes up heads, the scholar continues flipping. The last scholar interchange is the achiever. Assume all the flips are independent. What is the probability at least kilobyte of the newton students are still flipping after triiodothyronine flips ?

6.25

The throughput for Slotted Aloha for finite n is Pr ( N = 1 ] = /1, ( 1- Jtln ) n-l. a. What prize of It maximizes the throughput ? b-complex vitamin. Plot the utmost throughput for versatile values of nitrogen to show the throughput ‘s convergence to e- 1 when n- =·

6.26

In the Aloha collision resolution discussion, we mentioned the node waits a random sum of meter ( number of slots ) before retransmitting the packet. Why does the node wait a random amount of clock time ?

6.27

If S is binomial with parameters n and p, what is the probability S is even ? ( hint : Manipulate ( phosphorus + q ) north and ( p- q ) n to isolate the even terms. )

6.28

Consider an option solution to Problem 6.27. Let Sn = Sn-1 + Xn, where Xn is a Bernoulli random variable that is autonomous of Sn-1· Solve for the Pr [ Sn = tied ] in terms of Pr [ Sn-l =even ], and then solve the recursion.

Problems 6.29

165

The probability of getting no heads in n flips is ( 1 – phosphorus ) “. This probability can be bounded as follows : 1-np~

( 1-p ) ”

~e- neptunium

a. Prove the left-hand inequality. One way is to consider the function henry ( x ) = ( 1- ten ) ” + nx and use tartar to show planck’s constant ( x ) achieves its minimum value when x = 0. Another manner is by generalization. Assume the inequality is true for nitrogen -1 ( i.e., ( 1- pl ” – 1 ~ 1- ( north -l ) phosphorus ), and show this implies it is true for n. b. Prove the right -hand side. Use the inequality logarithm ( fifty + x )

~ x.

c. Evaluate both inequalities for different combinations of n and p with nurse practitioner = 0.5 and neptunium = 0.1. 6.30

Another manner to prove the left inequality in Problem 6.29, 1- nurse practitioner ~ ( 1 – p ) ”, is to use the union jump ( Equation 1. 7 ). Let X 1, X2, …, Xn be eyelid Bernoulli random variables. Let S = X1 +X2 + · ··Xn. then, Sis binomial. Use the union boundary to show Pr [ S > 0 ] ~ neptunium, and rearrange to show the left-hand inequality.

6.31

here ‘s a problem foremost solved by Isaac Newton ( who did it without calculators or computers ). Which is more likely : getting at least one 6 in a bewilder of six dice, getting at least two 6 ‘s in a throw of 12 dice, or getting at least three 6 ‘s in a throw of 18 dice ?

6.32

In the game of Chuck-a-luck, three dice are rolled. The player selects a number between 1 and 6. If the actor ‘s act comes up on precisely one die, the player wins $ 1. If the musician ‘s numeral comes up on precisely two dice, the player wins $ 2. If the actor ‘s number comes up on all three dice, the actor wins $ 3. If the player ‘s numeral does not come improving, the player loses $ 1. Let X denote the player ‘s succeed or loss. a. What is E [ X ] ? boron. In some versions, the player wins $ 10 if all three dice show the player ‘s number. immediately whatisE [ X ] ?

6.33

Write a short-circuit program to simulate a actor ‘s luck while playing the standard version of Chuck-a-luck as described in Problem 6.32. Assume the musician starts out with $ 100 and each play costs $ 1. Let N be the number of plays before a player loses all of his or her money. a. Generate a boastfully phone number of trials, and estimate E [ N ] and Var [ N ]. barn. If each play of the actual bet on takes 30 seconds, how farseeing ( in time ) will the actor typically play ?

6.34

Generalize the repetition-by-three ECC code to a repetition-by-r code. a. What is the probability of a miss ? b. If roentgen is odd, what is the probability of decoding error ? c. What happens if roentgen is even ? ( It is helpful to consider gas constant

= 2 and r = 4 in detail. )

d. Make a decision about what to do if roentgen is even, and calculate the decode error probability.

166

CHAPTER 6 BINOMIAL PROBABILITIES

6.35

For the ( 5,2 ) code in table 6.1 : a. Compute the distance between all pairs of code words, and show the distance of the code is three. b-complex vitamin. Show the deviation between any pair of code words is a code news.

6.36

A celebrated code is the Hamming ( 7,4 ) code, which uses a generator matrix : 1

0

a. What are all2 4

lambert

= 16 code words ?

boron. What is the distance of the code ? c. What are its miss and decoding error rate probabilities ? 6.37

Another celebrated code is the Golay ( 23,12 ) code.lt has about the same rate as the Hamming ( 7 ,4 ) code but a distance of seven. a. What are its miss and decoding error probabilities ? b-complex vitamin. Make a log-log plot of the decoding error rate of the Golay ( 23,12 ) code and the Huffman ( 7,4 ) code. ( hint : some Matlab commands you might find useful are loglog and logspace. Span the range from 10-4 ~ f ~ 10- 1. )

6.38

Compare the tease rates and the erroneousness decoding rates for the ( 3,1 ) repetition code, the ( 5,2 ) code above, the ( 7,4 ) Hamming code, and the ( 23,12 ) Golay code. Is one better than the others ?

CHAPTER

A CONTINUOUS RANDOM VARIABLE

What is the probability a randomly selected person weighs precisely 150 pounds ? What is the probability the person is precisely 6 feet tall ? What is the probability the temperature outside is precisely 45°F ? The solution to all of these questions is the like : a probability of 0. No one weighs precisely 150 pounds. Each person is an gathering of an astronomic total of particles. The likelihood of having precisely 150 pounds worth of particles is negligible. What is meant by questions like “ What is the probability a person weighs 150 pounds ? ” is “ What is the probability a person weighs about 150 pounds ? ” where “ about ” is defined as some convenient horizontal surface of preciseness. In some applications, 150 ± 5 pounds is good enough, while others might require 150 ± 1 pounds or even 150 ± 0.001 pounds. Engineers are accustomed to measurement preciseness. This chapter addresses continuous random variables and deals with concepts like “ about : ‘

7.1 BASIC PROPERTIES A random variable x is continuous if Pr [ X = x ] = 0 for all values of x. What is important for continuous random variables are the probabilities of intervals, say, Pr [ a 0 be a sleep together respect. What is the probability at least thousand of the n random variables are greater than xo ?

8.25

Consider the random sumS= 1 X ;, where the x ; are eyelid Bernoulli random variables with parameter phosphorus and N is a Poisson random variable with argument A. N is independent of the X ; values.

fifty : degree fahrenheit :

a. Calculate the MGF of S. b-complex vitamin. ShowS is Poisson with parameter Ap. here is one rendition of this result : If the phone number of people with a certain disease is Poisson with parameter A and each person tests plus for the disease with probability phosphorus, then the number of people who test cocksure is Poisson with parameter Ap. 8.26

Repeat Example 8.7 with X and Y being hat U ( O, 1 ) random variables.

8.27

Let X and Y be independent with X Bernoulli with parameter phosphorus and Y – U ( O, 1 ). Let Z=X+Y. a. What are the think of and division of Z ? barn. What are the concentration and distribution function of Z ? c. Use the concentration of Z to compute the intend and variance of Z.

220

CHAPTER 8 MULTIPLE CONTINUOUS RANDOM VARIABLES

8.28

Let X and Y be freelancer with X binomial with parameters n and phosphorus and Y- U ( O, 1 ). LetZ =X+ Y. a. What are the mean and variation of Z ? b. What are the density and distribution serve of Z ? c. Use the density of Z to compute the hateful and variability of Z.

8.29

Let X andY be lid U ( O, 1 ) random variables and let Z

= XIY.

a. What are the density and distribution functions of Z ? b. What is the median of the distribution of Z ? c. What is the beggarly of Z ?

CHAPTER

THE GAUSSIAN AND RELATED DISTRIBUTIONS

The gaussian distribution is arguably the most important probability distribution. It occurs in numerous applications in technology, statistics, and skill. It is so common, it is besides referred to as the “ normal ” distribution.

9.1 THE GAUSSIAN DISTRIBUTION AND DENSITY X is gaussian with base Jl and variance a 2 if fx ( ten ) =

_1_e- roentgen = sJ21og2 = 1.177s

( 9.20 )

9.5 related Distributions

237

f.l = 1.253 second 0.607

~ — > ! +-~

a= 0.655 sulfur

0.5

1s f.l

2s

3s

FIGURE 9.9 Rayleigh density and its medial, base, and standard deviation. note that the medial is less than the beggarly ( medial = 1.177s < 1.253s =mean ). r 2 2 /R ( radius ) = -e-r 12 south oo lo ( 9.21 ) 52 E [ R ] = r· = 5, fiii 2 = gas constant 2 -e-r 5 125 2 loo -oo 2 dr ____ ! ! .__e-r212s2 dr 53, fiii 5vrf = 1.2535 E [ R 2 ] = E [ X2 + Y2 j = 25 2 normality 4-n Var [ R ] = 25 2 – 52 = – -5 2 = ( 0.6555 ) 2 2 2 ( 9.22 ) ( 9.23 ) ( 9.24 ) ( 9.25 ) The integral in Equation ( 9.22 ) is the moment moment of a gaussian density and is consequently adequate to 1 ( by Equation 9.14 ). EXAMPLE9.3 In Figure 9.12 in section 9.6, 14 of the 1000 points have a spoke of greater than or equal to 3.0. ( The bespeak on the circle at five o’clock has a radius of 3.004. ) Are 14 such points unusual ? The radius is rayleigh with parameter 5 = 1. The probability of any period having a spoke greater than 3.0 is p =Pr [ R~ 3.0 ] = 1-Pr [ Ri•cbf — — +• R ( thymine ) = S ( thymine ) R ( thymine ) + N ( metric ton ) It turns out that since sin ( e ) and colorado ( e ) are extraneous over a complete menstruation, { 2 ” Jo cose sine de= 0 9.7 Digital Communications Using QAM 247 the signals can be broken up into a cosine condition and a sine term. These are known as the quadrature components : + Sy ( thymine ) sin ( wetl N ( thymine ) = N x ( deoxythymidine monophosphate ) conscientious objector ( wet ) + Ny ( thyroxine ) sin ( wet ) R ( thymine ) = Rx ( thymine ) cos ( wet ) + Ry ( thymine ) sin ( wetl = ( SxCt ) + N xCtl ) colorado ( wetl + ( Sy ( triiodothyronine ) + Ny ( t ) ) sine ( wetl S ( triiodothyronine ) = Sx ( triiodothyronine ) colorado ( wetl where We is the mailman frequency in radians per second, We = 2nfC > where iron is the carrier frequency in hertz ( cycles per second base ). The liquidator demodulates the receive signal and separates the quadrature components :

+ Nx ( metric ton ) Ry ( deoxythymidine monophosphate ) = Sy ( t ) + Ny ( thymine ) Rx ( triiodothyronine ) = Sx ( metric ton )

In the QAM application considered here, 00

SxCtl =

L

Ax ( newton ) phosphorus ( t- north dakota

n= -oo 00

Sy ( thymine ) =

L

Ay ( nitrogen ) phosphorus ( t- national trust )

n= -oo

The couple ( Ax, Ay ) represents the datum send. This is the data the recipient tries to recover ( much more about this below ). The phosphorus ( triiodothyronine ) is a pulse-shape affair. While phosphorus ( metric ton ) can be complicated, in many applications it is good a elementary rectangle function :

1 p ( thyroxine ) = { 0

O 1 ‘ 1 ] /Pr [ error ] = exp ( liter ‘ 1 2 /2a 2 ) = exp ( 1p 2 /2 ). Selected valuesofexp ( 1p 2 /2 ) are shown in table 9.1. When 1p = 3, the acceleration is a factor of 90 ; when 1p = 4, the acceleration agent is about 3000.

9.7 Digital Communications Using QAM

257

0

•

0

0 0

•

0

0

0 0

0

• •

• • • •

• •

•• •

•

• • • • • ••

•

•

FIGURE 9.17 Illustration of Monte Carlo study for error rate of a rectangle region, but conditioned on being outside the traffic circle of radius { ‘ ;. The example is drawn for 1f ! = { ‘ ; fa= 2.0. table 9.1 table of Monte Carlo speedups attained versus 1f !. 1.0

1.5

2.0

2.5

3.0

3.5

4.0

1.6

3.1

7.4

23

90

437

2981

Recall that in Section 9.6.2 we showed the Box-Muller method can generate two lid zero-mean gaussian random variables from a Rayleigh distribution and a consistent distribution. here, we modify the Box-Muller method acting to generate conditional Gaussians and dramatically speed up the Monte Carlo calculation. Generate a Rayleigh distribution conditioned on being greater than 1 ‘ 1 ( i, conditioned on being greater than the radius of the lap ) : Pr [ R : o ; riR~I ‘ fifty ] =

Pr [ liter ‘ fifty £ ] ~ 0 as n ~ = for all £ > 0. Chebyshev ‘s inequality implies that if the calculator is unbiased and its discrepancy approaches 0 as north ~ =, then the calculator is consistent. This is normally the easiest way to demonstrate that an calculator is reproducible. In this exemplar, the sample distribution modal is an unbiased and consistent calculator of p. Equation ( 10.2 ) shows the calculator is indifferent, and Equation ( 10.3 ) shows the variation goes to 0 as n tends to eternity. Both unbiasedness and consistency are by and large considered to be dependable characteristics of an calculator, but fair how well is phosphorus = Xn ? Or, to turn the question around, how big does nitrogen have to be in order for us to have confidence in our prediction ? The variance of Xn depends on the unknown p, but we can bound it as follows, p ( l- p ) :5 0.25 with equality when phosphorus = 0.5. Since the nameless principal investigator likely to be near 0.5, the tie down is likely to be close ; that is, p ( l- p ) “ ‘ 0.25 : phosphorus ( lambert – p )

0.25

~

0

0.5

phosphorus

now, we can use Chebyshev ‘s inequality ( Equation 4.16 ) to estimate how good this calculator is : Var [ p ] p ( l- p ) 1 Pr [ lp-pl > £ ] : 5 – – = – – : 5 £2 n£2 4n£2 A

Let us put in some numbers. If95 % of the fourth dimension we want to be within£= 0.1, then 1 Pr [ lp-pl > 0.1 ] : 5 — :51-0.95 4n · 0.1 2

10.1 A simpleton Election Poll

267

Solving this yields newton ~ 500. To get this level of accuracy ( -0.1 :5 p- p :5 0.1 ) and be wrong no more than 5 % of the time, we need to sample north = 500 voters. Some might object that getting to within 0.1 is unimpressive, specially for an election prediction. If we tighten the apprenticed to£= 0.03, the number of voters we need to sample now becomes newton ~ 5555. Sampling 500 voters might be accomplishable, but sampling 5000 sounds like a batch of work. Is normality > 5555 the best estimate we can find ? The solution is no. We can find a better estimate. We used Chebyshev ‘s inequality to bound n. The elementary advantage of the Chebyshev bind is that it applies to any random varying with a finite division. however, this comes at a price : the Chebyshev bind is frequently loose. By making a stronger assumption-in this encase, that the sample distribution average is approximately Gaussian-we can get a better estimate of n. In this model, p = Xn is the average of newton hat Bernoulli random variables. Hence, Xn is binomial ( scaled by 1/n ), and since newton is big, pi approximately gaussian by the CLT, p- N ( p, pqln ). therefore,

Pr [ -£:5p-p:5c ]

=Pr [ ~ :5 ~ :5 ~ ]

( normalize )

fifty

( boundpq )

[

Xn -p

-£

“ ‘Pr V1/4n :5

/P { jin

£

:5 vll4n

“ ‘Pr [ – 2cvn : oxygen ; omega : oxygen ; 2cvn ]

( Z- N ( 0,1 ) )

= ( 2£Vn ) – ( -2£Vn ) = 2C2cvnl – 1 ( z=2£yn )

= 2 ( omega ) -1

Solving 2 ( z ) – 1 = 0.95 yields

z ” ‘ 1.96 :

~ 0.95

/

-1.96

1.96

immediately, solve for north : 1.96 =

z = 2£Vn 1.962

n=

4£2

0.96

=

7

-2 “ -‘£

( 10.4 )

For£= 0.03, normality = 1067, which is about one-fifth adenine large as the Chebyshev inequality indicated. Thus, for a childlike election poll to be within £ = 0.03 of the correct value 95 % of the time, a sample distribution of about n = 1067 people is required.

268

CHAPTER 10 ELEMENTS OF STATISTICS

n

= 1600

/

0.35

I 0.35

‘

0.35

0.40

0.45

0.55

0.5

/\ : 0.40

0.45

0.5

0.40

0.45

0.5

I 0.55

p

0.55

‘

FIGURE 10.1 Effect of sample size on confidence intervals. As n increases, the confidence interval shrinks, in this example, from 0.35, ; p, ; 0.55 for newton = 100 to 0.4, ; phosphorus, ; 0.5 for newton = 400 and to 0.425, ; phosphorus, ; 0.475 for north = 1600. Each clock n is increased by a component offour, the width of the confidence interval shrinks by a factor of two ( and the altitude ofthe concentration doubles ). See example 10.1 for foster discussion.

EXAMPLE 10.1

Let ‘s illustrate how the confidence intervals vary with the numeral of samples, n. As above, consider a simple poll. We are interested in who is probably to win the election. In other words, we want to know if p < 0.5 or p > 0.5. How boastfully must newton be so the probability that { p < 0.5 } or { P > 0.5 } is at least 0.95 ? In this exercise, we assume phosphorus = 0.45. The distance between p and 0.5 ( the dividing wrinkle between winning and losing ) is£= 0.5-0.45 = 0.05. From Equation ( 10.4 ), newton “ ‘£- 2 = ( 0.05 ) – 2 = 400 If£ is halved, n increases by a factor of four ; if£ is doubled, n decreases by a factor of four. thus, for n = 100, £ = 0.1 ; for north = 1600, £ = 0.025. The 95 % confidence intervals for n = 100, for n = 400, and for normality = 1600 are shown in Figure 10.1. For n = 100, we are not able to predict a achiever as the confidence interval extends from phosphorus = 0.35 to p = 0.55 ; that is, the assurance time interval is on both sides of phosphorus = 0.5. For normality = 400, however, the confidence time interval is smaller, from p = 0.40 to p = 0.50. We can now, with 95 % confidence, predict that Bob will win the election.

10.2 Estimating the Mean and Variance

269

Comment 10.1 : In the Monte Carlo exercise in section 9.7.3, pis humble and varies over respective orders of magnitude, from about 10- 1 to 10- 4. consequently, we used a relative error to assess the accuracy ofp ( see Equation 9.43 ) :

Pr [ -e~p-p~€ ] ~0.95 p

In the election poll,

( 10.5 )

phosphorus ” ‘ 0.5. We used a dim-witted absolute, not relative, erroneousness : Pr [ -e~p-p~e ] ~0.95

There is no matchless rule for all situations : choose the accuracy criterion appropriate for the problem at hand.

10.2 ESTIMATING THE MEAN AND VARIANCE The election poll exercise above illustrated many aspects of statistical analysis. In this section, we present some basic ideas of data psychoanalysis, concentrating on the simplest, most common problem : estimating the beggarly and variance. LetX 1, Xz, …, Xn ben lid samples from a distribution F ( x ) with mean Jl and variance a 2. Let fi. denote an estimate of Jl. The most common appraisal of Jl is the sample think of : –

1

fi.=Xn =-

n

I : adam ;

nitrogen k= liter

E [ fi. ] =E [ Xn ] =Jl –

a2

Var [ fi. ] = Var [ Xn ] = n frankincense, the sample distribution mean is an indifferent calculator of Jl. It is besides a coherent calculator of Jl since it is unbiased and its variation goes to 0 as newton ~ =· The combination of these two properties is known as the weak law of large numbers.

Comment 1 0.2 : The faint law of large numbers says the sample mean of n eyelid observations, each with finite variance, converges to the implicit in think of, Jl · For any € > 0,

As one might suspect, there is besides a firm law of big numbers. It says the same basic thing ( i.e., the sample mean converges to Jl ), but the mathematical context is stronger : Pr [ lim Xn n-oo

= Jl ] = 1

270

CHAPTER 10 ELEMENTS OF STATISTICS

The reasons why the potent police is stronger than the weak law are beyond this text, but are often covered in advanced text. Nevertheless, in practice, both laws indicate that if the data are eyelid with finite discrepancy, then the sample median converges to the mean.

The sample distribution entail can be derived as the estimate that minimizes the square error. Consider the take after minimization problem : normality

minQ ( a ) = I : cxi-a ) 2 a

i= lambert

Differentiate Q ( a ) with obedience to a, set the derivative to 0, and solve for a, replacing a by two :

d district attorney

0= -Q ( a ) 1

I

n n =2I : CXi-ii ) =2LXi-nii

a=a newton

i= l

i= lambert

–

ii=- I : xi =Xn n i= fifty

We see the sample average is the respect that minimizes the square erroneousness ; that is, the sample base is the best constant approximating the xi, where “ best ” means minimal squared error. Estimating the variation is a bite slippery. Let ( ; 2 denote our calculator of a 2. first, we assume the mean is known and let the appraisal be the obvious sample distribution mean of the square errors :

So, ( ; 2 is an indifferent calculator of a 2. In this estimate, we assumed that the mean, but not the variability, was known. This can happen, but frequently both are unknown and need to be estimated. The obvious generalization is to replace Jl by its estimate Xn, but this leads to a complication :

E [ f. ( Xk -Xnl

2

]

= ( newton -l ) a

2

( 10.6 )

k= iodine

An unbiased calculator of the discrepancy is ~

1

n

–

a 2 = — I : CX-Xnl 2

( 10.7 )

n- 1 k= fifty

In the statistics literature, this calculate is normally denoted S2, or sometimes S~ : 1

n

–

S~ = — I : CX-Xnl n- 1 k= fifty

E [ S~j = a

2

2

10.3 recursive Calculation ofthe Sample Mean

271

It is worth emphasizing : the unbiased estimate of discrepancy divides the sample squared error by n- 1, not by n. As we shall see in Section 10.10, the maximal likelihood estimate of variance is a bias estimate as it divides by n. In rehearse, the unbiased estimate ( dividing by n- 1 ) is more normally used.

10.3 RECURSIVE CALCULATION OF THE SAMPLE MEAN In many engineer applications, a new observation arrives each discrete clock gradation, and the sample distribution mean needs to be updated for each newly observation. In this department, we consider recursive algorithm for computing the sample intend. Let X 1, X 2, …, Xn-l be the first base n- 1 samples. The sample distribution mean after these n- 1 samples is

At time normality, a raw sample arrives, and a new sample distribution intend is computed :

When north gets bombastic, this overture is wasteful. surely, all that work in computing Xn-l can be useful to simplify computing Xn. We present two approaches for reducing the calculation. For the inaugural recursive approach path, define the sum of the observations as T n =X 1 + X 2 + .. · +X n. then, T n can be computed recursively :

A recursive calculation uses former values of the “ same ” measure to compute the modern values of the quantity. In this lawsuit, the former measure T n-l is used to compute T n· EXAMPLE 10.2

The classic case of a recursive affair is the Fibonacci sequence. Letj ( nitrogen ) be the nth Fibonacci count. then, j ( O ) =0 joule ( fifty ) = 1

joule ( north ) = j ( n -1 ) + joule ( north -2 )

for north = 2,3, …

The recurrence relation back can be solved, yielding the sequence of numbers j ( north ) 0, 1, 1,2,3,5,8, 13, .. ..

272

CHAPTER 10 ELEMENTS OF STATISTICS

The sample median can be computed easily from T normality :

The algorithm is simple : at time north, calculate T n = T n-l + Xn and divide by newton, Xn = T nIn. EXAMPLE 10.3

Bowling leagues rate players by their median seduce. To update the averages, the leagues keep track of each bowler ‘s sum pins ( the sum of his or her scores ) and each bowler ‘s number of games. The average is computed by dividing the sum pins by the number of games bowled. note that the recursive algorithm does not need to keep track of all the samples. It only needs to keep track of two quantities, T and n. Regardless of how large nitrogen becomes, only these two values need to be stored. For case, assume a bowler has scored 130, 180, 200, and 145 pins. then,

To =0 T I = T 0 + 130 = 0 + 130 = 130 T 2 = T1 + 180 = 130 + 180 = 310

T 3 = T 2 +200 = 310+200 = 510

T4 = T3 + 145 = 510 + 145 = 655 The sequence of sample averages is

The moment recursive access develops the sample base in a predictor-corrector fashion. Consider the following :

n X1+Xz+ .. ·+Xn-1 Xn — -= — -= — — _ :. : __ : :_+n nitrogen n-1 X1 +Xz + … +Xn-1 Xn — -= — -= — -___ ;. ; _~ + normality n-1 nitrogen n-1 Xn = — ·Xn-1+n normality 1 =Xn-1 + – ( Xn -Xn-d n = prediction + gain · initiation

( 10.8 )

In words, the new estimate equals the old calculate ( the prediction ) plus a correction terminus. The discipline is a product of again ( lin ) times an initiation ( Xn- Xn_ 1 ). The invention is what is modern about the latest observation. The fresh calculate is larger than the erstwhile if the raw observation is larger than the old estimate, and vice versa.

10.4 Exponential Weighting EXAMPLE 10.4

273

The predictor-corrector imprint is useful in hardware implementations. As north gets bombastic, T might increase toward eternity ( or minus eternity ). The number ofbits needed to represent T can get large-too big for many microprocessors. The predictor-corrector form is normally well behaved. For exemplar, assume the Xk are lid uniform on 0,1 ,. .. ,255. The Xk are eight-bit numbers with an modal value of 127.5. When normality = 1024,0:5 T 1024 :5 255 x 1024 = 261,120. It takes 18 bits to represent these numbers. When n = 106, it takes 28 bits to represent T. The sample distribution average is between 0 and 255 ( since all the Xk are between 0 and 255 ). The initiation terminus, Xn-1- Xn-1 > is between -255 and 255. therefore, the invention requires nine bits, regardless of how big north becomes.

comment 10.3 : Getting a million samples sounds like a draw, and for many statistics

problems, it is. For many sign process problems, however, it is not. For exemplar, CO-quality sound recording is sampled at 44,100 samples per second. At this rate, a million samples are collected every 23 seconds !

10.4 EXPONENTIAL WEIGHTING In some applications, the mean can be considered quasi-constant : the intend varies in time, but slowly in comparison to the rate at which samples arrive. One border on is to employ exponential slant. Let a be a weighting factor between 0 and 1 ( a distinctive rate is 0.95 ). then, A

Jln=

Xn + aXn-1 + a 2Xn-2 + · · · + a ” – 1X1 1+a+a2+···+an-1

Notice how the importance of old observations decreases exponentially. We can calculate the exponentially leaden appraisal recursively in two ways : Sn =Xn + aSn-1 Wn = 1 +aWn- !

In the limit, as n ~ =, Wn simplifies well :

~ 1/ ( 1-

a ). With this approximation, the expression for fln

fln “ ‘fln-1 + ( 1- a ) ( Xn- fln-1 ) = afln-1 + ( 1- a ) Xn

This kind is useful when processing exponent is limited.

274

CHAPTER 10 ELEMENTS OF STATISTICS

Comment 10.4 : It is an technology observation that many real-life problems benefit

from some exponential slant ( e.g., 0.95 the fundamental intend should be ceaseless.

~a~

0.99 ) even though hypothesis indicates

10.5 ORDER STATISTICS AND ROBUST ESTIMATES In some applications, the observations may be corrupted by outliers, which are values that are inordinately bombastic ( positivist or negative ). The sample beggarly, since it weights all samples evenly, can be corrupted by the outliers. even one bad sample can cause the sample modal to deviate well from its correct value. similarly, estimates of the discrepancy that square the data values are particularly medium to outliers. Estimators that are insensitive to these outliers are termed robust. A large course of robust estimators are based on the ordering statistics. Let Xi > X 2, …, Xn be a succession of random variables. typically, the random variables are lid, but that is not necessary. The order statistics are the classify data values :

The grouped values are ordered from humble to high :

The minimal value is X ( I ), and the maximum measure is X ( north ). The medial is the middle measure. If n is odd, the median isX ( ( n+l l/ 2 ) ; if n is tied, the median is normally defined to be the modal of the two middle values, ( X ( n/2 ) + X ( n/2+1 ) ) /2. For exemplify, if the data values are 5, 4, 3, 8, 10, and l. The screen values are fifty, 3, 4, 5, 8, and 10. The minimum prize is lambert, the maximal is 10, and the median is ( 4 + 5 ) /2 = 4.5. Probabilities of rate statistics are relatively easy to calculate if the original data are hat. First, note that even for eyelid data, the order statistics are neither freelancer nor identically distributed. They are subject because of the order ; that is, Xo lambert ~X ,2l, etc. For example, x,2l can not be smaller than X ( lambert ). The order statistics are not identically distributed for the lapp cause. We calculate the distribution functions of order statistics using binomial probabilities. The consequence { X ( kl ~ x } is the event that at least kilobyte of the n random variables are less than or peer to x. The probability anyone of the X ; is less than or equal tox is F ( x ) ; that is, p = F ( x ). therefore,

Pr [ X ( kl

~ x ] =f. ( ~ ) pio- phosphorus ) n-J =f. ( ~ ) Fi ( x ) ( l- F ( x ) ) n-J j =k

1

joule =k

1

( 10.9 )

10.5 club Statistics and Robust Estimates

275

For case, the probability the first-order statistic ( the minimum rate ) is less than or equal tox is Pr [ X ( I ) :5 ten ]

=f. ( ~ ) pi 1

( 1- p ) n-J

j=l

= 1- ( 1-p ) n

( kernel is 1 minus the miss term )

= 1- ( 1-FCxlr

The distribution routine of the utmost is

The medial is sometimes used as an alternate to the mean for an estimate oflocation. The median is the middle of the data, half below and half above. therefore, it has the advantage that even with about half the data being outliers, the estimate is inactive well behaved. Another robust location estimate is the a-trimmed base, which is formed by dropping the an smallest and largest orderliness statistics and then taking the average of the respite : 1

n-an

p – – – ‘ ten k ( l a- n- 2an L .. k=an

The a-trimmed base can tolerate up to an outliers. A robust estimate of spread is the interquartile distance ( sometimes called the interquartile range ) :

uracil = c ( X ( 3 nt4l – adam ( nt4J ) where curie a constant choose so that E [ uranium ] = a. ( Note the interquartile outdistance is an calculator of a, not a 2. ) When the data are approximately gaussian, vitamin c = 1.349. Using the Gaussian quantile function, degree centigrade is calculated as follows :

=

hundred Q ( 0.75 ) – q ( 0.25 )

=0.6745- ( -0.6745 ) =1.349

We used the pursuit reasoning : If X ; – N ( O, 1 ), then Q ( 0.75 ) = – 1 ( 3/4 ) = 0.6745 gives the average value of the ( 3nl 4 ) -order statistic, X ( 3 newton /4l · Similarly, Q ( 0.25 ) gives the average value of the ( n/4 ) -order statistic. The remainder between these two gives the average value of the interquartile scope. As an example, consider the 11 data points -1.91, -0.62, -0.46, -0.44, -0.18, -0.16, -0.07, 0.33, 0.75, 1.60, 4.00. The first 10 are samples from an N ( O, 1 ) distribution ; the 11th sample is an outlier. The sample distribution mean and sample distribution variation of these 11 values are _ 1 II Xn = – LX ; = 0.26 11 k= lambert 1 II s~ = cxk- o.26l 2 = 2.3o north -1 k= fifty

— I :

276

CHAPTER 10 ELEMENTS OF STATISTICS

Neither of these results is particularly close to the correct values, 0 and 1. The medial is X

1= fifty

l= n-k+ l

10.11 Minimum Mean Squared Error Estimation lambert ( /1, ) = log ( L ( /1, ) ) = ( n- k ) loglt- lt ( t1

0=

d~ l ( /1, ) I

A

-‘=A

/1,

= n~k –

A

289

+ t2 + · · · + tn-kl- klttmax

Ct ! + t2 + … + tn-kl- ktmax

n-k

A

A= — — — — : -tl + t2 + · · · + tn-k + ktmax As the examples above indicate, utmost likelihood estimate is specially popular in parameter estimate. While statisticians have devised diseased examples where it can be bad, in exercise the MLE is about constantly a good calculate. The MLE is about always well behaved and converges to the true parameter value as newton ~ =·

10.11 MINIMUM MEAN SQUARED ERROR ESTIMATION This section discusses the most popular estimate rule, specially in mastermind and scientific applications : choose the calculator that minimizes the think of squared error. Let 8 represent the unknown respect or values to be estimated. For exemplify, 8 might be Jl if the mean is unknown, or ( phosphorus, a 2 ) if both the mean and division are unknown and to be estimated. The appraisal of is denoted either 8 or { j, depending on whether or not the estimate depends on random variables ( i.e., random observations ). If it does, we denote the estimate 8. ( We besides use 8 when referring to general estimates. ) If it does not depend on any random observations, we denote the estimate as 2 The error is 8- the square erroneousness is ( 8- 8 ), and the mean squared error ( MSE ) is

e

vitamin e.

einsteinium,

E [ ( 8 – 8 )

2

].

The value that minimizes the MSE is the MMSE estimate :

8 -e =error

( 8 – east ) 2 = squared error E [ ( 8 – 8 = beggarly squared erroneousness

n

mjnE [ ( 8 ( )

en=

minimal average squared mistake

We start here with the simplest example, which is to estimate a random varying by a ceaseless, and we progress to more complicate estimators. Let the random variable be Y and the calculate be { j ( since { j is a ceaseless, it is not a random variable ). then, ( 10.18 )

Q is a affair of { joule. It measures the have a bun in the oven loss as a function of setting its first derived function to 0 :

e and is minimized by

290

CHAPTER 10 ELEMENTS OF STATISTICS

which implies the optimum estimate is the mean, { J = E [ Y ]. The Q officiate is known as the average squared error ( MSE ) and the calculator as the minimal mean squared error ( MMSE ) or least mean squared error ( LMSE ) calculator. As another exemplar, let a random variable Y be estimated by a functiong ( X } of another random variable, X. Using Equation ( 8.7 ), the MSE can be written as

The conditional expected value inside the built-in can be minimized individually for each measure of x ( a sum of positive terms is minimized when each condition is minimized ). Letting =gCX ),

e

2

0= ! Erlx [ CY-8 ) iX=xJI

8

=

0

= 2Erlx [ CY -OJ IX =x ] which implies that the estimate of Y is the conditional beggarly of Y given X = x :

For exemplify, if y represents the burden and X the stature of a randomly selected person, then O ( x ) = Er 1x [ YIX = x ] is the average system of weights of a person given the stature, x. One in full expects that the average weight of people who are five feet improbable would be unlike from the modal weight of those who are six feet grandiloquent. This is what the conditional intend tells us. In general, O ( x ) = Er 1x [ YIX = x ] is nonlinear in ten. ( It is improbable that people who are six feet tall are precisely 20 % heavier than those who are five feet improbable, or that seven-footers are precisely 40 % heavier than five-footers ). Sometimes it is desirable to find the best linear officiate of X that estimates Y. Let information technology =aX+ bacillus with a and boron to be determined. then, minQ ( a, b ; Y, X ) = mine [ CY -aX- bl 2 ] a, b-complex vitamin

a, b-complex vitamin

( technically, this is an affine function because one-dimensionality requires b-complex vitamin = 0, but it is much referred to as linear even though it is not. ) This minimization requires setting two derivatives to 0 and solving the two equations : o=i.E [ CY-aX-bl 2 JI

aa

0=i.E [ CY-aX-b ) 2 jl

ab

__ =2E [ XY-ax 2 -bx ]

( 10.19 )

. =2E [ Y-ax-bj

( 10.20 )

a=a, b=b

_

a=a, b=b

The simplest room to solve these equations is to multiply the second adieu [ X ] and subtract the leave from the first, obtaining

Solving for a, ,

a-

E [ XY ] -E [ X ] E [ Y ] E [ X2 ] -E [ Xj 2

–

Cov [ X, Yj Var [ X ]

a xy

— a~

( 10.21 )

10.12 BayesianEstimation The respect for

291

a can then be substituted into Equation ( 10.20 ) : ( 10.22 )

It is convenient to introduce normalize quantities, and letting p be the correlation coefficient, p =a xy liter ( a xa yttrium ), ( 10.23 ) The condition in parentheses on the left field is the normalize estimate of Y, and the one on the right is the anneal deviation of the note respect X. Normalizing estimates like this is common because it eliminates scale differences between X and Y. For example, it is possible that X and Y have different units. The normalize quantities are dimensionless. Note the important function phosphorus plays in the estimate. Recall that phosphorus is a measure of how closely relate X and Yare. When p is close to 1 or -1, the size of the expected normalize deviation of Y is about the like as the note deviation of X. however, when phosphorus is close to 0, X and Yare unrelated, and the observation X is of little use in estimating Y.

comment 10.8 : equation ( 10.23 ), while useful in many problems, has besides been misinterpreted. For exemplify, in the middle of the twentieth century, this equation was used to estimate a son ‘s intelligence quotient ( IQ ) seduce from the church father ‘s IQ sexual conquest. The conclusion was that over time, the population was becoming more median ( since phosphorus < 1 ). The fallacy of this reason is illustrated by switching the roles offather and son and then predicting backward in time. In this case, the fathers are predicted to be more average than their sons. What actually happens is that the best prediction is more average than the observation, but this does not mean the random variable Y is more average than the random variable x. The population statistics do not necessarily change with time. ( In fact, sensitive scores on IQ tests have risen over time, reflecting that people have gotten either smarter or better at taking exchangeable tests. The published IQ score is normalized to eliminate this rise. ) Why not mothers and daughters, you might ask ? At the time, the military gave IQ tests to recruits, about all of whom were male. As a solution, many more intelligence quotient results were available for males than females. 10.12 BAYESIAN ESTIMATION e In traditional estimate, is assumed to be an unknown but nonrandom quantity. In contrast, in bayesian appraisal, the obscure measure is assumed to be a random rate ( or values ) with a known density ( or PMF ) j ( 8 ). e 292 CHAPTER 10 ELEMENTS OF STATISTICS Bayesian estimate is like minimum hateful squared error estimate except that the strange parameter or parameters are considered to be random themselves with a known probability distribution. bayesian estimate is growing in popularity among engineers and statisticians, and in this section, we present two simple examples, the beginning estimating the probability of a binomial random varying and the second gear estimating the bastardly of a gaussian random variable with acknowledge discrepancy. Let 8 represent the unknown random parameters, and 1etj ( 8 ) be the a priori probability density of 8. Let the observations be x with conditional concentration joule ( xi8 ), and let j ( x ) denote the density of ten. ( We use the convention in Bayes estimate of dropping the subscripts on the assorted densities. ) Using Bayes theorem, the a posteriori probability can be written as f 8 x _ j ( xi8 ) joule ( 8 ) _ joule ( xi8 ) joule ( 8 ) j ( x ) – fluorine : JCx four ) j ( v ) dv ( I ) – ( 10.24 ) possibly amazingly, fluorine ( adam ) plays a relatively little role in bayesian appraisal. It is a changeless arsenic far as 8 goes, and it is needed to normalize j ( 8 nine ) so the integral is 1. otherwise, however, it is broadly insignificant. The minimal entail squared calculator of 8 is the conditional think of : ( 10.25 ) In rationale, computing this estimate is an exercise in built-in tartar. In practice, however, the calculation much falls to one of two extremes : either the calculation is easy ( can be done in closed form ), or it is so unmanageable that numerical techniques must be employed. The calculation is easy if the j ( 8 ) is chosen as the conjugate distribution to j ( xi8 ). We give two examples below. For the first exercise, let 8 represent the stranger probability of a 1 in a series of newton Bernoulli trials, and letx = k equal the count of lambert ‘s observed ( and n- potassium the number ofo ‘s ). The conditional density joule ( ki8 ) is the common binomial PMF. Thus, The conjugate concentration to the binomial is the beta concentration. The beta concentration has the form j ( 8 ) = rca+ { 3 ) ea-l ( 1- 8 ) /l-l for 0 < einsteinium < 1 fluorine ( a ) degree fahrenheit ( { J ) ( 10.26 ) where the Gamma function is defined in Equation ( 9.26 ) and a~ 0 and f3 ~are nonnegative parameters. The beta density is a generalization of the uniform concentration. When a = f3 = 1, the beta concentration equals the uniform density. The concentration is symmetrical about 0.5 whenever a = { 3. distinctive values of a and f3 are 0.5 and 1.0. When a = f3 = 0.5, the concentration is peaked at the edges ; when a = f3 = 1.0, the density is consistent for 0 :58 :5 1. The mean of the beta concentration is aluminum ( a + { 3 ). 10.12 bayesian Estimation 293 The charming of the conjugate solution concentration is that the a posteriori density is besides a beta density : joule ( 8ik ) = joule ( ki8 ) joule ( 8 ) j ( k ) = _1_ ( north ) 8k ( l- 8 ) n-k farad ( a + farad : ll 8a-l ( 1- 8 ) { 3-1 joule ( k ) thousand farad ( a ) degree fahrenheit ( { : l ) = ( -1- ( nitrogen ) degree fahrenheit ( a + farad : ll ). 8k+a-l ( 1- 8 ) n-k+ { J-I j ( k ) k degree fahrenheit ( a ) f ( { : l ) The term in the large parentheses is a anneal changeless independent of 8 ( it depends on a, degree fahrenheit : l, n, and kelvin, but not on 8 ). It can be shown to equal fluorine ( a + degree fahrenheit : lambert + nl/ ( rca + kilobyte ) farad ( { : fifty + n- thousand ) ). thus, the a posteriori concentration can be written as j ( 8ik ) = f ( a+f : l+n ) 8k+a-l ( l-8 ) ” -k+ { J-I fluorine ( a+k ) farad ( { : l+n-k ) which we recognize as a beta density with parameters k + a and n – kilobyte + { : !. The estimate is the conditional mean and is consequently a+k 8 = E [ 8 IX= potassium ] = -, — — -a+f : l+n A ( 10.27 ) Thus, see the Bayes estimate has a familiar imprint : If a = fluorine : liter = 0, the calculate reduces to the sample distribution mean, kl n. With nonzero a and degree fahrenheit : fifty, the calculate is biased toward the a priori calculate, a I ( a + degree fahrenheit : ll. EXAMPLE 10.11 In Chapter 5, we considered the trouble of encoding an lid binary star sequence with a known probability of a 1 adequate top and developed the optimum Huffman code. In many situations, however, pis stranger and must be estimated. A common proficiency is to use the bayesian consecutive calculator described above. The ( north + 1 ) ‘st bite is encoded with a probability estimate determined from the inaugural nitrogen bits. Let be the estimate probability of the ( normality + 1 ) ‘st snatch. It is a function of the beginning normality bits. Common values for a and farad : liter are a = f : fifty = 1. Assume, for exemplar, the input signal sequence is 0110 · · ·. The first bit is estimated using the a priori estimate : Pn A a+O 1+0 — — 05 -a+f : l+0-1+1+0- · p0 For the second snatch, normality = 1 and k = 0, and the estimate is updated as follows : A P1 = a+O a+f : l+1 1 =- =0.33 3 For the third base bite, n = 2 and k = 1, so 2 – a+ 1 — -05 P2 -a+f : l+2_4_ · A 294 CHAPTER 10 ELEMENTS OF STATISTICS For the fourthly moment, normality = 3 and k = 2, and 3 – a+2 — -06 P3 -a+f : l+3_5_ · A This march can continue forever, each time updating the probability appraisal with the new information. In practice, the probability update is often written in a predictor-corrector form. For the second case, consider estimating the hateful of a gaussian distribution with known discrepancy. The conjugate distribution to the gaussian with unknown mean and known discrepancy is the gaussian distribution, 8- N ( jlo, a~ ). ( The conjugate distribution is different if the variation is besides stranger. ) The a posteriori concentration is X _ f ( 8 I ) – joule ( xj8 ) joule ( 8 ) j ( x ) 1 1 2 ( ( x- 8 ) ) ( ( x- 8 ) 1 ( = j ( x ) v’2iia exp -~ v’2iiao exp 1 2 = 2naaof ( adam ) exp -~- ( 8 – jlo ) 2a~ 2 2 ( 8 -j1 0 ) ) 2a~ ) With much long-winded algebra, this concentration can be shown to be a gaussian concentration on 8 with mean ( xa~ + lloa 2 ) ! ( a 2 +a~ ) and variation ( a- 2 +a 02 ) – 1. The bayesian estimate is therefore ( 10.28 ) thus, we see the estimate is a weighted combination of the observation x and the a priori value jlo. If multiple observations of ten are made, we can replace X by the sample distribution entail, Xn, and use its variation, a 2 In, to generalize the solution as follows : ( 10.29 ) As more observations are made ( i, as n ~= ), the estimate puts increasing weight on the observations and less on the a priori respect. In both examples, the bayesian appraisal is a linear combination of the observation and the a priori estimate. Both estimates have an slowly consecutive interpretation. Before any data are observed, the appraisal is the a priori value, either a/ ( a+ fluorine : ll for the binomial or llo for the Gaussian. As data are observed, the estimate is updated using Equations ( 10.27 ) and ( 10.29 ). Both these estimates are normally employed in technology applications when data arrive consecutive and estimates are updated with each newfangled notice. We mentioned above that the calculation in Equation ( 10.25 ) can be arduous, particularly if conjugate densities are not used ( or do not exist ). many numerical techniques have been developed, the most popular of which is the Markov range Monte Carlo, but study of these is beyond this text. Problems 295 Comment 10.9 : bayesian estimate is controversial in traditional statistics. many statisticians object to the mind that ( } is random and furthermore argue the a priori distribution fluorine ( { } ) represents the statistician ‘s biases and should be avoided. The counterargument made by many engineers and bayesian statisticians is that a random ( } is absolutely reasonable in many applications. They besides argue that the a priori distribution represents the engineer ‘s or statistician ‘s anterior cognition gained from past experience. Regardless of the philosophical debate, bayesian estimate is growing in popularity. It is specially handy in consecutive estimate, when the estimate is updated with each raw observation. 10.1 List respective practical ( nonstatistical ) reasons why election polls can be misleading. 10.2 Ten datum samples are 1.47, 2.08, 3.77, 1.01, 0.42, 0.77, 3.17, 2.89, 2.42, and -0.65. Compute the comply : a. the sample mean b. the sample variance c. the sample distribution routine d. a parametric estimate of the concentration, assuming the data are gaussian with the sample distribution hateful and variance 10.3 Ten datum samples are 1.44, 7.62, 15.80, 14.14, 3.54, 11.76, 14.40, 12.33, 7.08, and 3.40. Compute the following : a. the sample intend b. the sample variability c. the sample distribution affair d. a parametric estimate of the density, assuming the data are exponential with the sample hateful 10.4 Generate 100 samples from an N ( O, 1 ) distribution. a. Compute the sample beggarly and discrepancy. b-complex vitamin. Plot the sample distribution distribution serve against a gaussian distribution affair. c. Plot the parametric estimate of the density using the sample entail and variation against the actual Gaussian concentration. 10.5 Generate 100 samples from an exponential distribution with A.= 1. a. Compute the sample bastardly and variance. barn. Plot the sample distribution distribution function against an exponential distribution function. c. Plot the parametric estimate of the density using the sample distribution beggarly against the actual exponential concentration. 296 CHAPTER 10 ELEMENTS OF STATISTICS 10.6 The samples below are liD from one of the three distributions : N ( Jl, a 2 ), exponential with parameter A, or uniform U ( a, b ) where b > a. For each data set, settle which distribution best describes the data ( including values for the unknown parameters ). Justify your answers. a. X ;

= [ 0.30, 0.48, -0.24, -0.04, 0.023, -0.37, -0.18, -0.02, 0.47,0.46 ]

b. X ; = [ 2.41,2.41,5.01,1.97,3.53,3.14,4.74,3.03,2.02,4.01 ] c. X ; = [ 1.87,2.13, 1.82,0.07,0.22, 1.43,0.74, 1.20,0.61, 1.38 ]

d. X ; 10.7

= [ 0.46, -0.61, -0.95, 1.98, -0.13, 1.57, 1.01, 1.44,2.68, -3.31 ]

The samples below are liD from one of the three distributions : N ( Jl, a 2 ), exponential with parameter A, or undifferentiated U ( a, b-complex vitamin ) where barn > a. For each datum set, determine which distribution best describes the data ( including values for the unknown parameters ). Justify your answers. a. X ; = [ 1.50,0.52,0.88,0.18, 1.24,0.41,0.32,0.14,0.23,0.96 ] b. X ; = [ 3.82,4.39,4.75,0.74,3.08,1.48,3.60,1.45,2.71, -1.75 ] c. X ;

d. X ;

= [ -0.18, -0.22, 0.45, -0.04,0.32, 0.38, -0.48, -0.01, -0.45, -0.23 ] = [ -0.85, 0.62, -0.22, -0.69, -1.94,0.80, -1.23, -0.03,0.0 1, -1.28 ]

10.8

Let X ; for one = 1,2, …, normality be a sequence of autonomous, but not necessarily identically distributed, random variables with coarse beggarly E [ Xi ] = Jl and unlike variances Var [ Xi ] = af. Let T = 2:7= 1 a ; ten ; be an calculator of Jl. Use Lagrange multipliers to minimize the discrepancy frequently subjugate to the constraint E [ T ] = Jl. What are the resulting a ; ? ( This trouble shows how to combine observations of the like quantity, where the observations have different accuracies. )

10.9

Show Equation ( 10.6 ), which is an excellent use in the algebra of expected values. A crucial footfall is the expect respect of X ; Xr

i=j it’- } 10.10

The exponential slant in Section 10.4 can be interpreted as a lowpass trickle operating on a discrete time sequence. Compute the frequency response of the filter, and plat its magnitude and phase.

10.11

Let X ; ben liD U ( O, lambert ) random variables. What are the hateful and variance of the minimum-order and maximum-order statistics ?

10.12

Let X ; be newton hat U ( O, 1 ) random variables. Plot, on the like axes, the distribution functions of X, ofX ( 1l, and ofX ( normality ) ·

10.13

Let X 1, Xz, …, Xn be n autonomous, but not necessarily identically distributed, random variables. Let X ( k ; m ) denote the kth-order statistic taken from the first thousand random variables. then, Pr [ X ( thousand ; n ) ~x ] =Pr [ X ( k-1 ; n-1 ) ~x ] Pr [ Xn ~x ] +Pr [ X ( k ; n-1 ) ~x ] Pr [ Xn > x ] a. Justify the recursion above. Why is it true ? b. Rewrite the recursion using distribution functions.

Problems

297

c. Recursive calculations need boundary conditions. What are the boundary conditions for the recursion ? IO.I4

Write a short plan using the recursion in Problem 10.13 to calculate orderliness statistic distributions. a. Use your plan to calculate the distribution function of the median of n = 9 gaussian N ( O, 1 ) random variables. bel. Compare the computer calculation in part a, above, to Equation ( 10.9 ). For exemplar, show the recursion calculates the same values as Equation ( 10.9 ).

IO.I5

Use the platform you developed in Problem ( 10.14 ) to compute the intend of the median-order statistic of normality = 5 random variables : a. X ; -N ( O, l ) fori=1,2, … ,5. b. X ; -N ( O, fifty ) fori= 1,2, … ,4andXs -N ( lambert, fifty ). c. X ; – N ( O, 1 ) fori= 1,2,3 and X ; – N ( fifty, 1 ) fori= 4,5.

IO.I6

IfX 1, X2, …, Xn are normality hat continuous random variables, then the density of the kth-order statistic can be written as

fx ckl ( x ) dx= ( thousand

n k ) pk-i ( x ) -j ( x ) dx· ( l-F ( x ) ) ” -k -1,1, n-

a. Justify the convention above. Why is it true ? b. Use the formula to compute and plot the concentration of the medial of normality = 5 N ( 0, 1 ) random variables. IO.I7

Repeat Example 10.1 0, but using the parameterization of the exponential distribution in Equation ( 8.14 ). In early words, what is the censor estimate of J.L ?

IO.IS

Assuming X is continuous, what value of x maximizes the discrepancy of distribution function appraisal F ( ten ) ( Equation 10.10 ) ?

IO.I9

Write a curtly calculator function to compute a KDE using a gaussian kernel and Silverman ‘s rule for h. Your course of study should take two inputs : a sequence of observations and a succession of prey values. It should output a sequence of density estimates, one concentration appraisal for each rate of the aim sequence. Test your broadcast by reproducing Figure 10.3.

I0.20

Use the KDE function in Problem 10.19 to compute a KDE of the density for the data in Problem 10.2.

I0.2I

Use the values for a ( Equation 10.21 ) and boron ( Equation 10.22 ) to deduce Equation ( 10.23 ).

I 0.22

Repeat the steps ofExample 10.10 for the alternative parameterization of the exponential francium ( triiodothyronine ) = ( llto ) e-tlto to find the MLE of to.

I0.23

Let X ; be liD U ( 0,8 ), where 8 > 0 is an obscure argument that is to be estimated. What is the MLE of 8 ?

I 0.24

Repeat the calculation of the sequence of probability estimates of phosphorus in Example 10.11 using a = fJ = 0.5.

CHAPTER

GAUSSIAN RANDOM VECTORS AND LINEAR REGRESSION

Multiple Gaussian random variables are best deal with as random vectors. Many of the properties of gaussian random vectors become the properties of vectors and matrices. This chapter besides introduces linear regression, a common technique for estimating the parameters in analogue models.

11.1 GAUSSIAN RANDOM VECTORS Multiple Gaussian random variables occur in many applications. The easiest way to manipulate multiple random variables is to introduce random vectors. This section begins by discussing multiple gaussian random variables, then introduces random vectors and concludes with some properties of gaussian random vectors. Let X, Y, and Z be freelancer gaussian random variables. then, the joint distribution function is the merchandise of the individual densities :

{ Y :5 yttrium } n { Z :5 omega } ] = Pr [ X :5 x ] · Pr [ Y :5 y ] · Pr [ Z :5 zj

FXYz ( x, yttrium, omega ) = Pr [ { X :5 x ) n

= Fx ( x ) Fy ( yttrium ) Fz ( z )

The joint density is

fxyz ( x, y, z ) =

aaa

ax ay & FXYz ( x, y, z ) d

vitamin d

d

= dxFx ( x ) dJFy ( y ) ” J ; Fz ( omega )

298

( by independence )

11.1 gaussian Random Vectors

299

= fx ( x ) jy ( y ) jz ( z ) 2

2

2

( ( x -p.xl ( y -p.y ) ( z -p.zl ) = axayaz ( 2n ) 312 exp -~-~-~ 1

These expressions quickly get gawky as the number of random variables grows. To simplify the notation-and improve the understanding-it is easier to use random vectors. We use three different notations for vectors and matrices, depending on the situation. From simplest to most complicated, a vector can be represented as

and a matrix as gold a21

A= [ a ; jl =

:

( ani

A random vector is a vector, represented as

x,

whose components are random variables and can be

where each ten ; is a random variable.

comment 11.1 : In this chapter, we deviate slenderly from our usual practice of writing random variables as bold-italic capital letters. We write random vectors as bold-italic lowercase letters adorned with the vector symbol ( the minor arrow on clear of the letter ). This allows us to use small letter letters for vectors and uppercase letters for matrices. We still use bold-italic capital letters for the components of random vectors, as these components are random variables.

Expected values of random vectors are defined componentwise :

300 CHAPTER 11 GAUSSIAN RANDOM VECTORS AND LINEAR REGRESSION

Correlations and covariances are defined in terms of matrices :

XXT =

[ x, x, XzX1.

XnXI

x, x.l XzXn

X1Xz XzXz

( 11.1 )

XnXz

…

XnXn

The autocorrelation matrix is

Rxx=E [ xxT ]

[ E [ x, x, J E [ XzX ! ]

E [ X1X2 ]

E [ XzXz ]

E [ XzXn ]

E [ XnX ! ]

E [ XnXz ]

E [ XnXn ]

[ ‘ ”

r21

rnl

rl2 rzz rnz

E [ x, x. [ lambert

‘rzn ” fifty

( 11.2 )

rnn

Where possible, we drop the subscripts on R, Jl, and C ( see below ). The covariance matrix, C= Cxx > is ( 11.3 )

Recall that a matrix A is symmetrical if A =A r. It is nonnegative definite if xTAx~ 0 for all ten degree fahrenheit :. 0. It is incontrovertible definite ifXTAx > 0 for all ten degree fahrenheit :. 0. Since adam ; Xj = XjXi ( ordinary generation commutes ), r ; j = rji· Thus, R is symmetrical. similarly, C is symmetrical. R and C are besides nonnegative definite. To see this, let a be an arbitrary nonzero vector. then,

arRa = arE [ : xxr ] a =E [ aTXiTaj = E [ carxlc : xYal ]

=E [ YTYj =E [ Y

2

]

( letting Y = : xra )

( YT = Y since Y is a 1 adam 1 matrix )

~0

In the above controversy, Y = : xra is a 1 ten 1 matrix ( i.e., a scalar ). consequently, yT = Y and yT Y = Y 2. The same argument shows that Cis nonnegative definite.

11.1 gaussian Random Vectors

301

Sometimes, there are two unlike random vectors, x andy. The joint correlation and covariance-and subscripts are needed here-are the trace : Rxy =E [ xyTj Cxy = E [ ( x- JlxHY- Py ) T ) ] = Rxy – PxPy Rxy and Cxy are, in general, neither symmetrical nor nonnegative definite. The deciding of a matrix is denoted ICI. If the covariance matrix is cocksure definite, then it is besides invertible. If indeed, then we say x- N ( P, , C ) if _ jx ( x ) =

1

ex- P.lrc- cx- P.l ) 1

(

exp – – – ‘ – – – – – ‘ – V ( 2n ) ” ICI 2

( 11.4 )

If Cis diagonal, the density simplifies well :

C=

[

~i

0 a~

. 0

0

C’= [ T

lla~

0

The quadratic form in the advocate becomes

0 0

fifty [ ten ! – Jlll

1/~~

The overall density then factors into a product of individual densities :

c-l

r. x = Jx

1

O ” J0 ” 2···anJC2n ) ”

~ ex-husband ; exp ( – L .. i=l 2ai2

Jl ; l2 )

X2- Jl2

Xn

~

Jln

302

CHAPTER 11 GAUSSIAN RANDOM VECTORS AND LINEAR REGRESSION

Thus, if C is aslant, the ten ; are autonomous. This is an significant solution. It shows that if gaussian random variables are uncorrelated, which means C is diagonal, then they are independent. In practice, it is difficult to verify that random variables are independent, but it is frequently easier to show the random variables are uncorrelated.

comment 11.2 : In Comment 8 .2, we pointed out that to show two random variables are mugwump, it is necessity to show the joint concentration factors for all values of x andy. When ten and Yare jointly Gaussian, the process is simpler. All that needs to be done is to show X and Yare uncorrelated ( one .e., that f ( XY ] = E [ X ] · E [ Y ] ). normally, uncorrelated random variables are not necessarily mugwump, but for gaussian random variables, uncorrelated means independent.

The MGF of a vector gaussian random variable is useful for calculating moments and showing that linear operations on gaussian random variables result in gaussian random variables. First, recall the MGF of a single N ( p, a 2 ) random variable is Equation ( 9.15 ) :

.it ( uranium ) = E [ e ” x ] = exp ( up+ a 2u212 ) If X 1, X 2, …, Xn are lid N ( phosphorus, a 2 ), then the MGF of the vector xis the product of the person MGFs :

.it ( uranium, ‘ u2, … ‘Un ) = E [ e ” ‘ x ‘ +u2X2 +···+unXn l = E [ e ” 1x 1 ] E [ e ” 2×2 ] · · ·E [ einsteinium ” ” adam ” ]

( by independence )

=.it ( u Jl .. U Cu2l ···.it ( united nations ) = exp { u 1 phosphorus + ufa 212 ) · exp { u2p + u~a 2 12 ) · · · exp { UnJl + u~a 2 12 ) 2

= exp ( Cu, + u2 + · · · + Un ) Jl ) · exp { Cuf + u~ + · · · + u~ ) a 12 )

The last saying can be simplified with matrices and vectors :

.it ( uranium ) =E [ vitamin e ” rx ] =exp ( urji+

a2~Tu )

( ll.S )

where u = [ u 1, u2, …, unlT and jemaah islamiyah = [ Jl, Jl, …, Jl ] T. The MGF of the general font x- N ( jemaah islamiyah, C ) is the lapp as Equation ( ll.S ), but with C replacing a 2 I :

.. U ( uracil ) =exp ( urji+ ur2cu )

( 11.6 )

In mesa 11.1, we list several equivalents between formulas for one gaussian random discrepancy and formula for gaussian vectors.

11.2 Linear Operations on Gaussian Random Vectors

303

TABLE 11.1 Table of equivalents for one gaussian random variable and n gaussian random variables. 1 gaussian

Random Variable

newton Gaussian Random Variables

X-N ( f.l, a 2 )

x-N ( jl, C )

E [ X ] =f.l

E [ x ] =11 Cov [ x ] =C

Var [ X ] =a 2 2

( X-f.l ) ) Jx ( x ) = -1- exp ( – V2na2 2a 2

Jx ( x ) = – -1- – ex-husband ( – ( x-jl ) Tc- ‘ ( X-jl ) )

.it ( uranium ) = exp ( Uf.l + a22u2 )

.. it ( u ) = exp ( uTjl + z 1.5, and choose fi = 0 if N < 1.5-calculate Pr [ FP ], Pr [ TP ], Pr [ FN ], and Pr [ TN ]. 12.5 Consider a simple discrete hypothesis test. Under Ho, N is uniform on the integers from 0 to 5 ; under H1, N is binomial with parameters n = 5 and p. a. What is the likelihood proportion of this test ? bacillus. Calculate and plot the ROC swerve assumingp = 0.5. c. Assuming the two hypotheses are evenly probable, what is the MAP test ? 12.6 Consider a bare hypothesis test. Under Ho, X is exponential with parameter A = 2 ; under H 1, X- U ( O, lambert ). a. What is the likelihood proportion of this test ? bacillus. Calculate and plot the ROC crook. c. Assuming the two hypotheses are equally likely, what is the MAP detector ? 12.7 Consider a simpleton hypothesis quiz. Under Ho, X is exponential with parameter A= 1 ; under H1, X is exponential with parameter A= 2. a. What is the likelihood proportion of this test ? b. Calculate and plot the ROC wind. c. Assuming the two hypotheses are evenly likely, what is the MAP detector ? 12.8 Considerasimplehypothesis examination. Under H 0, X -N ( O, a~ ) ; under H 1, X -N ( O, american federation of labor with af > a~. a. What is the likelihood proportion of this quiz ? boron. What is the Neyman-Pearson likelihood proportion test ? c. Calculate Pr [ FP ] and Pr [ TP ] as functions of the test. d. Plot the ROC curve assuming at = 2a~. e. Assuming the two hypotheses are evenly probably, what is the MAP detector ?

Problems 12.9

339

In any hypothesis test, it is possible to achieve Pr [ TP ] = Pr [ FP ] = p, where 0, ; phosphorus, ; 1, by inserting some randomness into the decision procedure. a. How is this possible ? boron. Draw the ROC curve for such a routine. ( touch : the AUC is 0.5 for such a test ; see Problem 12.2. )

12.10

For the hypothesis testing exercise shown in Figure 12.1 with following probabilities :

xy

= 1.5, calculate the

a. Pr [ FP ] b. Pr [ TN ] c. Pr [ FN ] d. Pr [ TP ] 12.11

Repeat the radar case, but with Laplacian preferably than gaussian noise. That is, assume Ho : X= sulfur + N with south > 0 and H1 : X= N with N Laplacian with parameter A .. a. Compute and plot the log-likelihood ratio. b. Compute and plot the ROC curl. note that since the log-likelihood ratio has three regimes, the ROC wind will have three regimes.

12.12

Consider a binary star communications problemY =X +N, where Pr [ X = 1 ] = Pr [ X = -1 ] = 0.5, N- N ( O, u 2 ), and X and N are independent. a. What is the MAP detector for this trouble ? bel. What is the error rate for this detector ? c. What happens to theMAPdetectorifPr [ X = 1 ] =p > 0.5 and Pr [ X = -1 ] = 1-p < 0.5 ? That is, how does the optimum decision convention change ? d. What happens to the probability of error when phosphorus > 0.5 ? What is the terminus ad quem asp- 1 ?

CHAPTER

RANDOM SIGNALS AND NOISE

We talk, we listen, we see. All the concern signals we experience are random. In truth, nonrandom signals are uninteresting ; they are wholly predictable. random signals are nameless and irregular and therefore much more interesting. If Alice wants to communicate information to Bob, then that information must be obscure to Bob ( before the communication takes place ). In other words, the sign is random to Bob. This chapter introduces random signals and presents their probabilistic and spectral properties. We pay particular attention to a class of random signals called wide-eyed smell stationary ( WSS ), as these appear in many mastermind applications. We besides discuss make noise and linear filters.

13.1 INTRODUCTION TO RANDOM SIGNALS A random signal is a random function of clock. Engineering has many examples of random signals. For example, a address fricative ( e.g., the “ thorium ” or “ fluorine ” reasoned ) is made by passing disruptive atmosphere through a constrict open such as between the teeth and lips. Fricatives are noise-like. other examples are the random voltages in an electric circuit, temperatures in the standard atmosphere, photon counts in a pixel detector, and radio signals. In this section, we give a flying insertion to random signals and the more cosmopolitan class, random processes. Let X ( triiodothyronine ) be a random serve. Fix t as, say, triiodothyronine 0. then, X ( thyroxine 0 ) is a random variable. Fix another time, t 1. then, X ( deoxythymidine monophosphate ! liter is another random variable. In general, let t1 > t2 ,. .., tennessee be times. then, X ( tJl, X ( t2 ) ,. .., XCtnl are random variables. Letj ( x ; metric ton ) denote the densityofX ( deoxythymidine monophosphate ). The joint densityofX ( tJl andX ( t2 ) is j ( forty > x 2 ; t1 > t 2 ). In general, the nth-order density is j ( x 1, x2, …, xn ; t 1, t2, …, tennessee ). The mean of X ( metric ton ) is p. ( t ) : p. ( deoxythymidine monophosphate )

340

=E [ XCtl ] =

L :

xj ( x ; t ) dx

13.2 A simpleton Random Process

341

In the built-in, the density is a officiate of clock. consequently, the average is a function of time. similarly, the moment here and now and variance are functions of time :

E [ X2Ctl ] =E [ X ( thymine ) x ( thallium ] =

L :

x 2j ( x ; t ) dx

a 2 ( thymine ) =Var [ XCtl ] =E [ X 2Ctl ] -p.Ctl 2 The autocorrelation RxxCti > t 2 ) and autocovariance CxxCti > t 2 ) of a random procedure are

RxxCti > t2l = E [ X ( t1 ) ten ( t2 ) ] CxxCtJ, t2l = E [ ( XCt1l -p.Ct1l ) ( XCt2l -p.Ct2l ) ]

= E [ XCt1 ) X ( t2 ) j-p. ( tJlp.Ct2l = RxxCt !, t2l -p. ( tJlp.Ct2l

( 13.1 )

The autocorrelation and autocovariance are measures of how correlative the random process is at two different times. The autocorrelation and covariance functions are symmetrical in their two arguments. The proof follows because generation commutes :

RxxCt !, t2l = E [ XCtJlXCt2l ]

= E [ XCt2lXCtJl ] = RxxCt2, t1 )

( 13.2 )

CxxCtJ, t2l = RxxCtJ, t2l- E [ XCt1l ] E [ XCt2l ]

= RxxCt2, t1 ) – east [ XCt2l ] E [ XCtJl ] = CxxCt2, tJl

( 13.3 )

When it is well-defined, what random process is being referred to, we drop the subscripts on Rxx and Cxx :

R ( thallium, t2 ) = RxxCt !, t2 ) C ( thallium, t2 ) = CxxCti > t2 ) The cross-correlation between two random processes is

13.2 A SIMPLE RANDOM PROCESS To illustrate the calculations, we present a simpleton random procedure and compute its probabilities and moments. Let X ( thyroxine ) = Xo for -= < t £ ] :5 E [ { x Ctl -.X 0 Since Xa ( t ) approximates Xa ( t ) in the mean squared sense, the probability that Xa ( thyroxine ) differs from X a ( metric ton ) is 0 for all metric ton. gossip 13.10 : While we can not say Xh 0 ( thyroxine ) =X0 ( triiodothyronine ) for all thyroxine, we can say the probability they are the same is 1. This is a mathematical technicality that does not diminish the utility of the sample distribution theorem for random signals. 13.9.2 Example : figure 13.4 figure 13.4 shows an example of the sampling theorem applied to a random signal. The top graph shows a gaussian WSS signal. The center graph shows the signal sampled with sample distribution meter T = 1. The penetrate graph shows the sine reconstruction. 360 CHAPTER 13 RANDOM SIGNALS AND NOISE X ( normality ) FIGURE 13.4 Sampling theorem exemplar for a random bespeak. The top graph shows the original signal, the in-between graph the sample bespeak, and the bottom graph the reconstructed sign. Since 11 T = 1, the highestfrequency in X ( thyroxine ) must be 0.5 orless. We chose 0.4. In practice, a little oversampling helps compensate for the three problems in sine reconstruction : The signal is only approximately bandlimited, the sine function must be discretized, and the sine function must be truncated. evening though the first gear graph shows a continuous signal, it is drawn by oversampling a discrete sign ( in this case, oversampled by a factor of four ) and “ connecting the dots : ‘ For 25 seconds, we generate 101 samples ( 25 ·4+ 1 ; the signal goes from thymine = 0 tot= 25, inclusive ). The signal is generated with the following Matlab command : x = randn ( 1,101 ) ; Since the signal is oversampled by a component of 4, the upper frequency must be adjusted to 0.4/4 = 0.1. A filter ( equitable about any lowpass filter with a cutoffof0.1 would work ) is designed by barn = firrcos ( 30, 0.1, 0.02, 1 ) ; The signal is filtered and the cardinal helping extracted : xa xa = = conv ( x, barn ) ; xa ( 30:130 ) ; This signal is shown in the top graph ( after connecting the dots ). The sample signal ( middle graph ) is created by zeroing the samples we do not want : xn = xa ; xn ( 2:4 : end ) xn ( 3:4 : end ) xn ( 4:4 : end ) O· ‘ O· ‘ O· ‘ 13.9 The Sampling Theorem for WSS Random Processes 361 For the reconstruction, the sine function must be discretized and truncated : s = sinc ( -5:0.25 : 5 ) ; note that the sine function is sampled at four samples per time period, the same rate at which the signal is sampled. The reconstruction is created by convolving the sine routine with the sample bespeak : xahat xahat = conv ( xn, mho ) ; xahat ( 20 : 121 ) ; 13.9.3 Proof of the Random Sampling Theorem In this incision, we present the basic steps of the proof of the sampling theorem for WSS random signals. The MSE can be expanded as The first term simplifies to E [ X~ ( deoxythymidine monophosphate ) ] = R ( O ) If XaCtl ~ Xa ( triiodothyronine ), then it must be true that E [ XaCtlXaCtl ] ~ R ( O ) and E [ X ! Ctl ] ~ R ( O ). If these are true, then E [ ( Xa ( tl- XaCtl ) 2 ] = R ( O ) – 2R ( O ) + R ( O ) = 0 The southern cross of the proof is to show these two equalities, E [ XaCtlXa ( t ) ] ~ R ( O ) and E [ .X ! ( triiodothyronine ) ] ~ R ( O ). The proof of these equalities relies on respective facts. First, note that the autocorrelation function, R ( T ), can be thought of as a deterministic bandlimited signal ( the Fourier transform of R is bandlimited ). consequently, R ( radius ) can be written in a sine expansion : 00 R ( roentgen ) = L R ( north ) sinc ( Cr-nD ! T ) n= -oo Second, R is symmetrical, R ( T ) = R ( -T ). Third, the sine function is besides symmetrical, sinc ( metric ton ) = sinc ( -t ). Consider the first equality : n= -oo 00 = L E [ XaCt ) X ( newton ) ] sinc ( Ct-nD ! T ) n= -oo 00 = L n= -oo R ( nT- triiodothyronine ) sinc ( Ct- north dakota ! T ) ( 13.8 ) 362 CHAPTER 13 RANDOM SIGNALS AND NOISE Let R ‘ ( national trust ) = R ( nT- triiodothyronine ). Since it is just a time-shifted version of R, R ‘ is bandlimited ( the Fourier transform of R ‘ is phase-shifted, but the magnitude is the lapp as that of R ). then, fluorine : E [ XaCtliaCtl ] = R’CnDsinc ( Ct-nD ! T ) nitrogen = -oo =R ‘ ( metric ton ) ( the sample theorem ) =R ( t-t ) ( replaceR ‘ ( triiodothyronine ) =R ( t-t ) ) =R ( O ) now, spirit at the second equality : A2 00 00 L E [ XaCtl ] =E [ X ( normality ) sinc ( Ct-nD ! T ) n= -oo 00 X ( molarity ) sinc ( ( t-mT ) ! T ) ] megabyte = -oo 00 L L = L E [ X ( newton ) adam ( thousand ) ] sinc ( Ct-nD ! T ) sinc ( Ct-mD ! T ) n= -oom = -oo 00 = 00 I : I : R ( mT-nDsinc ( Ct-nD ! T ) sinc ( Ct-mD ! T ) n= -oom = -oo fluorine : ( fluorine : = R ( mT-nDsinc ( Ct-mD ! T ) ) sinc ( Ct-nD ! T ) n= -oo m=-oo Consider the formula in the bombastic parentheses. As above, let R ‘ ( mD = R ( mT- neodymium. then, the formula is a sine expansion of R ‘ ( metric ton ) : 2 00 E [ iaCtl ] = L R ‘ ( t ) sinc ( ( t-nT ) ! T ) n= -oo 00 = L R ( t-nDsinc ( Ct-nD ! T ) n= -oo 00 = L ( lapp as Equation 13.8 ) R ( nT-t ) sinc ( Ct-nD ! T ) n= -oo =R ( O ) Thus, we have shown both equalities, which implies In drumhead, the sampling theorem applies to bandlimited WSS random signals american samoa well as bandlimited deterministic signals. In practice, the sampling theorem provides mathematical documentation for converting back and away between continuous and discrete clock. A random sign ( besides known as a random process ) is a random function oftime. Let j ( x ; thymine ) denote the density of X ( deoxythymidine monophosphate ). The roast density of X ( metric ton 1 ) and X ( triiodothyronine 2 ) isj ( ten 1, x2 ; t 1, t2 ). The first and second moments of X ( thymine ) are p. ( metric ton ) =E [ XCtl ] = L : xj ( x ; t ) dx Summary 363 E [ X2 Ctl ] =E [ X ( metric ton ) x ( thallium ] =I : ten 2f ( adam ; t ) dx Var [ XCt } ] = E [ X 2 ( thymine ) j-p. ( triiodothyronine ) 2 The autocorrelation RxxCti > tz ) and autocovariance CxxCti > tz ) of a random march are

RxxCtJ, tz ) =E [ X ( titanium ) x ( tz ) ] CxxCtl, tz ) = E [ { XCtJl-p. ( thallium ) ) { X ( tz ) -p. ( tz ) ) ] = RxxCtl, tz ) -p. ( tJlp. ( tz )

The Fourier transform of x ( t ) is X ( tungsten ) :

X ( west ) = $ ( xCtl ) =I : ten ( thymine ) e-jwtdt The inverse Fourier translate is the inverse transform, which takes a frequency function and computes a meter officiate :

A random process is wide sense stationary ( WSS ) if it satisfies two conditions : 1. The average is ceaseless in meter ; that is, E [ XCt } ] = p. =constant.

2. The autocorrelation depends lone on the difference in the two times, not on both times individually ; that is, R ( t1, tz ) = RxxCtz – t1 ). The baron spectral concentration ( PSD ) SxxCw ) of a WSS random action is the Fourier transform of the autocorrelation affair :

S ( west ) = $ { RCrl ) =I : gas constant ( roentgen ) e-wr dr We have three ways to compute the average power of a WSS random work : average power= a 2 + p. 2 = R ( O ) = farad, ; joule : C, S ( w ) dw. Let X ( triiodothyronine ) be a continuous clock time WSS random work that is input to a linear percolate, henry ( deoxythymidine monophosphate ), and let the output be Y ( metric ton ) = X ( thymine ) * planck’s constant ( metric ton ). The PSD of Y ( metric ton ) is the PSD of X ( thyroxine ) multiplied by the filter order of magnitude squared, IH ( w ) l 2 • The three most important types of noise in typical electric systems are quantization noise ( uniform ), Poisson ( scene ) noise, and Gaussian thermal noise. When the noises are uncorrelated, the variances add :

N=N1+Nz+···Nk Var [ N ] = Var [ N 1 ] + Var [ N 2 ] + · · · + Var [ Nk ]

White randomness has a PSD that is constant for all frequencies, SxxCw ) = N 0 for-= < west < =· The autocorrelation function, consequently, is a delta routine : 364 CHAPTER 13 RANDOM SIGNALS AND NOISE The amplitude modulation of a random sinusoid by a WSS bespeak Y ( thymine ) = A ( triiodothyronine ) cos ( wct+8 ) is WSS, with PSD Syy ( W ) = ( SAA ( W -We ) +SAA ( W +wcl ) 12. The discrete time Wiener trickle estimates S by a linear combination of stay samples of ten : A S ( north ) = p L akX ( n- kilobyte ) k=O The optimum coefficients satisfy the Wiener-Hopf equations : p O=RxsCll- LakRxxCl-k ) k=O forl=O, fifty, …, p The sample theorem is the theoretical corroborate of using digital computers to process analogue signals. It applies to WSS signals adenine well : Theorem 13.3 : IfXaCtl is bandlimited tow Hz and sampled at rate liT > 2W, then XaCtl can be reconstructed from its samples, X ( newton ) = XaCnD :

iaCtl =

oo

L

n=-oo

X ( newton )

sine ( newton ( t-nDID newton ( triiodothyronine – national trust ) IT

=

oo

L

n=-oo

X ( nitrogen ) sinc ( nCt-nDIT )

The reconstruction iaCtl approximates XaCtl in the beggarly squared sense :

E [ ( XaCtl-iauln =O

13.1

IfX ( t ) and Y ( thymine ) are independent WSS signals, is X ( t ) – y ( thyroxine ) WSS ?

13.2

A bibulous random walk is a random march defined as follows : let S ( n ) = 0 for north ~ 0 and S ( nitrogen ) = X 1 + X 2 + · · · + Xn, where the ten ; are liD with probability Pr [ X ; = 1 ] = Pr [ X ; = -1 ] =0.5.

a. What are the base and division of S ( normality ) ? b-complex vitamin. What is the autocorrelation affair of S ( nitrogen ) ? c. Is S ( n ) WSS ? d. What does the CLT say about S ( newton ) for bombastic n ? e. Use a numerical package to generate and plot an case sequence S ( n ) for nitrogen = 0,1, … ,100. 13.3

Used to model price movements of fiscal instruments, the gaussian random walk is a random process defined as follows : let S ( n ) = 0 for normality ~ 0 and S ( nitrogen ) =X 1 +X2 + · ·· +Xn, where the adam ; are lid N ( O, a 2 ). a. What are the mean and division of S ( newton ) ? barn. What is the autocorrelation affair of S ( nitrogen ) ?

Problems

365

c. Is S ( nitrogen ) WSS ? d. Use a numerical software to generate and plot an example sequence S ( n ) for n 0,1, … ,100. Use u 2 = 1.

=

13.4

What is the Fourier transform of x ( thyroxine ) sin ( wetl ?

13.5

The trickle signal PSD result in Equation ( 13.6 ) holds in discrete meter ampere well as continuous clock. Repeat the sequence in Section 13.5 with sums alternatively of integrals to show this.

13.6

Let X ( thyroxine ) be a WSS signal, and let ( } – U ( 0,2n ) be freelancer of X ( deoxythymidine monophosphate ). Form Xe ( deoxythymidine monophosphate ) adam ( deoxythymidine monophosphate ) carbon monoxide ( wt + 6 ) and X, ( triiodothyronine ) = X ( triiodothyronine ) sin ( wt + 6 ).

=

a. Are Xe ( deoxythymidine monophosphate ) and X, ( thyroxine ) WSS ? If thus, what are their autocorrelation functions ?

boron. What is the cross-correlation between Xe ( triiodothyronine ) and X, ( thymine ) ? 13.7

Let X ( deoxythymidine monophosphate ) be a gaussian ashen noise with variability u 2. It is filtered by a perfect lowpass percolate with magnitude IH ( tungsten ) liter = 1 for lwl We. What is the autocorrelation serve of the filter sign ?

13.8

Let X ( t ) be a gaussian white noise with variance uranium 2. It is filtered by a arrant bandpass trickle with magnitude IH ( w ) l = 1 for w1 < lwl < wz and IH ( west ) fifty = 0 for other values of w. What is the autocorrelation function of the percolate sign ? 13.9 The advantage of sending an unmodulated carrier is that receivers can be built cheaply with simple hardware. This was specially significant in the early days of radio. The disadvantage is the unmodulated carrier requires a considerable fraction of the sum available might at the vector. Standard broadcast AM radio stations transmit an unmodulated carrier equally well as the modulated carrier. Let W ( t ) be a WSS baseband signal ( i.e., the voice or music ) scaled so that IW ( triiodothyronine ) lambert : : : ; 1. then, A ( deoxythymidine monophosphate ) = W ( thymine ) + 1. a. What are the autocorrelation and baron spectral densities of A ( deoxythymidine monophosphate ) and Y ( t ) ? b. Redraw Figure 13.2 to reflect the bearing of the carrier. 13.10 A common exercise of the Wiener filter is when S ( n ) =X ( n+ 1 ) -thatis, when the desired signal S ( nitrogen ) is a prediction of X ( n + 1 ). What do the Wiener-Hopf equations looks like in this case ? ( These equations are known as the Yule-Walker equations. ) 13.11 Create your own translation of Figure 13.4. CHAPTER SELECTED RANDOM PROCESSES A great variety of random processes occur in applications. Advances in computers have allowed the simulation and processing of increasingly large and complicate models. In this chapter, we consider three significant random processes : the Poisson march, Markov chains, and the Kalman trickle. 14.1 THE LIGHTBULB PROCESS An elementary discrete random process is the light bulb process. This simple process helps illustrate the kinds of calculations done in studying random processes. A light bulb is turned on at time 0 and continues until it fails at some random time T. Let X ( t ) = 1 if the light bulb is working and X ( thyroxine ) = 0 if it has failed. We assume the failure time Tis exponentially distributed. Its distribution function is Pr [ T :5 thyroxine ] = Fy ( deoxythymidine monophosphate ) = 1- e-M Therefore, the probability mass function of X ( thymine ) is Pr [ XCt } = 1 ] = Pr [ thyroxine :5 Tj = 1- Fy ( thymine ) =e-M Pr [ XCt } = 0 ] = Pr [ T < thymine ] = Fy ( metric ton ) = 1- e-M A sample realization of the random summons is shown in Figure 14.1. adam ( metric ton ) starts out at 1, stays at 1 for awhile, then switches to X ( metric ton ) = 0 at T = t and stays at 0 constantly subsequently. What are the properties of X ( metric ton ) ? The mean and variance are easily computed : einsteinium [ XCtl ] = 0 * Pr [ XCt } = 0 ] + 1 * Pr [ XCt } = 1 ] =e-M E [ XCtl 2 ] = 0 2 * Pr [ XCt } = 0 ] + 12 * Pr [ XCt } = 1 ] =e-M 2 Var [ XCt } ] = E [ XCtl 2 ] – vitamin e [ XCt } ] =e-M- e-nt E [ X ( triiodothyronine ) ] and Var [ XCt } ] are shown in Figure 14.2. 366 14.1 The Lightbulb Process 367 X ( triiodothyronine ) T FIGURE 14.1 Example ofthe lightbulb work. ten ( metric ton ) = 1 when the light bulb is working, and X ( thyroxine ) = 0 when the bulb has failed. Var [ X ( triiodothyronine ) ] FIGURE 14.2 Mean and variance of X ( deoxythymidine monophosphate ) for the light bulb process. The autocorrelation and autocovariance functions require the joint probability multitude function. Let t 1 and t 2 be two times, and let p ( one, j ; t 1, triiodothyronine 2 ) = Pr [ XCtJl = one n X ( metric ton 2 ) = joule ]. then, phosphorus ( O, O ; t1, t2 ) = Pr [ X ( thyroxine, ) = OnXCt2l = 0 ] =Pr [ T t 2 ] + 0 ·1 · Pr [ O, 1 ; thymine 1, t2 ] + 1 · 0 · Pr [ 1,0 ; t 1, t2 ] + 1 · 1 · Pr [ 1, fifty ; t1, tz ] The light bulb procedure is not WSS. It fails both criteria. The mean is not constant, and the autocorrelation affair is not a routine of alone the time deviation, t2 – metric ton 1. other, more interesting random processes are more complicate than the light bulb process and require more involve calculations. however, the light bulb summons serves as a useful example of the kinds of calculations needed. 14.2 THE POISSON PROCESS The Poisson process is an exercise of a wide-eyed class of random processes that possess independent increments. The basic idea is that points occur randomly in time and the number of points in any interval is freelancer of the number that occurs in any early nonoverlapping interval. A Poisson procedure besides has the property that the issue of points in any time interval ( south, thymine ) is a Poisson random varying with parameter lt ( t- south ). Another way of looking at it is that the beggarly phone number of points in the interval ( randomness, t ) is proportional to the size of the time interval, t-s. As one example of a Poisson process, consider a sequence of photons incident on a detector. The mean number of photons incident in ( sulfur, thymine ) is the saturation of the light, It, times the size of the interval, triiodothyronine -s. As another exercise, the numeral of diseases ( for example, cancer ) in a population versus clock time is much modeled as a Poisson march. In this section, we explore some of the basic properties of the Poisson process. We calculate moments and two matter to conditional probabilities. The foremost is the probability of getting lambert points in the interval ( 0, t ) given k points in the inadequate interval ( O, s ), with sulfur < t. The second is the revoke, the probability of getting k points in ( O, second ) given l points in ( O, thyroxine ). 14.2 The Poisson Process 369 Comment 14.1 : It is conventional to refer to point, and the Poisson process is frequently referred to as a Poisson point process. In ordinary English, one might use the son “ events ” to describe the things that happen, but we have already used this word to refer to a set of outcomes. Let ( s, triiodothyronine ) be a fourth dimension time interval with 0 < s < thymine, and let N ( s, t ) be the count of points in the interval ( randomness, deoxythymidine monophosphate ). For public toilet, letN ( s ) =N ( O, s ) andN ( t ) =N ( O, t ). WealsoassumeN ( O ) = 0. then, N ( s, triiodothyronine ) = N ( deoxythymidine monophosphate ) - N ( s ) This can be rearranged as N ( triiodothyronine ) = N ( s ) + N ( s, metric ton ). This is shown diagrammatically in Figure 14.3. 1 + – – – – – – – – – N ( thymine ) – – – – – – – – – + 1 ~ N ( s ) _ __, -1+ — — – N ( s, metric ton ) – – – – > 1

s

0

FIGURE 14.3 An example of a Poisson march. The points are shown as closed dots : N ( s ) = 5, 2, and N ( triiodothyronine ) = 7.

N ( s, t ) =

As mentioned above, a Poisson procedure has two significant properties. First, it has mugwump increments, meaning that if ( randomness, triiodothyronine ) and ( u, volt ) are disjoin ( nonoverlapping ) intervals, then N ( s, triiodothyronine ) and N ( u, five ) are mugwump random variables. In particular, N ( south ) and N ( s, deoxythymidine monophosphate ) are independent since the intervals ( O, sulfur ) and ( randomness, thymine ) do not overlap. Second, N ( s, deoxythymidine monophosphate ) is Poisson with parameter /t ( deoxythymidine monophosphate – sulfur ). For a Poisson random variable, E [ N ( s, triiodothyronine ) ] = lt ( t -s ). This presumption means the expect number of points in an interval is

proportional to the size of the interval. Using the Poisson assumption, means and variances are easily computed : e [ N ( triiodothyronine ) ] = Var [ N ( t ) ] =Itt E [ N ( s, t ) ] = Var [ N ( s, triiodothyronine ) ] = 1\, ( triiodothyronine -s )

Computing autocorrelations and autocovariances uses both the Poisson and independent increments assumptions : R ( s, thyroxine ) =E [ N ( s ) N ( thyroxine ) j

= E [ N ( mho ) ( N ( s ) +N ( s, thallium ) ] =E [ N ( s ) N ( s ) j +E [ N ( s ) N ( s, metric ton ) j 2

= Var [ NCsl ] +E [ NCsl ] +E [ N ( s ) jE [ N ( s, triiodothyronine ) j 2 2

= lts+lt south + ( lts ) ( lt ( t-s ) ) =Its ( I +Itt )

370

CHAPTER 14 SELECTED RANDOM PROCESSES

C ( s, deoxythymidine monophosphate ) = R ( s, thyroxine ) – vitamin e [ N ( s ) ] E [ N ( triiodothyronine ) ]

=Its An interest probability is the one that no points occur in an interval : Pr [ N ( s, metric ton ) = Oj = einsteinium — thymine ( metric ton – sulfur ) This is the same probability as for an exponential random variable star not occurring in ( second, t ). frankincense, we conclude the interarrival times for the Poisson process are exponential. That the waiting times are exponential gives a elementary rendition to the Poisson work. Starting from time 0, wait an exponential time T 1 for the first point. then, wait an extra exponential clock T 2 for the second point. The summons continues, with each distributor point occurring an exponential wait clock time after the former point. The theme that the waiting times are exponential gives an easy method acting of generating a realization of a Poisson action : 1. Generate a succession of U ( O, 1 ) random variables U 1, U 2, .. .. 2. Transform each as T 1 = -log ( UJ ) /.1\, , T2 = -log ( U2l/lt, …. The Ti are exponential random variables. 3. The points occur at times Tl > T 1 + T 2, T 1 + T 2 + T 3, etc.

( This is how the points in Figures 14.3 and 14.4 were generated. ) We besides calculate two conditional probabilities. The first uses the orthogonal increments property to show that the future of the Poisson march is independent of its past :

[

I

fifty

Pr N ( thymine ) =I N ( s ) = k =

Pr [ N ( thyroxine ) = lnN ( s ) = k ] [ ] Pr N ( s ) =k Pr [ N ( s, deoxythymidine monophosphate ) = 1-knN ( s ) = kilobyte ] Pr [ N ( s ) =k ] Pr [ N ( s, metric ton ) = 1-k ] · Pr [ N ( s ) = k ] Pr [ N ( s ) =k ]

( by independence )

= Pr [ N ( s, deoxythymidine monophosphate ) = 1- thousand ] Given there are I points at time t, how many points were there at time second ? The reversion conditional probability helps answer this interrogate and gives yet another connection between the Poisson and Binomial distributions :

[

I

liter

Pr N ( s ) = k N ( thymine ) =I =

Pr [ NCt ) =lnN ( s ) =k ] [ ] Pr N ( thymine ) =I Pr [ N ( s, t ) = 1-knN ( sulfur ) =k ] Pr [ N ( deoxythymidine monophosphate ) =I ] Pr [ N ( s, deoxythymidine monophosphate ) = 1-k ] · Pr [ N ( s ) = k ] Pr [ N ( deoxythymidine monophosphate ) =I ]

14.2 The Poisson Process

371

( 1\, ( metric ton _ s ) ) l-ke-.l ( t-s ) ( 1\, sulfur ) ke-.ls

k !

( l-k ) ! ( Itt ) I e-M

I ! I !

( s ) k ( t-s ) l-k

= ( l-k ) ! thousand !.

t

( 14.1 )

-t-

This probability is binomial with parameters I and sit. Curiously, the probability does not depend on lt. figure 14.4 shows a realization of a Poisson process created as described above with It = 1. In the fourth dimension time interval from deoxythymidine monophosphate = 0 to t = 25, we expect to get about E [ N ( 25 ) ] = 25/t = 25 points. We obtained 31 points, slightly greater than one standard diversion ( a= VIi= 5 ) above the mean. For big metric ton, the Poisson distribution converges to a gaussian by the CLT. On modal, we expect N ( thyroxine ) to be within one criterion deviation of the mean about two-thirds of the meter. In summary, the Poisson process is used in many reckon experiments. N ( s, t ) is the count of points that occur in the interval from s total and is Poisson with argument 1\, ( t-s ). The interval between consecutive points is exponential ; that is, the waiting time until the adjacent point occurs is exponential with parameter It ( the lapp It in the Poisson probabilities ). The mean of N ( s, metric ton ) is 1\, ( t- randomness ) ; the discrepancy of N ( s, thyroxine ) is besides lt ( t- south ). A typical N ( t ) sequence is shown in Figure 14.4.

30

vTt 25

Jit

A-t 20 15 10 5

5

Ill

10

Ill I I II I IIIII I

15

I

20

II

II Ill I I

25 Gk > Hk > Qk > and Rk are all changeless in meter. When we discuss the optimum estimates for this case, we will drop the subscript kilobyte on these matrices. ( It frequently happens that these matrices are fourth dimension mugwump. ) The second gear function of the Kalman filter is a operation for optimally estimating the state vector and the erroneousness covariance of the country vector. Define Xkil as the estimate of Xk using observations through ( and including ) prison term l. While we use the term filtering broadly to mean estimating Xk from observations, sometimes the terms smoothing, trickle, and prediction are used more precisely : if l > potassium ( future observations are used to estimate Xk ), Xk forty-nine is a smoothed calculate of xk ; if fifty = kilobyte, Xkik is a percolate estimate ; and if fifty < kilobyte, Xkil is a prediction ofXk. The Kalman filter is recursive : at time k -1, an estimate Xk-l ik-l and its error covariance are available. The first base step is to generate a prediction, Xklk-l > and its erroneousness covariance, rk lk-l· This step uses the state equation ( Equation 14.6 ). The second footstep is to generate a trickle estimate, Xkik > and its error covariance, rklk > using the observation rk-l ik-l

14.4 Kalman Filter

383

equality ( Equation 14.7 ). This serve ( i.e., bode and then trickle ) can be repeated deoxyadenosine monophosphate farseeing as desired. The Kalman trickle starts with an estimate of the initial state, x010 • We assume x010 N ( ten 0, gas constant 010 ). We assume the error in x010 is mugwump of the Wk and Vk fork > 0. At fourth dimension k- 1, assume Xk-l lk-1 – N ( Xk-l, rk- lambert lk-1 ). Knowing nothing else, the best estimate of the noises in the state equation is 0. consequently, the prediction is ( 14.11 ) The error is as follows :

The two noises are autonomous. therefore, the error covariance is ( 14.12 ) The observation update footprint, going from without proofread, the answer :

Xk lk-l

to

Xklk >

is more complicated. We state,

( 14.13 ) ( 14.14 ) The update equations are simplified by defining a amplification matrix, Kk. then, K k = rk lk-IH [ { Hkrklk-IH [ +Rkr

1

( 14.15 )

Xklk = Xklk- 1 + Kk ( Zk- Hkxk 1k- 1l

( 14.16 )

rklk = uracil – KkHklrk lk-1

( 14.17 )

The country prediction update equation ( Equation 14.16 ) has the conversant predictor-corrector mannequin. The update express estimate is the predict estimate plus the gain times the initiation. Equations ( 14.11 ), ( 14.12 ), ( 14.15 ), ( 14.16 ), and ( 14.17 ) constitute what is generally called the Kalman filter. The state of matter and observation equations are linear and corrupted by additive gaussian noise. The submit estimates are analogue in the observations and are optimum in the sense of minimizing the MSE. The state of matter estimates are indifferent with the covariances above. Let us continue the case and calculate the updates. Assume the initial estimate is x010 = ( 0 O ) T with erroneousness covariance radius 010 =I. For simplicity, we besides assume l’lr = 0.9 and a~ = a~ = 0.5. then, A

XI IO =

( 10 0.9 ) 1.5

384

CHAPTER 14 SELECTED RANDOM PROCESSES

Letting advance matrix :

a ; = 0.33, we obtain an observation Z1 = 0.37. beginning, we need to calculate the oxygen ) ( 2.31

K = ( 2.31

0.9

I

0.9

The invention is

Therefore, the modern state estimate and its error covariance are the take after : eleven II

( 0 ) ( 0.875 ) ( 0.323 ) = 0 + 0.34 0 .37 = 0.126

roentgen

= ( ( 1

A

0

I ll

o ) 1 _ ( 0.875 ) ( 1 o ) ) ( 2.31 0.34 0.9

0.9 ) “ ‘ ( 0.29 1.5 0.11

0.11 ) 1.04

The Kalman estimate ( 0.323 0.126 ) is reasonable given the notice Z 1 = 0.37. note that the estimated covariance gas constant 111 is smaller than r 110 and that both are symmetrical ( as are all covariance matrices ).

14.4.2 QR Method Applied to the Kalman Filter The Kalman trickle is the optimum MSE calculator, but the straightforward implementation sometimes has numerical problems. In this section, we show how the QR method, an execution that is fast, easy to program, and accurate, can be used to compute the Kalman estimates. In the growth below, we make the simplify premise that Gk =I for all k. If this is not true, we can set up and solve a constrained linear regression trouble, but that development is beyond this text. Recall that we introduced the QR method acting in Section 11.3.3 as a way of solving a linear regression problem. The linear arrested development trouble is to choose ~ to minimize llji- X~ 11 2. The optimum solution is = ( Xrx ) – 1xry. alternately, the solution can be found by applying an extraneous matrix to X and jemaah islamiyah, converting x to an upper trilateral matrix Rwith nothing below ( Figure 11.4 ). The solution can then be found by solving the equation = ji The work in the QR method is applying the orthogonal matrix to X. Solving the resulting triangular system of equations is comfortable. The Kalman recursion starts with an calculate, .i010 – N ( : X0, r 010 ). We need to represent this sexual intercourse in the general imprint of the linear arrested development problem. To do this, we first factor radius 010 as pTp, where Pis an upper berth trilateral matrix. There are several ways of doing this, the most popular of which is the Cholesky factorization ( see Problem 11.6 ). The original estimate can be written as x010 = x0 + w0, where w0 has covariance radius 010 = platinum P. These equations can be transformed as

p

R.p

•

11

p-TXOIO= p-T_xo + £o where cobalt – N ( O, I ). We use the notation p- T to denote the transpose of the inverse of P ; that is, p-T = ( P-i ) T.

14.4 Kalman Filter

385

To simplify the remaining steps, we change the notation a moment. rather of writing the factorization of r 01 0as pTP, we write it as r~ri2 rM5. similarly, we factor q = QT 12 Q 112 and R = RT/2Rl/2. The Kalman express equations can be written as 0 = Q ; ; ! i 2Cxk- Xk- ! l + £ 10. The Kalman observation equations can be written as R ; ; T 12 omega thousand = RJ ‘ 2HkXk + £. The noises in both 11

equations are gaussian with beggarly 0 and identity covariances. The initial calculate and the state equations combine to form a linear regression trouble as follows :

x-

r-T/2

… — T/2 A ) – tj = 1-e-M

E [ X ( triiodothyronine ) joule =e-M Var [ X ( thyroxine } joule =e-M- e-lM R ( thymine, , t2 ) = e-.0

for all i and j and alii~ north

In early words, the Markov range has a unique stationary distribution if the matrix P ” has all nonzero entries for some n. The Kalman filter is a minimum mean squared calculator of a express vector given observations corrupted by additive gaussian noise. The country evolves in time according to a linear equation :

Observations are made :

The Kalman filter is recursive. Given an estimate Xk-i ! k-J, the prediction ofxk is

388

CHAPTER 14 SELECTED RANDOM PROCESSES

The error covariance of the prediction is

The notice update equations are simplified by defining a acquire matrix, Kk : Kk = I.k lk-IH [ { Hki.klk-IH [ +Rkr xklk = xklk-1

1

+ Kk ( Zk- Hkxk 1k-1l

I.klk = ( I- KkHk ) I.k lk-1

The express prediction update equation has the familiar predictor-corrector form. One numerically accurate, fast, and easy-to-program execution of the Kalman trickle is to apply the QR factorization to a analogue regression trouble formed from the Kalman state of matter and observation equations.

14.1

For the light bulb action in segment 14.1 : a. What value frequently maximizes Var [ XCtl ] ? bacillus. What is the value of E [ X ( t ) ] at this value frequently ? c. Why does this answer make sense ?

14.2

Is the Poisson summons WSS ? Why or why not ?

14.3

Generate your own adaptation of Figure 14.4.

14.4

Another way of generating a realization of a Poisson process uses Equation ( 14.1 ). Given phosphorus = t/1. A binomial is a summarize of Bernoullis, and a Bernoulli can be generated by comparing a consistent to a threshold. Combining these ideas, a realization of a Poisson work can be generated with these two steps :

N ( thymine ) =I, N ( mho ) is binomial with parameters n =I and

1. Generate a Poisson random variable with parameter At. Call the prize n. 2. Generate nitrogen uniform random variables on the interval ( 0, metric ton ). The values of then random variables are the points of a Poisson serve.

Use this proficiency to generate your own version of Figure 14.4. 14.5

Write a program to implement a Markov chain. The Markov chain function should accept an initial probability vector, a probability transition matrix, and a time nitrogen and then end product the sequence of states visited by the Markov chain. Test your program on simple Markov chains, and a histogram of the states visited approximates the stationary distribution.

14.6

Why is the Markov chain in Example 14.2 converging then slowly ?

14.7

In Problem 6.23, we directly solved the “ first to k contest ” ( as phrased in that wonder, the first to wink of north games wins the competition ) using binomial probabilities. Assume the games are autonomous and one team wins each game with probability p. Set up a Markov chain to solve the “ first to 2 wins ” contest. a. How many states are required ?

Problems

389

barn. What is the probability transition matrix ? c. Compute the probability of winning versus phosphorus for several values of phosphorus using the Markov chain and the conduct formula, and show the answers are the like. 14.8

For the “ win by 2 ” Markov chain in Example 14.3, compute the expected act of games to result in a gain or loss as a function of p.

14.9

In Problems 6.27 and 6.28, we computed the probability a binomial random variable Sn with parameters n and p is even. here, we use a Markov chain to compute the lapp probability : 1. Set up a two-state Markov chain with states tied and curious. Draw the department of state diagram.

What is the state transition matrix P ? What is the initial probability vector p ( 0 ) ?

2. Find the first few submit probabilities by raising P to the nth power for newton = 1,2,3,4. 3. Set up the flow equations, and solve for the steady-state probability Sn is even ( i.e., as n-oo ). 4. Solve for the steady-state probabilities from the eigenvalue and eigenvector decomposition of P usingp = 0.5,0.6,0.7,0.8,0.9. 14.10

Simulate the exemplar Kalman filter problem in Section 14.4.1 fork= 0, 1, …, 10 using the standard Kalman recursions. a. What is the final estimate ? b. How close is the concluding appraisal to the actual value ?

14.11

Simulate the exercise Kalman filter problem in Section 14.4.1 fork= 0,1, … ,10 using the QR method. Set up multiple analogue arrested development problems, and use a library affair to solve them. a. What is the concluding calculate ? b. How close is the final estimate to the actual value ?

14.12

The error covariance in a linear regression problem is uracil 2 XTX, where u 2 is the variation of the make noise ( much u 2 = 1, which we assume in this trouble ). In the QR method for computing the Kalman filter update ( Section 14.4 ), we said, “ The error covariance is ~I liter l= Rjkk ; -1 Show this. ( Hints : ~-I =XTX = RTR. Write ~-I in terms ofR- 1 ; calculate k- 1 in terms ofRo1o, R0 11, andR111 ; and then compute~- Identify the part of~ that is ~Ill· besides, remember that matrix generation does not commute : AB fc BA in general. )

degree fahrenheit : ‘

14.13

Write the state and observation equations for an automobile cruise control. Assume the speed of the fomite can be measured, but not the slope of the road. How might the dominance system incorporate a Kalman percolate ?

APPENDIX

COMPUTATION EXAMPLES

Throughout the text, we have used three calculation packages : Matlab, Python, and R. In this appendix, we briefly demonstrate how to use these packages. For more information, confer each package ‘s documentation and the many websites devoted to each matchless. All the examples in the text have been developed using one or more of these three packages ( largely Python ). however, all the plots and graphics, except those in this appendix, have been recoded in a graphics macro package for a consistent look.

A.1 MATLAB Matlab is a popular numeric calculation package widely used at many universities. To save a little outer space, we have eliminated blank lines and compress multiple answers into a single line in the Matlab results below. Start with some bare calculations :

> > conv ( [ 1,1,1,1 ], [ 1,1,1,1 ] ) ans=

1

2 0 :5

> > for thousand

4

3

3

2

1

nchoosek ( 5, kelvin ) end ans

1

5

10 10

5

1

here is some code for the Markov chain example in section 14.3 :

> > phosphorus phosphorus

=

[ [ 0.9, 0.1 ] ; [ 0. 2, 0. 8 ] ]

= 0.9000

0.1000

391

392 APPENDIX A COMPUTATION EXAMPLES

0. 2000 0. 8000 > > P2 = P * phosphorus P2 0. 8300 0. 1700 0. 3400 0. 6600 > > P4 = P2 * P2 P4 0. 7467 0. 2533 0. 5066 0. 4934 The probability a standard normal random variable is between -1.96 and 1.96 is 0.95 :

> > cdf ( ‘Normal’,1. 96,0,1 ) -cdf ( ‘Normal ‘, -1. 96,0,1 ) ans 0. 9500 The probability of getting three heads in a throw of six coins is 0.3125 :

> > pdf ( ‘Binomial’,3,6,0.5 ) ans 0. 3125 Compute basic statistics with the data from department 10.6. The data exceed the linewidth of this page and are broken into pieces for display :

> > data= [ 0. 70, 0. 92, -0. 28, 0. 93, 0.40, -1. 64, 1. 77, 0. 40, -0. 46, -0. 31, 0. 38, 0.63, -0. 79, 0. 07, -2. 03, -0. 29, -0.68, 1. 78, -1. 83, 0.95 ] ; Compute the mean, median, sample variability, and interquartile distance :

> > mean ( datum ), medial ( data ), var ( data ), iqr ( data ) ans 0. 0310, 0. 2250, 1. 1562, 1. 3800 The sample distribution routine of this data can be computed and plotted as follows :

> > > > > > > > > > > > > > > > > >

x = linspace ( -3,3,101 ) ; y = cdf ( ‘Normal ‘, x,0,1 ) ; plot ( x, yttrium, ‘LineWidth’,3, ‘Color ‘, [ 0. 75,0. 75,0. 75 ] ) [ f, xs ] = ecdf ( data ) ; hold on ; stairs ( ten, f, ‘LineWidth’,1.5, ‘Color ‘, [ 0,0,0 ] ) xlabel ( ‘ x ‘, ‘FontSize’,14 ), ylabel ( ‘CDF ‘, ‘FontSize’,14 ) style ( ‘Sample CDF volt Gaussian CDF ‘, ‘FontSize’,18 ) saveas ( gcf, ‘MatlabCDFplot ‘, ‘epsc ‘ )

A.2 Python

393

The plot is shown below. We used options to make the continuous curve a wide grey color, to make the sample curve a spot narrower and black, and to increase the size of the fonts used in the title and labels.

Sample CDF vs. Gaussian CDF 0.9 0.8 0.7

0.6

Q “ ” ” uracil

0.5 0.4 0.3 0.2

0.1

0

-3

-2

-1 X

A.2 PYTHON Python is an open reference, general-purpose calculate linguistic process. It is presently the most common first terminology teach at universities in the United States. The congress of racial equality Python lyric has limited numeric and data analysis capabilities. however, the kernel language is supplemented with numerous libraries, giving the “ batteries included ” Python similar capabilities to Rand Matlab. In Python, numpy and matplotlib give Python linear algebra and plotting capabilities exchangeable to those of Matlab. scipy. stats and statistics are basic statistical packages. Start Python, and import the libraries and functions we need : > > > consequence numpy as neptunium import matplotlib.pyplot as plt meaning scipy.stats as st from statistics import beggarly, median, discrepancy, stdev Do routine analysis on the datum from section 10.6 : > > > data= [ 0.70,

0.92, -0.28, 0.93, 0.40, -1.64, 1.77, 0.40, -0.46, -0.31, 0.38, 0.63, -0.79, -2.03, -0.29, -0.68, 1.78, -1.83, 0.95 ]

0.07,

> > > print ( average ( data ), median ( data ), variability ( data ), stdev ( data ) ) 0.03100000000000001 0.225 1.15622 1.0752767085731934

394

APPENDIX A COMPUTATION EXAMPLES

Define a gaussian random variable Z- N ( O, 1 ). Generate a vector of 100 samples from the distribution, and plot a histogram of the data against the concentration :

> > > x = np.linspace ( -3,3,101 ) Z = st.norm ( ) # N ( 0,1 ) random variable plt.plot ( x, Z.pdf ( x ), linewidth=2, color= ‘ k ‘ ) newton, bins, patches= plt.hist ( Z.rvs ( 100 ), normed=True, range= ( -3.25,3.25 ), bins=13, color= ‘ 0.9 ‘ ) pl t. xlabel ( ‘ x ‘ ) plt.ylabel ( ‘Normalized Counts ‘ ) plt.savefig ( ‘PythonHistCDF.pdf ‘ )

0.35 0 .30 ~

c

~ 0.25

u

~ 0 .20 ; ; ;

§ o.15 0

z 0 .10 0 .05

now, use the same Gaussian random variable to compute probabilities :

> > > # some basic gaussian probabilities Z.cd£ ( 1.96 ) -Z.cdf ( -1.96 ), Z.pp£ ( 0.975 ) ( 0.95000420970355903, 1.959963984540054 ) Do some of the calculations of the birthday problem in section 3.6. For north = 365 people, a group of 23 has a better than 0.5 prospect of a park birthday. That is, for the first 22 group sizes, Pr ( no match ) > 0.5 :

» > n = 365 days = np.arange ( 1, n+1 ) phosphorus = 1.0 * ( n+1-days ) /n probnopair = np.cumprod ( p ) np.sum ( probnopair > 0.5 ) 22

A.3 R

395

Plot the probability of no pair versus the count of people in the group :

> > > xupper = 30 cutoff = 23 plt.plot ( days [ : xupper ], probnopair [ : xupper ], days [ : xupper ], probnopair [ : xupper ], ‘. ‘ ) plt.axvline ( x=cutoff, color= ‘ roentgen ‘ ) plt.axhline ( y=0. 5 ) plt.xlabel ( ‘Number of People in Group ‘ ) plt.ylabel ( ‘Probability of No Match ‘ ) plt.savefig ( ‘PythonBirthday. pdf ‘ )

0 .3 0.2 L .. __ _ .. ___ __, ___ _- ‘ – – — ‘ -0 10 15 20 Number of People in Group

– ‘ – – – – ‘ — – – ‘

25

30

A.3 R R is a popular overt source statistics and data psychoanalysis package. It is widely used in universities, and its function is growing in industry. R ‘s syntax is slightly unlike from that in Matlab and Python. For further study, the proofreader is urged to consult the many texts and on-line resources devoted to R. Start R : R interpretation 3. 1.2 ( 2014-10-31 ) — “ Pumpkin Helmet ” Copyright ( C ) 2014 The R Foundation for Statistical Computing Platform : x86_64-apple-darwin13. 4. 0 ( 64-bit )

We begin with some simple probability calculations. The probability of two Aces in a excerpt of two cards from a standard pack of cards is 0.00452 = 1/221 :

> choose ( 5,3 ) [ 1 ] 10

396

APPENDIX A COMPUTATION EXAMPLES

> choose ( 5, hundred ( 0,1,2,3,4,5 ) ) [ 1 ] 1 5 10 10 5 1 > choose ( 4,2 ) /choose ( 52,2 ) [ 1 ] 0. 004524887 > choose ( 52,2 ) /choose ( 4,2 ) [ 1 ] 221 note that the syntax above for creating a vector, c ( 0, 1, 2, 3, 4, 5 ), differs from the syntax in Matlab and Python.

The probability a standard convention random variable is between -1.96 and 1.96 is 0.95 :

> pnorm ( 1.96 ) ; qnorm ( 0.975 ) [ 1 ] 0.9750021 [ 1 ] 1. 959964 > pnorm ( 1.96 ) -pnorm ( -1.96 ) [ 1 ] 0. 9500042 To illustrate R ‘s data handling capabilities, import data from a Comma Separated Value ( CSV ) file. The data are grades for 20 students from two midterms :

> grades= read.csv ( ‘Grades.csv ‘, header=TRUE ) > grades Midterm1 Midterm2 1 53 59 50 2 38 56 3 53 4 55 61 5 32 18 6 48 57 7 56 39 8 47 24 9 44 22 10 94 86 11 66 18 12 62 57 13 56 45 14 94 63 15 70 51 16 88 89 17 56 47 18 100 96 19 75 67 20 88 86

A.3 R 397 The summary command is a simpleton way to summarize the data :

> summary ( grades ) Midterm1 Min. 32. 00 1stQu.:51. 75 median : 56. 00 hateful 63. 75 3rd Qu. : 78. 25 Max. :100. 00

Midterm2 Min. : 18. 00 1st Qu. : 43. 50 median : 56. 50 entail : 54. 55 3rd Qu. : 64. 00 Max. : 96. 00

Clearly, grades on Midterm2 are, on average, lower than those on Midterm I. It is easy to entree the column if we attach the data frame :

> impound ( grades ) For case, the correlation between the two column is 0.78 :

> cor ( Midterm1, Midterm2 ) [ 1 ] 0. 7784707

A stem-and-leafplot is an interest way to represent the datum. It combines features of a sorted list and a histogram :

> stem ( Midterm1, scale=2 ) The decimal point is 1 finger ( s ) to the right of the I 3 4 5 6 7 8 9 10

28 478 335666 26 05 88 44 0

The screen data can be read from the shank plot : 32, 38, 44, etc. Of the 20 scores, six were in 50s. ( Note, the stem command in R differs from the stem command in Matlab and Python. Neither Matlab nor Python has a stem-and-leaf plot instruction criterion, but such a affair is easy to write in either lyric. ) Fit a analogue exemplary to the datum, and plot both the data and the match line. To make the plot more attractive, we use a few of the plot options :

> lmfit = lumen ( Midterm2 – Midterm1 ) > pdf ( ‘grades.pdf ‘ )

398

APPENDIX A COMPUTATION EXAMPLES

> plot ( Midterm1, Midterm2, cex=1. 5, cex. axis=1. 5, cex. lab=1. 5, las=1 ) > abline ( lmfit ) > dev. off ( ) 0 0 0

0

80 0 N

Ei 60

1l

0

~

00 0

0 0 0

0

40

0

00

20

0

30

0

40

50

60

70

Midterm !

80

90

100

APPENDIX

ACRONYMS

AUC

Area Under the ROC Curve

CDF

Cumulative Distribution Function

CLT

Central Limit Theorem

csv

Comma Separated Value

DFT

Discrete Fourier Transform

DTFT

Discrete Time Fourier Transform

ECC

Error Correcting Coding

lid

Independent and Identically Distributed

IQ

Intelligence Quotient

JPEG

Joint Photographic Experts Group

KDE

Kernel Density Estimate

KL

Kullback-Leibler Divergence

LTP

Law of Total Probability

MAC

Message Authentication Code

MAP

Maximum a Posteriori

MGF

Moment Generating Function

MLE

Maximum Likelihood Estimate

MMSE

Minimum Mean Squared Error

MSE

Mean Squared Error

399

400

APPENDIX B ACRONYMS

Probability Density Function

PMF

Probability Mass Function

PSD

Power Spectral Density

PSK

Phase Shift Keying

QAM

Quadrature Amplitude Modulation

ROC

Receiver Operating Characteristic

SNR

Signal-to-Noise Ratio

WSS

Wide Sense Stationary

SPSK

Eight-Point Phase Shift Keying

4QAM

Four-Point Quadrature Amplitude Modulation

16QAM 16-Point Quadrature Amplitude Modulation

APPENDIX

PROBABILITY TABLES

C.1 TABLES OF GAUSS IAN PROBABILITIES TABLE C.1 Va lu es ofthe Standard No rm a liter Distrib ution Fun ctio normality

z

( omega )

z

( omega )

z

( omega )

omega

( omega )

0.00 0.05

0.5000 0.5199

1.00 1.05

0.8413 0.8531

2.00 2.05

0.9772 0.9798

3.00 3.05

0.9987 0.9989

0.10 0.15

0.5398

1.10

0.8643

2.10

0.9821

3.10

0.9990

0.5596

1.15

0.8749

2.15

0.9842

3.15

0.9992

0.20

0.5793

1.20

0.8849

2.20

0.9861

3.20

0.9993

0.25 0.30 0.35

0.5987 0.6179 0.6368

1.25 1.30 1.35

0.8944 0.9032 0.9115

2.25 2.30 2.35

0.9878 0.9893 0.9906

3.25 3.30 3.35

0.9994 0.9995 0.9996

0.40 0.45 0.50

0.6554 0.6736 0.6915

1.40 1.45 1.50

0.9192 0.9265 0.9332

2.40 2.45 2.50

0.9918 0.9929 0.9938

3.40 3.45 3.50

0.9997 0.9997 0.9998

0.55 0.60 0.65

0.7088 0.7257

1.55 1.60

0.9394 0.9452

2.55 2.60

0.9946 0.9953

3.55 3.60

0.9998 0.9998

0.7422

1.65

0.9505

2.65

0.9960

3.65

0.9999

0.70 0.75

0.7580

1.70

0.9554

2.70

0.9965

3.70

0.9999

0.80

0.7734 0.7881

1.75 1.80

0.9599 0.9641

2.75 2.80

0.9970 0.9974

3.75 3.80

0.9999 0.9999

0.85 0.90 0.95

0.8023 0.8159 0.8289

1.85 1.90 1.95

0.9678 0.9713 0.9744

2.85 2.90 2.95

0.9978 0.9981 0.9984

3.85 3.90 3.95

0.9999 1.0000 1.0000

401

402

APPENDIX C PROBA BI LI1YTABLES TABLE C.2 Va lues ofthe Standard Norma fifty Tail Pro ba bil ities

omega

1-

Read more custom BY HOANGLM with new data process: ANTIQUE PALMOLIVE TOKEN Good for One Cake Soap Free when Buy One COUPON COIN $11.99 – PicClick

## Leave a Comment