## Aunt Pythia’s advice

Aunt Pythia is well-slept and excited to be here to answer your wonderful and thoughtful ethical conundrums. Please do comment on my answers, if you disagree but especially if you agree wholeheartedly and want me to keep up the good work. *Love* that kind of encouraging comment.

**And please, don’t forget to ask me a question at the bottom of the page!**

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

*Dear Aunt Pythia,*

*What is your text editor of choice? The most popular ones, the ones in which I know die-hard fans, are for Emacs, Vi/Vim, and Sublime. I am personally an Emacs user, but I haven’t given any other editors a chance, to be honest. Which do you prefer to use, and why?*

*Text Editor*

Dear TE,

I use emacs mostly, and xemacs when it’s available. It’s easy, it “knows” about python and other languages, and the drop-down menu is easier than remembering keystroke commands. I’ve been known to use an IDE or two depending on codebase context. For me it’s all about ease of use and, since I’ve never been a professional engineer and so I’ve never spent a large majority of my time with source code, vim doesn’t attract me, even though everything is keystroke and you never need to use your mouse.

As an aside, I’d like to argue this point, because it’s often shrouded in weird macho crap: why not use your mouse? Does it really waste that much time? I honestly have never been prevented from coding efficiently because my arm is too tired from moving from the keyboard to the mouse and back. Is the goal really to be able to stay in the exact same position for as long as possible? I’m the kind of person that is too fidgety for such ideas. I take the “stand up and walk around every 20 minutes” rule seriously, at least before 4pm, when I become a zombie.

Good luck, young padawan!

Aunt Pythia

——

*Dear Aunt Pythia,*

*What are your thoughts on the famous (infamous?) two-daughter problem? I have three PhDs who give different answers all of which appear to be statistically correct. Modinow says the answer is 1/2. The chair of the stats department at local university says the answer is 3/7, and a chap at Fl Coastal College has yet a 3rd answer which I have lost.*

*How can this be? *

*Tombs*

Dear Tombs,

OK I’m pretty sure there’s only one answer to this if it’s stated precisely. So let’s try to do that. Here’s the question:

Suppose I have two children. One of them is a girl who was born on a Friday. What are the chances of both children being girls?

Now I’m a big fan of making things incredibly easy and visual. So what I’m going to do here is identify the fact that, as far as children go, there are two attributes of interest in this question, namely gender and day of birth. I will assume that all options are equally likely and that they are independent from each other as well as between kids, and in my first iteration I’ll draw up a list of equally likely bins for a given child, namely of either gender and of any day born. That’s 14 equally likely bins for a given single child, and that means they happen with probability 1/14.

Now, for the second iteration, let’s talk about having two kids. You have a 2-dimensional array of bins, which you arrange to be 14-by-14, and you assume that any of those 14*14 = 196 bins is *a priori* equally likely.

If you label the first bin as “(Female, Friday)” and the second bin as “(Female, Saturday)” and so on, you realize that the condition that “one of the two kids is a girl who was born on Friday” means that we already know we are working in the context where we are either in the left-most column or the bottom row. Here’s my awesome rendition of this area:

Specifically, the left bottom corner is the case where there are two girls, both born on Friday. The one to the right and above that corner refers to the case where there are two girls, one born on Friday and one born on Saturday. The stuff on the right and in the upper part of the column refers to the case where there’s a Friday girl and a boy.

Altogether we have 13 pink bins with two girls and 14 pink bins of a boy and a girl. So the overall chances of two girls, given one Friday girl, is 13/27.

I hope that’s convincing!

Aunt Pythia

——

*Dear Auntie P,*

*What do you think about topological data analysis (some info here). Should we trust people who can’t tell the difference between their rear end and a coffee cup because the two are topologically equivalent?*

*Topological Fear*

Dear TF,

Geez I don’t know about you but my rear end is not topologically equivalent to my coffee cups. You either need to go to a doctor or buy some coffee cups that don’t leak.

So, I don’t know very much about this stuff, but I do think it’s potentially interesting, and it’s maybe close to an idea I’ve had for a while now but for which I haven’t found a practical use. Yet.

The idea I have had, if it’s close to this idea, and I think from short conversations with people that it is, is that if you draw a bunch of scatter plots of, say, two attributes x1 and x2 and an outcome y (so you need numerical data for this), then you’ll notice in the resulting 3-dimensional blob of points some interesting topological properties. Namely, there seem to be pretty well-defined boundaries, and those boundaries might have certain kinds of curves, and there may possibly even be well-defined holes in the blob, at least if you “fatten up” the points (sufficiently but not more than necessary) and then take the union of all of the resulting spheres to be some kind of 3-d manifold. You can then play with the relationship between, say, the radius of these fattened points and the topological properties of the resulting blob.

Anyhoo, the idea could be that, if you see x1 and x2 then you can exclude a y that lives in a hole, or rather where point (x1, x2, y) would live in a hole. This is more than most kinds of modern models can do for you, but even so I’ve never seen this actually come in handy.

I hope that helps, and please do see a doctor!

Auntie P

——

*Dear Aunt Pythia,*

*This is a reaction to a previous post (maybe Oct 12?) where you said the following: *

My kids, to be clear, hate team sports and suck at them, like good nerds.

*Now, as a nerd whose parents never let play team sports growing up and now plays one in college (a “nerd” sport, but still…), I have a question for you: Why do “good nerds” have to hate sports and/or suck at them? What classifies a “good nerd”? Does this generalize to other things that nerds are stereotypically bad at, like sex lives? Is there another category that should be created for nerdy type people that are also jocky-er, like a nerock or a jord?*

*With Love,*

*A “Bad Nerd”*

Dear Bad Nerd,

Great question, and you’re not the only nerd that called me out on my outrageous discrimination. I wasn’t being fair to my nerock and jord friends, and that ‘aint cool. Although, statistically I believe I still have a point, there’s no reason to limit people in arbitrary ways like that, and it’s fundamentally un-nerdy of me to do so.

For all you nerocks and jords out there: you go, girls! and boys!

But just for the record, nerds are categorically excellent at sex. We all know that. Say yes.

Love,

Aunt Pythia

——

Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

I’m feeling good that I got the same answer as Auntie Pi to the 2 daughter problem – and I didn’t even have to look. Unfortunately, my approach lacked Auntie P’s cool visualization. Hopefully, the profs alluded to were working different problems.

I just want to defend the answer 1/2 to the two daughters problem. I’m not sure either is completely correct – it depends on subtle issues of language.

Suppose the question read, “I have two children – Taylor and Morgan. Traylor is a girl born on a Friday; what is the probability that Morgan is a girl?”

Here I believe the answer is unambiguously 1/2. Indeed, if when specifying the gender and day of birth you are referring to a *particular* child (first born, better at math, etc.) then the genders are independent. The answer 13/27 only makes sense if your children are completely interchangeable, which might make sense if you are doing statistics on families with two children including a daughter born on a Friday.

Overall I think I prefer the answer 1/2 since when somebody says “one of my children” they usually have a specific child in mind.

Sorry but no. Of course it’s true that, when somebody says “one of my children” they usually have a specific child in mind, it is not clear to the listener whether it’s the older or younger child.

As with most such probability puzzles, the problem really needs to be stated even more precisely than the supposedly precise statement above for either answer to be unambiguously correct. (At least the statement should contain the words “at least.”)

If the question is about a family selected at random from all families with exactly two children, at least one of whom is a girl born on a Friday – perhaps there is an annual convention of such families – then obviously the answer given above with the pretty pictures is correct.

OTOH if some random parent randomly starts telling you about some random child of theirs – as parents are wont to do – and it emerges that this child was a girl born on a Friday and has exactly one sibling, then the 1/2 answer is correct. In this case the diagonal parents will perforce always say the same (sex, day), while the off-diagonal parents will have had a choice and only pick (girl, Friday) half of the time. It’s the same as the principle of restricted choice in bridge, or the Monty Hall problem.

I agree with the answer of 1/2 to pwsiegel’s reworded problem. With the rewording, the kids have been distinguished. If supplied with identifiers up front – names, birth positions, hair color, etc. and then you tell me the kid identified as “Taylor” or “First” or “Blonde” is a girl born on Friday then that changes the conditioning on the problem – and the answer. Let

GF = girl born on Friday

GF’ = girl born on a day other than Friday

B = boy

ID1 = kid with identifier #1 (“Taylor” or “First”,,,)

ID2 = kid with identifier #2 (“Morgan” or “Second” or…)

Probabilities for each of the possible outcomes (should add to 1:)

ID1 ID2 Probability

1. GF GF (1/14)(1/14)

2. GF GF’ (1/14)(6/14)

3. GF’ GF (6/14)(1/14)

4. GF’ GF’ (6/14)(6/14)

5. GF B (1/14)(7/14)

6. GF’ B (6/14)(7/14)

7. B GF (7/14)(1/14)

8. B GF’ (7/14)(6/14)

9. B B (7/14)(7/14)

In the original problem, one kid is a girl born on Friday in cases 1,2,3,5 and 7 above. The probability of one of the kids being a Friday-born girl is the sum of the probabilities associated with these cases, 27/196. The probability of 2 girls given one is a girl born on Friday is then the probability of case 1 or 2 or 3 (13/196) divided by 27/196 = 13/27.

In pwsiegel’s reformulation, ID1 is a Friday girl – only true in cases 1,2, and 5. The probability of one of these 3 cases occurring is 14/196. Of these, cases 1 and 2 result in two girls. So, the desired probability is 7/196 divided by 14/196 = 1/2.

I can’t see how the original problem can be read as pwsiegal’s formulation though. It seems pretty clear that the reformulation excludes cases 3 and 7 while the original includes them.

Regarding Cathy’s comment – I’m not convinced that birth position gives one any different information than the identifiers proposed by pwsiegel.

The only role that providing names played in changing the solution to the problem was to distinguish the children, and so the answer is still 1/2 if the setup for the puzzle is “I have two children. The older child is a girl born on a Friday…” Indeed, the solution to this puzzle can be reduced to your analysis if you imagine that the speaker is a weird parent who named his or her children “Firstchild” and “Secondchild” instead of “Taylor” and “Morgan”.

So in the original puzzle, it all comes down to how you interpret the phrase “one of my children is X”. If you interpret this phrase to mean “it is not true that both of my children are not X” then 13/27 is correct. But if you interpret it to mean “A specific element of the set containing my two children is X” then 1/2 is correct. The first interpretation is usually correct in statistical language, but the second seems more correct in colloquial language. The puzzle is confusing because it is hard to decide which linguistic environment takes pecedence.

(After thinking a little bit more about this, I realized that the issue here is whether or not the speaker has chosen a coordinate system on his or her family. The answer is 1/2 in any fixed in any fixed choice of coordinates, but if one works intrinsically – as those trained in algebraic geometry are known to do – then the answer is 13/27.)

Wow if “weird macho crap” isn’t bait then I don’t know what is! Some jord here needs to defend it. Might as well be me.

Quite frankly keystrokes are, to use a kinder Pythian term, awesome! Here’s just one lovely thing about them: for anyone like myself who prefers to automate repetitive tasks whenever possible, the ease with which keystrokes go into keyboard macros makes them absolutely adorable. Entering menu selections in macros can be a big pain, button presses an even bigger one. Mice have their place, but in my opinion keyboard shortcuts rank right up there with some of the best UI features of all time, including modern touchscreen ones.

Though I haven’t tried a bunch, my favorite text editors are the commercial ones Epsilon (Windows) and BBEdit (Mac). My favorite keyboard macro app of all time is Winbatch (Windows). Lately I’ve been using Keyboard Maestro (Mac) and have found it to be impressive and reliable.

I don’t like using the mouse because it takes more precision, and I’m clumsy (getting the pointer exactly where I want it is hard for me). Also, I find taking my hands on and off the keyboard takes time, and for some reason interrupts my train of thought.

Macho non-jock

pwsiegel, andeux, and greg, beat me to some points I think are very important about the conditional probability puzzle. The most important thing as with any model, is to know when the calculations would apply, and they covered this well, but let me sum up in my own words: To be sure is the applicable answer to such a problem, it must be the case that you would have received your current information in *all worlds* and *only those worlds* where B is true. It is the *all worlds* direction of this condition which is often overlooked and which leads to errors, most famously in the Monty Hall problem. It is relevant here also, because we we don’t know how the information provided to us was chosen. It is hard to imagine, for instance, that someone would make the statement in the problem and in fact have two daughters both born on Friday. One way to make it unambiguous would be “You ask someone if event B has happened and she says yes. You are sure she is accurate and truthful, would never refuse to answer, *and* would never volunteer additional information beyond Yes/No.” That’s a lot of conditions, but they are all necessary, and the last two especially are easily overlooked.

This is a very relevant topic on mathbabe’s blog because selectively provided information is a major way to manipulate models!

There is not enough room in this space for me to give a complete proof, so here is my last (and first) theorem:

1) We all know that each person is born on one of seven days, and for that matter, one of many times. If adding the information that the daughter was born on a Friday is truly meaningful (or whatever the correct mathematical/statistical term may be), then adding the precise time of birth is also meaningful.

2) Suppose we add that the birth time was 12:01 am on Friday.

3) By the reasoning above, there would be a much larger collection of possible combinations.

4) By arbitrarily revealing more accurate information as to the precise time of birth (e. g. Monday, 2:00000001 pm), the solution becomes arbitrarily close to 1/2.

5) Therefore, the correct probability that the other child is a girl is 1/3.

I liked that probability problem! For a little while it bugged me that if you know one kid is a girl, the probability of both kids being girls is 1/3, while if you know one kid is a girl born on a Friday, then it’s much closer to 1/2 (13/27). I finally got some decent intuition for it by filling in more of your picture:

P(2 girls | 1 girl born on a Friday) = 13 / (13+14)

P(2 girls | 1 girl born on a Friday or Saturday) = (13+11) / (13+11+2*14)

P(2 girls | 1 girl born on a Friday or Saturday or Sunday = (13+11+9) / (13+11+9+3*14)

…

P(2 girls | 1 girl) = P(2 girls | 1 girl born on Fri, Sat, Sun, Mon, Tue, Wed, or Thu)

= (13+11+9+7+5+3+1) / (13+11+9+7+5+3+1+7*14) = 7*7 / (7*7 + 7*14) = 1/3.

In the other direction, as you extend the number of characteristics (probability of having two girls given that you have one girl born on Friday in March with red hair and freckles), the probability goes to 1/2. The best intuition I can come up with for that is that the more characteristics you know, the more likely it is that you’ve pinned down which kid you’re talking about, as the probability that both kids have those characteristics gets lower and lower. In other words, you’ve gotten closer and closer to saying that Kid A is a girl, which reduces the probability of both kids being girls to the probability of Kid B being a girl, which is 1/2.

My probabilist undergrad advisor always used to say that our brains weren’t wired right to think about probability!