101、通过特征向量,你可以了解到哪些信息?What do you understand by feature vectors?1102、数据管理程序(如缺失数据处理等)如何使选择偏差变得更糟糕?How do data management procedures like missing data handling make selection bias worse?1103、使用正则化方法(如Ridge回归)的优点和缺点是什么?What are the advantages and disadvantages of using regularization methods like Ridge Regression?1104、通过长数据格式和宽数据格式,你可以理解哪些东西?What do you understand by long and wide data formats?1105、什么情况下用随机森林,而不用SVM?反过来呢?Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa.1106、从(模型的)外样本点(outliers)和内样本(inliers),你可以了解到什么?如果在数据集中找到它们,你会怎么做?What do you understand by outliers and inliers? What would you do if you find them in your dataset?1107、用Python编写一个程序,输入硬币的直径和硬币的重量,并输出硬币的货币值。Write a program in Python which takes input as the diameter of a coin and weight of the coin and produces output as the money value of the coin.1108、线性回归的基本假设是什么?What are the basic assumptions to be made for linear regression?1109、写出R平方的计算公式。Can you write the formula to calculat R-square? 1110、在SVM拟合之前进行降维处理的优点是什么?What is the advantage of performing dimensionality reduction before fitting an SVM?1111、你如何评估洞察力的统计显着性,即如何判断洞察的信度?How will you assess the statistical significance of an insight whether it is a real insight or just by chance?1112、您如何创建分类系统来识别非结构化数据中的关键客户趋势?How would you create a taxonomy to identify key customer trends in un structured data?1113、如何找到分类变量和连续变量之间的相关性?How will you find the correlation between a categorical variable and a continuous variable ?1114、芝加哥有多少个钢琴调音师?How many Piano Tuners are there in Chicago?1115、假如有一个5条道的赛道和25匹马。想找出最快的3匹马,最少需要比赛多少次?There is a race track with five lanes. There are 25 horses of which you want to find out the three fastest horses. What is the minimal number of races needed to identify the 3 fastest horses of those 25?1116、估计麦当劳每天出售的薯条数量。Estimate the number of French fries sold by McDonalds everyday.1117、一只钟的指针在一天中重叠多少次?How many times in a day does a clocks hand overlap?1118、你有两个烧杯。第一个烧杯含有4升水,第二个烧杯含有5升水。你怎样才能将7升水装入桶中?You have two beakers. The first beaker contains 4 litre of water and the second one contains 5 litres of water.How can you our exactly 7 litres of water into a bucket?1119、硬币翻转1000次,正面出现560次。你认为硬币有偏差吗?A coin is flipped 1000 times and 560 times heads show up. Do you think the coin is biased? 1120、估计飞机里面可以装多少个网球。Estimate the number of tennis balls that can fit into a plane.1121、你认为每年在美国发生多少次理发?How many haircuts do you think happen in US every year?1122、在一个居民只喜欢男孩的城市里,每个家庭都会一直生孩子直到生出男孩。计算城市中男孩与女孩的比例。In a city where residents prefer only boys, every family in the city continues to give birth to children until a boy is born. If a girl is born, they plan for another child. If a boy is born, they stop. Find out the proportion of boys to girls in the city.1123、有两家公司生产电子芯片。A公司制造残缺品的概率为20,优质品的概率为80。B公司制造残缺品的概率是80%,优质品的概率是20。如果你只有一个电子芯片,它有多大概率是优质品?There are two companies manufacturing electronic chip. Company A is manufactures defective chips with a probability of 20% and good quality chips with a probability of 80%. Company B manufactures defective chips with a probability of 80% and good chips with a probability of 20%.If you get just one electronic chip, what is the probability that it is a good chip?1124、假设您现在可以从同一家公司购买A或B两款电子芯片。当您测试第一款电子芯片时,它看起来不错。您收到的第二块电子芯片是好的的可能性是多少?Suppose that you now get a pack of 2 electronic chips coming from the same company either A or B. When you test the first electronic chip it appears to be good. What is the probability that the second electronic chip you received is also good?1125、假设有一个约会网站允许用户选择25个形容词中的6个来描述他们的喜好。如果至少5个词相同,两名用户就会被匹配。如果布拉德和安吉丽娜随机挑选形容词,他们被匹配的概率是多少?A dating site allows users to select 6 out of 25 adjectives to describe their likes and preferences. A match is said to be found between two users on the website if the match on at least 5 adjectives. If Brad and Angelinarandomly pick adjectives, what is the probability that they will form a match?1126、一枚硬币扔了10次,结果是2次反和8次正。你如何分析硬币是否公平?同时,实验的p值是多少?A coin is tossed 10 times and the results are 2 tails and 8 heads. How will you analyse whether the coin is fair or not? What is the p-value for the same?1127、继续上面的问题,如果你有10个硬币,每个扔10次结果还是一样。你会修改你的方法来测试硬币的公平性还是继续相同的操作?Continuation to the above question, if each coin is tossed 10 times (100 tosses are made in total). Will you modify your approach to the test the fairness of the coin or continue with the same?1128、一只蚂蚁被放在无限长的树枝上。蚂蚁在离散时间间隔中可以以相同的概率向后或向前移动一步。算出蚂蚁返回起点的概率。An ant is placed on an infinitely long twig. The ant can move one step backward or one step forward with same probability during discrete time steps. Find out the probability with which the ant will return to the starting point.1129、解释中心极限定理Explain the central limit theorem.1130、中心极限定理对于几乎没有任何统计知识的社会科学新生有什么意义?What is the relevance of central limit
