Introduction

The "babynames" package created by Hadley Wickham, which includes a dataset named "babynames" will be used for our group project. This dataset was provided by the Social Security Administration and includes names from Social Security card applications for births in the United States since 1880.

The dataset has extensive historical coverage, providing information on the number of babies with each name in the United States, categorized by sex, for each year from 1880 to 2017. The dataset has 1924665 rows and 5 columns, including information on the year of birth, sex of the baby, name, raw count of babies with that name in a given year, and the proportion of people with that name born in that year. This dataset is valuable for analyzing trends in baby names over time, revealing shifts in American cultural and societal norms, and even providing practical applications for expectant parents.

Data

You can access the original dataset by babynames.csv.

It is also available for viewing and downloading on the TidyTuesday Github Page

Question 1: What is the trend in total births and the distribution of sexes over time?

This question aims to explore the fascinating historical trends in total birth and sex distribution in the United States. To achieve this, we have two key research questions that seek to answer: (1) how has the total number of babies born changed over time, and (2) how has the number of male and female babies changed, and what is the resulting sex gap? To help us answer these questions, we have created two visually engaging graphs. The first graph presents the total number of babies born over the years, while the second compares the number of male and female babies born and showcases the resulting sex gap. Our analysis makes use of key variables including year, sex, and the number of babies born for each year. Additionally, we have created a dataset called "gap_data" to represent the difference between the numbers of baby in both sexes.

Approach

For the first question, geom_area is used to visualize how the total number of babies born has changed over time. This area plot is appropriate because it shows the cumulative total of babies born over time and the change in that total between different points in time. It uses color to represent the sex of the baby, which allows us to see the relative contributions of male and female babies to the total number of babies born.

For the second question, both geom_line and geom_area are used to visualize how the number of male and female babies born has changed over time and the associated gender gap. Line plot is appropriate because it shows the trend of male and female babies separately, allowing us to see how the number of male and female babies has changed over time. Area plot shows the gender gap, which is the difference between the number of male and female babies born in a given year. The area plot uses color mapping to represent the direction of the gap (i.e., positive or negative) and the sex associated with the gap. This allows us to see how the gender gap has changed over time and when the gender distribution gap faced a shift. Besides, geom_point is used to point out the year when the shift happened. This graph is ideal because it emphasizes a specific year, and it is easy for the viewer to see.

Discussion

The results of our analysis revealed several interesting trends in total birth and gender distribution in the United States. For the first area plot, it shows that the total number of babies born has been increasing steadily over time, with a significant increase observed in the number of babies born in the mid-20th century. We speculate the increase was due to the post-World War II baby boom, with the significant immigration to the United States from Europe and Asia. This may also caused by the advances in medical technology as well as public health.

For the second question, the line plot shows the trends of male and female babies born has aligned with previous plot. However, the whole graph shows that there was a significant shift in gender gap around 1937, with the number of male babies born exceeding the number of female babies born. This trend has continued to the present day, with the gender gap remaining relatively consistent over time, as represented by the area plot. Possible explanations for this shift could be changes in cultural attitudes towards gender roles and family planning practices. It's also possible that other societal and environmental factors, such as changes in the economy and workforce, may have played a role in the shift in gender distribution.

Overall, our analysis provides insights into how baby name popularity and gender distribution have evolved in the United States over time. These trends are not only interesting from a historical perspective but also have important implications for understanding social and cultural changes over time.

Question 2: What are the popular babynames and how will the popular movie characters affect babynames?

This question intends to investigate the popular babynames and how the popularity of characters from well-known movies affects the trend of baby names in the following years. In order to answer the first part of the question, the varibales we used are: name, sex, and n (count). For the second part of the question, we filtered from the original dataset and created new datasets for the names Elsa, Ariel, Rocky, and Forrest, and plotted the proportion of babies named each of these names against year. The necessary variables to answer this question include the name of the movie character and the corresponding name of the baby, the year the baby was born, and the proportion of babies named of the character.

Approach

The first plot explores the top 10 popular baby names over time. It uses a bar chart to visualize the top 10 used baby names from 1880 to 2017, color-coded by sex. Bar chart is particularly suits this question as it could clearly ranked the baby names.

The following plots ananlyze four different names (Elsa, Ariel, Forrest, and Rocky). In each graph, a line plot is used to show the trend in popularity of the name over time. This is a good choice as it allows the viewer to see how the name's popularity has changed over the years in a clear and easy-to-understand way. In addition to the line plot, each graph also includes a geom_point function and a label added using geom_label_repel to indicate the year of release for a corresponding movie that may have influenced the popularity of the name. For example, the label "Frozen" is added to the Elsa graph to indicate the release year of the movie of the same name. This is a good choice as it provides context for the trend in popularity and helps the viewer understand why certain years may have seen an increase in the proportion of babies with that name.

Discussion

The bar chart displaying the top 10 most commonly used names in the US from 1880 to 2017. It is noteworthy that out of the ten names, nine are male names, while only one, Mary, is a female name. This finding aligns with our previous analysis of the gender gap, where we observed that due to the increasing number of male babies, their names dominate the dataset.

The other four baby name graphs revealed a significant increase in the proportion of babies named after popular movie characters after the release of their corresponding movies. The data also showed that the amount of increase aligned with the popularity of the movies. For instance, the graphs for Elsa, Ariel, and Forrest showed a steep increase in the proportion of babies named after the characters after the release of Frozen, The Little Mermaid, and Forrest Gump respectively. The graph for Rocky, on the other hand, showed a relatively smaller increase in the proportion of names after Rocky series were released.

One possible reason for the observed trend is the influence of media on popular culture. Movies and other forms of media often have a significant impact on the cultural zeitgeist and can influence the trends in different aspects of society, including baby names. The release of a popular movie featuring a likable or relatable character can lead to an increase in the popularity of names associated with the character. However, the extent of the influence may depend on several factors, including the popularity of the movie, the likability of the character, and the cultural context of the time.