Project by Pedro M. Cruz at the Co-Lab for Data Impact and the Center for Design at Northeastern University. This project was funded by a National Geographic Explorer Grant for “Visualizing the Evolution of Household Diversity in America.”

Contributors
John Wihbey @ Co-Lab for Data Impact, Associate Professor, School of Journalism and Media Innovation, Northeastern University
Kathleen Foley, visual identity and typography
Leah Welch, functional prototyping and ideation
Ryan Morrill, research, ideation, and visual prototyping
Arushi Singh, data frameworks and statistics
Eunice Esomonu, data querying
Yuqing Liu, research and visual prototyping
Anuj Golesar, database setup and statistics
Dishali Sonawane, database setup and statistics

This project uses the Nunito typeface served by Google Fonts, and is hosted by GitHub. Coded in pure Javascript and p5.js.

The application uses two canvas elements: one that is updated when a year is selected to render all the families,and a second one that is redrawn according to the mouse position in order to render a zoomed in version of the hovered families. In order to avoid computing distances from the mouse pointer to every other family, an hash map of families and coordinates is kept in memory, enabling to match mouse coordinates with any other families just by rounding integers. The chromosomes are drawn using the built-in Rom-Catmull splines in p5.js, then rendered with a thick stroke and rounded joints. The motion of the chromosomes was created by shifting horizontally the middle coordinate of each of the arms using the Perlin noise function. The ages correspond to height, not linearly, but by a square root instead: making the height of each chromosome correspond to the age of the spouses creates a wide variation of heights making it hard to condense so much information in a constrained space.

The micro-census data utilized for this project were obtained from the USA IPUMS database (IPUMS USA, University of Minnesota, www.ipums.org). Nine sampled years were utilized:

Year	Sample size
2020	1%
2000	1%
1980	5%
1960	100%
1940	100%
1920	100%
1900	100%
1880	100%
1860	100%

The data from IPUMS is anonymized. For each sample, the following variables were extracted from IPUMS in a rectangular format:

Variable	Label
YEAR	Census year
SAMPLE	IPUMS sample identifier
SERIAL	Household serial number
HHWT	Household weight
PERNUM	Person number in sample unit
SPLOC	Spouse's location in household
RELATE	Relationship to household head
SEX	Sex
AGE	Age
RACE	Race
HISPAN	Hispanic origin

Each sample was then inserted as a document of persons in a MongoDB database. The database's engine was then used to parse households, extract nuclear families from within those households, detect multi-racial families, and store them with as hierarchical objects: each family is comprised of two parents and has list of children. The results of the data processing were exported as static JSON files which are loaded by the browser and used to render the visualization.