넓은 형식에서 긴 형식으로 data.frame 재구성
data.frame
넓은 테이블에서 긴 테이블 로 변환하는 데 문제가 있습니다. 현재 다음과 같이 보입니다.
Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246
이제 이것을 이것을 data.frame
길게 변환하고 싶습니다 data.frame
. 이 같은:
Code Country Year Value
AFG Afghanistan 1950 20,249
AFG Afghanistan 1951 21,352
AFG Afghanistan 1952 22,532
AFG Afghanistan 1953 23,557
AFG Afghanistan 1954 24,555
ALB Albania 1950 8,097
ALB Albania 1951 8,986
ALB Albania 1952 10,058
ALB Albania 1953 11,123
ALB Albania 1954 12,246
나는 일부 사람들이 비슷한 질문에서 제안한 것처럼 melt()
및 reshape()
기능을 살펴보고 이미 사용해 보았습니다 . 그러나 지금까지는 지저분한 결과 만 얻었습니다.
가능한 경우 처리하기가 reshape()
조금 더 좋기 때문에 함수 로 수행하고 싶습니다 .
reshape()
melt
/ 로 익숙해지기까지 시간이 걸립니다 cast
. 다음은 데이터 프레임이 호출되었다고 가정 할 때 모양이 바뀌는 솔루션입니다 d
.
reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value",
idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)
세 가지 대안 솔루션 :
1 :로 reshape2
library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))
기부:
Code Country variable value
1 AFG Afghanistan 1950 20,249
2 ALB Albania 1950 8,097
3 AFG Afghanistan 1951 21,352
4 ALB Albania 1951 8,986
5 AFG Afghanistan 1952 22,532
6 ALB Albania 1952 10,058
7 AFG Afghanistan 1953 23,557
8 ALB Albania 1953 11,123
9 AFG Afghanistan 1954 24,555
10 ALB Albania 1954 12,246
동일한 결과를 제공하는 몇 가지 대체 표기법 :
# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)
# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))
2 :와 data.table
패키지 melt
에서 와 동일한 기능을 사용할 수 있습니다 reshape2
(확장 및 개선 된 구현). melt
from data.table
에는 또한 melt
-function에서 오는 더 많은 매개 변수가 있습니다 reshape2
. 예를 들어 변수 열의 이름을 지정할 수도 있습니다.
library(data.table)
long <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")
대체 표기법 :
melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")
3 :와 tidyr
library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))
대체 표기법 :
wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)
제외 할 경우 NA
값을, 당신은 추가 할 수 있습니다 na.rm = TRUE
받는 사람 melt
뿐만 아니라 gather
기능.
데이터의 또 다른 문제점은 값이 R에 의해 문자 값으로 읽히는 것입니다 ( ,
숫자 의 결과 ). 다음 gsub
과 as.numeric
같이 수리 할 수 있습니다 .
long$value <- as.numeric(gsub(",", "", long$value))
또는 직접으로 data.table
또는 dplyr
:
# data.table
long <- melt(setDT(wide),
id.vars = c("Code","Country"),
variable.name = "year")[, value := as.numeric(gsub(",", "", value))]
# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>%
mutate(value = as.numeric(gsub(",", "", value)))
데이터:
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
사용하여 모양 변경 패키지를 :
#data
x <- read.table(textConnection(
"Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246"), header=TRUE)
library(reshape)
x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))
Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R: stack
.
Note, however, that stack
does not work with factor
s--it only works if is.vector
is TRUE
, and from the documentation for is.vector
, we find that:
is.vector
returnsTRUE
if x is a vector of the specified mode having no attributes other than names. It returnsFALSE
otherwise.
I'm using the sample data from @Jaap's answer, where the values in the year columns are factor
s.
Here's the stack
approach:
cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
## Code Country values ind
## 1 AFG Afghanistan 20,249 1950
## 2 ALB Albania 8,097 1950
## 3 AFG Afghanistan 21,352 1951
## 4 ALB Albania 8,986 1951
## 5 AFG Afghanistan 22,532 1952
## 6 ALB Albania 10,058 1952
## 7 AFG Afghanistan 23,557 1953
## 8 ALB Albania 11,123 1953
## 9 AFG Afghanistan 24,555 1954
## 10 ALB Albania 12,246 1954
Here is another example showing the use of gather
from tidyr
. You can select the columns to gather
either by removing them individually (as I do here), or by including the years you want explicitly.
Note that, to handle the commas (and X's added if check.names = FALSE
is not set), I am also using dplyr
's mutate with parse_number
from readr
to convert the text values back to numbers. These are all part of the tidyverse
and so can be loaded together with library(tidyverse)
wide %>%
gather(Year, Value, -Code, -Country) %>%
mutate(Year = parse_number(Year)
, Value = parse_number(Value))
Returns:
Code Country Year Value
1 AFG Afghanistan 1950 20249
2 ALB Albania 1950 8097
3 AFG Afghanistan 1951 21352
4 ALB Albania 1951 8986
5 AFG Afghanistan 1952 22532
6 ALB Albania 1952 10058
7 AFG Afghanistan 1953 23557
8 ALB Albania 1953 11123
9 AFG Afghanistan 1954 24555
10 ALB Albania 1954 12246
Here's a sqldf solution:
sqldf("Select Code, Country, '1950' As Year, `1950` As Value From wide
Union All
Select Code, Country, '1951' As Year, `1951` As Value From wide
Union All
Select Code, Country, '1952' As Year, `1952` As Value From wide
Union All
Select Code, Country, '1953' As Year, `1953` As Value From wide
Union All
Select Code, Country, '1954' As Year, `1954` As Value From wide;")
To make the query without typing in everything, you can use the following:
Thanks to G. Grothendieck for implementing it.
ValCol <- tail(names(wide), -2)
s <- sprintf("Select Code, Country, '%s' As Year, `%s` As Value from wide", ValCol, ValCol)
mquery <- paste(s, collapse = "\n Union All\n")
cat(mquery) #just to show the query
# Select Code, Country, '1950' As Year, `1950` As Value from wide
# Union All
# Select Code, Country, '1951' As Year, `1951` As Value from wide
# Union All
# Select Code, Country, '1952' As Year, `1952` As Value from wide
# Union All
# Select Code, Country, '1953' As Year, `1953` As Value from wide
# Union All
# Select Code, Country, '1954' As Year, `1954` As Value from wide
sqldf(mquery)
# Code Country Year Value
# 1 AFG Afghanistan 1950 20,249
# 2 ALB Albania 1950 8,097
# 3 AFG Afghanistan 1951 21,352
# 4 ALB Albania 1951 8,986
# 5 AFG Afghanistan 1952 22,532
# 6 ALB Albania 1952 10,058
# 7 AFG Afghanistan 1953 23,557
# 8 ALB Albania 1953 11,123
# 9 AFG Afghanistan 1954 24,555
# 10 ALB Albania 1954 12,246
Unfortunately, I don't think that PIVOT
and UNPIVOT
would work for R
SQLite
. If you want to write up your query in a more sophisticated manner, you can also take a look at these posts:
Using sprintf
writing up sql queries Or Pass variables to sqldf
With tidyr_1.0.0
, another option is pivot_longer
library(tidyr)
pivot_longer(df1, -c(Code, Country), values_to = "Value", names_to = "Year")
# A tibble: 10 x 4
# Code Country Year Value
# <fct> <fct> <chr> <fct>
# 1 AFG Afghanistan 1950 20,249
# 2 AFG Afghanistan 1951 21,352
# 3 AFG Afghanistan 1952 22,532
# 4 AFG Afghanistan 1953 23,557
# 5 AFG Afghanistan 1954 24,555
# 6 ALB Albania 1950 8,097
# 7 ALB Albania 1951 8,986
# 8 ALB Albania 1952 10,058
# 9 ALB Albania 1953 11,123
#10 ALB Albania 1954 12,246
data
df1 <- structure(list(Code = structure(1:2, .Label = c("AFG", "ALB"), class = "factor"),
Country = structure(1:2, .Label = c("Afghanistan", "Albania"
), class = "factor"), `1950` = structure(1:2, .Label = c("20,249",
"8,097"), class = "factor"), `1951` = structure(1:2, .Label = c("21,352",
"8,986"), class = "factor"), `1952` = structure(2:1, .Label = c("10,058",
"22,532"), class = "factor"), `1953` = structure(2:1, .Label = c("11,123",
"23,557"), class = "factor"), `1954` = structure(2:1, .Label = c("12,246",
"24,555"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
참고URL : https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format
'IT' 카테고리의 다른 글
Celery에서 대기열에있는 작업 목록 검색 (0) | 2020.06.29 |
---|---|
텍스트 영역 내에서 HTML 렌더링 (0) | 2020.06.29 |
자식 태그를 다시 만든 후 "태그가 이미 원격에 존재합니다"오류 (0) | 2020.06.29 |
반응 형 iframe 만들기 (0) | 2020.06.29 |
Python에서는 argparse를 사용하여 양의 정수만 허용하십시오. (0) | 2020.06.29 |