Fast Learner

20231018

fast learner — Fri, 18 Aug 2023 18:26:32 +0900

1. "Where have you visited in Korea?"
2. "One place I'd like to recommend to you is Gwanghwamun."
3. "The weather today isn't good for tourists, right?"
4. "Do you have any difficulties traveling in Korea?"
5. "Today's weather is typical of mid-summer in Korea."
6. "The movie was 3 hours long, but I didn't find it boring because the plot was so well-structured that I never felt bored."
7. "The reason I didn't feel bored was that my major in university was physics. I already knew much about the story, and it was great to see familiar figures such as Einstein and Heisenberg."
8. "Before the movie was released, people anticipated it to be action-packed or visually focused because the director, Mr. Madonna, was renowned for films that emphasized visuals. However, after watching it, I realized it wasn't that type of film at all; it felt more like a documentary. It wasn't what I had expected."

230625

fast learner — Sun, 25 Jun 2023 19:36:38 +0900

"Recently, I've been searching for a microphone for language learning."
"Out of curiosity, I checked out the Craigslist app."
"Just an hour ago, someone listed their microphone for free on the Craigslist app."
"Also, he lives just 15 minutes away from my house."
"I texted him asking if the microphone was still available."
"He answered quickly and confirmed that the microphone was still available."
"We promised to meet near his house so we could finalize the transaction and I could pick up the microphone."

230622

fast learner — Sat, 24 Jun 2023 14:19:04 +0900

Sure! Here are the sentences that were correct so far:

I played basketball with my son today.
My son tried to shoot and he succeeded 20 times.
But since it was a hot summer day, we stopped after just 30 minutes.
We practiced basketball near a high school called IT High School.
The basketball court at IT High School is open to the public until 6:00 PM.
We had difficulty finding a good place to practice basketball near our house, but finally we found one.
Is there anything else you would like me to do?

230619

fast learner — Mon, 19 Jun 2023 12:35:32 +0900

Today is Monday that starts the week.
I didn’t sleep well last night because a mosquito kept biting me.
It was very strange that the mosquito didn’t bite my wife.
It was unusually hot last night for this time of the year.
The global warming was not the cause of the high temperature, I think. There must be some other reason.
Population decline poses a big challenge for advanced countries.
The AI technology will reverse the declining population, so I don’t think it is a big problem for them. Many people think that way.
I hope this helps you review your progress. Do you want to try another sentence?

0609

fast learner — Fri, 9 Jun 2023 19:22:52 +0900

Sure, I can summarize all the sentences that are correct so that you can review them later. Here they are:

I think AI possibly will replace doctors in the near future because it can diagnose diseases faster and more accurately.
My coworker disagreed with my opinion because he thought it would not surpass the ability of the human such as medical and legal sector.
We had a discussion about whether AI can reason like a human and we disagreed on some points.
I read the paper that described how AI can have the ability of reasoning yesterday.

tutor mode

fast learner — Thu, 8 Jun 2023 11:15:59 +0900

I would like to practice my English skill. The method is like this. First I speak English in my own way. You suggest some improvements that makes it clear and natural.

20230510

fast learner — Wed, 10 May 2023 22:59:55 +0900

오늘은 아들과 함께 학교 생활에 대해 이야기하는 시간을 보냈습니다.
그는 학교에서 작은 농구 시험에서 1등을 했기 때문에 매우 자랑스러웠습니다.
아들이 농구 시험을 보는 모습을 담은 비디오 클립을 제가 아내와 함께 공유했습니다.
아들을 포함한 세 명의 아이들이 농구를 배우고 있었습니다.
내 아들은 이제 2개월째 농구를 배우고 있습니다. 처음에는 배우기 어려웠지만, 요즘에는 아주 잘하고 계속 배울 수 있습니다.
우리가 함께 농구를 연습한 덕분에, 그가 학교 농구 시험에서 1등을 한 것을 들어서 매우 기뻤습니다.
6월 3일에 ADP 시험을 볼 예정이며, 이는 내 직업 발전에 매우 중요합니다.
ADP 시험은 성공률이 3% 미만으로 매우 어렵습니다.
이 시험은 4시간 동안 4개의 문제로 구성되어 있습니다. 교재를 허용하지만, 난이도는 매우 높습니다.
이 시험을 통과하기 위해서는 모든 유형의 문제를 조직화하고, 각각에 대한 학습 계획을 만들어야 합니다.

Today I spent some time with my son talking about his school life.
He was very proud because he got first place in a small basketball test at his school.
My wife shared a video clip of my son taking a basketball test.
There were three children learning basketball, including my son.
My son has been learning basketball for two months now. At first, he had difficulty learning it, but these days he's been doing very well and is able to continue learning.
I was so pleased to hear that he came in first place in the basketball test because we had practiced basketball together many times.
I'm supposed to take the ADP test on June 3rd, which is an important step for my career development.
The ADP test is very challenging, with a success rate of less than 3%.
The test lasts for four hours and consists of four problems. Open book is allowed, but the difficulty level is still very high.
In order to pass the ADP test, I need to organize every type of problem and create a study plan to address each one.

결측치 대처법 - KNNImputer로 데이터셋의 빈칸 채우기

fast learner — Tue, 2 May 2023 15:19:20 +0900

안녕하세요, 여러분! 오늘은 데이터 전처리 과정 중 결측치를 다루는 데 도움이 되는 KNNImputer에 대해 알아보겠습니다. 때로는 데이터셋에서 결측치가 발생할 수 있습니다. 이러한 결측치를 처리하는 방법 중 하나로 KNNImputer를 사용하여 이웃 값들의 평균으로 채울 수 있습니다.

먼저 아래 코드를 살펴봅시다.

import numpy as np
from sklearn.impute import KNNImputer

# 예제 데이터 생성
data = {'A': [1, 2, np.nan, 4, 5],
        'B': [6, np.nan, 8, 9, 10],
        'C': [11, 12, 13, np.nan, 15],
        'D': [16, 17, 18, 19, 20]}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

imputer = KNNImputer()
df_filled = imputer.fit_transform(df)
print(type(df_filled))
print(df_filled)
df_filled=pd.DataFrame(df_filled, columns=df.columns)

결과

Original DataFrame:
     A     B     C   D
0  1.0   6.0  11.0  16
1  2.0   NaN  12.0  17
2  NaN   8.0  13.0  18
3  4.0   9.0   NaN  19
4  5.0  10.0  15.0  20
<class 'numpy.ndarray'>
[[ 1.    6.   11.   16.  ]
[ 2.    8.25 12.   17.  ]
[ 3.    8.   13.   18.  ]
[ 4.    9.   12.75 19.  ]
[ 5.   10.   15.   20.  ]]

이 코드는 다음 단계를 수행합니다:

필요한 라이브러리를 임포트합니다.
결측치가 포함된 예제 데이터를 생성합니다.
KNNImputer 객체를 생성합니다. 기본값은 k=5입니다.
fit_transform() 함수를 사용하여 결측치를 대체하고, 결과를 NumPy 배열로 반환합니다.
반환된 NumPy 배열을 다시 DataFrame으로 변환합니다.

이 코드를 실행하면, 결측치가 포함된 원본 DataFrame에서 결측치가 이웃 값들의 평균으로 대체된 새로운 DataFrame이 생성됩니다. 이렇게 간단한 방법으로 결측치를 처리할 수 있습니다.

KNNImputer를 사용하면, 데이터셋의 결측치를 대체하는 데 도움이 되는 부드러운 처리 방법을 손쉽게 적용할 수 있습니다. 이 글이 여러분의 데이터 전처리 과정에 도움이 되길 바랍니다. 다음 글에서 또 만나요!

pandas value_counts()함수를 이용하여 bar차트 그리기

fast learner — Tue, 2 May 2023 09:24:59 +0900

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('../data_adp_book/student_data.csv')
print(df)
df.info()
plt.bar(df['grade'].value_counts().index, df['grade'].value_counts().values)
print(f"평균성적 {df['grade'].mean()}")
plt.show()

value_counts() 함수는 pandas의 Series 객체에서 사용할 수 있는 메서드로, 해당 Series에서 값들의 빈도수를 계산하여 반환합니다. 반환되는 값은 새로운 Series 객체이며, 이 객체의 인덱스는 고유한 값들이고, 해당 인덱스의 값은 각 값의 빈도수입니다.

plt.bar(df['grade'].value_counts().index, df['grade'].value_counts().values) 코드에서, df['grade'].value_counts()는 'grade' 열에 있는 값들의 빈도수를 계산합니다. 이 결과를 바탕으로 막대 그래프를 생성하기 위해 아래와 같이 사용되었습니다.

df['grade'].value_counts().index: 이 부분은 'grade' 열의 고유한 값들을 나타냅니다. 이 값들은 막대 그래프의 x축에 위치합니다.
df['grade'].value_counts().values: 이 부분은 각 고유한 값들의 빈도수를 나타냅니다. 이 값들은 막대 그래프의 y축에 위치하며, 해당 값들의 높이를 결정합니다.

따라서, plt.bar(df['grade'].value_counts().index, df['grade'].value_counts().values) 코드는 'grade' 열의 값들의 빈도수를 기반으로 한 막대 그래프를 생성하는 역할을 합니다.

대출 소득 데이터 시각화: 샘플 크기에 따른 평균의 분포 비교(feat. seaborn, 중심극한정리)

fast learner — Thu, 27 Apr 2023 15:16:11 +0900

안녕하세요! 오늘은 파이썬을 사용하여 대출 소득 데이터를 시각화하는 방법에 대해 알아보겠습니다. 우리는 주어진 데이터셋에서 샘플을 추출하고, 이를 시각화하여 샘플 크기에 따른 평균 소득의 분포를 비교하는 과정을 살펴볼 것입니다. 아래에 사용한 파이썬 코드와 함께 설명을 차례대로 살펴보겠습니다.

먼저, 필요한 라이브러리를 임포트합니다.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

다음으로, 대출 소득 데이터를 읽어온 후, DataFrame에서 Series로 변환합니다.

loan_income = pd.read_csv('data_practical_statistics/loans_income.csv').squeeze()

이제 3가지 종류의 데이터를 생성합니다.

원본 데이터셋에서 무작위로 1000개의 데이터를 추출하여 'Data' 타입으로 저장합니다.
원본 데이터셋에서 무작위로 5개의 데이터를 추출한 뒤, 평균을 구하고 이를 1000번 반복하여 'Mean of 5' 타입으로 저장합니다.
원본 데이터셋에서 무작위로 20개의 데이터를 추출한 뒤, 평균을 구하고 이를 1000번 반복하여 'Mean of 20' 타입으로 저장합니다.

sample_data = pd.DataFrame({'income': loan_income.sample(1000), 'type': 'Data'})
sample_mean_05 = pd.DataFrame({'income': [loan_income.sample(5).mean() for _ in range(1000)], 'type': 'Mean of 5'})
sample_mean_20 = pd.DataFrame({'income': [loan_income.sample(20).mean() for _ in range(1000)], 'type': 'Mean of 20'})

이후, 세 가지 데이터 타입을 하나의 DataFrame으로 결합합니다.

results = pd.concat([sample_data, sample_mean_05, sample_mean_20])
print(results.head())

이제 seaborn의 FacetGrid를 사용하여 히스토그램을 그립니다. 각각의 데이터 타입에 대해 히스토그램을 생성하며, x축은 소득을 나타내고, y축은 카운트를 나타냅니다. 히스토그램의 범위는 0에서 200,000으로 설정되어 있으며, 총 40개의 구간으로 나뉩니다.

g = sns.FacetGrid(results, col='type', col_wrap=1, height=2, aspect=2)
g.map(plt.hist, 'income', range=[0, 200000], bins=40)
g.set_axis_labels('Income', 'Count')
g.set_titles('{col_name}')

FacetGrid 함수는 여러 개의 플롯을 그리드 형태로 구성하여 비교하고자 할 때 유용합니다.

results: 이 인자는 시각화할 데이터를 전달합니다. 여기서는 results라는 DataFrame을 사용하였습니다.
col: 이 인자는 데이터를 분할하여 그리드에 표시할 열을 지정합니다. 여기서는 'type' 열을 사용하여 각각의 데이터 타입('Data', 'Mean of 5', 'Mean of 20')에 따라 서로 다른 플롯을 생성하였습니다.
col_wrap: 이 인자는 한 줄에 표시할 열의 최대 개수를 지정합니다. 여기서는 1로 설정하여 각 데이터 타입에 해당하는 플롯이 세로로 나열되게 하였습니다.
height: 이 인자는 각 플롯의 높이를 설정합니다. 여기서는 2로 설정하였습니다.
aspect: 이 인자는 각 플롯의 가로 세로 비율을 설정합니다. 여기서는 가로가 세로의 2배인 2로 설정하였습니다.

다음으로, g.map() 함수를 사용하여 각각의 그리드에 히스토그램을 그립니다.

plt.hist: 이 인자는 각 그리드에 적용할 함수를 지정합니다. 여기서는 matplotlib.pyplot의 hist 함수를 사용하여 히스토그램을 그렸습니다.
'income': 이 인자는 히스토그램을 그릴 때 사용할 데이터의 열 이름을 지정합니다. 여기서는 소득 데이터가 저장된 'income' 열을 사용하였습니다.
range: 이 인자는 히스토그램의 x축 범위를 지정합니다. 여기서는 소득 범위를 0에서 200,000으로 설정하였습니다.
bins: 이 인자는 히스토그램의 구간(bin) 개수를 지정합니다. 여기서는 40개의 구간으로 설정하였습니다.

plt.tight_layout()
plt.show()

이렇게 생성된 플롯은 원본 데이터, 평균 5개의 샘플, 평균 20개의 샘플에 대한 소득 분포를 보여줍니다. 이를 통해 샘플 크기가 커짐에 따라 평균 소득의 분포가 어떻게 달라지는지 비교할 수 있습니다.

분석 결과, 'Mean of 5'와 'Mean of 20'의 히스토그램은 원본 데이터에 비해 더 정규분포에 가까운 형태를 보입니다. 이는 중심극한정리(Central Limit Theorem)에 따라 샘플 크기가 커질수록 표본 평균의 분포가 정규분포에 가까워지는 것을 잘 보여줍니다.

이상으로 파이썬을 이용한 대출 소득 데이터 시각화에 대한 설명을 마치겠습니다. 이 예제를 통해 여러분은 다양한 샘플 크기를 가진 데이터의 평균 분포를 시각화하고, 중심극한정리와 같은 통계적 개념을 쉽게 이해할 수 있습니다. 앞으로도 다양한 데이터 시각화 기법을 활용하여 데이터 분석의 효율성을 높여보시기 바랍니다. 감사합니다.