NumPy¶

행렬 연산을 위한 핵심 라이브러리
“Numerical Python“의 약자
대규모 다차원 배열과 행렬 연산에 필요한 다양한 함수를 제공
파이썬 list 객체를 개선한 NumPy의 ndarray 객체를 사용
과학계산 패키지 대부분이 NumPy 배열 객체로 데이터 교환 출처 : http://taewan.kim/post/numpy_cheat_sheet/

NumPy 배열¶

고성능 다차원 배열과 이런 배열을 처리하는 다양한 함수와 툴을 제공

%%html
 
<!-- 에디터 폰트를 조정합니다. -->
<style type='text/css'>
.CodeMirror{
    font-size: 14px;
    font-family: consolas;
</style>

# import
import numpy as np

# 버젼 확인
np.__version__

'1.16.4'

<그림 1>과 같이 다차원 배열을 지원
NumPy 배열의 구조는 “Shape“으로 표현
Shape은 배열의 구조를 파이썬 튜플 자료형을 이용하여 정의
http://taewan.kim/post/numpy_sum_axis/

그림 1: NumPy 1차원, 2차원 및 3차원 배열과 Axis

#NumPy 객체의 정보를 출력

def pprint(arr):
    print("type:{}".format(type(arr)))
    print("shape: {}, dimension: {}, dtype:{}".format(arr.shape, arr.ndim, arr.dtype))
    print("Array's Data:\n", arr)

배열 생성¶

NumPy 배열은 numpy.ndarray 객체

파이썬 배열로 NumPy 배열 생성¶

파이썬 배열을 인자로 NumPy 배열을 생성
파라미터로 list 객체와 데이터 타입(dtype)을 입력하여 NumPy 배열을 생성
dtype을 생략할 경우, 입력된 list 객체의 요소 타입이 설정

파이썬 1차원 배열(list)로 NumPy 배열 생성¶

# 파이썬 1차원 배열(list)로 NumPy 배열 생성

arr = [1, 2, 3]
a = np.array(arr)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3,), dimension: 1, dtype:int32
Array's Data:
 [1 2 3]

파이썬 2차원 배열로 NumPy 배열 생성, 원소 데이터 타입 지정¶

# 파이썬 2차원 배열로 NumPy 배열 생성, 원소 데이터 타입 지정
arr = [(1,2,3), (4,5,6)]
a= np.array(arr, dtype = float)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:float64
Array's Data:
 [[1. 2. 3.]
 [4. 5. 6.]]

파이썬 3차원 배열로 NumPy 배열 생성, 원소 데이터 타입 지정¶

arr = np.array(
    [
        [[1,2,3], [4,5,6]], 
        [[3,2,1], [4,5,6]]
    ]
    , dtype = float)
a= np.array(arr, dtype = float)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 2, 3), dimension: 3, dtype:float64
Array's Data:
 [[[1. 2. 3.]
  [4. 5. 6.]]

 [[3. 2. 1.]
  [4. 5. 6.]]]

배열 생성 및 초기화¶

원하는 shape으로 배열을 설정
각 요소를 특정 값으로 초기화
zeros, ones, full, eye 함수
파라미터로 입력한 배열과 같은 shape의 배열을 만드는 zeros_like, ones_like, full_like 함수

np.zeros 함수¶

zeros(shape, dtype=float, order='C')
지정된 shape의 배열을 생성하고, 모든 요소를 0으로 초기화

a = np.zeros((3,4))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 4), dimension: 2, dtype:float64
Array's Data:
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

np.ones 함수¶

np.ones(shape, dtype=None, order='C')
지정된 shape의 배열을 생성하고, 모든 요소를 1로 초기화

a = np.ones((2,3,4),dtype=np.int16)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 3, 4), dimension: 3, dtype:int16
Array's Data:
 [[[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]

 [[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]]

np.full 함수¶

np.full(shape, fill_value, dtype=None, order='C')
지정된 shape의 배열을 생성하고, 모든 요소를 지정한 "fill_value"로 초기화

a = np.full((2,2),7)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 2), dimension: 2, dtype:int32
Array's Data:
 [[7 7]
 [7 7]]

np.eye 함수¶

np.eye(N, M=None, k=0, dtype=<class 'float'>)
(N, N) shape의 단위 행렬(Unit Matrix)을 생성

np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

np.empty 함수¶

empty(shape, dtype=float, order='C')
지정된 shape의 배열 생성
요소의 초기화 과정에 없고, 기존 메모리값을 그대로 사용
배열 생성비용이 가장 저렴하고 빠름
배열 사용 시 주의가 필요(초기화를 고려)

a = np.empty((4,2))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (4, 2), dimension: 2, dtype:float64
Array's Data:
 [[0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 6.99596955e-321]
 [8.34441742e-308 0.00000000e+000]]

like 함수¶

numpy는 지정된 배열과 shape이 같은 행렬을 만드는 like 함수를 제공합니다.
np.zeros_like
np.ones_like
np.full_like
np.empty_like

a = np.array([[1,2,3], [4,5,6]])
b = np.ones_like(a)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 1 1]
 [1 1 1]]

a = np.array([[1,2,3], [4,5,6]])
b = np.zeros_like(a)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[0 0 0]
 [0 0 0]]

a = np.array([[1,2,3], [4,5,6]])
b = np.full_like(a, 7)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[7 7 7]
 [7 7 7]]

a = np.array([[1,2,3], [4,5,6]])
b = np.empty_like(a)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]]

데이터 생성 함수¶

주어진 조건으로 데이터를 생성한 후, 배열을 만드는 데이터 생성 함수
numpy.linspace
numpy.arange
numpy.logspace

np.linspace 함수¶

numpy.linspace(start, stop, num=50, endpoint=True, dtype=None)
start부터 stop의 범위에서 num개를 균일한 간격으로 데이터를 생성하고 배열을 만드는 함수
요소 개수를 기준으로 균등 간격의 배열을 생성

a = np.linspace(0, 1, 5)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (5,), dimension: 1, dtype:float64
Array's Data:
 [0.   0.25 0.5  0.75 1.  ]

# endpoint : stop 값 포함 여부
a = np.linspace(0, 1, 5, endpoint=False) 
pprint(a)

type:<class 'numpy.ndarray'>
shape: (5,), dimension: 1, dtype:float64
Array's Data:
 [0.  0.2 0.4 0.6 0.8]

# dtype : 값 타입 설정
a = np.linspace(0, 1, 5, dtype=np.int)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (5,), dimension: 1, dtype:int32
Array's Data:
 [0 0 0 0 1]

# linspace의 데이터 추출 시각화
import matplotlib.pyplot as plt
plt.plot(a, 'o')
plt.show()

np.arange 함수¶

numpy.arange([start,] stop[, step,], dtype=None)
start부터 stop 미만까지 step 간격으로 데이터 생성한 후 배열을 만듦
범위내에서 간격을 기준 균등 간격의 배열
요소의 갯수가 아닌 데이터의 간격을 기준으로 배열 생성

a = np.arange(0, 10, 2, np.float)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (5,), dimension: 1, dtype:float64
Array's Data:
 [0. 2. 4. 6. 8.]

# arange의 데이터 추출 시각화
import matplotlib.pyplot as plt
plt.plot(a, 'o')
plt.show()

np.geomspace 함수¶

np.geomspace(start, stop, num=50, endpoint=True, dtype=None)
로그 스케일의 linspace 함수
로그 스케일로 지정된 범위에서 num 개수만큼 균등 간격으로 데이터 생성한 후 배열 만듦

a = np.geomspace(0.1, 1, 20, endpoint=True)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (20,), dimension: 1, dtype:float64
Array's Data:
 [0.1        0.11288379 0.1274275  0.14384499 0.16237767 0.18329807
 0.20691381 0.23357215 0.26366509 0.29763514 0.33598183 0.37926902
 0.42813324 0.48329302 0.54555948 0.61584821 0.6951928  0.78475997
 0.88586679 1.        ]

# logspace의 데이터 추출 시각화
import matplotlib.pyplot as plt
plt.plot(a, 'o')
plt.show()

2.4 난수 기반 배열 생성¶

난수 발생 및 배열 생성을 생성하는 numpy.random 모듈
np.random.normal
np.random.rand
np.random.randn
np.random.randint
np.random.random

np.random.normal¶

normal(loc=0.0, scale=1.0, size=None)
정규 분포 확률 밀도에서 표본 추출
loc: 정규 분포의 평균
scale: 표준편차
평균과 표준편차가 0, 1이면 np.random.randn 와 동일

mean = 0
std = 1
a = np.random.normal(mean, std, (2, 3))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:float64
Array's Data:
 [[ 0.92214951  0.2768713   0.28871485]
 [-0.22215228  1.65003785  0.27238484]]

np.random.normal이 생성한 난수는 정규 분포의 형상
다음 예제는 정규 분포로 10000개 표본을 뽑은 결과를 히스토그램으로 표현한 예
표본 10000개의 배열을 100개 구간으로 구분할 때, 정규 분포 형태

data = np.random.normal(0, 1, 10000)
import matplotlib.pyplot as plt
plt.hist(data, bins=30) # bins 구간의 갯수
plt.show()

np.random.rand¶

난수: [0. 1)의 균등 분포(Uniform Distribution) 형상으로 표본 추출

data = np.random.rand(10000)
print(data)
import matplotlib.pyplot as plt
plt.hist(data, bins=10)
plt.show()

[0.83136404 0.08291371 0.63280369 ... 0.75685302 0.83744586 0.82761162]

np.random.randn¶

난수: 표준 정규 분포(standard normal distribution)에서 표본 추출

a = np.random.randn(2, 4)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 4), dimension: 2, dtype:float64
Array's Data:
 [[ 0.63740099  2.00253445  0.59490857  0.45505599]
 [-0.24464908 -1.89789156  1.18259     0.0472664 ]]

data = np.random.randn(10000)
import matplotlib.pyplot as plt
plt.hist(data, bins=30)
plt.show()

np.random.randint¶

numpy.random.randint(low, high=None, size=None, dtype='l')
지정된 shape으로 배열을 만들고 low 부터 high 미만의 범위에서 정수 표본 추출

a = np.random.randint(5, 10, size=(2, 4))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 4), dimension: 2, dtype:int32
Array's Data:
 [[7 5 7 7]
 [5 8 8 8]]

a = np.random.randint(1, size=10)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (10,), dimension: 1, dtype:int32
Array's Data:
 [0 0 0 0 0 0 0 0 0 0]

data = np.random.randint(-100, 100, 10000)
import matplotlib.pyplot as plt
plt.hist(data, bins=10)
plt.show()

rand vs random¶

random.random 은 random.random_sample 의 alias
인자로 dimension 주는 방법이 다름

np.random.rand(3,5)

array([[0.81730018, 0.10876448, 0.10992376, 0.19525865, 0.25390175],
       [0.10982458, 0.84511368, 0.61362997, 0.58784936, 0.38169617],
       [0.1416443 , 0.02131784, 0.73151409, 0.94102015, 0.4691242 ]])

np.random.random((3,5))

array([[0.26260145, 0.77520009, 0.5710438 , 0.29422218, 0.23341132],
       [0.61602588, 0.2764158 , 0.45973021, 0.34387439, 0.84149006],
       [0.17352401, 0.72021185, 0.26327834, 0.33899194, 0.86812198]])

2.5 약속된 난수¶

np.random.seed 값을 설정 하여 난수 발생 재연

# seed 없이 random 사용
np.random.rand(2,3)

array([[0.26714709, 0.74248027, 0.50818729],
       [0.97813143, 0.68592831, 0.16024763]])

np.random.rand(2,3)

array([[0.71470379, 0.41572334, 0.21219404],
       [0.86153737, 0.26261555, 0.72541432]])

# seed 설정 후 random 사용
np.random.seed(0)

np.random.rand(2,3)

array([[0.9156788 , 0.98914756, 0.19871523],
       [0.06571392, 0.96298927, 0.87936139]])

np.random.randint(0, 10, (2,3))

array([[4, 7, 6],
       [8, 8, 1]])

np.random.seed(0)

np.random.rand(2,3)

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411]])

np.random.randint(0, 10, (2,3))

array([[4, 7, 6],
       [8, 8, 1]])

데이터 타입¶

배열을 생성할 때 dtype속성으로 지정
np.int64 : 64 비트 정수 타입
np.float32 : 32 비트 부동 소수 타입
np.complex : 복소수 (128 float)
np.bool : 불린 타입 (Trur, False)
np.object : 파이썬 객체 타입
np.string_ : 스트링 타입
np.unicode_ : 유니코드 타입

배열 상태 검사(Inspecting)¶

배열 shape np.ndarray.shape 속성 arr.shape (5, 2, 3)
배열 길이 일차원의 배열 길이 확인 len(arr) 5
배열 차원 np.ndarray.ndim 속성 arr.ndim 3
배열 요소 수 np.ndarray.size 속성 arr.size 30
배열 타입 np.ndarray.dtype 속성 arr.dtype dtype(‘float64’)
배열 타입 명 np.ndarray.dtype.name 속성 arr.dtype.name float64
배열 타입 변환 np.ndarray.astype 함수 arr.astype(np.int) 배열 타입 변환

#데모 배열 객체 생성
arr = np.random.random((5,2,3))

#배열 타입 조회
type(arr)

numpy.ndarray

# 배열의 shape 확인
arr.shape

(5, 2, 3)

# 배열의 길이
len(arr)

5

# 배열의 차원 수 
arr.ndim

3

# 배열의 요소 수: shape(k, m, n) ==> k*m*n
arr.size

30

# 배열 타입 확인
arr.dtype

dtype('float64')

# 배열 요소를 int로 변환
# 요소의 실제 값이 변환되는 것이 아님
# View의 출력 타입과 연산을 변환하는 것
arr.astype(np.int)

array([[[0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0]]])

# np.float으로 타입을 다시 변환하면 np.int 변환 이전 값으로 모든 원소 값이 복원됨
arr.astype(np.float)

array([[[0.79172504, 0.52889492, 0.56804456],
        [0.92559664, 0.07103606, 0.0871293 ]],

       [[0.0202184 , 0.83261985, 0.77815675],
        [0.87001215, 0.97861834, 0.79915856]],

       [[0.46147936, 0.78052918, 0.11827443],
        [0.63992102, 0.14335329, 0.94466892]],

       [[0.52184832, 0.41466194, 0.26455561],
        [0.77423369, 0.45615033, 0.56843395]],

       [[0.0187898 , 0.6176355 , 0.61209572],
        [0.616934  , 0.94374808, 0.6818203 ]]])

도움말¶

NumPy의 모든 API는 np.info 함수를 이용하여 도움말을 확인

np.info(np.ndarray.dtype)

Data-type of the array's elements.

Parameters
----------
None

Returns
-------
d : numpy dtype object

See Also
--------
numpy.dtype

Examples
--------
>>> x
array([[0, 1],
       [2, 3]])
>>> x.dtype
dtype('int32')
>>> type(x.dtype)
<type 'numpy.dtype'>

배열 연산¶

배열 일반 연산¶

산술 연산(Arithmetic Operations)¶

기본 연산자 연산자 재정의

# arange로 1부터 10 미만의 범위에서 1씩 증가하는 배열 생성
# 배열의 shape을 (3, 3)으로 지정
a = np.arange(1, 10).reshape(3, 3)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

# arange로 9부터 0까지 범위에서 1씩 감소하는 배열 생성
# 배열의 shape을 (3, 3)으로 지정
b = np.arange(9, 0, -1).reshape(3, 3)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[9 8 7]
 [6 5 4]
 [3 2 1]]

# 배열 연산: 뺄셈, -

a - b

array([[-8, -6, -4],
       [-2,  0,  2],
       [ 4,  6,  8]])

np.subtract(a, b)

array([[-8, -6, -4],
       [-2,  0,  2],
       [ 4,  6,  8]])

# 배열 연산: 덧셈, +
    
a + b

array([[10, 10, 10],
       [10, 10, 10],
       [10, 10, 10]])

np.add(a, b)

array([[10, 10, 10],
       [10, 10, 10],
       [10, 10, 10]])

# 배열 연산: 곱셈, *
    
a * b

array([[ 9, 16, 21],
       [24, 25, 24],
       [21, 16,  9]])

np.multiply(a, b)

array([[ 9, 16, 21],
       [24, 25, 24],
       [21, 16,  9]])

# 배열 연산: 나눗셈, /
    
a / b

array([[0.11111111, 0.25      , 0.42857143],
       [0.66666667, 1.        , 1.5       ],
       [2.33333333, 4.        , 9.        ]])

np.divide(a, b)

array([[0.11111111, 0.25      , 0.42857143],
       [0.66666667, 1.        , 1.5       ],
       [2.33333333, 4.        , 9.        ]])

# 배열 연산: 지수

np.exp(a)

array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01],
       [5.45981500e+01, 1.48413159e+02, 4.03428793e+02],
       [1.09663316e+03, 2.98095799e+03, 8.10308393e+03]])

# 배열 연산: 제곱근

np.sqrt(a)

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974],
       [2.64575131, 2.82842712, 3.        ]])

# 배열 연산: sin

np.sin(a)

array([[ 0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ],
       [ 0.6569866 ,  0.98935825,  0.41211849]])

# 배열 연산: cos

np.cos(a)

array([[ 0.54030231, -0.41614684, -0.9899925 ],
       [-0.65364362,  0.28366219,  0.96017029],
       [ 0.75390225, -0.14550003, -0.91113026]])

# 배열 연산: tan

np.tan(a)

array([[ 1.55740772, -2.18503986, -0.14254654],
       [ 1.15782128, -3.38051501, -0.29100619],
       [ 0.87144798, -6.79971146, -0.45231566]])

# 배열 연산: log

np.log(a)

array([[0.        , 0.69314718, 1.09861229],
       [1.38629436, 1.60943791, 1.79175947],
       [1.94591015, 2.07944154, 2.19722458]])

# 배열 연산: dot product, 내적

np.dot(a, b)

array([[ 30,  24,  18],
       [ 84,  69,  54],
       [138, 114,  90]])

비교 연산(Comparison)¶

배열의 요소별 비교 (Element-wise)¶

기본 연산자를 이용하여 요소별 비교

a == b

array([[False, False, False],
       [False,  True, False],
       [False, False, False]])

a > b

array([[False, False, False],
       [False, False,  True],
       [ True,  True,  True]])

배열 비교 (Array-wise)¶

두 배열 전체는 np.array_equal 함수를 사용하여 비교

np.array_equal(a, b)

False

집계 함수(Aggregate Functions)¶

NumPy의 모든 집계 함수는 집계 함수는 AXIS를 기준으로 계산
집계함수에 AXIS를 지정하지 않으면 axis=None
axis=None
- aixs=None은 전체 행렬을 하나의 배열로 간주하고 집계 함수의 범위를 전체 행렬로 정의합니다.

axis=0
- aixs=0은 행을 기준으로 각 행의 동일 인덱스의 요소를 그룹으로 합니다.
- 각 그룹을 집계 함수의 범위로 정의합니다.

axis=1
- aixs=1은 열을 기준으로 각 열의 요소를 그룹으로 합니다.
- 각 그룹을 집계 함수의 범위로 정의합니다.

axis 관련해서는 3차원 ==> Numpy에서 np.sum 함수의 axis 이해를 참조

# arange로 1부터 10미만의 범위에서 1씩 증가하는 배열 생성
# 배열의 shape을 (3, 3)으로 지정
a = np.arange(1, 10).reshape(3, 3)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

[ndarray 배열 객체].sum(), np.sum(): 합계¶

지정된 axis를 기준으로 요소의 합을 반환

a.sum(), np.sum(a)

(45, 45)

a.sum(axis=0), np.sum(a, axis=0)

(array([12, 15, 18]), array([12, 15, 18]))

a.sum(axis=1), np.sum(a, axis=1)

(array([ 6, 15, 24]), array([ 6, 15, 24]))

[ndarray 배열 객체].min(), np.min(): 최소값¶

지정된 axis를 기준으로 요소의 최소값을 반환

a.min(), np.min(a)

(1, 1)

a.min(axis=0), np.min(a, axis=0)

(array([1, 2, 3]), array([1, 2, 3]))

a.min(axis=1), np.min(a, axis=1)

(array([1, 4, 7]), array([1, 4, 7]))

[ndarray 배열 객체].max(), np.max(): 최대값¶

지정된 axis를 기준으로 요소의 최대값을 반환

a.max(), np.max(a)

(9, 9)

a.max(axis=0), np.max(a, axis=0)

(array([7, 8, 9]), array([7, 8, 9]))

a.max(axis=1), np.max(a, axis=1)

(array([3, 6, 9]), array([3, 6, 9]))

[ndarray 배열 객체].cumssum(), np.cumsum(): 누적 합계¶

지정된 axis를 기준으로 각 요소의 누적 합의 결과를 반환

a.cumsum(), np.cumsum(a)

(array([ 1,  3,  6, 10, 15, 21, 28, 36, 45], dtype=int32),
 array([ 1,  3,  6, 10, 15, 21, 28, 36, 45], dtype=int32))

a.cumsum(axis=0), np.cumsum(a, axis=0)

(array([[ 1,  2,  3],
        [ 5,  7,  9],
        [12, 15, 18]], dtype=int32), array([[ 1,  2,  3],
        [ 5,  7,  9],
        [12, 15, 18]], dtype=int32))

a.cumsum(axis=1), np.cumsum(a, axis=1)

(array([[ 1,  3,  6],
        [ 4,  9, 15],
        [ 7, 15, 24]], dtype=int32), array([[ 1,  3,  6],
        [ 4,  9, 15],
        [ 7, 15, 24]], dtype=int32))

[ndarray 배열 객체].mean(), np.mean(): 평균¶

지정된 axis를 기준으로 요소의 평균을 반환

a.mean(), np.mean(a)

(5.0, 5.0)

a.mean(axis=0), np.mean(a, axis=0)

(array([4., 5., 6.]), array([4., 5., 6.]))

a.mean(axis=1), np.mean(a, axis=1)

(array([2., 5., 8.]), array([2., 5., 8.]))

np.mean(): 중앙값¶

지정된 axis를 기준으로 요소의 중앙값을 반환

np.median(a)

5.0

np.median(a, axis=0)

array([4., 5., 6.])

np.median(a, axis=1)

array([2., 5., 8.])

np.corrcoef(): (상관계수)Correlation coeficient¶

np.corrcoef(a)

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

[ndarray 배열 객체].std(), np.std(): 표준편차¶

지정된 axis를 기준으로 요소의 표준 편차를 계산

a.std(), np.std(a)

(2.581988897471611, 2.581988897471611)

a.std(axis=0), np.std(a, axis=0)

(array([2.44948974, 2.44948974, 2.44948974]),
 array([2.44948974, 2.44948974, 2.44948974]))

a.std(axis=1), np.std(a, axis=1)

(array([0.81649658, 0.81649658, 0.81649658]),
 array([0.81649658, 0.81649658, 0.81649658]))

브로드캐스팅¶

Shape이 같은 두 배열에 대한 이항 연산은 배열의 요소별로 수행됩니다. 두 배열 간의 Shape이 다를 경우 두 배열 간의 형상을 맞추는 <그림 2>의 Broadcasting 과정을 거칩니다.

그림 2: 브로드캐스트 작동 원리

그림 2 참조: https://mathematica.stackexchange.com/questions/99171/how-to-implement-the-general-array-broadcasting-method-from-numpy

# 데모 배열 생성
a = np.arange(1, 25).reshape(4, 6)
pprint(a)
b = np.arange(25, 49).reshape(4, 6)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]
type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[25 26 27 28 29 30]
 [31 32 33 34 35 36]
 [37 38 39 40 41 42]
 [43 44 45 46 47 48]]

a+b

array([[26, 28, 30, 32, 34, 36],
       [38, 40, 42, 44, 46, 48],
       [50, 52, 54, 56, 58, 60],
       [62, 64, 66, 68, 70, 72]])

Shape이 다른 두 배열의 연산¶

Shape이 다른 두 배열 사이의 이항 연산에서 브로드케스팅 발생
두 배열을 같은 Shape으로 만든 후 연산을 수행

a+100

array([[101, 102, 103, 104, 105, 106],
       [107, 108, 109, 110, 111, 112],
       [113, 114, 115, 116, 117, 118],
       [119, 120, 121, 122, 123, 124]])

a + 100은 다음과 같은 과정을 거쳐 처리 됩니다.¶

# step 1: 스칼라 배열 변경
new_arr = np.full_like(a, 100)
pprint(new_arr)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[100 100 100 100 100 100]
 [100 100 100 100 100 100]
 [100 100 100 100 100 100]
 [100 100 100 100 100 100]]

# step 2: 배열 이항 연산
a+new_arr

array([[101, 102, 103, 104, 105, 106],
       [107, 108, 109, 110, 111, 112],
       [113, 114, 115, 116, 117, 118],
       [119, 120, 121, 122, 123, 124]])

Case 2: Shaep이 다른 배열들의 연산¶

# 데모 배열 생성
a = np.arange(5).reshape((1, 5))
pprint(a)
b = np.arange(5).reshape((5, 1))
pprint(b)

type:<class 'numpy.ndarray'>
shape: (1, 5), dimension: 2, dtype:int32
Array's Data:
 [[0 1 2 3 4]]
type:<class 'numpy.ndarray'>
shape: (5, 1), dimension: 2, dtype:int32
Array's Data:
 [[0]
 [1]
 [2]
 [3]
 [4]]

a + b

array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

백터연산¶

반복문 처리 속도 비교

import numpy as np
# sample array
a = np.arange(10000000)

result = 0

%%time
for v in a:
  result += v

f:\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: RuntimeWarning: overflow encountered in long_scalars

Wall time: 2.36 s

result

266447232

%%time 
result = np.sum(a)

Wall time: 4 ms

result

-2014260032

배열 복사¶

[ndarray 배열 객체].copy(), np.copy()¶

#데모용 배열
a = np.random.randint(0, 9, (3, 3))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[0 5 1]
 [8 4 7]
 [7 1 7]]

copied_a1 =np.copy(a)

배열 정렬¶

ndarray 객체는 axis를 기준으로 요소 정렬하는 sort 함수

#배열 생성
unsorted_arr = np.random.random((3, 3))
pprint(unsorted_arr)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:float64
Array's Data:
 [[0.79555181 0.11040651 0.49927913]
 [0.55927789 0.85683627 0.17198536]
 [0.47712614 0.87329614 0.78931464]]

#데모를 위한 배열 복사
unsorted_arr1 = unsorted_arr.copy()
unsorted_arr2 = unsorted_arr.copy()
unsorted_arr3 = unsorted_arr.copy()

#배열 정렬
unsorted_arr1.sort()
pprint(unsorted_arr1)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:float64
Array's Data:
 [[0.11040651 0.49927913 0.79555181]
 [0.17198536 0.55927789 0.85683627]
 [0.47712614 0.78931464 0.87329614]]

#배열 정렬, axis=0
unsorted_arr2.sort(axis=0)
pprint(unsorted_arr2)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:float64
Array's Data:
 [[0.47712614 0.11040651 0.17198536]
 [0.55927789 0.85683627 0.49927913]
 [0.79555181 0.87329614 0.78931464]]

#배열 정렬, axis=1
unsorted_arr3.sort(axis=1)
pprint(unsorted_arr3)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:float64
Array's Data:
 [[0.11040651 0.49927913 0.79555181]
 [0.17198536 0.55927789 0.85683627]
 [0.47712614 0.78931464 0.87329614]]

서브셋, 슬라이싱, 인덱싱¶

요소 선택¶

배열의 각 요소는 axis 인덱스 배열로 참조
1차원 배열은 1개 인덱스, 2차원 배열은 2개 인덱스, 3차원 인덱스는 3개 인덱스로 요소를 참조
인덱스로 참조한 요소는 값 참조, 값 수정이 모두 가능

# 데모 배열 생성
a0 = np.arange(24) # 1차원 배열
pprint(a0)
a1 = np.arange(24).reshape((4, 6)) #2차원 배열
pprint(a1)
a2 = np.arange(24).reshape((2, 4, 3)) # 3차원 배열
pprint(a2)

type:<class 'numpy.ndarray'>
shape: (24,), dimension: 1, dtype:int32
Array's Data:
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
type:<class 'numpy.ndarray'>
shape: (2, 4, 3), dimension: 3, dtype:int32
Array's Data:
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]]

1 차원 배열 요소 참조 및 변경¶

a0[5] # 5번 인덱스 요소 참조

5

# 5번 인덱스 요소 업데이트
a0[5] = 1000000

pprint(a0)

type:<class 'numpy.ndarray'>
shape: (24,), dimension: 1, dtype:int32
Array's Data:
 [      0       1       2       3       4 1000000       6       7       8
       9      10      11      12      13      14      15      16      17
      18      19      20      21      22      23]

2 차원 배열 요소 참조 및 변경¶

pprint(a1)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]

# 1행 두번째 컬럼 요소 참조
a1[0, 1]

1

# 1행 두번째 컬럼 요소 업데이트
a1[0, 1]=10000

pprint(a1)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[    0 10000     2     3     4     5]
 [    6     7     8     9    10    11]
 [   12    13    14    15    16    17]
 [   18    19    20    21    22    23]]

3 차원 배열 요소 참조 및 변경¶

pprint(a2)

type:<class 'numpy.ndarray'>
shape: (2, 4, 3), dimension: 3, dtype:int32
Array's Data:
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]]

# 2 번째 행, 첫번째 컬럼, 두번째 요소 참조
a2[1, 0, 1]

13

a2[1, 0, 1]=10000

pprint(a2)

type:<class 'numpy.ndarray'>
shape: (2, 4, 3), dimension: 3, dtype:int32
Array's Data:
 [[[    0     1     2]
  [    3     4     5]
  [    6     7     8]
  [    9    10    11]]

 [[   12 10000    14]
  [   15    16    17]
  [   18    19    20]
  [   21    22    23]]]

슬라이싱(Slicing)¶

여러개의 배열 요소를 참조할 때 슬라이싱을 사용
슬라이싱은 axis 별로 범위를 지정하여 실행
[from_index:to_index]
from_index는 범위의 시작 인덱스이며, to_index는 범위의 종료 인덱스
to_index는 결과에 포함되지 않음
from_index는 생략 가능, 생략할 경우 0을 지정한 것으로 간주
to_index 역시 생략 가능, 이 경우 마지막 인덱스로 설정됩니다.
[ : ] 는 전체 범위
from_index와 to_index에 음수를 지정하면 이것은 반대 방향
-1은 마지막 인덱스를 의미
슬라이싱은 원본 배열의 뷰
슬라이싱 결과의 요소를 업데이트하면 원본에 반영

# 데모 배열 생성
a1 = np.arange(1, 25).reshape((4, 6)) #2차원 배열
pprint(a1)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

가운데 요소 가져오기

a1[1:3, 1:5]

array([[ 8,  9, 10, 11],
       [14, 15, 16, 17]])

음수 인덱스를 이용한 범위 설정¶

a1[1:-1, 1:-1]

array([[ 8,  9, 10, 11],
       [14, 15, 16, 17]])

# 슬라이싱 배열
slide_arr = a1[1:3, 1:5]
pprint(slide_arr)

type:<class 'numpy.ndarray'>
shape: (2, 4), dimension: 2, dtype:int32
Array's Data:
 [[ 8  9 10 11]
 [14 15 16 17]]

# 슬라이싱 결과 배열에 슬라이싱을 적용하여 4개 요소 참조
slide_arr2 = slide_arr[:, 1:3]
pprint(slide_arr2)

type:<class 'numpy.ndarray'>
shape: (2, 2), dimension: 2, dtype:int32
Array's Data:
 [[ 9 10]
 [15 16]]

# 슬라이싱을 적용하여 참조한 4개 요소 업데이트 및 슬라이싱 배열 조회
slide_arr[:, 1:3]=99999
pprint(slide_arr)

type:<class 'numpy.ndarray'>
shape: (2, 4), dimension: 2, dtype:int32
Array's Data:
 [[    8 99999 99999    11]
 [   14 99999 99999    17]]

pprint(a1)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[    1     2     3     4     5     6]
 [    7     8 99999 99999    11    12]
 [   13    14 99999 99999    17    18]
 [   19    20    21    22    23    24]]

블린 인덱싱(Boolean Indexing)¶

배열 각 요소의 선택 여부를 True, False 지정하는 방식
해당 인덱스의 True만을 조회

# 데모 배열 생성
a1 = np.arange(1, 25).reshape((4, 6)) #2차원 배열
pprint(a1)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

# 짝수인 요소 확인
# numpy broadcasting을 이용하여 짝수인 배열 요소 확인
even_arr = a1%2==0
pprint(even_arr)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:bool
Array's Data:
 [[False  True False  True False  True]
 [False  True False  True False  True]
 [False  True False  True False  True]
 [False  True False  True False  True]]

# a1[a1%2==0] 동일한 의미입니다. 
a1[even_arr]

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24])

np.sum(a1)

300

Boolean Indexing의 응용¶

구글 : seattle2014.csv 검색
2014년 시애클 강수량 데이터:
2014년 시애틀 1월 평균 강수량은?

# 데이터 로딩
import pandas as pd
rains_in_seattle = pd.read_csv("Seattle2014.csv")
rains_arr = rains_in_seattle['PRCP'].values
print("Data Size:", len(rains_arr))

Data Size: 365

# 날짜 배열 
days_arr = np.arange(0, 365)

days_arr

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
       195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
       208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,
       221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,
       234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,
       247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259,
       260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272,
       273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285,
       286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298,
       299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311,
       312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324,
       325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337,
       338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350,
       351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,
       364])

# 1월의 날수 boolean index 생성
condition_jan = days_arr < 31
condition_jan

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])

# 40일 조회
condition_jan[:40]

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False])

#1월의 강수량 추출
rains_jan = rains_arr[condition_jan]
rains_jan

array([  0,  41,  15,   0,   0,   3, 122,  97,  58,  43, 213,  15,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   5,   0,   0,   0,   0,
         0,  89, 216,   0,  23], dtype=int64)

#강수량 데이터 수 (1월: 31일)
len(rains_jan)

31

# 1월 강수량 총합
np.sum(rains_jan)

940

# 1월 평균 강수향
np.mean(rains_jan)

30.322580645161292

팬시 인덱싱(Fancy Indexing)¶

배열에 인덱스 배열을 전달하여 요소를 참조

arr = np.arange(1, 25).reshape((4, 6))
pprint(arr)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

[arr[0,0], arr[1, 1], arr[2, 2], arr[3, 3]]

[1, 8, 15, 22]

# 두 배열을 전달==> (0, 0), (1,1), (2,2), (3, 3)
arr[[0, 1, 2, 3], [0, 1, 2, 3]]

array([ 1,  8, 15, 22])

# 전체 행에 대해서, 1, 2번 컬럼 참조
arr[:, [1, 2]]

array([[ 2,  3],
       [ 8,  9],
       [14, 15],
       [20, 21]])

배열 변환¶

전치(Transpose)¶

Tranpose는 행렬의 인덱스가 바뀌는 변환
[numpy.ndarray 객체].T 속성을 사용

$$ \begin{bmatrix}1 & 2 \end{bmatrix}^T = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$

$$ \begin{bmatrix}1 & 2 \\ 3 & 4 \end{bmatrix} ^T = \begin{bmatrix}1 & 3 \\ 2 & 4 \end{bmatrix} $$

$$ \begin{bmatrix}1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} ^T = \begin{bmatrix}1 & 3 & 5 \\ 2 & 4 & 6 \end{bmatrix} $$

# 행렬 생성
a = np.random.randint(1, 10, (2, 3))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[2 2 9]
 [4 8 2]]

#행렬의 전치
pprint(a.T)

type:<class 'numpy.ndarray'>
shape: (3, 2), dimension: 2, dtype:int32
Array's Data:
 [[2 4]
 [2 8]
 [9 2]]

배열 형태 변경¶

ravel은 배열의 shape을 1차원 배열로 만드는 메서드
reshape은 데이터 변경없이 지정된 shape으로 변환하는 메서드

[numpy.ndarray 객체].ravel()¶

배열을 1차원 배열로 반환하는 메서드
numpy.ndarray 배열 객체의 View를 반환
배열의 요소를 수정하면 원본 배열 요소에도 반영

# 데모 배열 생성
a = np.random.randint(1, 10, (2, 3))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[7 4 1]
 [4 2 6]]

a.ravel()

array([7, 4, 1, 4, 2, 6])

b = a.ravel()
pprint(b)

type:<class 'numpy.ndarray'>
shape: (6,), dimension: 1, dtype:int32
Array's Data:
 [7 4 1 4 2 6]

b[0]=99
pprint(b)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (6,), dimension: 1, dtype:int32
Array's Data:
 [99  4  1  4  2  6]
type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[99  4  1]
 [ 4  2  6]]

[numpy.ndarray 객체].reshape()¶

[numpy.ndarray 객체]의 shape 정보 변경

# 대상 행렬 속성 확인
a = np.random.randint(1, 10, (2, 3))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[9 4 7]
 [4 4 2]]

result = a.reshape((3, 2, 1))
pprint(result)

type:<class 'numpy.ndarray'>
shape: (3, 2, 1), dimension: 3, dtype:int32
Array's Data:
 [[[9]
  [4]]

 [[7]
  [4]]

 [[4]
  [2]]]

배열 요소 추가 삭제¶

배열의 요소를 변경, 추가, 삽입 및 삭제하는 resize, append, insert, delete 함수

np.resize(a, new_shape)¶

np.resize와 np.reshape 함수는 배열의 shape을 변경한다는 부분에서 유사
resize는 shape을 변경하는 과정에서 배열 요소 수를 줄이거나 늘릴 수 있음

일반적인 resize 사용 방법¶

#배열 생성
a = np.random.randint(1, 10, (2, 6))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 6), dimension: 2, dtype:int32
Array's Data:
 [[3 2 1 7 8 7]
 [6 5 3 4 7 9]]

# shape 변경 - 요소 수 변경 없음
a.resize((6, 2))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (6, 2), dimension: 2, dtype:int32
Array's Data:
 [[3 2]
 [1 7]
 [8 7]
 [6 5]
 [3 4]
 [7 9]]

요소 수가 늘어난 변경¶

#배열 생성
a = np.random.randint(1, 10, (2, 6))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 6), dimension: 2, dtype:int32
Array's Data:
 [[3 4 8 1 7 5]
 [2 5 4 1 7 5]]

# 요소수 12개에서 20개로 늘어남
# 늘어난 요소는 0으로 채워짐
a.resize((2, 10))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 10), dimension: 2, dtype:int32
Array's Data:
 [[3 4 8 1 7 5 2 5 4 1]
 [7 5 0 0 0 0 0 0 0 0]]

요소 수가 즐어든 변경¶

#배열 생성
a = np.random.randint(1, 10, (2, 6))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (2, 6), dimension: 2, dtype:int32
Array's Data:
 [[8 8 2 4 1 8]
 [8 5 9 4 3 7]]

# 요소수 12개에서 9개로 줄임
a.resize((3, 3))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[8 8 2]
 [4 1 8]
 [8 5 9]]

np.append(arr, values, axis=None)¶

배열의 끝에 값을 추가
axis로 배열이 추가되는 방향을 지정

# 데모 배열 생성
a = np.arange(1, 10).reshape(3, 3)
pprint(a)
b = np.arange(10, 19).reshape(3, 3)
pprint(b)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[10 11 12]
 [13 14 15]
 [16 17 18]]

case 1: axis을 지정하지 않을 경우¶

axis를 지정하지 않으면 배열은 1차원 배열로 변형되어 결합

# axis 지정 없이 추가
result = np.append(a, b)
pprint(result)

type:<class 'numpy.ndarray'>
shape: (18,), dimension: 1, dtype:int32
Array's Data:
 [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18]

#원본 배열을 변경하는 것이 아니며 새로운 배열이 생성됩니다. 
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

case 2: axis=0 지정¶

# axis = 0, 행방향
# axis 0 방향으로 b 배열 추가
result = np.append(a, b, axis=0)
pprint(result)

type:<class 'numpy.ndarray'>
shape: (6, 3), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [16 17 18]]

case 3: axis=1 지정¶

# axis = 1 열방향
# axis 1 방향으로 b 배열 추가
result = np.append(a, b, axis=1)
pprint(result)

type:<class 'numpy.ndarray'>
shape: (3, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3 10 11 12]
 [ 4  5  6 13 14 15]
 [ 7  8  9 16 17 18]]

np.insert(arr, obj, values, axis=None)¶

axis를 지정하지 않으며 1차원 배열로 변환
추가할 방향을 axis로 지정

# 데모 배열 생성
a = np.arange(1, 10).reshape(3, 3)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

#  a 배열을 일차원 배열로 변환하고 1번 index에 99 추가
np.insert(a, 1, 999)

array([  1, 999,   2,   3,   4,   5,   6,   7,   8,   9])

# a 배열의 axis 0 방향 1번 인덱스에 추가
# index가 1인 row에 999가 추가됨
np.insert(a, 1, 999, axis=0)

array([[  1,   2,   3],
       [999, 999, 999],
       [  4,   5,   6],
       [  7,   8,   9]])

# a 배열의 axis 1 방향 1번 인덱스에 추가
# index가 1인 column에 999가 추가됨
np.insert(a, 1, 999, axis=1)

array([[  1, 999,   2,   3],
       [  4, 999,   5,   6],
       [  7, 999,   8,   9]])

np.delete(arr, obj, axis=None)¶

axis를 지정하지 않으며 1차원 배열로 변환
삭제할 방향을 axis로 지정
delete 함수는 원본 배열을 변경하지 않으며 새로운 배열을 반환

# 데모 배열 생성
a = np.arange(1, 10).reshape(3, 3)
pprint(a)

type:<class 'numpy.ndarray'>
shape: (3, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

#  a 배열을 일차원 배열로 변환하고 1번 index 삭제
np.delete(a, 1)

array([1, 3, 4, 5, 6, 7, 8, 9])

# a 배열의 axis 0 방향 1번 인덱스인 행을 삭제한 배열을 생성하여 반환
np.delete(a, 1, axis=0)

array([[1, 2, 3],
       [7, 8, 9]])

# a 배열의 axis 1 방향 1번 인덱스인 열을 삭제한 배열을 생성하여 반환
np.delete(a, 1, axis=1)

array([[1, 3],
       [4, 6],
       [7, 9]])

a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

배열 결합¶

배열과 배열을 결합하는 np.concatenate, np.vstack, np.hstack 함수를 제공

배열 결합 concatenate((a1, a2, ...), axis=0)¶

# 데모 배열
a = np.arange(1, 7).reshape((2, 3))
pprint(a)
b = np.arange(7, 13).reshape((2, 3))
pprint(b)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]]
type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[ 7  8  9]
 [10 11 12]]

# axis=0 방향으로 두 배열 결합, axis 기본값=0
result = np.concatenate((a, b))
result

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

# axis=0 방향으로 두 배열 결합, 결과 동일
result = np.concatenate((a, b), axis=0)
result

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

# np.append(a, b, axis=0) 와 동일
np.append(a, b, axis=0)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

# axis=1 방향으로 두 배열 결합, 결과 동일
result = np.concatenate((a, b), axis=1)
result

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

# np.append(a, b, axis=0) 와 동일
np.append(a, b, axis=1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

수직 방향 배열 결합¶

np.vstack

np.vstack(tup)
- tup: 튜플
튜플로 설정된 여러 배열을 수직 방향으로 연결 (axis=0 방향)
np.concatenate(tup, axis=0)와 동일

# 데모 배열
a = np.arange(1, 7).reshape((2, 3))
pprint(a)
b = np.arange(7, 13).reshape((2, 3))
pprint(b)

type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[1 2 3]
 [4 5 6]]
type:<class 'numpy.ndarray'>
shape: (2, 3), dimension: 2, dtype:int32
Array's Data:
 [[ 7  8  9]
 [10 11 12]]

np.vstack((a, b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

# 4개 배열을 튜플로 설정
np.vstack((a, b, a, b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

수평 방향 배열 결합¶

np.hstack

np.hstack(tup)
- tup: 튜플
튜플로 설정된 여러 배열을 수평 방향으로 연결 (axis=1 방향)
np.concatenate(tup, axis=1)와 동일

np.hstack((a, b))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

np.hstack((a, b, a, b))

array([[ 1,  2,  3,  7,  8,  9,  1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12,  4,  5,  6, 10, 11, 12]])

배열 분리¶

NumPy는 배열을 수직, 수평으로 분할하는 함수를 제공합니다.

np.hsplit(): 지정한 배열을 수평(행) 방향으로 분할
np.vsplit(): 지정한 배열을 수직(열) 방향으로 분할

# 분할 대상 배열 생성
a = np.arange(1, 25).reshape((4, 6))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

# 수평으로 두 그룹으로 분할하는 함수
result = np.hsplit(a, 2)
result

[array([[ 1,  2,  3],
        [ 7,  8,  9],
        [13, 14, 15],
        [19, 20, 21]]), array([[ 4,  5,  6],
        [10, 11, 12],
        [16, 17, 18],
        [22, 23, 24]])]

result[0]

array([[ 1,  2,  3],
       [ 7,  8,  9],
       [13, 14, 15],
       [19, 20, 21]])

# 수평으로 두 그룹으로 분할하는 함수
result = np.hsplit(a, 3)
result

[array([[ 1,  2],
        [ 7,  8],
        [13, 14],
        [19, 20]]), array([[ 3,  4],
        [ 9, 10],
        [15, 16],
        [21, 22]]), array([[ 5,  6],
        [11, 12],
        [17, 18],
        [23, 24]])]

result[0]

array([[ 1,  2],
       [ 7,  8],
       [13, 14],
       [19, 20]])

# a.shape[1] -> 6
# a[:, :1], a[:, 1:3], a[:, 3:5], a[:, 5:]

rs = np.hsplit(a, [1,3,5])

rs[0]

array([[ 1],
       [ 7],
       [13],
       [19]])

rs[1]

array([[ 2,  3],
       [ 8,  9],
       [14, 15],
       [20, 21]])

rs[2]

array([[ 4,  5],
       [10, 11],
       [16, 17],
       [22, 23]])

rs[3]

array([[ 6],
       [12],
       [18],
       [24]])

배열 수직 분할¶

np.vsplit(ary, indices_or_sections)
배열을 수직 방향(행 방향)으로 분할하는 함수

# 분할 대상 배열 생성
a = np.arange(1, 25).reshape((4, 6))
pprint(a)

type:<class 'numpy.ndarray'>
shape: (4, 6), dimension: 2, dtype:int32
Array's Data:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

result=np.vsplit(a, 2)
result

[array([[ 1,  2,  3,  4,  5,  6],
        [ 7,  8,  9, 10, 11, 12]]), array([[13, 14, 15, 16, 17, 18],
        [19, 20, 21, 22, 23, 24]])]

np.array(result).shape

(2, 2, 6)

result=np.vsplit(a, 4)
result

[array([[1, 2, 3, 4, 5, 6]]),
 array([[ 7,  8,  9, 10, 11, 12]]),
 array([[13, 14, 15, 16, 17, 18]]),
 array([[19, 20, 21, 22, 23, 24]])]

np.array(result).shape

(4, 1, 6)

# row를 1, 2-3, 4번째 라인으로 구분
np.vsplit(a, [1, 3])

[array([[1, 2, 3, 4, 5, 6]]), array([[ 7,  8,  9, 10, 11, 12],
        [13, 14, 15, 16, 17, 18]]), array([[19, 20, 21, 22, 23, 24]])]