Skip to content

๐Ÿ‡ฐ๐Ÿ‡ท๋ถํ•œ, ํ†ต์ผ์— ๋Œ€ํ•œ ์ธ์‹ ๋ณ€ํ™” ๋ถ„์„๐Ÿ‡ฐ๐Ÿ‡ต Analysis on change of South Koreans' Perceptions of Unification and North Korea(DPRK)

Notifications You must be signed in to change notification settings

Sunghee2/Unification.NorthKorea-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

97 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๋ฌธ์žฌ์ธ ๋Œ€ํ†ต๋ น ์ทจ์ž„ ํ›„ ๊ตญ๋ฏผ๋“ค์˜ ๋ถํ•œ, ํ†ต์ผ์— ๋Œ€ํ•œ ์ธ์‹ ๋ณ€ํ™”

- Hadoop ๊ธฐ๋ฐ˜์˜ ๋น…๋ฐ์ดํ„ฐ NLP ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ

๋ชฉ์ฐจ

  1. ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜
  2. ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ฒฐ๊ณผ
  3. Todo List

์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜


๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ฒฐ๊ณผ

  • Visualization with zeppelin notbook (.gif)


  • ๊ฒฐ๊ณผ

    • '๋ถํ•œ', 'ํ†ต์ผ' ์–ธ๊ธ‰ ์ˆ˜ ์ฆ๊ฐ€

      • '๋ถํ•œ' ์–ธ๊ธ‰๋Ÿ‰ ๊ทธ๋ž˜ํ”„ : 2017๋…„์— ๋น„ํ•ด 2018๋…„ '๋ถํ•œ' ์–ธ๊ธ‰๋Ÿ‰ ์ฆ๊ฐ€

      • 'ํ†ต์ผ' ์–ธ๊ธ‰๋Ÿ‰ ๊ทธ๋ž˜ํ”„ : 2017๋…„์— ๋น„ํ•ด 2018๋…„ 'ํ†ต์ผ' ์–ธ๊ธ‰๋Ÿ‰ ์ฆ๊ฐ€

    • ๊ธ, ๋ถ€์ • ์ถ”์ด ๋ณ€ํ™”

      • '๋ถํ•œ' ๊ธ, ๋ถ€์ • ์ถ”์ด ๊ทธ๋ž˜ํ”„ : 2017๋…„์—๋Š” ์ „๋ฐ˜์ ์œผ๋กœ ๋ถ€์ •์ ์ด์—ˆ์ง€๋งŒ 2018๋…„์€ ๋ถ€์ •๋ณด๋‹ค ๊ธ์ • ๋น„์œจ์ด ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

      • 'ํ†ต์ผ' ๊ธ, ๋ถ€์ • ์ถ”์ด ๊ทธ๋ž˜ํ”„ : ์ „๋ฐ˜์ ์œผ๋กœ ๋ถ€์ •์ด ๊ธ์ •๋ณด๋‹ค ๋†’์ง€๋งŒ 2018๋…„์—๋Š” ๊ธ, ๋ถ€์ • ์ฐจ์ด๊ฐ€ ๊ฐ์†Œํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

    • '์ข…์ „'

      ๊ฐ€์žฅ ๋งŽ์ด ์–ธ๊ธ‰๋œ 15๊ฐœ์˜ ํ‚ค์›Œ๋“œ ๊ทธ๋ž˜ํ”„์—์„œ 2017๋…„์—๋Š” '์ข…์ „'์ด๋ž€ ๋‹จ์–ด๊ฐ€ ๋“ฑ์žฅํ•˜์ง€ ์•Š์•˜๋‹ค.

      2018๋…„ 3์›” 9์ผ ์ฒ˜์Œ ๋“ฑ์žฅํ•˜์—ฌ 4์›” 17, 18, 26, 27์ผ(1์ฐจ ๋‚จ๋ถ์ •์ƒํšŒ๋‹ด), 9์›” 19์ผ(3์ฐจ ์ •์ƒํšŒ๋‹ด)์— ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋œ๋‹ค.

      3์›” 9์ผ 179๋ฒˆ ์–ธ๊ธ‰ํ•˜์˜€๊ณ  ํŠนํžˆ 1์ฐจ ๋‚จ๋ถ์ •์ƒํšŒ๋‹ด(4์›” 27์ผ)์—๋Š” 1,224๋ฒˆ ์–ธ๊ธ‰์œผ๋กœ ์ •์ ์„ ์ฐ๊ฒŒ ๋œ๋‹ค

      • 3์›” 9์ผ - 179๋ฒˆ

      • 4์›” 27์ผ - 1,224๋ฒˆ

Todo List

2018-11-04

  • tweet scraper ์ฐพ๊ธฐ (twint, twitter-scraper)
  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘..

๐Ÿ“

ํŠธ์œ„ํ„ฐ api ์ด์šฉํ•˜๋Š” ๊ฒƒ์€(tweepy) 7์ผ ์ด๋‚ด ๋ฐ์ดํ„ฐ๋งŒ ๊ฐ€๋Šฅํ•˜๊ณ  ์ด์ „ ์ž๋ฃŒ๋ฅผ ๋ณด๋ ค๋ฉด ๋ˆ์„ ๋‚ด์•ผ๋จ -> ์›น์—์„œ ๊ธ์–ด๋ชจ์œผ์ž...

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ twint, twitter-scraper.. firefox ๋ฌดํ•œ์Šคํฌ๋กค ์ด์šฉํ•ด์„œ ์ง์ ‘โ€ฆโ€ฆ

twitter-scraper๋Š” 25ํŽ˜์ด์ง€ ์ •๋„๊นŒ์ง€๋งŒ ๋ณด์žฅ ๊ฐ€๋Šฅ(486ํŠธ์œ—) -> twint ์‚ฌ์šฉ

python2.* => $ python python3.* => $ py

๐Ÿ›

Command "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-pdut0psv/cchardet/setup.py';..-> twint install์ด ์•ˆ๋จ. ์ด๊ฒƒ์ €๊ฒƒ ํ•˜๋‹ค๊ฐ€ python ์š”๊ตฌ๋ฒ„์ „์ด 3.6์ด๋ผ ์œˆ๋„์šฐ๋กœ ์˜ฎ๊ฒจ์„œ 3.6.7๊น”์•˜๋”๋‹ˆ ํ•ด๊ฒฐ...

์‹คํ–‰ํ–ˆ๋”๋‹ˆ => ModuleNotFoundError: No module named 'aiohttp_socks' -> twint uninstallํ•˜๊ณ  pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint ์ด๋ ‡๊ฒŒ ์„ค์น˜ํ–ˆ๋”๋‹ˆ ํ•ด๊ฒฐ


์ด๋ ‡๊ฒŒ ์‰ฝ๊ฒŒ ํ•ด๋„ ๋˜๋Š” ๊ฑธ๊นŒโ€ฆ.ใ… โ€ฆ ๋ฐ”๋กœ ๋””๋น„์— ์ €์žฅํ•ด์•ผ๋˜๋‚˜..?


2018-11-06
  • twint ์—๋Ÿฌ ํ•ด๊ฒฐ
  • vscode ์—ฐ๊ฒฐ
  • ๋ฐ์ดํ„ฐ ํŒŒ์ผ read
  • nifi ์„ค์น˜ โ€”> ํ–ˆ๋Š”๋ฐ hortonworks๋กœ ๋‹ค์‹œ ๊น”๊ธฐ

๐Ÿ“

twint ์ •์ง€๊ณ„์ •ํŠธ์œ—๋‚˜๋ฉด ์—๋Ÿฌ๋‚จ -> output์— ์—๋Ÿฌ์ฒ˜๋ฆฌํ•ด์ฃผ๊ธฐ(๋‚˜์ค‘์— ์˜ฌ๋ ค์ค˜์•ผ์ง€..)


vm๊ณผ vscode ์—ฐ๊ฒฐ

  1. vscode์—์„œ extentsions 'Remote VSCode' ์„ค์น˜

  2. rmate ์„ค์น˜

    wget https://raw.githubusercontent.com/sclukey/rmate-python/master/bin/rmate
    chmod +x ./rmate
    sudo mv ./rmate /usr/local/bin/rmate 
    
  3. $ ssh -R 52698:localhost:52698 maria_dev@localhost -p 2222

  4. $ rmate project/tw.py

๐Ÿ›

hdfs์— testํŒŒ์ผ ์˜ฌ๋ ธ๋Š”๋ฐ ํ•œ๊ธ€ ๋‹ค ๊นจ์ง -> $ echo $LANG $ locale ๋ณด๋ฉด ์ œ๋Œ€๋กœ(ko_KR.UTF-8) ๋˜์–ด์žˆ๋Š”๋ฐใ… 

df๋กœ ๋งŒ๋“ค๋ฉด ์Šคํ‚ค๋งˆ๊ฐ€ ์ด์ƒํ•˜๊ฒŒ c1, c2โ€ฆ ์ด๋ ‡๊ฒŒ ๋จโ€ฆ.. ---> csv loadํ•˜๋ฉด์„œ header="true" ๋นผ๋จน์Œ

UnicodeEncodeError: 'ascii' codec can't encode characters in position 1551-1552: ordinal not in range(128) ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ ์ธ์ฝ”๋”ฉ ์„ค์ •ํ•˜๋Š”๋ฐ๋„ ์™œ์ด๋Ÿฌ์ง€...


18-11-10
  • ๋ฐ์ดํ„ฐ ํŒŒ์ผ read -> ์ธ์ฝ”๋”ฉ

๐Ÿ“

nifi ์‹คํ–‰ : ./bin/nifi.sh start -> ํฌ๊ธฐ^^;;

๐Ÿ›

์—ฌ์ „ํžˆ ํ•œ๊ธ€ ์ธ์ฝ”๋”ฉโ€ฆ hadoop fs -text data/tweet_test.csv ํ•˜๋ฉด ์ž˜๋ณด์ž„โ€ฆ ํ…Œ์ŠคํŠธ์šฉ ๋งŒ๋“ค์–ด๋ณด์•˜๋Š”๋ฐ ์—ฌ์ „ํžˆ ๋˜‘๊ฐ™... df.show()ํ•ด์„œ ์•ˆ๋‚˜์˜ค๋˜ ๊ฒƒ์ด.. print(df) ํ•˜๋‹ˆ๊น ๋‚˜์˜ด...^^...

print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())

์ด์ œ ์ถœ๋ ฅ์€ ๋˜๋Š”๋ฐ.. u"\ub098\ub3c4 \uc5ec\uae30\uc11c \uc774 \uc9c0\ub784\ub4e4 \ud558\uace0\uc788\uc9c0\ub9cc... \ubaa8\ub450 ์ด๋ ‡๊ฒŒ ์ถœ๋ ฅ๋จ..


18-11-12
  • nifi ์„ค์น˜
  • virtual box git..... -> ์ด๋ฏธ ์žˆ์Œ..ใ…Ž
  • date time ํ•ฉ์น˜๊ธฐ
  • ์‹œ๊ฐ„ ์กฐ์ •
  • ํ•„์š”์—†๋Š” ์—ด ์‚ญ์ œ
  • konlp ๋ช…์‚ฌ ๋‚˜๋ˆ„๊ธฐ

๐Ÿ“

๋‹ค์‹œ nifi ๋„์ „ํ•ด๋ณด์ž..! hdf ์„ค์น˜

๐Ÿ›

mount: unknown filesystem type 'vboxsf' -> VBoxGuestAdditions ์„ค์น˜(๋ฒ„์ „ ๋งž๊ฒŒ)

์‹คํ–‰x -> sudo yum install gcc kernel-devel make bzip2 -> VBoxLinuxAdditions.run ์‹คํ–‰

Please install the Linux kernel "header" files matching the current kernel

mount: only root can use "--types" option


sys:1: DtypeWarning: Columns (0,1,2,6) have mixed types. Specify dtype option on import or set low_memory=False. -> read_csv์—์„œ dtype ์„ค์ •

date ๋ณ€๊ฒฝํ•˜๋Š”๋ฐ ์•ˆ๋จ -> date์— ์ด์ƒํ•œ ์ฃผ์†Œ๊ฐ€ ๋“ค์–ด๊ฐ€ ์žˆ์Œ.. errors='coerce' ์ถ”๊ฐ€

AttributeError: type object 'datetime.datetime' has no attribute 'timedelta' -> from datetime import datetime ์„ import datetime ์œผ๋กœ ๋ณ€๊ฒฝ

konlp ์„ค์น˜ ์ค‘ error: command 'gcc' failed with exit status 1 -> xcode-select --install

๋งŒ์•ฝ xcode-select: command not found ๋ผ๊ณ  ๋œจ๋ฉด ์ง์ ‘ apple developers์—์„œ command line tools๋‹ค์šด

RuntimeError: No matching overloads found for simplePos09 in find -> string์œผ๋กœ ํƒ€์ž… ๋ฐ”๊ฟ”์คŒ


18-11-13
  • nlp์ž๋ฅธ ๊ฒƒ df ์ €์žฅ
  • nifi์„ค์น˜

๐Ÿ“

ํ•œ๋‚˜๋ˆ”์ด ๋‹ค๋ฅธ ๊ฒƒ๋ณด๋‹ค ์™ธ๋ž˜์–ด, ์˜์–ด, ํ•œ์ž ์ž˜ ์žก์•„๋ƒ„.

๐Ÿ›

ValueError: Length of values does not match length of index -> ํ•œ๋ฒˆ์— ์ „์ฒด๋กœ ๋‚˜์™€์„œ ๋”ฐ๋กœ ๋”ฐ๋กœ

๋ฆฌ์ŠคํŠธ df์— ์ €์žฅ์ด ์•ˆ๋จ...

ImportError: No module named ambari_commons.exceptions

ambari ์ด์ƒํ•ด์ ธ์„œ ๊ฐ€์ƒ๋จธ์‹  ์ƒˆ๋กœ ํ–ˆ๋”๋‹ˆ unable to sign in. invalid username/password combination. admin๊ณ„์ • ๋กœ๊ทธ์ธ ์•ˆ๋จ -> # ambari-admin-password-reset

์—„์ฒญ๋‚œ ์‚ฝ์งˆ ๋์—.. # ambari-server setup

# ambari-server install-mpack --mpack=http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.2.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.2.0.0-520.tar.gz --verbose

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.2.0/release-notes/content/hdf_repository_locations.html

ambari์—์„œ nifi ์ถ”๊ฐ€ํ•˜๋Š”๋ฐ ์„ค์น˜ ์•ˆ๋จ -> ๋ฒ„์ „ ์•ˆ๋งž์•˜์Œใ…  ํ˜„ ambari ๋ฒ„์ „ 2.6.2 ์ตœ์†Œ 2.7 ์ด์–ด์•ผ๋จ.

https://supportmatrix.hortonworks.com/

https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.0.0/bk_ambari-upgrade/content/upgrade_ambari.html < ๊ด€๋ จ ๋ฌธ์„œ. ๋‚ด์ผ ํ•ด๋ณด์ž


18-11-14
  • ambari update

๐Ÿ›

ImportError: No module named ambari_commons.exceptions -> ์•ž์— sudo

nifi ui๊ฐ€ ์‹คํ–‰ ์•ˆ๋จ..9090ํฌํŠธ -> /private/etc/hosts์—์„œ 127.0.0.1 localhost sandbox.hortonworks.com sandbox-hdp.hortonworks.com sandbox-hdf.hortonworks.com ์ถ”๊ฐ€! -> ์•ˆ๋จใ… 

Permission denied: 'conf/bootstrap.conf' -> ๋ฃจํŠธ๊ณ„์ •์œผ๋กœ

๊ฐ€์ • 1. ํฌํŠธ๋ฒˆํ˜ธ

  1. ๋กœ์ปฌํ˜ธ์ŠคํŠธ
  2. admin ๊ถŒํ•œ

18-11-15

hive ์ด์ƒํ•ด์ง„๊ฒŒ ๋ฒ„์ „ ๋ฌธ์ œ ์ธ ๊ฒƒ ๊ฐ™๋‹ค.. HDP๋ฒ„์ „ ์ƒ๊ฐ ๋ชปํ•จใ… ๋งˆ์ง€๋ง‰์œผ๋กœ ๋‹ค์‹œ ์‚ญ์ œ์„ค์น˜ํ•ด๋ณด๊ธฐ..^^;;;

๋ฒ„์ „ ๋งž๋Š”๋ฐ๋„ ์•ˆ๋จ.. -> hive nifi ์ถฉ๋Œ์ธ๊ฐ€ ใ… ใ… 

https://supportmatrix.hortonworks.com/


18-11-16
  • ๋ช…์‚ฌ๋กœ ๋‚˜๋ˆˆ ๊ฒƒ str | ๋กœ ๋‚˜๋ˆˆ ๊ฒƒ์œผ๋กœ ๋ณ€ํ™˜
  • ์ค‘๋ณต ํ–‰ ์ œ๊ฑฐ
  • ์ด์ƒ์น˜ ์ฒ˜๋ฆฌ
  • hashtag ๋ถ„๋ฆฌ
  • mention ๋ถ„๋ฆฌ

๐Ÿ“

iterrows()๋ณด๋‹ค itertuples()์ด ํ›จ์”ฌ ๋น ๋ฆ„

df ๊ฐ’ ๊ทธ๋ƒฅ updateํ•˜๋ฉด ์—๋Ÿฌ๋‚จ -> index๋กœ at[] ์ด์šฉํ•˜์—ฌ ๊ฐ’ ๋ณ€๊ฒฝํ•˜๊ธฐ

๋„๋ฐฐํ•˜๋Š” ํŠธ์œ—๋“ค ์—†์• ๋ฒ„๋ฆด๊นŒ?...

๐Ÿ›

ModuleNotFoundError: No module named 'NumPy' -> numpy ์†Œ๋ฌธ์ž๋กœ ์“ฐ๋‹ˆ ํ•ด๊ฒฐ


18-11-20
  • scraper ๋‚ ์งœ ์œ ๋™์ ์ด๋„๋ก ๊ณ ์น˜๊ธฐ + ํŒŒ์ผ ์‚ญ์ œ
  • oozie python shell

๐Ÿ“

python scraper oozie์— ์˜ฌ๋ฆฌ๊ธฐ

hdfs fs -put {vm} {hdfs}

๐Ÿ›

ํ„ฐ๋ฏธ๋„ ์‹œ๊ฐ„ ์ด์ƒํ•˜๊ฒŒ ๋‚˜์˜ด -> sudo date {month}{day}{hour}{minute}{year} ex) 2018๋…„ 11์›” 20์ผ 18์‹œ 24๋ถ„ -> sudo date 1120182418

twitter๊ฐ€ ๊ณ„์ •๋งˆ๋‹ค ์‹œ๊ฐ„์„ค์ • ์ด์ƒํ•˜๊ฒŒ ๋˜์–ด์žˆ์Œ -> twitter ๋กœ๊ทธ์ธ -> ์„ค์ • ์—์„œ ๊ณ ์ณ์ฃผ๋ฉด ๋จ. GMT+9(csv๋Š” utc์‹œ๊ฐ„)

/usr/bin/env: python3: No such file or directory -> oozie์— python3 ์„ค์น˜ํ•ด์•ผ๋˜๋Š”๋“ฏ... ์ด๋ฏธ ์„ค์น˜๋˜์–ด์žˆ์Œ.


18-11-21

๐Ÿ“

๊ธ๋ถ€์ • ํ•™์Šตํ•  ๋ฐ์ดํ„ฐ http://word.snu.ac.kr/kosac/lexicon.php ์—์„œ ์–ป์Œ. > http://word.snu.ac.kr/kosac/pub/PACLIC26.pdf

https://docs.google.com/spreadsheets/d/1OGAjUvalBuX-oZvZ_-9tEfYD2gQe7hTGsgUpiiBSXI8/edit#gid=0 -> KoNLPy tag chart -> ์œ„ ํ•™์Šต๋ฐ์ดํ„ฐ๋Š” Komoran๊ณผ ๊ฐ€์žฅ ๋น„์Šทํ•œ๋“ฏ

๐Ÿ›

ImportError: No module named numpy -> sudo pip install numpy sudo pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org numpy-> RegressionEvaluator import์—์„œ ์œ„ ์—๋Ÿฌ ๋‚จ. ์•„์ง ํ•ด๊ฒฐx


18-11-22
  • ํ˜•ํƒœ์†Œ ๋‚˜๋ˆˆ ๊ฒƒ ;/๋กœ ๋ฐ”๊ฟˆ
  • ํ•™์Šต๋ฐ์ดํ„ฐ ๋งŒ๋“ค ๊ฒƒ ์ฒ˜๋ฆฌ : ํ˜•ํƒœ์†Œ ๋ถ„๋ฆฌ, ๊ฐ์ • ์‚ฌ์ „ ์ฝ์–ด์™€์„œ ๋น„๊ต

๋‹ค ์ŠคํŒŒํฌ๋กœ ์ „์ฒ˜๋ฆฌ ํ•ด์•ผํ•  ๋“ฏ... -> ์‹œ๊ฐ„์ด ์žˆ์„๊นŒ???ใ…  ๋‹ค๋ฅธ ๊ฒƒ๋ถ€ํ„ฐํ•˜๊ณ  ์‹œ๊ฐ„ ๋‚จ์œผ๋ฉด ๋ฐ”๊พธ์ž..

์Šคํฌ๋ž˜ํผ.. ๊ตณ์ด hdfs์—์„œ ํŒŒ์ผ ์‚ญ์ œํ•  ํ•„์š”๊ฐ€ ์žˆ์„๊นŒ? -> ์ผ๋‹จ ์ฃผ์„์ฒ˜๋ฆฌ

๊ฐ์ • ๋ถ„์„ -> ๊ฐ์ • ์‚ฌ์ „ ๋ฐ–์— ์—†์Œ.. -> ์ด๊ฒƒ ์ด์šฉํ•ด์„œ ๋‚ด๊ฐ€ ํ•™์Šต๋ฐ์ดํ„ฐ ๋งŒ๋“ค์ž

๐Ÿ›

[UnicodeEncodeError: 'ascii' codec can't encode character](https://stackoverflow.com/questions/39662384/pyspark-unicodeencodeerror-ascii-codec-cant-encode-character) -> spark์—์„œ showํ•  ๋•Œ๋งˆ๋‹ค ๋‚˜๋Š” ์—๋Ÿฌ => spark runํ•˜๊ธฐ ์ „์— $ export PYTHONIOENCODING=utf8 ์ž…๋ ฅํ•˜๋ฉด ๋จ!

oozie... running 8์‹œ๊ฐ„์งธโ€ฆโ€ฆ

๋‹ค์‹œ ์„ค์น˜ํ–ˆ๋”๋‹ˆ ์นด์‚ฐ๋“œ๋ผ ์•ˆ๋จ... ๋งˆ์ง€๋ง‰์œผ๋กœ ๋˜ ์‚ญ์ œํ•ด๋ณด์ž..^... -> ์—ญ์‹œ ์‚ญ์ œ๋Š” ์ง„๋ฆฌ๋‹ค


18-11-24

๐Ÿ“

python ์‹คํ–‰ํ•  ๋•Œ ๋งจ ์œ„ python argument์— ํŒŒ์ผ ์ด๋ฆ„!!!! ๊ทธ๋ฆฌ๊ณ  file์— ํ•ด๋‹น ํŒŒ์ผ ๋„ฃ๊ธฐ!!! ๋ฉฐ์น ๋™์•ˆ์‚ฝ์งˆ์ด์—‡๋‚˜ใ… 

python3.6์œผ๋กœ ์ž…๋ ฅํ•˜๋ฉด 3๋ฒ„์ „๋„ ๋จ!

๐Ÿ›

oozie ์™œ echo๋„ ์•ˆ๋˜๋‹ˆ,,,,,,

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain]


18-11-25
  • ๊ฐ์ •๋ถ„์„์‹œ ์ค‘๋ณต์ ์ˆ˜๋˜๋Š” ๊ฒƒ ํ•ด๊ฒฐ
  • spark noun ๋‚˜๋ˆ ์„œ top 20 ๋ฝ‘๊ธฐ
  • python์œผ๋กœ ์ „์ฒ˜๋ฆฌํ•œ ๊ฒƒ => ์ŠคํŒŒํฌ๋กœ ๋ฐ”๊พธ๊ธฐ
  • twitter๋กœ tokenizer ๋ฐ”๊พธ๊ธฐ
  • csv ์ €์žฅ
  • hbase ์ €์žฅ

๐Ÿ“

๊ฐ์ •์‚ฌ์ „ ์ž๋ฃŒ๊ฐ€ pos(์•ฝ ๋งŒ์ด์ฒœ๊ฐœ), neg(์•ฝ ์‚ฌ์ฒœ๊ฐœ)๋ผ์„œ ํ•ญ์ƒ ๊ธ์ •์œผ๋กœ ๋‚˜์˜ค๋Š”๋ฐ ์–ด๋–กํ•˜๋ƒโ€ฆโ€ฆ... ๋„ˆ๋ฌด ์••๋„์ ์ด๋ผ์„œ %๋„ ์•ˆ๋จ..

์ŠคํŒŒํฌ๋Š” tokenizer๊ฐ€ ์—†๊ณ (์žˆ๋Š”์ค„์•Œ์•˜๋Š”๋ฐ pyspark๊ฐ€์•„๋‹ˆ์˜€์Œ..)... python์€ oozie์—์„œ ํ•˜์ž๋‹ˆ... gcc๊ฐ€ ์•ˆ๋˜๊ณ ......

๐Ÿ›

xcode-select: command not found ์—ฌ๊ธฐ์„œ๋Š” xcode CLT ๋ชป์„ค์น˜ํ•˜๋Š”๋ฐ ๊ทธ๋Ÿผ ๋กœ์ปฌ์—์„œ ํ•ด์•ผ๋˜๋Š”๊ฑด๊ฐ€....ใ…  -> yum์œผ๋กœ gcc ์„ค์น˜ ๊ฐ€๋Šฅ

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 371: ordinal not in range(128) -> with open("./data/result.csv", "r", encoding="utf-8")


18-11-26
  • date ์ถ”์ถœ๋˜๋Š”์ง€ ํ™•์ธํ•  ๊ฒƒ!
  • python -> vm์—์„œ
  • hbase ์ €์žฅํ•  column ์ง€์ •(๋‚ ์งœ๋ฅผ ๋งจ ์•ž์— ๋ฆฌํŠธ์œ—์ˆ˜, word, @, ํ•ด์‹œํƒœ๊ทธ...)
  • python module๋ฌถ์–ด์„œ oozie

๐Ÿ›

SyntaxError: Non-ASCII character '\xec' in file variable.py on line 1, -> python2๋ฒ„์ „์€ ๋งจ ์œ„์— ์ธ์ฝ”๋”ฉ..

python2๋ฒ„์ „์—์„œ๋Š” gcc jpype ์ž˜ ๋จ

UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 0: ordinal not in range(128) ->

#-*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

http://www.adaltas.com/en/2018/03/06/execute-python-in-an-oozie-workflow/ -> oozie python module

ImportError: Missing required dependencies ['numpy'] pandas.. ๋™์ import๋ฅผ ๋ฐ”๊ฟ”์ฃผ๋‹ˆ ํ•ด๊ฒฐ

ImportError: cannot import name 'multiarray'


18-11-27
  • ์ŠคํŒŒํฌ ์ธ์ฝ”๋”ฉ ๋˜๋Š” ๋‹ค๋ฅธ ๊ฒƒ์œผ๋กœ ๋ฐ”๊ฟ€ ๊ฒƒ..

๐Ÿ“

hive๋กœ ๋ฐ”๊ฟ€๊นŒ...ใ…Ž

๐Ÿ›

hive๋กœ ํ–ˆ๋Š”๋ฐ๋„ ??๋œธ.. ์ธ์ฝ”๋”ฉํ•ด๋„... -> ์ŠคํŒŒํฌ sql์€ ์“ฐ์ง€๋ง๊ณ ..


18-11-28

๐Ÿ“

๋‹จ์–ด ๋นˆ๋„์ˆ˜ ๋ถ„์„ํ–ˆ๋Š”๋ฐ ์“ธ๋ชจ์—†๋Š” ๋ฐ์ดํ„ฐ ๋„ˆ๋ฌด ๋งŽ์Œ -> ๋ช…์‚ฌ ์ค‘์—์„œ๋„ ๋ณดํ†ต๋ช…์‚ฌ, ๊ณ ์œ ๋ช…์‚ฌ, ์ˆ˜์‚ฌ ๋งŒ ์ถ”์ถœ(์ˆ˜์‚ฌ๋„ ๋บ„๊นŒ..) -> ๊ทธ๋ž˜๋„ ์“ฐ๋ ˆ๊ธฐ๊ฐ’๋งŽ์Œใ… ใ… ใ… ใ…  ํ•˜์ง€๋งŒ 22๊ฐ€ ์ตœ๋Œ€์น˜

์˜์–ด ์ถ”์ถœํ•˜๋ ค๋ฉด nlp ์จ์•ผํ•˜๋Š” ๊ฑฐ๋‹ˆ..? -> ์™ธ๊ตญ์–ด ํƒœ๊ทธ F์˜€์Œ... ํ•˜์ง€๋งŒ..

์™ธ๋ถ€๋งํฌ ์‚ญ์ œ

๋ฌธ์žฅ๋ถ€ํ˜ธ ์ถ”์ถœ

๋งž์ถค๋ฒ•, ๋„์–ด์“ฐ๊ธฐ๊ฐ€ ์ž˜์•ˆ๋˜์–ด์žˆ์–ด์„œ ๊ฐ’์ด ์ž˜ ์•ˆ๋‚˜์˜ค๋Š” ๊ฒƒ ๊ฐ™๋‹ค -> ํ…Œ์ŠคํŠธํ•ด๋ณด๋‹ˆ ์ž˜ ๋˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์—†์Œใ… ใ… ํ‘

๐Ÿ›

๋งž์ถค๋ฒ• ๊ฒ€์‚ฌํ•˜๋Š”๋ฐ ValueError: No JSON object could be decoded

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋งจ๋‚  ๊ณ ์ณ์“ฐ๋Š”๋“ฏ.. py_hanspell์—์„œ baseurl, req import ๋ฐ”๊ฟ”์•ผ๋จ


18-11-29

๐Ÿ“

๋ถˆ์šฉ์–ด ๋ฐ์ดํ„ฐ

https://github.com/6/stopwords-json

๐Ÿ›

xml.etree.ElementTree.ParseError: not well-formed (invalid token) in Python -> string์— &์ด ์žˆ์–ด์„œ ๋ฐœ์ƒํ•œ ์˜ค๋ฅ˜. ๊ตฌ๋‘์ , ํŠน๋ฌธ ๋‹ค ์ œ๊ฑฐํ•ด์•ผ๊ฒ ์Œ.

๋ถˆ์šฉ์–ดํ™•์ธํ•˜๋‹ˆ๊น..๋ญ”๊ฐ€ ์ด์ƒํ•ด์ง


18-11-30 & 18-12-01
  • sentiment ํ˜•ํƒœ์†Œ ๋ณ„๋กœ ๋ถ„๋ฆฌ
  • sentiment dict๋กœ ๋งŒ๋“ค๊ธฐ(ํ˜•ํƒœ์†Œ๋ณ„๋กœ?)
  • ๋‹ค์‹œ ์šฉ์–ธ๋„ ๋„ฃ๊ธฐ
  • date๋ณ„ ๊ฐ€์žฅ ๋งŽ์€ 15๊ฐœ ๋‹จ์–ด ์ถ”์ถœ

๋ถˆ์šฉ์–ด๋ฅผ konlpy ์ „์— ํ•˜๋‹ˆ๊น ๋ฌธ์žฅ์ด ์ด์ƒํ•ด์ง -> ์ž๋ฅด๊ณ  ๋ถˆ์šฉ์–ด ์‚ญ์ œํ•˜์ž

๋ถ„์„ : date count null ์‚ญ์ œ

๐Ÿ›

ํŠธ์œ„ํ„ฐ์—์„œ ๊ณท ์ด๋Ÿฐ ๋‹จ์–ด๋ฅผ ์ธ์ฝ”๋”ฉ ๋ชปํ•จ.. -> str(hangul_text).decode('utf-8', errors="replace")

https://konlpy-ko.readthedocs.io/ko/v0.4.3/examples/wordcloud/

http://word.snu.ac.kr/kosac/lexicon.php


18-12-02
  • oozie scraper
  • oozie local to hdfs
  • sentiment ํ•จ์ˆ˜ (์•„์ง ํ™•์ธ ๋ชปํ•จ)

dict keyerror๋‚˜๋Š” ๊ฒŒ ํƒ€์ž…์ด ์•ˆ๋งž์•„์„œ ๊ทธ๋Ÿฐ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํ•ด๋‹น ํ‚ค๊ฐ€ ์—†์–ด์„œ์˜€๋‹ค..

hive

oozie๋Š” ๋ฃจํŠธ๊ณ„์ •์—์„œ!

๐Ÿ›

sudo: no tty present and no askpass program specified -> oozie์—์„œ sudoํ–ˆ์„ ๋•Œ ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์•Œ์ง€ ๋ชปํ•ด์„œ ๋‚˜์˜ค๋Š” ์—๋Ÿฌ

org.apache.oozie.action.hadoop.launcherexception: output data exceeds its limit [2048] -> oozie์—์„œ ์ถœ๋ ฅ์ด ๋„ˆ๋ฌด ๋งŽ์•„์„œ ์ƒ๊ธฐ๋Š” ์—๋Ÿฌ. capture output์„ ๋„๊ณ  ์‹คํ–‰ํ•˜๋ฉด ๋จ

oozie์—์„œ ํด๋” permission denied ๋จ..-> chmod ๋ณ€๊ฒฝ


18-12-03
  • oozie data_preprocessing
  • data_preprocessing encoding ํ™•์ธ
  • get_sentiment ํ™•์ธ
  • ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ ๋Œ๋ฆฌ๊ธฐ

๐Ÿ’• -> ์ด๊ฑฐ ์™œ ์•ˆ์‚ฌ๋ผ์ง€๋ƒ ์ง„์งœ..... U+1F495 ๋‹ค๋ฅธ๊ฑฐ ๋‹ค ์‚ฌ๋ผ์ง€๋Š”๋ฐ ์™œ ํ˜ผ์ž... ๊ทธ๋งŒํ•ด...์ œ๋ฐœ....

ํ•œ๊ธ€ ์œ ๋‹ˆ์ฝ”๋“œ๋กœ reg ๋ฐ”๊ฟ”์„œ ํ•˜๋‹ˆ๊น ์ž˜๋จ ใ…Žใ…Ž์ง„์ž‘....์ง„์งœ ์ค‘์š”ํ•˜๋‹ค ์žŠ์ง€๋ง์•„๋ผ.. ์œ ๋‹ˆ์ฝ”๋“œ...


18-12-04
  • get_sentiment ํ™•์ธ
  • ์šฉ์–ธ์— -๋‹ค ๋ถ™์ด๊ธฐ
  • spark ๊ฒฐ๊ณผ -> mysql

๐Ÿ›

sre_constants.error: bad character range emoji unicode

python2 ๋ฒ„์ „์—์„œ cmd ํ•œ๊ธ€ ์ถœ๋ ฅ ์•ˆ๋˜๋ฉด str(text).decode('utf-8s')

์šฉ์–ธ์ผ ๊ฒฝ์šฐ -๋‹ค ๋ถ™์ด๊ธฐ

mysql ํ•œ๊ธ€ ๋ฌผ์Œํ‘œ => $ cd /etc/my.cnf ์ถ”๊ฐ€

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8


[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
ALTER DATABASE [DB๋ช…] DEFAULT CHARACTER SET utf8;
show variables like โ€˜c%โ€™

spark-submit --packages mysql:mysql-connector-java:5.1.39 [ํŒŒ์ผ ์ด๋ฆ„]


18-12-05

๐Ÿ›

mysql ์œ„์ฒ˜๋Ÿผ ํ•ด๋„ ์•ˆ๋จ -> ๋ณด๋‚ผ ๋•Œ url์„ jdbc:mysql://localhost/[db_name]?useUnicode=true&characterEncoding=utf-8 -> ํ•ด๊ฒฐ

java.lang.ClassNotFoundException: om.mysql.jdbc.Driver -> ์ œํ”Œ๋ฆฐ์—

๊ทธ๋ž˜ํ”„..pos ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š”๊ฒŒ ์—†์Œ..

pyspark ์—์„œ python 2๋ฒ„์ „


18-12-06
  • zeppelin helium ์„ค์น˜
  • centos ์‹œ๊ฐ„ ๋™๊ธฐํ™”
  • percent ๊ตฌํ•˜๊ธฐ

bubble... ์ œํ”Œ๋ฆฐ..

์ฆ‰์‹œ ๋™๊ธฐํ™” ๋ช…๋ น์–ด $ chronyc -a makestep

positive, negative, neutral, complex

๐Ÿ›

RROR: Exception DBusException: org.freedesktop.DBus.Error.AccessDenied ->

$ systemctl restart dbus $ systemctl restart firewalld


18-12-07
  • scraper 2๊ฐœ ํŒŒ์ผ ์–ป๋„๋ก
  • multiprocessing
  • ์ „์ฒ˜๋ฆฌ 2๊ฐœ
  • import hbase
  • zeppelin mysql ์—ฐ๋™

๐Ÿ“

hbase ์—ฌ๋Ÿฌ ๊ฐ’์„ ์ €์žฅํ•˜๋ ค๋ฉด versions ์ง€์ •ํ•ด์ฃผ๋ฉด ๋จ! -> starbase์—๋Š” ์—†์Œ

alter "test", NAME => "tweets", VERSIONS => 1000000 -> ์˜ฌ๋ฐ”๋ฅธ ๋ฐฉ๋ฒ• ์•„๋‹˜..

๐Ÿ›

Gcc error: gcc: error trying to exec 'cc1': execvp: No such file or directory -> $ sudo yum install gcc-c++

pandas read_jsonValueError: Expected object or value -> ์ƒ๋Œ€ ๊ฒฝ๋กœ๊ฐ€ ์ž˜๋ชป๋œ ๊ฒƒ

hbase์•ˆ๋˜์—ˆ๋˜ ๊ฒƒ์€ create(column family)


18-12-08~09
  • hbase ์—ฐ๋™
  • hbase data insert
  • sentiment_analysis ์ •๋ฆฌ
  • zeppelin hbase ์—ฐ๋™
  • zeppelin ๊ทธ๋ž˜ํ”„ ๋งŒ๋“ค๊ธฐ(๋‚ ์งœ)
  • '๋ถํ•œ' ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ->
  • oozie

๐Ÿ›

ntp ๋™๊ธฐํ™” systemctl enable ntpd ์ด๊ฑฐ ํ•˜๋‹ˆ๊น ์ œ๋Œ€๋กœ ์ž‘๋™

ํ•œ๊ตญ์‹œ๊ฐ„ timedatectl set-timezone Asia/Seoul

Submitting job to Oozie failed. Please check your definition/configuration. org.apache.oozie.ambari.view.exception.WfmException: -> sanbox domain name์— ๋Œ€ํ•œ host file ๋ณ€๊ฒฝ

18-12-10

๐Ÿ›

sudo must be owned by uid 0 and have the setuid bit set -> chown root:root /usr/bin/sudo && chmod 4755 /usr/bin/sudo

๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

https://zzsza.github.io/development/2018/08/24/data-visualization-in-python/#

https://www.dremio.com/trump-twitter-sentiment-analysis/ > ํŠธ๋Ÿผํ”„

http://www.zinicap.kr/archives/2433 ๋‚˜์ค‘์— ์ด๋Ÿฐ์‹์œผ๋กœ ์ˆ˜์ง‘ํ•œ ์ˆ˜ ์•„์ด๋”” ์•Œ๋ ค์ฃผ๋ฉด ๋  ๋“ฏ

https://github.com/Ahneunjeong/bigdata-foodelivery/blob/master/๋ฐฐ๋‹ฌ๋ถ„์„๋ฐœํ‘œ์ž๋ฃŒ.pdf

http://wiki.gurubee.net/pages/viewpage.action?pageId=28117507

About

๐Ÿ‡ฐ๐Ÿ‡ท๋ถํ•œ, ํ†ต์ผ์— ๋Œ€ํ•œ ์ธ์‹ ๋ณ€ํ™” ๋ถ„์„๐Ÿ‡ฐ๐Ÿ‡ต Analysis on change of South Koreans' Perceptions of Unification and North Korea(DPRK)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages