はてなダイアリの記事の数を数えたい！ beautifulsoupを使えるまで(；´Д｀)

カウントしたいんだ！！

ダイアリの記事を数える需要はほぼないだろうが進めるよ～w

pip？　やらでインストール！

なんとなくそのうち弄る事になると思っていたBeautifulsoup。いざ尋常に勝負！！

では先ず、beautifulsoupをインストールする前に、pipというパッケージ管理システムをインストールします。
そしてpipコマンドを使って、beautifulsoupをインストールします。
HugeDomains.com - Shop for over 300,000 Premium Domains

お、おう。（よくわからない）

f:id:elve:20180321084242p:plain

pip is already installed if you are using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org
Installation — pip 20.3.dev0 documentation

なんか、もうすでにインストールされてっから、的な事が？？？

ここにある「get-pip.py」を右クリックで保存。保存先はc:\python
コマンドプロンプトで念のため「cd c:\python」してから

python get-pip.py
Installation — pip 20.3.dev0 documentation

f:id:elve:20180321084326p:plain
書いてある通りに進むと安心するwww

続いて、コマンドプロンプトから、以下のコマンドを実行します。
> pip install beautifulsoup4
HugeDomains.com - Shop for over 300,000 Premium Domains

f:id:elve:20180321084410p:plain

よしじゃぁ・・・うぎゃぁ

qiita.com

言語:Python 2.7.12
Python Webスクレイピング実践入門 - Qiita

(；´Д｀)＜ウチはpython3だからかしらんがうまく動かず

Python3で、urllibとBeautifulSoupを使ってWebページを読み込む方法
Python3で、urllibとBeautifulSoupを使ってWebページを読み込む - minus9d's diary

・・・なぜかsoupがいつもからなんだぜ(；´Д｀)

qiita.com

なんか基本のsoupの取り方でsoupが取れない(´・ω・｀)

from bs4 import BeautifulSoup
import urllib.request
url = 'http://b/hatena.ne.jp/elve'
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
html = response.read()
soup = BeautifulSoup(html, "lxml")
#ここから先にすすまない
title_tag = soup.title
title = title_tag.string
print(title_tag)
print(title)

soupが、soupが欲しい・・・。

追記

url間違ってた(´∀｀*)ﾎﾟｯ

from bs4 import BeautifulSoup
import urllib.request
url = 'http://d.hatena.ne.jp/elve'
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
html = response.read()
soup = BeautifulSoup(html, "lxml")

title_tag = soup.title
title = title_tag.string
print(title_tag)
print(title)

で取れたー。

結局

qiita.com
ｷﾀ――(ﾟ∀ﾟ)――!!

requests: HTTP ライブラリ
python3でwebスクレイピング(Beautiful Soup) - Qiita

ってのが時間かかるけどうまくできた～

まだ途中だけど

# coding: UTF-8
import requests
from bs4 import BeautifulSoup

# アクセスするURL
url = "http://d.hatena.ne.jp/elve/archive"

# URLにアクセスする htmlが帰ってくる(タグ付き)
html = requests.get(url)         #requestsを使って、webから取得
# htmlをBeautifulSoupで扱う
soup = BeautifulSoup(html.text,'lxml')
#記事のページリストを取得
lsPageDiv = soup.find('div',class_="calendar", id="pager-top")
lsPage = lsPageDiv.find_all('a')
#一覧出力
#for l in lsPage:
#    print(l['href'])

#<div class="day">の
#<li class="archive archive-section">を
#数える
lsSectionsDiv = soup.find('div',class_="day")
lsSections = lsSectionsDiv.find_all('li',class_="archive archive-section")
#要素数
cnt = len(lsSections)
print(cnt)