{"data":{"site":{"siteMetadata":{"title":"コリログ","author":"コリ"}},"markdownRemark":{"id":"b144771a-786a-5338-971b-8d73cd0599d2","excerpt":"スクレイピングをしたい Webサイトから情報をまとめてCSVでごにょごにょしたいなどにスクレイピングは便利です。 今回は、Pythonのスクレイピングによく使われるBeautifulSoupを使って勉強していきます。\nBeautifulSoupには便利な機能がたくさんあります。今回は、それをまとめておきます。 tag…","htmlAst":{"type":"root","children":[{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"スクレイピングをしたい"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Webサイトから情報をまとめてCSVでごにょごにょしたいなどにスクレイピングは便利です。"},{"type":"element","tagName":"br","properties":{},"children":[]},{"type":"text","value":"\n今回は、Pythonのスクレイピングによく使われるBeautifulSoupを使って勉強していきます。\nBeautifulSoupには便利な機能がたくさんあります。今回は、それをまとめておきます。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"tagの名前を表示"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.title.name)"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"titleタグで囲まれた文字列を表示"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.title.string)"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"titleタグの親要素を表示"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.title.parent.name)"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"タグで囲まれた部分を表示"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.p)"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"タグのクラス名を取得"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.p['class'])"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"element","tagName":"a","properties":{},"children":[{"type":"text","value":"タグの最初の一つを取得"}]}]},{"type":"element","tagName":"a","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.a)"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"}]},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"element","tagName":"a","properties":{},"children":[]},{"type":"element","tagName":"a","properties":{},"children":[{"type":"text","value":"タグ全てを取得"}]}]},{"type":"element","tagName":"a","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.find_all('a'))"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"idを検索"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"BeautifulSoupでtagの名前を表示するには"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"print(soup.find(id=\"link3\"))"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"と記載します。"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"完成したコード"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"div","properties":{"className":["gatsby-highlight"],"dataLanguage":"text"},"children":[{"type":"element","tagName":"pre","properties":{"className":["language-text"]},"children":[{"type":"element","tagName":"code","properties":{"className":["language-text"]},"children":[{"type":"text","value":"html_doc = \"\"\"\n<html><head><title>The Dormouse's story</title></head>\n<body>\n<p class=\"title\"><b>The Dormouse's story</b></p>\n\n<p class=\"story\">Once upon a time there were three little sisters; and their names were\n<a href=\"http://example.com/elsie\" class=\"sister\" id=\"link1\">Elsie</a>,\n<a href=\"http://example.com/lacie\" class=\"sister\" id=\"link2\">Lacie</a> and\n<a href=\"http://example.com/tillie\" class=\"sister\" id=\"link3\">Tillie</a>;\nand they lived at the bottom of a well.</p>\n\n<p class=\"story\">...</p>\n\"\"\"\n\nfrom bs4 import BeautifulSoup\nsoup = BeautifulSoup(html_doc)\n\n#tagの名前を表示\nprint(soup.title.name)\n\n#titleタグで囲まれた文字列を表示\nprint(soup.title.string)\n\n#titleタグの親要素を表示\nprint(soup.title.parent.name)\n\n#<p>タグで囲まれた部分を表示\nprint(soup.p)\n# <p class=\"title\"><b>The Dormouse's story</b></p>\n\n#<p>タグのクラス名を取得\nprint(soup.p['class'])\n\n#<a>タグの最初の一つを取得\nprint(soup.a)\n\n#<a>タグ全てを取得\nprint(soup.find_all('a'))\n\n#idを検索\nprint(soup.find(id=\"link3\"))"}]}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"動くサンプル"}]},{"type":"text","value":"\n"}]},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"a","properties":{},"children":[]},{"type":"element","tagName":"a","properties":{"href":"https://colab.research.google.com/drive/1b8yzaGchdQKOwbA1QxQ7_hoVx3pmI12w?hl=ja#scrollTo=5-VAsYl4WkJ5"},"children":[{"type":"text","value":"Python3"}]}]}],"data":{"quirksMode":false}},"fields":{"slug":"/python/python030/"},"frontmatter":{"title":"BeautifulSoup入門　soup.の挙動を確認する【Python】","categoryName":"Python","categorySlug":"python","date":"02 28, 2019"}}},"pageContext":{"slug":"/python/python030/","previous":{"fields":{"slug":"/python/python031/"},"frontmatter":{"title":"BeautifulSoup入門　スクレイピング実践編　HTML取得まで【python】","categorySlug":"python"}},"next":{"fields":{"slug":"/python/python034/"},"frontmatter":{"title":"pandas入門 google Colaboratoryでローカルファイルを読み込む方法","categorySlug":"python"}}}}