Banana Republic — американский ретейлер одежды и аксессуаров, владельцем которого является Американская транскорпорация Gap. Компания была основана в 1978 году Мелом и Патрицией Зиглер, а в 1983 была приобретена корпорацией Gap, что способствовало росту компании. На сегодняшний день компания насчитывает более 600 магазинов по всему миру. Этот парсер товаров онлайн магазина поможет вам собрать информацию о продуктах в интернет магазине bananarepublic.gap.com.
Примерное количество товаров: 20000
Примерное количество запросов: 20000
Рекомендуемый план подписки: X-Small
ВНИМАНИЕ! Количество запросов может превышать количество товаров, потому что данные о вариациях, изображениях и др. могут парсится используя запросы к дополнительным ресурсам. Также часть данных о товаре может доставляться с помощью XHR запросов, что также увеличивает общее количество необходимых запросов.
Для его использования вы должны иметь учетную запись в нашем сервисе Diggernaut.
- Пройдите по этой ссылке для регистрации в сервисе Diggernaut
- После регистрации и подтверждения email адреса войдите в свою учетную запись
- Создайте проект с любый именем и описанием, если вы не знаете как, обратитесь к нашей документации
- Войдите во вновь созданный проект и создайте в нем диггер с любым именем, если вы не знаете как, обратитесь к нашей документации
- Скопируйте в буфер обмена приведенный ниже сценарий диггера и вставьте его в созданный вами диггер, если вы не знаете как, обратитесь к нашей документации
- Переключите режим работы диггера с Debug на Active, если вы не знаете как, обратитесь к нашей документации
- Запустите ваш диггер и дождитесь окончания его работы, если вы не знаете как, обратитесь к нашей документации
- Скачайте собранный набор данных в нужном вам формате, если вы не знаете как, обратитесь к нашей документации
В дальнейшем вы можете установить расписание для запуска вашего парсера и забирать информацию регулярно.
Сценарий парсера:
---
config:
debug: 2
agent: Firefox
do:
- walk:
to: http://bananarepublic.gap.com/
do:
- find:
path: ul.brnavigation-brol>li>a
do:
- parse:
attr: href
- space_dedupe
- trim
- if:
match: \/browse\/
do:
- normalize:
routine: url
- link_add:
pool: main
- walk:
to: links
pool: main
do:
- find:
path: .sidebar-navigation
slice: 0
do:
- node_remove: h1
- sequence:
header: h2
selector: h2,div
- find:
path: div.sequence
do:
- variable_clear: catname
- find:
path: h2
do:
- parse
- space_dedupe
- trim
- variable_set: catname
- find:
path: .sidebar-navigation--category--link
do:
- pool_clear: pager
- parse:
attr: href
filter:
- cid=(.+)
- variable_set: cid
- register_set: http://bananarepublic.gap.com/resources/productSearch/v1/search?cid=<%cid%>&locale=en_US&isFacetsEnabled=true
- link_add:
pool: pager
- walk:
to: links
pool: pager
do:
- variable_clear: ptot
- find:
path: pageNumberTotal
do:
- parse
- if:
match: (^\s*[0-1]\s*$)
else:
- variable_set: ptot
- find:
path: pageNumberRequested
do:
- parse
- if:
match: (^\s*0\s*$)
do:
- variable_get: ptot
- if:
match: (\d)
do:
- if:
gt: 1
do:
- eval:
routine: js
body: '(function (){var r = ""; for (var i = 1; i<<%ptot%>; i++){r += ""+i+""}; return r;})();'
- to_block
- find:
path: div
do:
- parse
- variable_set: pageid
- register_set: http://bananarepublic.gap.com/resources/productSearch/v1/search?cid=<%cid%>&locale=en_US&pageId=<%pageid%>&isFacetsEnabled=true
- link_add:
pool: pager
- find:
path: productCategory > name
do:
- parse
- space_dedupe
- trim
- variable_set: catname2
- find:
path: productCategory > childProducts
do:
- find:
path: parentBusinessCatalogItemId
do:
- parse
- if:
match: (\S)
do:
- variable_set: pid
- register_set: http://bananarepublic.gap.com/browse/product.do?pid=<%pid%>&cid=<%cid%>
- walk:
to: value
do:
- variable_clear: isP
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- variable_set:
field: isP
value: 1
- find:
path: html
do:
- variable_get: isP
- if:
match: (1)
do:
- object_new: product
- find:
path: head
do:
- eval:
routine: js
body: '(function (){var d = new Date(); return d.toISOString()})();'
- object_field_set:
object: product
field: date
- static_get: url
- object_field_set:
object: product
field: url
- register_set: 'Banana Republic'
- object_field_set:
object: product
field: brand
- find:
path: meta[name="keywords"]
do:
- parse:
attr: content
- object_field_set:
object: product
field: description
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- parse:
filter:
- gap\.currentBrand\s*=\s*\"(.+)\"\;
- if:
match: (\S)
do:
- object_field_set:
object: product
field: brand
- parse
- normalize:
routine: replace_substring
args:
var\s*gap\s*=\s*window\.gap\s*\|\|\s*\{\s*\}\;: ''
gap\.pageProductData\s*=\s*: ''
\s*;\s*gap.currentBrand\s*=\s*.*\;: ''
- normalize:
routine: json2xml
- to_block
- find:
path: productimages
do:
- parse:
format: html
- variable_set: imghtml
- find:
path: variants > productstylecolors > productstylecolorimages
do:
- parse
- normalize:
routine: lower
- variable_set: imgpath
- register_set: <%imghtml%>
- to_block
- find:
path: safe_<%imgpath%>
do:
- variable_clear: getit
- find:
path: xlarge
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: large
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: medium
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: small
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- find:
path: body_safe > variants > productstylecolors > colorname
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: variations
joinby: "|"
- find:
path: body_safe > name
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: name
- find:
path: body_safe > currentmaxprice, body_safe > currentminprice
do:
- parse:
filter:
- (\d+\.?\d*)
- if:
match: (\d+)
do:
- object_field_set:
object: product
field: price
type: float
- register_set: USD
- object_field_set:
object: product
field: currency
- find:
path: styleid
slice: 0
do:
- parse
- object_field_set:
object: product
field: sku
- variable_set: sid
- find:
path: body
do:
- find:
path: '#topNavWrapper a[class*=_selected]'
do:
- parse
- space_dedupe
- trim
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname2
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- object_save:
name: product
- find:
path: productCategory > childCategories
do:
- variable_clear: catname3
- find:
path: name
slice: 0
do:
- parse
- space_dedupe
- trim
- variable_set: catname3
- find:
path: parentBusinessCatalogItemId
do:
- parse
- if:
match: (\S)
do:
- variable_set: pid
- register_set: http://bananarepublic.gap.com/browse/product.do?pid=<%pid%>&cid=<%cid%>
- walk:
to: value
do:
- variable_clear: isP
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- variable_set:
field: isP
value: 1
- find:
path: html
do:
- variable_get: isP
- if:
match: (1)
do:
- object_new: product
- find:
path: head
do:
- eval:
routine: js
body: '(function (){var d = new Date(); return d.toISOString()})();'
- object_field_set:
object: product
field: date
- static_get: url
- object_field_set:
object: product
field: url
- register_set: 'Banana Republic'
- object_field_set:
object: product
field: brand
- find:
path: meta[name="keywords"]
do:
- parse:
attr: content
- object_field_set:
object: product
field: description
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- parse:
filter:
- gap\.currentBrand\s*=\s*\"(.+)\"\;
- if:
match: (\S)
do:
- object_field_set:
object: product
field: brand
- parse
- normalize:
routine: replace_substring
args:
var\s*gap\s*=\s*window\.gap\s*\|\|\s*\{\s*\}\;: ''
gap\.pageProductData\s*=\s*: ''
\s*;\s*gap.currentBrand\s*=\s*.*\;: ''
- normalize:
routine: json2xml
- to_block
- find:
path: productimages
do:
- parse:
format: html
- variable_set: imghtml
- find:
path: variants > productstylecolors > productstylecolorimages
do:
- parse
- normalize:
routine: lower
- variable_set: imgpath
- register_set: <%imghtml%>
- to_block
- find:
path: safe_<%imgpath%>
do:
- variable_clear: getit
- find:
path: xlarge
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: large
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: medium
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: small
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- find:
path: body_safe > variants > productstylecolors > colorname
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: variations
joinby: "|"
- find:
path: body_safe > name
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: name
- find:
path: body_safe > currentmaxprice, body_safe > currentminprice
do:
- parse:
filter:
- (\d+\.?\d*)
- if:
match: (\d+)
do:
- object_field_set:
object: product
field: price
type: float
- register_set: USD
- object_field_set:
object: product
field: currency
- find:
path: styleid
slice: 0
do:
- parse
- object_field_set:
object: product
field: sku
- variable_set: sid
- find:
path: body
do:
- find:
path: '#topNavWrapper a[class*=_selected]'
do:
- parse
- space_dedupe
- trim
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname2
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname3
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- object_save:
name: product
Ниже приведен пример датасета с несколькими товарами в формате JSON (для наглядности). Датасет может быть скачан и как CSV, XLSX, XML, и любой другой текстовый формат используя темплейтный подход.
[{
"product": {
"brand": "banana-republic",
"category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
"currency": "USD",
"date": "2017-12-06T20:24:20.440Z",
"description": "Riley-Fit Stain-Resistant Super-Stretch Shirt, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
"images": "http://bananarepublic.gap.com/webcontent/0013/731/030/cn13731030.jpg|http://bananarepublic.gap.com/webcontent/0013/787/545/cn13787545.jpg|http://bananarepublic.gap.com/webcontent/0013/787/550/cn13787550.jpg|http://bananarepublic.gap.com/webcontent/0013/731/030/cn13731030.jpg|http://bananarepublic.gap.com/webcontent/0013/787/545/cn13787545.jpg|http://bananarepublic.gap.com/webcontent/0013/787/550/cn13787550.jpg",
"name": "Riley-Fit Stain-Resistant Super-Stretch Shirt",
"price": 88,
"sku": "875959",
"url": "http://bananarepublic.gap.com/browse/product.do?pid=875959&cid=48422",
"variations": "White|White"
}
}
,{
"product": {
"brand": "banana-republic",
"category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
"currency": "USD",
"date": "2017-12-06T20:24:22.345Z",
"description": "Pearl Print Tie-Back Dress, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
"images": "http://bananarepublic.gap.com/webcontent/0014/333/311/cn14333311.jpg|http://bananarepublic.gap.com/webcontent/0014/511/681/cn14511681.jpg|http://bananarepublic.gap.com/webcontent/0014/511/700/cn14511700.jpg|http://bananarepublic.gap.com/webcontent/0014/501/794/cn14501794.jpg|http://bananarepublic.gap.com/webcontent/0014/333/311/cn14333311.jpg|http://bananarepublic.gap.com/webcontent/0014/511/681/cn14511681.jpg|http://bananarepublic.gap.com/webcontent/0014/511/700/cn14511700.jpg|http://bananarepublic.gap.com/webcontent/0014/501/794/cn14501794.jpg",
"name": "Pearl Print Tie-Back Dress",
"price": 128,
"sku": "878840",
"url": "http://bananarepublic.gap.com/browse/product.do?pid=878840&cid=48422",
"variations": "Navy|Navy"
}
}
,{
"product": {
"brand": "banana-republic",
"category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
"currency": "USD",
"date": "2017-12-06T20:24:23.316Z",
"description": "Stripe Pajama-Style Shirt with Piping, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
"images": "http://bananarepublic.gap.com/webcontent/0014/388/402/cn14388402.jpg|http://bananarepublic.gap.com/webcontent/0014/556/204/cn14556204.jpg|http://bananarepublic.gap.com/webcontent/0014/556/192/cn14556192.jpg|http://bananarepublic.gap.com/webcontent/0014/388/402/cn14388402.jpg|http://bananarepublic.gap.com/webcontent/0014/556/204/cn14556204.jpg|http://bananarepublic.gap.com/webcontent/0014/556/192/cn14556192.jpg",
"name": "Stripe Pajama-Style Shirt with Piping",
"price": 88,
"sku": "887053",
"url": "http://bananarepublic.gap.com/browse/product.do?pid=887053&cid=48422",
"variations": "Navy|Navy"
}
}
,{
"product": {
"brand": "banana-republic",
"category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
"currency": "USD",
"date": "2017-12-06T20:24:24.239Z",
"description": "Zero Gravity Dixie Wash Skinny Ankle Jean, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
"images": "http://bananarepublic.gap.com/webcontent/0013/683/975/cn13683975.jpg|http://bananarepublic.gap.com/webcontent/0013/684/197/cn13684197.jpg|http://bananarepublic.gap.com/webcontent/0013/745/912/cn13745912.jpg|http://bananarepublic.gap.com/webcontent/0013/683/975/cn13683975.jpg|http://bananarepublic.gap.com/webcontent/0013/684/197/cn13684197.jpg|http://bananarepublic.gap.com/webcontent/0013/745/912/cn13745912.jpg",
"name": "Zero Gravity Dixie Wash Skinny Ankle Jean",
"price": 110,
"sku": "874720",
"url": "http://bananarepublic.gap.com/browse/product.do?pid=874720&cid=48422",
"variations": "Indigo|Indigo"
}
}]