Discussion:
[Rails] text visible in browser but not in source
fugee ohu
2018-11-07 15:35:00 UTC
Permalink
I'm not very good with the consoles in chrome and firefox but I couldn't
find the text I was looking for in source even though it's displayed as
text seemingly, the cursur changes to a vertical line on mouse-over I found
this html below in the source How does this html create the text that
displays?

<div class="ui-box product-description-main" id="j-product-description"> <div class="ui-box-title">Product Description</div> <div class="ui-box-body"> <div class="description-content" data-role="description" data-spm="1000023"> <div class="loading32"></div> </div> </div> </div>
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Colin Law
2018-11-07 16:00:40 UTC
Permalink
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.

Colin
I'm not very good with the consoles in chrome and firefox but I couldn't find the text I was looking for in source even though it's displayed as text seemingly, the cursur changes to a vertical line on mouse-over I found this html below in the source How does this html create the text that displays?
<div class="ui-box product-description-main" id="j-product-description">
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description" data-spm="1000023">
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLutoo0y24s95MA4eozwUz9ZhdReHTKJr5EH_Le24tMLRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-07 16:17:02 UTC
Permalink
Post by Colin Law
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohu
I'm not very good with the consoles in chrome and firefox but I couldn't
find the text I was looking for in source even though it's displayed as
text seemingly, the cursur changes to a vertical line on mouse-over I found
this html below in the source How does this html create the text that
displays?
Post by fugee ohu
<div class="ui-box product-description-main"
id="j-product-description">
Post by fugee ohu
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by fugee ohu
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by fugee ohu
To unsubscribe from this group and stop receiving emails from it, send
<javascript:>.
Post by fugee ohu
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text I'm
viewing in the browser isn't visible in source?
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Colin Law
2018-11-07 16:30:07 UTC
Permalink
Yes, within that context, javascript, how does it happen that the text I'm viewing in the browser isn't visible in source?
It isn't in the source, the DOM is updated using javascript. You
should see it in the DOM inspector but not in the source.

Colin
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLtu-SxVBw%2B_BJy6-YjCsVV53K%2BNC1GZZkNoYsezbU%3Dr0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Jake Niemiec
2018-11-07 16:33:15 UTC
Permalink
The ui-box class would indicate that it is a react component:
https://github.com/segmentio/ui-box

React components are run client-side, meaning the text you are looking for
is inserted into the document after the page runs <script> tags. I would
take a look at the Sources tab in chrome, you can find all the loaded
scripts there.
Post by fugee ohu
Post by Colin Law
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohu
I'm not very good with the consoles in chrome and firefox but I
couldn't find the text I was looking for in source even though it's
displayed as text seemingly, the cursur changes to a vertical line on
mouse-over I found this html below in the source How does this html create
the text that displays?
Post by fugee ohu
<div class="ui-box product-description-main"
id="j-product-description">
Post by fugee ohu
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by fugee ohu
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text
I'm viewing in the browser isn't visible in source?
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com
<https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CALn2xuBszCYvKNbS0cwk851YY2SzuDzbZ%2BT5B%3DoHbD1tDLftqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-08 06:09:28 UTC
Permalink
Post by Jake Niemiec
https://github.com/segmentio/ui-box
React components are run client-side, meaning the text you are looking for
is inserted into the document after the page runs <script> tags. I would
take a look at the Sources tab in chrome, you can find all the loaded
scripts there.
Post by fugee ohu
Post by Colin Law
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohu
I'm not very good with the consoles in chrome and firefox but I
couldn't find the text I was looking for in source even though it's
displayed as text seemingly, the cursur changes to a vertical line on
mouse-over I found this html below in the source How does this html create
the text that displays?
Post by fugee ohu
<div class="ui-box product-description-main"
id="j-product-description">
Post by fugee ohu
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by fugee ohu
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text
I'm viewing in the browser isn't visible in source?
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com
<https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
Thanks Can you point me to a brief tutorial to show me how to get react to
render the content
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/ba79beeb-e4d3-4808-bc60-a0a76d0b68bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Colin Law
2018-11-08 08:40:25 UTC
Permalink
...
Thanks Can you point me to a brief tutorial to show me how to get react to render the content
Open it in a browser, that's what browsers do.

Note there may well be successive requests back to the server to get
the data you are looking for. Look at the Network tab in the browser
developer tools and you may see the call that fetches it.

Colin
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLsPEz6hLp5yOngJVjAOBuWJ2ye0AuSZO%3D15JFyPuyB-iw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-08 22:53:08 UTC
Permalink
Post by Jake Niemiec
https://github.com/segmentio/ui-box
React components are run client-side, meaning the text you are looking for
is inserted into the document after the page runs <script> tags. I would
take a look at the Sources tab in chrome, you can find all the loaded
scripts there.
Post by fugee ohu
Post by Colin Law
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohu
I'm not very good with the consoles in chrome and firefox but I
couldn't find the text I was looking for in source even though it's
displayed as text seemingly, the cursur changes to a vertical line on
mouse-over I found this html below in the source How does this html create
the text that displays?
Post by fugee ohu
<div class="ui-box product-description-main"
id="j-product-description">
Post by fugee ohu
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by fugee ohu
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text
I'm viewing in the browser isn't visible in source?
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com
<https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
I was able to find the text that wasn't shown in source by opening console
and expanding the ui-box div
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/f4193c37-b0a4-4dba-9f99-b2a1db258fca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-09 23:22:44 UTC
Permalink
Post by Jake Niemiec
https://github.com/segmentio/ui-box
React components are run client-side, meaning the text you are looking for
is inserted into the document after the page runs <script> tags. I would
take a look at the Sources tab in chrome, you can find all the loaded
scripts there.
Post by fugee ohu
Post by Colin Law
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohu
I'm not very good with the consoles in chrome and firefox but I
couldn't find the text I was looking for in source even though it's
displayed as text seemingly, the cursur changes to a vertical line on
mouse-over I found this html below in the source How does this html create
the text that displays?
Post by fugee ohu
<div class="ui-box product-description-main"
id="j-product-description">
Post by fugee ohu
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by fugee ohu
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text
I'm viewing in the browser isn't visible in source?
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com
<https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
So far I'm trying to get up to the table, the last element shown below
doc.at_css("div#j-product-description div.ui-box-body
div.description-content") gets me back the div class="description-content
element but doc.at_css("div#j-product-description div.ui-box-body
div.description-content div.origin-part") returns nil There's a lot inside
kde:widget that I'm not including here

<div class="ui-box product-description-main" id="j-product-description"
data-widget-cid="widget-27">
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> &nbsp; </p>
<kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP"
type="relation">...</kse:widget>
<table border="2">
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/34f70d1c-c808-4ac1-9115-fa6bf22f82c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Walter Lee Davis
2018-11-10 15:34:32 UTC
Permalink
The ui-box class would indicate that it is a react component: https://github.com/segmentio/ui-box
React components are run client-side, meaning the text you are looking for is inserted into the document after the page runs <script> tags. I would take a look at the Sources tab in chrome, you can find all the loaded scripts there.
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
I'm not very good with the consoles in chrome and firefox but I couldn't find the text I was looking for in source even though it's displayed as text seemingly, the cursur changes to a vertical line on mouse-over I found this html below in the source How does this html create the text that displays?
<div class="ui-box product-description-main" id="j-product-description">
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description" data-spm="1000023">
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text I'm viewing in the browser isn't visible in source?
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
So far I'm trying to get up to the table, the last element shown below doc.at_css("div#j-product-description div.ui-box-body div.description-content") gets me back the div class="description-content element but doc.at_css("div#j-product-description div.ui-box-body div.description-content div.origin-part") returns nil There's a lot inside kde:widget that I'm not including here
<div class="ui-box product-description-main" id="j-product-description" data-widget-cid="widget-27">
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description" data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> &nbsp; </p>
<kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP" type="relation">...</kse:widget>
<table border="2">
It seems to me that you are going to have to identify the data source that the in-page JavaScript is using to generate the dynamic table data, and query that rather than trying to work everything out from the HTML (which is just a template for the in-page script to fill). There's probably a JSON URL somewhere that is being loaded into the page, and the script is building from that. This entire approach is pretty fraught with peril, though, because (like any scraping project, only more so) any change to the scheme that the site's developer chooses to implement will break your scraper immediately.

Following this path is going to force you to learn about how the site is working on a code level -- and to figure out how they go from data to presentation.

Another approach might be to use a headless browser on the server to construct a "real" DOM of the page, and query that. To be clear -- I do not recommend you follow this path -- I am noting it here to illustrate how ridiculous this effort will be.

One way to visualize this difference is to use the Web Inspector in Safari or Chrome to look at the differences between the raw HTML (Safari labels this tab "Resources") and the DOM (Safari calls this "Elements"). There is likely very little in common outside of the overall outline, if the page is changing as dramatically as you describe. If you hunt through the Resources tab (in Safari) you may find a link to a JSON file that is being required into the page. Loading that URL, rather than the HTML, may give you a much cleaner set of data (which you can parse directly using Ruby) rather than trying to execute JS on your server in order to construct an HTML DOM that you can parse with Nokogiri.

Walter
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/009F2585-34BA-4B4D-A0F0-AD561978E613%40wdstudio.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-10 17:22:52 UTC
Permalink
Post by Jake Niemiec
https://github.com/segmentio/ui-box
Post by Jake Niemiec
React components are run client-side, meaning the text you are looking
for is inserted into the document after the page runs <script> tags. I
would take a look at the Sources tab in chrome, you can find all the loaded
scripts there.
Post by Jake Niemiec
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohu
I'm not very good with the consoles in chrome and firefox but I
couldn't find the text I was looking for in source even though it's
displayed as text seemingly, the cursur changes to a vertical line on
mouse-over I found this html below in the source How does this html create
the text that displays?
Post by Jake Niemiec
Post by fugee ohu
<div class="ui-box product-description-main"
id="j-product-description">
Post by Jake Niemiec
Post by fugee ohu
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by Jake Niemiec
Post by fugee ohu
<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by Jake Niemiec
Post by fugee ohu
To unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by Jake Niemiec
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text
I'm viewing in the browser isn't visible in source?
Post by Jake Niemiec
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by Jake Niemiec
To unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com.
Post by Jake Niemiec
For more options, visit https://groups.google.com/d/optout.
So far I'm trying to get up to the table, the last element shown below
doc.at_css("div#j-product-description div.ui-box-body
div.description-content") gets me back the div class="description-content
element but doc.at_css("div#j-product-description div.ui-box-body
div.description-content div.origin-part") returns nil There's a lot inside
kde:widget that I'm not including here
Post by Jake Niemiec
<div class="ui-box product-description-main" id="j-product-description"
data-widget-cid="widget-27">
Post by Jake Niemiec
<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> &nbsp; </p>
Post by Jake Niemiec
<kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP"
type="relation">...</kse:widget>
Post by Jake Niemiec
<table border="2">
It seems to me that you are going to have to identify the data source that
the in-page JavaScript is using to generate the dynamic table data, and
query that rather than trying to work everything out from the HTML (which
is just a template for the in-page script to fill). There's probably a JSON
URL somewhere that is being loaded into the page, and the script is
building from that. This entire approach is pretty fraught with peril,
though, because (like any scraping project, only more so) any change to the
scheme that the site's developer chooses to implement will break your
scraper immediately.
Following this path is going to force you to learn about how the site is
working on a code level -- and to figure out how they go from data to
presentation.
Another approach might be to use a headless browser on the server to
construct a "real" DOM of the page, and query that. To be clear -- I do not
recommend you follow this path -- I am noting it here to illustrate how
ridiculous this effort will be.
One way to visualize this difference is to use the Web Inspector in Safari
or Chrome to look at the differences between the raw HTML (Safari labels
this tab "Resources") and the DOM (Safari calls this "Elements"). There is
likely very little in common outside of the overall outline, if the page is
changing as dramatically as you describe. If you hunt through the Resources
tab (in Safari) you may find a link to a JSON file that is being required
into the page. Loading that URL, rather than the HTML, may give you a much
cleaner set of data (which you can parse directly using Ruby) rather than
trying to execute JS on your server in order to construct an HTML DOM that
you can parse with Nokogiri.
Walter
It wasn't shown in source but when I expanded the element recursively in
chrome developer tools I saw the text I was looking for So, what's that
gonna be worth?
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/4d5c228f-5252-46b4-9ab0-72257d754ead%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Colin Law
2018-11-10 17:25:48 UTC
Permalink
...
It wasn't shown in source but when I expanded the element recursively in chrome developer tools I saw the text I was looking for So, what's that gonna be worth?
As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.

Colin
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLudDPm3UoJAdmqiOcyH1obDrQ-3CBkn4sMW5nLD_LaFfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-10 21:28:22 UTC
Permalink
Post by fugee ohu
Post by fugee ohu
...
It wasn't shown in source but when I expanded the element recursively in
chrome developer tools I saw the text I was looking for So, what's that
gonna be worth?
As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.
Colin
Using a headless browser would be cheating?
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/a9b07a26-4caf-450b-b436-11c9762a7456%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Colin Law
2018-11-10 21:57:02 UTC
Permalink
Post by fugee ohu
Post by Colin Law
...
It wasn't shown in source but when I expanded the element recursively in chrome developer tools I saw the text I was looking for So, what's that gonna be worth?
As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.
Have you done what I suggested and looked in the browser developer
tools at the Network tab? Then you will see if it fetches any further
data after the initial page fetch. Very often you will find it
fetching some json which will very likely contain the data you are
looking for.

Colin
Post by fugee ohu
Post by Colin Law
Colin
Using a headless browser would be cheating?
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/a9b07a26-4caf-450b-b436-11c9762a7456%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CAL%3D0gLvTG-F0nZpjhm%3DGSP7Y8ZT-pj9gV7CLBew6%3DMDS_8igVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-10 22:46:45 UTC
Permalink
Post by fugee ohu
Post by Colin Law
Post by fugee ohu
...
It wasn't shown in source but when I expanded the element recursively
in chrome developer tools I saw the text I was looking for So, what's that
gonna be worth?
Post by fugee ohu
Post by Colin Law
As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.
Have you done what I suggested and looked in the browser developer
tools at the Network tab? Then you will see if it fetches any further
data after the initial page fetch. Very often you will find it
fetching some json which will very likely contain the data you are
looking for.
Colin
Post by fugee ohu
Post by Colin Law
Colin
Using a headless browser would be cheating?
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by fugee ohu
To unsubscribe from this group and stop receiving emails from it, send
<javascript:>.
Post by fugee ohu
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/a9b07a26-4caf-450b-b436-11c9762a7456%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
In chrome developer tools under network->all there's a lot of
config.json?key=.... statements but under I don't see anything
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/054eba47-d1ee-43c3-b10e-410ab8fac54f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-10 23:29:36 UTC
Permalink
Post by fugee ohu
Post by Colin Law
Post by fugee ohu
...
It wasn't shown in source but when I expanded the element recursively
in chrome developer tools I saw the text I was looking for So, what's that
gonna be worth?
Post by fugee ohu
Post by Colin Law
As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.
Have you done what I suggested and looked in the browser developer
tools at the Network tab? Then you will see if it fetches any further
data after the initial page fetch. Very often you will find it
fetching some json which will very likely contain the data you are
looking for.
Colin
Post by fugee ohu
Post by Colin Law
Colin
Using a headless browser would be cheating?
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by fugee ohu
To unsubscribe from this group and stop receiving emails from it, send
<javascript:>.
Post by fugee ohu
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/a9b07a26-4caf-450b-b436-11c9762a7456%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
Yes there's a few scripts running on the page
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/0b5a2859-e95d-4b5d-85d6-17011939617b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
fugee ohu
2018-11-11 04:27:39 UTC
Permalink
Post by fugee ohu
Post by Colin Law
Post by fugee ohu
...
It wasn't shown in source but when I expanded the element recursively
in chrome developer tools I saw the text I was looking for So, what's that
gonna be worth?
Post by fugee ohu
Post by Colin Law
As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.
Have you done what I suggested and looked in the browser developer
tools at the Network tab? Then you will see if it fetches any further
data after the initial page fetch. Very often you will find it
fetching some json which will very likely contain the data you are
looking for.
Colin
Post by fugee ohu
Post by Colin Law
Colin
Using a headless browser would be cheating?
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by fugee ohu
To unsubscribe from this group and stop receiving emails from it, send
<javascript:>.
Post by fugee ohu
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/a9b07a26-4caf-450b-b436-11c9762a7456%40googlegroups.com.
Post by fugee ohu
For more options, visit https://groups.google.com/d/optout.
I found this under Network->Initiator The name column to the left of
Initiator is very long I got these from mouseover one of the Initiators

send @ package.a6067778.js:3
ajax @ package.a6067778.js:3
_init @ main-detail-v170105.f7571793.js:24
setup @ main-detail-v170105.f7571793.js:24
initialize @ package.a6067778.js:4
o @ package.a6067778.js:4
getTrendingProductFun @ main-detail-v170105.f7571793.js:25
onRouse @ main-detail-v170105.f7571793.js:25
_rouse @ main-detail-v170105.f7571793.js:13
_activate @ main-detail-v170105.f7571793.js:13
activateAll @ main-detail-v170105.f7571793.js:13
(anonymous) @ main-detail-v170105.f7571793.js:13
(anonymous) @ ZQ8V5-LYZLD-DEX8D-M5HHU-ERM4X:16
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+***@googlegroups.com.
To post to this group, send email to rubyonrails-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/ca04e9fc-e968-4bd2-9c68-790e4f23261b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...